Paper
27 July 2001 Mining biological databases for candidate disease genes
Terry A. Braun, Todd Scheetz, Gregg Lewis Webster, Thomas L. Casavant
Author Affiliations +
Proceedings Volume 4528, Commercial Applications for High-Performance Computing; (2001) https://doi.org/10.1117/12.434869
Event: ITCom 2001: International Symposium on the Convergence of IT and Communications, 2001, Denver, CO, United States
Abstract
The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).
© (2001) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Terry A. Braun, Todd Scheetz, Gregg Lewis Webster, and Thomas L. Casavant "Mining biological databases for candidate disease genes", Proc. SPIE 4528, Commercial Applications for High-Performance Computing, (27 July 2001); https://doi.org/10.1117/12.434869
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Databases

Mining

Data acquisition

Proteins

Data mining

Genetics

Organisms

Back to Top