Gene discovery through comparisons of networks of structural...

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Chemical analysis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C435S006120, C708S131000, C712S200000

Reexamination Certificate

active

06633819

ABSTRACT:

SPECIFICATION
The present specification contains a 3 page computer program listing which appears as a microfiche (3 frames) and 131 frames.
1. INTRODUCTION
The present invention relates to methods for identifying novel genes comprising: (i) generating one or more specialized databases containing information on gene/protein structure, function and/or regulatory interactions; and (ii) searching the specialized databases for homology or for a particular motif and thereby identifying a putative novel gene of interest. The invention may further comprise performing simulation and hypothesis testing to identify or confirm that the putative gene is a novel gene of interest.
Specifically, the present invention provides for the generation of specialized databases containing information on gene/protein structure, function and regulatory interactions based on the retrieval of such information from research articles and databases, and computer representation of such information in a manner that allows efficient access to the extracted information. The invention further provides for the use of the specialized databases for identifying novel genes based on detection of sequence similarities and domain/motif matches between genes/proteins, computation and interpretation of phylogenetic trees for multigene families, and analysis of homologous regulatory networks. The methods of the invention are based on the observation that functionally similar regulatory systems are generated during evolution by genetic duplication of ancestral genes. Thus, a comparison of homologous/similar networks within the same organism and between different species will allow the identification of genes absent in one of the systems under comparison. In this way genes that contribute to the phenotype of a specific disease associated with a particular biological system under analysis may be identified.
2. BACKGROUND OF THE INVENTION
A variety of different methods are currently utilized for the identification and characterization of novel genes. Perhaps the most widely used method for generating large quantities of sequence information is via high throughput nucleotide sequencing of random DNA fragments. A disadvantage associated with this gene discovery technique is that in most instances when genes are identified their function is unknown.
For identification of specific disease genes, positional cloning is currently the most efficiently used method. The positional cloning approach combines methods of formal genetics, physical mapping and mutation analysis and usually starts with a precise description of the disease phenotype and a tracing of the disease through families of affected individuals. Genetic linkage data obtained from the analysis of affected families frequently allows the determination of an approximate genomic localization of the candidate disease gene with a precision of several millions of nucleotides. Once localized, the genetically defined chromosomal region is then recovered from genomic libraries as a contiguous set of genomic fragments. Genes residing in the disease-related region are determined by analysis of transcripts that are transcribed from the genomic fragment. From this analysis an initial set of candidate genes for a particular disease are identified based on the presence of the gene product in the biological system affected by disease and a correlation between its expression pattern and the pattern of disease progression.
Important information for selection of candidate genes also comes from analysis of their homology with genes known to be part of the same or related biological system. Finally, the ultimate proof of association between a gene and a genetic disorder comes from mutational analysis of a gene in patients affected by the disorder and from demonstration of a statistical correlation between occurrence of mutation and the disease phenotype.
Although positional cloning is a powerful method for gene discovery, the experimental method is extremely tedious and expensive. Moreover, disease genes implicated in genetically complex disorders, i.e., those controlled by multiple loci, can hardly be found using this strategy because of the complications associated with multiple loci linkage analysis.
Specialized databases for homology searches have also been utilized in disease gene discovery projects. In recent years a number of efficient sequence comparison tools have been developed such as the BLAST (Basic Local Alignment Search Tool) family of programs designed for comparison of a single “search sequence” with a database (see Altschul et al., 1990, J. Mol. Biol. 215:403-410; Altschul et al., 1997, Nucleic Acids Res. 25:3389-3402), the family of Hidden Markov Model methods for comparison of a set of aligned sequences that usually represent a protein motif or domain with a database (e.g., Krogh et al., 1994, J. Mol. Biol. 235:1501-1531; Grundy et al., 1997, Biochem Biophys. Res. Commun. 231:760-6) and various other comparison tools (Wu et al., 1996, Comput. Appl. Biosci 12:109-118; Neuwald et al., 1995, Protein Sci. 4:1618-1632; Neuwald, 1997, Nucleic Acids Res. 25:1665-1677).
When used in disease gene discovery projects, homology searches can be enhanced by creating specialized databases that utilize statistical analysis for evaluating significance of sequence similarities in comparison of new sequences with a database of known sequence. Such databases are fine-tuned to the size of the database used (Altschul et al., 1990, J. Mol. Biol. 215:403-410; Altschul et al., 1997, Nucleic Acids Res. 25:3389-3402), so that the same level of homology between a search sequence and a database sequence can be determined to be highly significant if the search sequence is compared with a smaller database, or insignificant and thus undetectable, if the search sequence is compared with a larger database.
In alternatives to standard homology searches, in projects oriented towards gene discovery, researchers usually have some a priori knowledge about the set of genes/proteins that might display important similarity to the unknown new gene. Therefore, selecting an a priori defined set of genes/proteins for comparison with new experimental sequences is a feasible and useful strategy. This strategy was successfully applied to search for homologs of disease genes in yeast and nematode genomes by Mushegian et al. (1997, Proc. Natl. Acad. Sci USA 94:5831-5836).
Two homologous genes taken from different species that originate from the nearest common ancestor by speciation are referred to as orthologs, while any two genes that originate from a common ancestor via a series of events involving intragenomic duplications are call paralogs. Tatusov et al. (1994, Proc. Nat.l, Acad. Sci USA 91:12091-12095) describe comparisons of proteins encoded by the genomes of different phylogenetic lineages and elucidation of consistent patterns of sequence similarities permitting the delineation of clusters of orthologous groups (COGs). Each COG consists of individual orthologous genes or orthologous groups of paralogs from different phylogenetic lineages. Since orthologs typically have the same function, the classification of known genes and proteins into clusters of orthologous groups permits the assignment of a function to a newly discovered gene or protein by merely classifying it into a COG. Although Tatusov describes a method for assigning a function to a newly discovered gene, he does not describe a method for predicting the existence of undiscovered genes. In addition, Yuan, et al. attempted simultaneous reconstruction of a species tree and identification of paralogous groups of sequences and detection of orthologs in sequence databases (Yuan et al., 1998,
Bioinformatics
143:285-289).
Other groups have aimed at capturing interactions among molecules through the use of programs designed to compare structures and functions of proteins (Kazic 1994, In:
Molecular Modeling: From Virtual Tools to Real Problems
, Kumosinski, T. and Liebman, M. N. (Eds.), American Chemical Society, Washington, D.C. pp. 486-494; Kazic, 1994, In:

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Gene discovery through comparisons of networks of structural... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Gene discovery through comparisons of networks of structural..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Gene discovery through comparisons of networks of structural... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3138124

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.