Application of protein structure predictions

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C702S027000, C702S031000, C436S086000, C436S089000

Reexamination Certificate

active

06377893

ABSTRACT:

INTRODUCTION
1. Field of the Invention
This invention relates to the area of bioinformatics, more specifically to methods for analyzing the sequences of evolutionarily related proteins, and most specifically for identifying evolutionary and functional relationships between proteins and the genes that encode them.
2. Background
Proteins are linear polypeptide chains composed of 20 different amino acid building blocks. Determining the sequence of amino acids in a protein is now experimentally routine, both by direct chemical analysis of the proteins themselves, and by translation of genes that encode proteins. The size of protein sequence databases will grow explosively over the next decade as genome sequencing projects are completed.
The polypeptide chain in a protein folds to give secondary structural units (most commonly alpha helices and beta strands) which then fold to give supersecondary structures (for example, a beta sheet or a strand-turn-helix) and a tertiary structure. These are collectively termed “conformation” or, more colloquially, the “fold”. Most behaviors of a protein are determined by the fold, including those that are important for allowing the protein to function in a living system. The folded structure must be known before pharmaceuticals can be rationally designed to bind to the protein, for example.
In principle, the linear polypeptide sequence, by providing the constitution of the protein, also determines all of its other properties, including secondary and tertiary structure, stability, interaction with other molecules, and through these and other properties, biological activity. The connection between amino acid sequence and these other properties is not transparent, however. For example, some 30 years have been spent developing tools that allow the biochemist to predict secondary structure of proteins starting from sequence data. Many of the classical approaches attempting to predict secondary structure from sequence, of example, were summarized in the disclosure of Ser. No. 07/857,224, filed Mar. 25, 1992, which is herein incorporated by reference.
In the mid 1970's, a relationship between evolutionary ancestry and protein conformation was established. Rossman noted that lactate, glyceraldehyde-3-phosphate, and alcohol dehydrogenases acting on quite different substrates all have a domain that folds to give a parallel sheet flanked by helices (a “Rossman fold”). [Rossman, M. G., & Argos, P. (1976). Exploring structural homology of proteins. 105, 75-95].
It is now widely appreciated that homologous proteins can have diverged so much that no significant sequence similarity remains between them, even though their overall folds might be the same. Since 1976, many have attempted to exploit the fact that homologous proteins have the same fold as a tool for predicting fold. For cases where the target protein was sufficiently similar in sequence to a protein with a known conformation to establish homology with reasonable statistical similarity, “homology modelling” was used. Homology modeling is best defined strictly as a process for building a model of the conformation of a target protein that begins by identifying a protein with known conformation that is a homolog of a target, and uses the homolog as a starting point to model the conformation of the target.(May, & Blundell, 1995; Sali, 1995) [May, A. C. W., & Blundell, T. L. (1995). Automated comparative modelling of protein structures. 5, 355-360. Sali, A. (1995). Modeling mutations and homologous proteins. 6, 437-51.]
As is well known to those skilled in the art, sequence analysis becomes ineffective as a tool to establish homology after sequence identity between two homologous proteins drops below approximately 25% for a protein of typical length. At this point (the “twilight zone”), non-homologous sequences share the same level of sequence similarity with a target protein as homologous sequences, making it impossible to determine from sequence data alone whether two proteins are homologous or not. Thus, while a high similarity score (corresponding to a high sequence identity in an alignment with few gaps) is generally a strong indicator of homology, a low score is generally not a reliable indicator of non-homology. Much of the sequence analysis tools presently being developed attempt to extract evidence of homology from sequence data for proteins that have statistically marginal or sub-significant similarities, and to use this to predict conformation.
One approach for identifying long distance homologs when alignment scores are statistically marginal is to do a “profile analysis” [Gribskov, M., McLachlan, A. D., Eisenberg, D. Profile analysis: Detection of distantly related proteins.
Proc. Nat. Acad. Sci.
84, 4355-4358 (1987)]. In this approach, a set of sequences of members of a protein family is examined. The sequence similarities in this set of proteins must be sufficient to establish that the proteins in the set are homologous and adopt the same fold. A multiple alignment of the sequences is constructed. Then, for each position in the multiple alignment, a position-specific scoring matrix is constructed using as input the amino acids at that position for each protein in the multiple alignment. A “profile” of the protein is the collection of each of these matrices for each position for the entire protein sequence alignment. The sequence of a protein that is a possible homolog of family (but whose sequence is too dissimilar from that of any individual member of the family to give a score that is statistically adequate) is then matched against the profile and scored. If the score is high, the hypothesis that the protein is a possible homolog of the family is strengthened.
In practice, profile analyses identify many proteins in a database that are possible homologs, where the correct “hits” are buried in a large number of false positives. For this reason, profile analysis is virtually useless as a tool for excluding the possibility that two proteins are homologous, or contain the same core fold.
Another approach for identifying long distance homologs when alignment scores are statistically marginal is to search for sequence “templates” or “motifs”, short segments of polypeptide chain that might be conserved over long distances [Taylor, W. R.
J. Mol. Biol.
188, 233-258 (1986); Taylor, W. R., Thornton, J. M.
Mol. Biol.
173, 487-514 (1984); Wierenga, R. K., Terpstra, P., Hol, W. G. J.,
J. Mol. Biol.
187, 101-107 (1986)]. Here, the presence of analogous motifs in two protein sequences can be used to infer long distance homology between a target protein and a protein with known conformation, and from this inference, a model of the target protein can be modelled on the structure of the other. As with profile modelling, the presence of a template is not a reliable indicator of long distance homology and similar fold. For example, in the first example presented in Ser. No. 07/857,224 (for protein kinase), several groups had noted that the protein has a sequence motif Gly-Xxx-Gly-Xxx-Xxx-Gly (where Xxx is any amino acid) [Sternberg, M. J. E., Taylor, W. R. Modeling the ATP binding site of oncogene products, the epidermal growth-factor receptor and related proteins FEBS Lett. 1984, 175, 387-392.]. Further it was noted that a similar motif was found in adenylate kinase, where a crystal structure was known. Therefore, it was proposed that the two structures are homologous. From this proposal, it was deduced in the literature that protein kinase would adopt the same fold as adenylate kinase. This proposal was proposed in Ser. No. 07/857,224 to be incorrect, and later shown to be incorrect experimentally [Knighton, D. R., Zheng, J., Ten Eyck, L., Ashford, F. V. A., Xuong, N. H. Taylor, S. S., Sowadski, J. M. (1991) Crystal structure of the catalytic subunit of cyclic adenosine-monophosphate dependent protein-kinase.
Science
253, 407-414.].
Further, motif analysis has not (prior to Ser. No. 07/857,224) been used as part of any tool to infer t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Application of protein structure predictions does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Application of protein structure predictions, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Application of protein structure predictions will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2831249

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.