Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical
Reexamination Certificate
1998-05-01
2003-01-28
Brusca, John S. (Department: 1631)
Data processing: measuring, calibrating, or testing
Measurement system in a specific environment
Biological or biochemical
C435S004000
Reexamination Certificate
active
06512981
ABSTRACT:
TECHNICAL FIELD
This invention relates to a computer-assisted method for identifying protein sequences that fold into a known three-dimensional structure, and more particularly to a computer-assisted method for assigning an amino acid probe sequence to a known three-dimensional protein structure.
BACKGROUND
Proteins (or polypeptides) are linear polymers of amino acids. The polymerization reaction which produces a protein results in the loss of one molecule of water from each amino acid, and hence proteins are often said to be composed of amino acid “residues.” Natural protein molecules may contain as many as 20 different types of amino acid residues, each of which contains a distinctive side chain. The particular linear sequence of amino acid residues in a protein defines the primary sequence, or primary structure, of the protein. The primary structure of a protein can be determined with relative ease using known methods.
Proteins fold into a three-dimensional structure. The folding is determined by the sequence of amino acids and by the protein's environment. Examination of the three-dimensional structure of numerous natural proteins has revealed a number of recurring patterns, or secondary structure. Secondary structures known as alpha helices, parallel beta sheets, and anti-parallel beta sheets are the most common observed. A description of such secondary structures is provided by Dickerson, R. E., et al. in
The Structure and Action of Proteins,
W. A. Benjamin, Inc. Calif. (1969). The helices, sheets, and turns of a protein's secondary structure pack together to produce the folded three-dimensional, or tertiary, structure of the protein.
In the past, the three-dimensional structure of proteins has been determined in a number of ways. Perhaps the best known way of determining protein structure involves the use of the technique of x-ray crystallography. A general review of this technique can be found in
Physical Bio
-
chemistry,
Van Holde, K. E. (Prentice-Hall, N.J. 1971), pp. 221-239, or in
Physical Chemistry with Applications to the Life Sciences,
D. Eisenberg & D. C. Crothers (Benjamin Cummings, Menlo Park 1979). Using this technique, it is possible to elucidate three-dimensional structure with good precision. Additionally, protein structure may be determined through the use of the techniques of neutron diffraction, or by nuclear magnetic resonance (NMR). See, e g.,
Physical Chemistry,
4th Ed. Moore, W. J. (Prentice-Hall, N.J. 1972) and
NMR of Proteins and Nucleic Acids,
K. Wüthrich (Wiley-Interscience, NY 1986).
The biological properties of proteins depend directly on the protein's three-dimensional (3D) conformation. The 3D conformation determines the activity of enzymes, the capacity and specificity of binding proteins, and the structural attributes of receptor molecules. Because the three-dimensional structure of a protein molecule is so significant, it has long been recognized that a means for readily determining a protein's three-dimensional structure from its known amino acid sequence would be highly desirable. However, it has proved extremely difficult to make such a determination. One difficulty is that each protein has an astronomical number of possible conformations (about 10
16
for a small protein of 100 residues; see K. A. Dill,
Biochemistry,
24, 1501-1509, 1985), and there is no reliable method for picking the one conformation stable in aqueous solution. A second difficulty is that there are no accurate and reliable force laws for the interaction of one part of a protein with another part, and with water. Proteins exist in a dynamic equilibrium between a folded, ordered state and an unfolded, disordered state. These and other factors have contributed to the enormous complexity of determining the most probable relative 3D location of each residue in a known protein sequence.
The protein folding problem, the problem of determining a protein's three-dimensional tertiary structure from its amino acid sequence, or primary structure, has defied solution for over 30 years. In the last decade, however, the increase in the number of known protein sequences, and the fact that many sequences have been found to fold into the same basic three-dimensional structure, have focused attention on a related problem: the inverse protein folding problem. The inverse protein folding problem asks, given a known three dimensional protein structure, which amino acid sequences fold into that structure?
As a result of the molecular biology revolution, the number of known protein sequences is about 50 times greater than the number of known three-dimensional protein structures. This disparity hinders progress in many areas of biochemistry because a protein sequence has little meaning outside the context of the three-dimensional structure. The disparity is less severe than the numbers might suggest, however, because different proteins often adopt similar three-dimensional folds. As a result, each new protein structure can serve as a model for other protein structures. These structural similarities occur because the current array of protein structures probably evolved from a small number of primordial folds. If the number of folds is indeed limited, it is possible that x-ray crystallographers and NMR spectroscopists may eventually describe examples of essentially every fold. In that event, protein structure prediction theoretically would reduce, at least in crude form, to the inverse protein folding problem—the problem of identifying which fold in this limited repertoire a particular amino acid sequence adopts. Thus, protein fold recognition aims to assign each new amino acid sequence to the known 3D fold that the sequence most closely resembles.
The inverse protein folding problem is most often approached by seeking sequences that are similar to the sequence of a protein whose structure is known. If a sequence relationship can be found, it can often be inferred that the protein of known sequence but unknown structure adopts a fold similar to the protein of known structure. The strategy works well for closely related sequences, but structural similarities can go undetected as the level of sequence identity drops below about 25 percent.
A more direct attack on the inverse protein folding problem has been to search for sequences that are compatible with a given structure. In this “tertiary template” method, the backbone of a known protein structure—the amino acid residues less the side chains—is kept fixed and the side-chains in the protein core are then replaced and tested combinatorially by computer, to find which combination of new side-chains could fit into the core. A set of core sequences is thereby enumerated that could in principle be tolerated in the protein structure. In this manner, the method of tertiary templates provides a direct link between possible three-dimensional structure and known sequence. See Ponder & Richards,
J. Mol. Biol.,
93, 775-791 (1987).
The rules used to relate one-dimensional amino acid sequences to possible three-dimensional structures in the tertiary template method may be excessively rigid. Proteins that fold into similar structures can have large differences in the size and shape of residues at equivalent positions. These changes are tolerated not only because of replacements or movements in nearby side-chains, but also as a result of shifts in the protein backbone. Moreover, insertions and deletions in the amino acid sequence, which are commonly found in related protein structures, are not considered in the implementation of tertiary templates. To describe realistically the sequence requirements of a particular fold, the constraints of a rigid backbone and a fixed spacing between core residues must somehow be relaxed.
Another approach, suggested by work done by one of the present inventors, is a profile method that characterizes the amino acid sequences of families of proteins aligned by sequence or structural similarities. The profile method builds a table of weighted values that reflect the frequency that
Eisenberg David
Fischer Daniel
Brusca John S.
Fish & Richardson P.C.
The Regents of the University of California
LandOfFree
Protein fold recognition using sequence-derived predictions does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Protein fold recognition using sequence-derived predictions, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Protein fold recognition using sequence-derived predictions will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3068664