Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Chemical analysis
Reexamination Certificate
2000-01-25
2002-12-03
Zeman, Mary K. (Department: 1631)
Data processing: measuring, calibrating, or testing
Measurement system in a specific environment
Chemical analysis
C702S019000, C702S020000, C703S002000, C703S011000
Reexamination Certificate
active
06490532
ABSTRACT:
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by any one of the patent disclosure, as it appears in the Patent and Trademark files or records, but otherwise reserves all copyright rights whatsoever.
FIELD OF THE INVENTION
The invention relates to a computer-implemented method for determining all-atom, real-space protein structures.
BACKGROUND OF THE INVENTION
Protein sequences can be deduced from the DNA sequence of an organism, and the worldwide genome project has provided tens of thousands of new protein sequences. Proteins have flexible backbones, and protruding, rotating side-chains. They can take up a countless number of shapes i.e. conformations, in three-dimensional space. Yet proteins eventually fold into an ordered structure, the native conformation. The protein-folding problem reflects the inability to predict the native folded conformation of a protein given only its amino acid sequence.
Various methods have been described for solving the folding problem. The methods include direct and template-based methods. Direct methods try to determine the native conformation as a lowest energy point in some defined hyperspace of conformational possibilities. Template-based methods compare a sequence of unknown three-dimensional structure against a library of known three-dimensional structure and score good matches as likely folds. There is a substantive body of research literature on these methods, but successes are rare and often not reproducible. There has been a call for new computational methods that broadly explore comformational space and that are true to the details of protein structure (Dill, K. A. et al Nature Structural Biology 4:10, 1997; Karpus, M. Fold Des. 2:S69, 1997).
SUMMARY OF THE INVENTION
The present inventors have developed a method to generate plausible random protein structures. All-atom proteins are made directly in continuous 3-dimensional space starting from primary sequence with an N to C directed build-up method. The method uses a novel pipelined residue addition approach in which the leading edge of the protein is constructed 3 residues at a time for optimal protein geometry, including the placement of cis proline. Build-up methods represent a classic N-body problem, expected to scale as N
2
. When proteins become more collapsed, build-up methods are susceptible to backtracking problems which can scale exponentially with the number of residues required to back out of a trapped walk. Solutions to both these problems have been provided, using a multiway binary tree that makes the N-body problem of bump-checking scale as NlogN, and speeding up backtracking by varying the number of tries before backtracking based on available conformational space.
In particular, the method constructs all-atom protein structures in O(NlogN) time (rather than in quadratic O(N
2
) time) by residue addition that is balanced in both speed and detail. The primary sequence and a multidimensional trajectory graph system is employed, which directs the sampling of conformational space and behaves like the theoretical protein folding funnel. Trajectory graphs can direct either the random sampling of protein conformer space (the funnel “mouth”), or direct the reconstruction of a known protein backbone (the funnel “spout”). Several novel geometrical, methodological, and algorithmic approaches are introduced in the method. A schematic diagram of a method of the invention is shown in FIG.
1
.
The methods of the invention have been validated at both extremes of the folding funnel by comparison with polymer theory, and by reconstructing known proteins. In particular, random all-atom proteins generated using the
E. coli
genomic amino acid composition had radius of gyration statistics that showed the expected swelling compared to non-self avoiding random polyalanine, and Flory's (12) theoretical curves approximate a lower bound for these results. For tests of protein fold reconstruction using nine different protein folds, an average RMSD of 0.63 Å was obtained for C&bgr;, C, N and O backbone atoms. WHAT_CHECK, a protein-structure checking software suite (30) validated that the method generates physically and chirally valid backbones and sidechains,
The binary-d tree is a new hierarchical data structure developed to deal with the O(N
2
) problem of atomic bump-checking (collision detection based on atomic radii). It permits overall O(NlogN) time complexity (validated out to N=2,500 amino acids), together with efficient backtracking. It utilizes a unique 3-dimensional tree that partitions space in a relative fashion unlike voxels used in an octree system. Branch and bound search methods on the binary-d tree can retrieve coordinates contained by probe volumes. The method allows atoms or sections of molecules to be moved without repartitioning the space occupied by the entire set of atoms. Binary searches are also used in the fitting of amino acid backbones between alpha-carbons, and in the random sampling of the trajectory graphs, which also contribute to the overall O(NlogN) performance of the method of the invention.
Therefore, in accordance with an aspect of the invention a method is provided for creating or identifying a conformation of a protein of known or unknown structure which comprises the steps of;
(a) providing an amino acid sequence of the protein;
(b) constructing a backbone structure of &agr;-carbons of the protein by adding and removing carbon atoms through chain elongation and backtracking, wherein an atom is positioned based on a predicted two-dimensional space, and wherein backtracking removes an atom if it is closer to its neighbour than allowed by van der Waals radii;
(c) positioning &bgr; carbons, C, N, and O atoms to provide favourable bond lengths and bond angles; and
(d) positioning sidechain rotamers; thereby outputting a conformation of the protein.
The method constructs the conformation of the protein in O(NlogN) time, and it is constructed in real space and not confined to a lattice. The conformation is preferably an all atom protein structure, including hydrogen atoms. The method may further comprise assembling different conformations of the protein to provide an ensemble of conformations of the protein. The ensemble may be incorporated in a database which may comprise from about 50,000 to 500,000 different conformations of the protein.
Another aspect of the invention is a computer-implemented process for identifying a conformation of a protein of known or unknown structure from an amino acid sequence of the protein. The steps of the process performed by the computer include (a) constructing a backbone structure of &agr;-carbons of the protein by adding and removing carbon atoms through chain elongation and backtracking, wherein an atom is positioned based on a predicted two-dimensional space, and wherein backtracking removes an atom if it is closer to its neighbour than allowed by van der Waals radii; (b) positioning &bgr; carbons, C, N, and O atoms to provide favourable bond lengths and bond angles; and (c) positioning sidechain rotamers; thereby identifying a conformation of the protein.
Another aspect of the invention is part of a computer system for creating or identifying a conformation of a protein of known or unknown structure from an amino acid sequence of the protein. This part of the computer system includes (a) means for constructing a backbone structure of &agr;-carbons of the protein by adding and removing carbon atoms through chain elongation and backtracking, wherein an atom is positioned based on a predicted two-dimensional space, and wherein backtracking removes an atom if it is closer to its neighbour than allowed by van der Waals radii; (b) means for positioning &bgr; carbons, C, N, and O atoms to provide favourable bond lengths and bond angles; and (c) means for positioning sidechain rotamers.
Another aspect of the invention is part of a computer system for identifying favorable areas of
Feldman Howard
Hogue Christopher
Merchant & Gould P.C.
Mount Sinai Hospital
Zeman Mary K.
LandOfFree
Method to construct protein structures does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method to construct protein structures, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method to construct protein structures will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2942307