Method and apparatus for extracting and evaluating mutually...

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C436S063000, C702S027000

Reexamination Certificate

active

06370479

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to method and apparatus for extracting and evaluating mutually coinciding or similar portions between sequences of atoms or atomic groups in molecules and/or between three-dimensional structures of molecules and, particularly to a method and apparatus for automatically extracting and evaluating mutually coinciding or similar portions between amino acid sequences in protein molecules and/or between three-dimensional structures of protein molecules.
2. Description of the Related Art
A gene is in substance DNA, and is expressed as a base sequence including four bases of A (adenine), T (thymine), C (cytosine), and G(guanine). There are about twenty types of amino acids constituting an organism, and it has been shown that arrangements of three bases correspond to the respective amino acids. Accordingly, it has been found out that the amino acids are synthesized according to the base sequences of the DNA in the organism and that a protein is formed by folding the synthesized amino acids. The arrangement of amino acids is expressed as an amino acid sequence in which the respective amino acids are expressed in letters similar to the base sequence.
A method for determining a sequence of bases and amino acids has been established together with the development of molecular biology, and therefore a huge amount of gene information including a base sequence data and an amino acid sequence data has been stored. Thus, in the field of gene information processing, a core subject has been how to extract biological information concerning the structure and function of the protein out of the huge amount of stored gene information.
A basic technique in extracting the biological information is to compare the sequences. This is because it is considered that a similarity is found in the biological function if the sequences are similar. Accordingly, by searching a data base of known sequences whose functions are known for a sequence similar to an unknown sequence a homology search for estimating a function of an unknown sequence, and an alignment such that a sequence is rearranged so as to maximize the degree of analogy between the compared sequences when researchers compare the sequences are presently studied.
Further, it is considered that a region of the sequence, in which a function important for the organism is coded, is perpetuated in the evolution process. For instance, a commonly existing sequence pattern (region) is known to be found when the amino acid sequences in proteins having the same function are compared between different types of organisms. This region is called a motif. Accordingly, if it is possible to extract the motif automatically, the property and function of the protein can be shown by finding which motif is included in the sequence. Further, the automatic motif extraction is applicable to a variety of protein engineering fields such as strengthening of the properties of the preexisting proteins, addition of functions to the preexisting proteins, and synthesis of new proteins. As described above, it can be considered as an effective means in extracting the biological information to extract the motif out of the amino acid sequence. However, the extracting method is not yet established, and the researchers currently decide manually which part is a motif sequence after the homology search and alignment.
A dynamic programming technique that is used in a voice recognition processing has been the only method used for automatically comparing two amino acid sequences.
However, according to the method of comparing the amino acid sequences using the dynamic programming technique, the amino acid sequences are compared two-dimensionally. Thus, this method requires a large memory capacity and a long processing time.
Meanwhile, in the fields of physics and chemistry, in order to examine the properties of a new (unknown) substance and to produce the new substance artificially, three-dimensional structures of substances are determined by a technique such as an X-ray crystal analysis or an NMR analysis, and information on the determined three-dimensional structures is stored in a data base. As a typical data base, a PDB (Protein Data Bank) in which three-dimensional structures of proteins or the like identified by the X-ray crystal analysis of protein are registered is widely known and universally used. Further, a CSD (Cambridge Structural Database) is known as a data base in which chemical substances are registered.
In the protein, a plurality of amino acids are linked to one another as a single chain and this chain is folded in an organism to thereby form a three-dimensional structure. In this way, the protein exhibits a variety of functions. The respective amino acids are expressed by numbering them from an N-terminal through a C-terminal. These numbers are called amino acid numbers, amino acid sequence numbers, or amino acid residue numbers. Each amino acid includes a plurality of atoms according to the type thereof. Therefore, there are registered names and administration numbers of protein, amino acid numbers constituting the protein, types and three-dimensional coordinates of atoms constituting the respective amino acids, and the like in the PDB.
It is known that the three-dimensional structure of the substance is closely related to the function thereof from the result of chemical studies conducted thus far, and a relationship between the three-dimensional structure and function is shown through a chemical experiment in order to change the substance and to produce a substance having a new function. Particularly, since a structurally similar portion (or a specific portion) between the substances having the same function is considered to influence the function of the substance, it is essential to discover a similar structure commonly existing in the three-dimensional structures.
However, since there is no method of extracting a characteristic portion directly from the three-dimensional coordinate, the researchers are at present compelled to express the respective three-dimensional structures in a three-dimensional graphic system and to search the characteristic portion manually. There is in general no method of determining an orientation of the substance, and thus the characteristic portion is searched while rotating one substance using the other substance as a reference, which requires a substantial amount of time.
When the researcher searches the similar three-dimensional structure, an r.m.s.d (root mean square distance) value is used as a scale of the similarity of the three-dimensional structures of the substances. The r.m.s.d value is a value expressing a square root of a mean square distance between the corresponding elements constituting the substances. Empirically, the substances are thought to be exceedingly similar to each other in the case where the r.m.s.d value between the substances is not greater than 1 Å.
For instance, it is assumed that there are substances expressed by a point set A={a
1
, a
2
, . . . , a
i
, . . . , a
m
} and a point set B={b
1
, b
2
, . . . , b
j
, . . . , b
n
}, wherein a
i
(i=1, 2, . . . , m) and b
j
(i=1, 2, . . . , n) are vectors expressing positions of the respective elements in the three-dimensional space. The elements constituting these substances A and B are related to each other, and the substance B is rotated and moved so that the r.m.s.d value between the corresponding elements is minimized. For example, if a
k
is related to b
k
(k=1, 2, . . . , n), the r.m.s.d value is obtained in the following equation (1) wherein U denotes a rotation matrix and W
k
denote respective weights:
r
.


m
.


s
.


d
.
=
(

k
=
1
n

(
w
k

(
Ub
k
-
a
k
)
2
)
)
1
2
n
(
1
)
A technique of obtaining the rotation and movement of the substances, which minimizes the r.m.s.d value between these corresponding points, is proposed by Kabsh et al. (for example, refer to “A Solution for the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for extracting and evaluating mutually... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for extracting and evaluating mutually..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for extracting and evaluating mutually... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2900031

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.