Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-02-16
2001-07-03
Choules, Jack (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C435S069100, C435S068100, C702S022000
Reexamination Certificate
active
06256647
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method of searching a database of three-dimensional protein structures (hereinafter simply referred to as a “protein structure database”), and particularly to a method of searching a protein structure database through use of peripheral distributions of distance maps.
2. Description of the Related Art
The three-dimensional structure of a protein provides various kinds of information in terms of pharmacology and physical chemistry, as well as important information in terms of biology. With recent progress in structure determination techniques, the number of entries in a protein structure database has increased drastically. One technique for analyzing proteins is comparison analysis in which similar structures are compared to each other. Comparative analysis requires a technique for searching a structure database of huge size for structures resembling a three-dimensional structure obtained by a researcher.
SUMMARY OF THE INVENTION
In view of the foregoing, an object of the present invention is to provide a method of searching a protein structure database with peripheral distributions of distance maps, where a protein structure, which is three-dimensional information, is converted into one-dimensional information called peripheral distribution and then subjected to a dynamic programming algorithm (DP). The method can realize high speed search with high detection sensitivity.
In order to achieve the above object, the present invention provides a method of searching a database of three-dimensional protein structures, comprising the steps of setting a three-dimensional protein structure; forming a two-dimensional distance map based on the three-dimensional protein structure; forming a one-dimensional peripheral distribution based on the distance map; and comparing the one-dimensional peripheral distribution with that for another three-dimensional protein structure by use of a dynamic programming algorithm.
Preferably, the distance map is a two dimensional image and has a structure of a triangular matrix in which respective columns or respective rows correspond to respective residues of a protein; the i-th row corresponds to the i-th amino acid residue counted from the N terminal end, and the j-th column corresponds to the j-th amino acid residue counted from the N terminal end; each element (i, j) of the matrix corresponds to the distance between the i carbon of the i-th residue and the x carbon of the j-th residue; and when the distance is smaller than or equal to a given threshold r
0
, a dot is assigned to that portion, and when the distance is greater than the threshold r
0
, a blank space is assigned to that portion, which operation is performed for each element in order to complete a binary distance map.
Preferably, the peripheral distribution is composed of a vertical peripheral distribution obtained in the form of a distribution of the frequency of dots at respective rows in a binary distance map and a horizontal peripheral distribution obtained in the form of a distribution of the frequency of dots at respective columns in the binary distance map.
Preferably, for comparison between peripheral distributions, an alignment score obtained by the dynamic programing algorithm is used as a similarity between corresponding protein structures.
A two-dimensional matrix, D, is required for the comparison of peripheral distributions. Each element of the matrix D is preferably obtained by solving the following recurrence equation:
D
i,j
=max {D
i−1, j−1
+s
i, j,
D
i−1, j
−g, D
i, j−1
−g}
where
S
i,j
indicates the similarity between the i-th element of the peripheral distribution of protein A and the j-th element of the peripheral distribution of protein B; and
g=5 : gap penalty (however, g=0 at the boundary)
Through the solution of the equation, the similarity is accumulated from the upper left corner toward the lower right corner of the matrix D, considering insertion and deletion. Then, the similarity between two peripheral distributions is obtained as a value for the element of the lower right corner of the matrix D.
s
i,j
is obtained by the following equation:
S
i,j
=a /{(N
A
i
-N
B
j
)
2
+b}+a/{(C
A
i
-C
B
j
)
2
+b}
where
N
A
i
indicates the j-th frequency of the vertical peripheral distribution of protein A;
C
A
i
indicates the i-th frequency of the horizontal distribution of protein A;
N
B
j
indicates the j-th frequencies of the vertical peripheral distributions of protein B;
C
B
j
indicates the j-th frequencies of the horizontal peripheral distribution of protein B; and
where a=50, and b=2.
Preferably, a dot frequency R of a distance map is defined as follows:
R=number of dot elements in a distance map/total number of elements in the distance map; and
the threshold is determined such that the dot frequency R falls within a predetermined range, and thus the detection sensitivity is increased.
More preferably, the threshold is determined such that the dot frequency R falls within the range of 0.12 to 0.16.
REFERENCES:
patent: 5518911 (1996-05-01), Abo et al.
patent: 5752019 (1998-05-01), Rigoutsos et al.
patent: 5787279 (1998-07-01), Rigoutsos et al.
patent: 5824490 (1998-10-01), Coffey et al.
patent: 5878373 (1999-03-01), Cohen et al.
patent: 5884230 (1999-03-01), Srinivasan et al.
patent: 5950192 (1999-09-01), Moore et al.
patent: 6048706 (2000-04-01), Abo et al.
patent: 6111582 (2000-08-01), Jenkins
patent: 6141655 (2000-10-01), Johnson et al.
Biomolecular Engineering Research Institute
Choules Jack
Le Debbie M.
Oblon & Spivak, McClelland, Maier & Neustadt P.C.
LandOfFree
Method of searching database of three-dimensional protein... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of searching database of three-dimensional protein..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of searching database of three-dimensional protein... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2510767