Method for comparison of DNA base sequences

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C435S006120, C702S027000, C702S019000

Reexamination Certificate

active

06662115

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method for comparing of DNA base sequences and a method for searching for DNA base sequences. In particular, it relates to a method for high-sensitivity detection of similarities between DNA base sequences and a method for estimation of an amino acid sequence coded for by a DNA base sequence.
2. Description of the Related Art
In recent years, there has been the following increasing trend: the DNA base sequences of various organisms are determined and the function of a protein coded by each DNA base sequence is analyzed. The DNA base sequence is a sequence of four kinds of bases A, C, G and T, and portions of the DNA base sequence code for biofunctional proteins, respectively. Of these proteins, those having an important function can be utilized, for example, for design and development of drugs, and there is desired a technique for accurately estimating the function of the protein coded for by the DNA base sequence. In general, the determination of the DNA base sequence is technically easier than experimental protein sequencing.
The function of a protein coded by a newly determined DNA base sequence is estimated as follows: the DNA base sequence is translated into an amino acid sequence (which permits protein sequencing) by using the well-known codon table (each of the starting point of translation into amino acids, the terminating point of translation into amino acids and the kinds of amino acids are prescribed in terms of a triplet nucleotide unit (a codon unit)), and the result of the protein sequencing is compared with data on a protein having a known function, to judge whether the proteins are similar or not.
In a DNA base sequence, the exon region coding for protein information is a region to be translated into amino acids. The codons are unequivocally translated into the amino acids. When the direction of translation of the DNA base sequence and the translation starting point are known, the DNA base sequence can be translated into an amino acid sequence, i.e., a protein by picking out triplets of successive nucleotides from the DNA base sequence in succession. However, if there is an error due to a nucleotide insertion or deletion in the DNA base sequence, the exon region of the DNA base sequence is shifted. Since the DNA base sequence is translated into amino acids as codon units, it is translated into completely different amino acids if a nucleotide insertion or deletion is present.
For comparing two DNA base sequences by translating them into amino acid sequences, respectively, and comparing these translated amino acid sequences, the translated amino acid sequences should be determined from the respective DNA base sequences.
FIG. 1
is a diagram illustrating 6 kinds of reading frames in a DNA base sequence in the translation of the DNA base sequence into an amino acid sequence [(first prior art): for example, reference 1: Biotechnology textbook series 11 “Introduction of Computer in Biotechnology” written by Haruki Nakamura and Kenta Nakai, pp. 66-67 (1995), CORONA PUBLISHING CO., LTD., Tokyo, Japan)].
The 6 kinds of the reading frames are as follows:
Frame (1): a frame according to which a DNA base sequence is translated into an amino acid sequence as codon units from the 5′-terminal of the DNA base sequence.
Frame (2): a frame according to which the DNA base sequence is translated into an amino acid sequence as codon units while shifting the starting position of each codon by one base from that in frame (1).
Frame (3): a frame according to which the DNA base sequence is translated into an amino acid sequence as codon units while shifting the starting position of each codon by two bases from that in frame (1).
Frame (4): a frame according to which the translation of a sequence complementary to the DNA base sequence into an amino acid sequence as codon units is initiated from the 5′-terminal of the complementary sequence.
Frame (5): a frame according to which the complementary sequence is translated into an amino acid sequence as codon units while shifting the translation starting position by one base from that in frame (4).
Frame (6): a frame according to which the complementary sequence is translated into an amino acid sequence as codon units while shifting the translation starting position by two bases from that in frame (4).
From frame (1) to frame (3), the translation starting position is shifted base by base from the 5′-terminal. From frame (4) to frame (6), the translation starting position is shifted base by base from the 5′-terminal of the sequence complementary to the original DNA base sequence (the 3′-terminal of the original DNA base sequence). Therefore, there are the six kinds of reading frames (1) to (6). A DNA base sequence is translated into an amino acid sequence by employing each of frames (1) to (6). Amino acid sequences translated from two DNA base sequences, respectively, by employing the same frame are compared. Thus, 6 kinds, in all, of amino acid sequences translated from one of the DNA base sequences are compared from those translated from the other DNA base sequence.
As a typical program for searching similar sequences, there is widely known BLAST developed by Altshul et al. of NCBI, a branch of U.S. NIH, the source program of which has been disclosed (see, for example, the first reference, pages 141 to 143). The BLAST family includes BLASTN for comparing DNA base sequences, BLASTP for comparing amino acid sequences, BLASTX for searching for each of 6 kinds of amino acid sequences mechanically translated from a DNA base sequence according to each of the above-mentioned 6 kinds of frames, by using an amino acid sequence data base, and TBLASTX for mechanically translating each of a query DNA base sequence as a first DNA base sequence and a DNA base sequence read out of a DNA base sequence data base (a target DNA base sequence) as a second DNA base sequence according to each of the above-mentioned 6 kinds of the frames, and comparing 36 combinations of 6 kinds of amino acid sequences translated from the first DNA base sequence and 6 kinds of amino acid sequences translated from the second DNA base sequence. In the case of the BLAST family, high-speed pattern matching of a base sequence having a definite length in a query DNA base sequence with a target DNA base sequence was carried out at first, and a region similar to the query DNA base sequence is detected on the basis of the position of a base sequence with a definite length detected in the target DNA base sequence.
In the Smith-Waterman method, each base of a query DNA base sequence is compared with each base of a target DNA base sequence, a score (a similarity) suitable for the combination of the two bases is given, the scores (similarities) thus given are accumulated, and there is sought a path (an alignment) in which the accumulated score (similarity) becomes maximum [(third prior art): for example, reference 2: “Identification of Common Molecular Subsequences”, J. Mol. Biol.,147 (1981), pp. 195-197].
In the third prior art, the combinations of two bases of two DNA base sequences, respectively, are compared by a dynamic programming method, and scores between the two DNA base sequences are determined. When a DNA base sequence similar to a specific noted DNA base sequence (hereinafter referred to as “query DNA base sequence” or “first DNA base sequence”) is searched for in a DNA base sequence data base, a matrix is formed by aligning the bases of the query DNA base sequence (number of bases: M) in regular order from the 5′-terminal along a first axis (for example, x-axis) and the bases of a DNA base sequence (number of bases: N) read out of the DNA base sequence data base (hereinafter referred to as “target DNA base sequence” or “second DNA base sequence”) in regular order from the 5′-terminal along a second axis (for example, y-axis) (in the present specification, such a matrix is hereinafter referred to “score matrix”) (FIG.
2
).
FIG. 2

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for comparison of DNA base sequences does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for comparison of DNA base sequences, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for comparison of DNA base sequences will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3120452

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.