Method and apparatus for high-speed approximate sub-string...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C706S013000, C709S214000, C712S020000, C712S203000

Reexamination Certificate

active

06223175

ABSTRACT:

TECHNICAL FIELD
The present disclosure relates to sequence searching, and more specifically to computer implemented sequence searching using parallel processing for purposes such as the human genome effort.
BACKGROUND
In 1983, Doolittle and colleagues reported similarity between a newly discovered oncogene and a normal gene for growth. Based on this similarity, these researchers concluded that cancer is probably caused by mutation of an otherwise normal growth gene. This incident demonstrates the value of genetic database search techniques.
A pair of similar DNA sequences usually represent code for similar strands of amino acids and therefore express similar functions or structures. When a new strand of DNA is sequenced, the strand homology to a well-studied and well-documented strand of DNA stored in the DNA database usually provides the first clue as to the new strand's function. Instead of testing and analyzing the coded protein and generations of bacteria, biologists can search the database for similar sequences. The researchers can then design experiments to test the results of the search.
As a result of the enormous improvement of DNA sequencing technology, the rate of growth of the DNA database has grown exponentially over the last decade from 1.5 million nucleotides per year in 1989 to an expected 1.6 billion nucleotides per year in 1999. However, this boom in the genetic database poses serious problems for conventional database search methods. These conventional methods are based on heuristic or dynamic programming techniques which typically require time in the order is of N×M, where N is the length of the database sequence being searched and M is the length of the target sequence being searched for. Two examples of heuristic search techniques are FASTP described by D. J. Lipman, et al., in “Rapid and sensitive protein similarity searches”, Science 227: 1435-1441, 1985, and BLAST described by Stephen Altschul, et al., in “Basic local alignment search tool”, J. of Molecular Biology, 215:403-410, 1990. Dynamic programming techniques are described by T. F. Smith, et al., in “Identification of Common Molecular Subsequences”, J. of Molecular Biology, 147:195-197.
SUMMARY
The present disclosure describes methods and apparatus implementing a technique for searching in a source sequence for a target sequence using parallel processing. The technique preferably provides an indication of similarity between the target sequence and sub-strings within the source sequence.
In one aspect, the technique includes supplying each symbol of the target sequence to a corresponding processing element, where the processing elements are connected to form an array and at least one processing element supplies an accumulated score to a succeeding processing element in the array; supplying a symbol from the source sequence in parallel to all of the processing elements, so that each processing element receives the same source sequence symbol; determining a score based on the source sequence symbol and the processing element's target sequence symbol for each processing element; adding the score for a processing element to the accumulated score received by that processing element from a preceding processing element to generate a local score; comparing the local score to an accumulation value; when the local score is less than the accumulation value, supplying a default accumulation score to the succeeding processing element as the accumulated score; and when the local score is not less than the accumulation value, supplying the local score to the succeeding processing element as the accumulated score.
In another implementation, a system for comparing a target sequence of symbols having a first number of symbols against a source sequence of symbols having a second number of symbols, includes a processor array for calculating an accumulated score based on comparing symbols of the target sequence to symbols of the source sequence, and a result storage table connected to the processor array for storing accumulated scores. The processor array includes a plurality of processing elements, where: at least one processing element is connected to another processing element, all the processing elements receive a common symbol from the source sequence, and each processing element receives a different symbol from the target sequence.
The design of the processing elements provides a dense, inexpensive, and very fast solution to sub-string searching. These processing elements can provide inexpensive database searches at speeds of 100 million comparisons per second. In addition, with minimal modification, the processing elements can be modified to provide bidirectional searching, effectively doubling the number of comparisons.


REFERENCES:
patent: 5251131 (1993-10-01), Masand et al.
patent: 5335106 (1994-08-01), Paquin et al.
patent: 5432934 (1995-07-01), Levin et al.
patent: 5632041 (1997-05-01), Peterson et al.
patent: 5706498 (1998-01-01), Fujimiya et al.
patent: 5964860 (1999-10-01), Peterson et al.
patent: 6058385 (2000-05-01), Koza et al.
patent: 6108648 (2000-08-01), Lakshmi et al.
patent: 6137839 (2000-10-01), Mannering et al.
Huang H. et al., “Gene Alert—a sequence search results keyword parser,” IEEE Engineering in Medicine and Biology Magazine, Mar.-Apr. 1998, vol. 17, Issue: 2, pp. 119-122.*
Roos, Thomas et al., “A Data Structure of Approximate String Searching,” Proceedings of the Twenty-Seventh Annual Hawaii International Conference on System Science, Volume. V, Jan. 4-7, 1994, pp. 45-46.*
Sastry, Raghu et al., “A Systolic Array for Approximate String Matchine,” Proceedings of the 1993 IEEE International Conference on Computer Design: VLSI in Computers and Processors, ICCD '93, Oct. 3-6, 1993, pp. 402-405.*
Yoshikawa Takanobu et al., “On the Implementation of a Phylogenetic Tree Database,” 1999 IEEE Pacific Rim Conference on Communications, Computers and Signal Processings, Aug. 22-24, 1999 pp. 42-45.*
T.F. Smith and M. S. Waterman; Identification of Common Molecular Subsequences; Mar. 25, 1981; Journal of Molecular Biology, 147:195-197.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for high-speed approximate sub-string... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for high-speed approximate sub-string..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for high-speed approximate sub-string... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2529521

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.