Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-06-14
2004-04-06
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
Reexamination Certificate
active
06718325
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates generally to methods and apparatus for matching strings.
2. Background Art
Approximate string matching techniques are used in searching for strings that match a query term. Approximate string matching is an important task in applications such as spell checking in text processors, information retrieval in databases, net and latch mapping in computer-aided design, protein and nuclei sequence identification in molecular biology, and handwriting and speech recognition. Approximate string matching techniques involve finding occurrences of a pattern string P=p
1
p
2
. . . p
m
in a text string T=t
1
t
2
. . . t
n
, where t
i
, p
i
belong to some known alphabet. Approximate string matching techniques find all locations j in T such that there is a suffix of T=[1 . . . j] matching P with k or fewer differences, where k is greater than or equal to zero. When k is zero, the matching scheme is said to be exact. Approximating string matching techniques have been studied extensively in the field of computer science. See, for example, Amihood Amir and Martin Farach, “Efficient 2-dimensional approximate matching of non-rectangular figures,” Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms, 1991, pages 212-223, and Richard Cole and Ramesh Hariharan, “Approximate String Matching: A Simpler Faster Algorithm,” Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, 1998, pages 463-472.
Approximate string matching techniques involve computing the “edit distance” between two strings. The “edit distance” between two strings is the minimum number of insertions, deletions, and substitutions required to convert one string to the other. The objective of approximate string matching techniques is to determine the cost edit distance, i.e., the minimum number of edit operations, required to transform one string to the other. The most common method of computing cost edit distance is dynamic programming. See, for example, Gene Myers, “A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming” Journal of the ACM, Vol. 46, No. 3, May 1999, pages 395-415. The exact nature of dynamic programming is known and will not be discussed in this application. There are other methods for determining cost edit distance which do not involve dynamic programming. For example, U.S. Pat. No. 5,761,538 issued to Hull discloses a method for estimating cost edit distance which includes equalizing the lengths of two strings by adding padding elements to the shorter one of the strings. The two strings are sorted according to their element values. Then a sum of substitution costs of the elements in corresponding positions in the sorted strings are calculated. The sum of the substitution costs are then set as the lower bound estimate of the cost edit distance.
SUMMARY OF THE INVENTION
In one aspect, the invention is a method for comparing two delimited strings, each of which has a plurality of substrings. The method comprises pairing each substring in one of the delimited string with a corresponding substring in the other one of the delimited strings, computing a proximity value for each pair of substrings, and computing a set of decaying weights corresponding to the pairs of substrings. The method further comprises multiplying the proximity value for each pair of substrings by the corresponding weight and summing the weighted proximity values to obtain a strength of match between the delimited strings.
Other aspects and advantages of the invention will be apparent from the following description and the appended claims.
REFERENCES:
patent: 4743458 (1988-05-01), Gellman et al.
patent: 5216627 (1993-06-01), McClellan et al.
patent: 5276616 (1994-01-01), Kuga et al.
patent: 5553272 (1996-09-01), Ranganathan et al.
patent: 5761538 (1998-06-01), Hull
patent: 5832480 (1998-11-01), Byrd et al.
patent: 6026398 (2000-02-01), Brown et al.
patent: 6038561 (2000-03-01), Snyder et al.
patent: 6076088 (2000-06-01), Paik et al.
patent: 6085186 (2000-07-01), Christianson et al.
patent: 6098034 (2000-08-01), Razin et al.
patent: 6137911 (2000-10-01), Zhilyaev
patent: 6377945 (2002-04-01), Risvik
R. Cole and R. Hariharan; “Appproximate String Matching: A Simpler Faster Algorithm,” Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms, 1998, pp. 463-472.
G. Myers, “A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming,” Journal of the ACM, vol. 46, No. 3, May 1999, pp. 395-415.
A. Amir and M. Farach, “Efficient 2-dimensional approximate matching of non-rectangular figures,” Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms, 1991, pp. 212-223.
Amsbury Wayne
Rosenthal & Osha L.L.P.
Sun Microsystems Inc.
LandOfFree
Approximate string matcher for delimited strings does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Approximate string matcher for delimited strings, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Approximate string matcher for delimited strings will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3245831