Learning edit distance costs

Data processing: artificial intelligence – Machine learning

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06295524

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to data string comparisons and specifically to the learning of optimal costs for string edit distances. The invention has application in varied fields, including speech recognition, database retrieval and molecular biology to compare DNA sequences.
BACKGROUND OF THE INVENTION
In the art of edit distance matching or dynamic programming matching, it is known that if there are two data strings that are different, it is useful to measure how similar or different the data strings are. Edit distance between two finite strings over a finite alphabet finds the least costly way to transform the first string into the second string into the second string by single-symbol insertions, deletions and substitutions each with a predetermined cost stored in a table in memory.
These prior art methods can be thought of as a black box into which two strings are applied as inputs along with a predetermined cost table. The box's output is a number representing the least costly way to transform the first string into the second.
SUMMARY OF THE INVENTION
The present invention improves the process for determining the edit distance costs or similarity of two data strings by introducing a procedure optimizing or minimizing the cost based upon a learning algorithm relying upon examples of similar strings, i.e. learning data. The procedure begins with a computer, a training list of string pairs and a memory containing an initial cost table. An algorithm causes the memory to provide an improved cost table with reference to the training list and the initial cost table. In this manner, the cost table is learned (from the training list of string pairs) rather than being a predetermined fixed or engineered table of entries. The table output is improved based on the sense of the summer distance between each training pair.
This process may be repeated to further improve the cost table using the previously learned cost table as the initial cost table. Edit distance has many widespread applications, the present invention provides a simple method to learn costs which may improve performance in many applications.
A principal object of the invention is the provision of a method of learning costs of string edit distances.
Further and still other objects of the invention will become more clearly apparent when the following description is read in conjunction with the accompanying drawing.


REFERENCES:
patent: 4845610 (1989-07-01), Parvin
patent: 5459739 (1995-10-01), Handley et al.
patent: 5757959 (1998-05-01), Lopresti
patent: 5761538 (1998-06-01), Hull
patent: 5774588 (1998-06-01), Li
patent: 5832474 (1998-11-01), Lopresti et al.
D. Smith and E. Pierzchala, “An Algorithm and Architecture for Approximate String Matching,” Proc. 33rd Midwest Symposium on Circuits and Systems, vol. 2, pp. 736-739, Aug. 1990.*
H. Bunke and J. Csirik, “Inference of edit costs using parametric string matching,” Proc. 11th IAPR Int'l. Conf. on Pattern REcognition, pp. 549-552, Sep. 1992.*
E.J. Ciaccio, et al., “Biosignal Pattern Recognition and Interpretation Systems,” IEEE Engineering in Medicine and Biology, pp. 129-135, Feb. 1994.*
H. Bunke and J. Csirik, “Parametric String Edit Distance and its Application to Pattern Recognition,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 25(1), pp. 202-206, Jan. 1995.*
J.D. Golic and S.V. Petrovic, “Correlation Attacks on Clock-Controlled Shift Registers in Keystream Generators,” IEEE Trans. on Computers, vol. 46(4), pp. 482-486, Apr. 1996.*
H. Bunke and J. Csirik, “Edit Distance of Run-Length Coded Strings”, Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing: Technological Challenges of the 1990's, 137-43, Mar. 1992.*
A. Weigel and F. Fein, “Normalizing the Weighted Edit Distance”, Proceedings of the 12th IAPR International Conference on Pattern Recognition: Computer Vision & Image Processing, 399-402, Oct. 1994.*
A. Weigel et al., “Lexical Postprocessing by Heuristic Search and Automatic Determination of the Edit Costs”, Proceedings of the Third International Conference on Document Analysis and Recognition, 857-60, Aug. 1995.*
H. Bunke, “A Fast Algorithm for Finding the Nearest Neighbor of a Word in a Dictionary”, Proceedings of the Second International Conference on Document Analysis and Recognition, 632-37, Oct. 1993.*
D.T. Hoang, “Searcing Genetic Databases on Splash 2”, Proceedings of the IEEE Workshop on FPGAs for Custom Computing Machines, 185-91, Apr. 1993.*
H. Bunke and J. Csirik, “Inference of Edit Costs Using Parametric String Matching”, Proceedings of the 11th IAPR International Conference on Methodology and Systems, 549-52, Aug. 1992.*
E. Vidal et al., “Fast Computation of Normalized Edit Distances”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 899-902, Sep. 1995.*
H. Bunke and J. Csirik, “Parametric String Edit Distance and its Application to Pattern Recognition”, IEEE Transactions on Systems, Man and Cybernetics, 202-06, Jan. 1995.*
A. Marzal and E. Vidal, “Computation of Normalized Edit Distance and Applications”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 926-32, Sep. 1993.*
Microsoft Press, Computer Dictionary: Third Edition, 451, 1997.*
Merriam Webster's Collegiate Dictionary: Tenth Edition, 161, 663, 1997.*
Joseph B. Kruskal et al., “An Overview of Sequence Comparison”, Time Warps, String Edits, and Macromolecules the Theory and Practice of Sequence Comparison, Addison-Wesley Pub. Co. 1983, pp. 1-43.
Hall et al., “Approximate String Matching”, Computing Surveys, vol. 12, 1980, pp. 381-402.
NEC Research Institute, Inc. Technical Report, “Finite Growth Models”, TR #96-074, Aug. 1996.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Learning edit distance costs does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Learning edit distance costs, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Learning edit distance costs will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2529015

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.