Image analysis – Pattern recognition – Template matching
Reexamination Certificate
1998-04-30
2001-08-28
Johns, Andrew W. (Department: 2621)
Image analysis
Pattern recognition
Template matching
Reexamination Certificate
active
06282318
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to a method and computer system for combining pattern matching and optimization, for example, providing an efficient match between the elements of two data files.
FIELD OF THE INVENTION
Our work relates to methodology and systems for combining disparate techniques, namely, pattern matching and optimization. This work is of interest in solving problems that can arise in the following three illustrative areas.
First, as companies merge and form partnerships that include the sharing of electronic data, it often becomes necessary to reconcile two or more databases that may contain data related to the same set of items. The individual databases may represent this data in different ways, for example, using different part numbers to identify the same item. Since these databases are typically very large, manually examining every record to determine the correct mapping is impractical. The consequences of an incorrect mapping can be severe, resulting in, for example, incorrect manufacturing processes or incorrect customer orders.
In addition to the manufacturing example cited above, there are numerous other instances in which it is desirable to compare two sets of elements and determine when an element in one set is the same as an element in another set. Consider the problem of merging two mailing lists, in which there may be slight differences in the spelling or punctuation of names and addresses. To reduce mailing costs, duplicates should be merged into a single entry, but to ensure completeness of the mailing list, and the accuracy of any additional data associated with the persons on the mailing list (e.g., buying history) it is important to avoid merging records associated with different individuals.
A third example occurs when lists of people are merged. For example, two agencies might wish to merge client or customer lists in order to detect fraud or incomplete data. For example, a court system might want to compare its list of potential jury members with voter rolls or tax rolls. If the court system finds a voter or tax payer who is not in the list of potential jury members, it represents a person who might be added to the list of potential jury members.
SUMMARY OF THE INVENTION
We have now discovered novel methodology and system construct that can combine the disparate technologies of pattern matching and optimization, to an end of addressing and solving the foregoing illustrative problems arising in the contemplated situation of reconciling two or more databases.
In particular, we disclose a method and system for efficiently computing a full or partial matching, that is, a one-to-one mapping, between two sets of elements, preferably based on one or more attributes associated with each of the elements. Our method is especially applicable in cases where there is more than one candidate match for some of the elements; in this case, our method can produce a matching, or partial matching, that is unlikely to have incorrect matches.
In a first aspect, the present invention discloses a method for matching the elements of two data files utilizing a programmable digital computer for performing the steps of:
a) reading the data elements and corresponding attributes for each of the two data files;
b) performing pattern matching on the elements and the corresponding attributes of each of the two files read in step a;
c) performing optimization on the results of step b for finding a best total or partial matching of the elements of the two files; and
d) outputting a file selected from the group consisting of the matches produced by step c, and a file containing the elements that are not matched.
In a second aspect, the present invention discloses a computer system suitable for matching the elements of two data files, comprising:
means for reading the data elements and corresponding attributes for each of the two data files;
means for performing pattern matching on the elements and the corresponding attributes of each of the two files read;
means for performing optimization on the results for finding a best total matching of the elements of the two files; and
means for outputting a file selected from the group consisting of the matches produced, and a file containing the elements that are not matched.
In a third aspect, the present invention discloses a method for matching the elements of two data files utilizing a programmable digital computer for performing the steps of:
a) reading the data elements and corresponding attributes for each of the two data files;
b) using relevant attributes for computing a similarity measure between some elements in the first file and some elements in the second file;
c) constructing using the similarity measure a set of possible matches for at least one element in one data file, the set of possible matches comprising elements from the other data file;
d) selecting an element that has at least one potential match and choosing from among the potential matches a best possible match;
e) deleting the selected element in step d from its data file, and the matched element in step d from each set of possible matches;
f) repeating step d and step e until a stopping criterion is met; and
g) outputting a file selected from the group consisting of the matches produced by steps d and e, and a file containing the elements that are not matched along with their remaining potential matches.
For this third aspect of the present invention, we note the following preferred embodiments.
The attributes in step b preferably are matched using a similarity measure selected from the group consisting of numerical distance, lexicographic distance, Euclean or non-Euclidean metric, and perceptual distance.
The elements in the first data file may be matched with elements in the second data file using a combination of similarity measures that apply to the elements' attributes.
Step c may comprise constructing a set of possible matches based on the similarity measure exceeding a predefined minimum value, or alternatively, having a predefined maximum number of elements comprising the elements with the largest similarity values.
With respect to step d, the method may comprise selecting the element that has the fewest possible matches. Also the method may comprise selecting the element that has the match with the highest similarity measure. Alternatively, step d may comprise selecting the element that has the greatest difference between the similarity measure of the match with the highest similarity measure and the similarity measure of the match with the second highest similarity measure.
With respect to the step f stepping criteria, one may alternatively:
f) stop when the only matches that are left have a similarity measure that is below a predefined minimum value;
f) stop when the only elements that are left have two or more matches and the difference between the similarity measure of the match with the highest similarity measure and the match with the second highest similarity measure is less than a predefined minimum value;
f) stop when alternate f
1
applies to all elements that only have 1 possible match and alternate f
2
applies to all other elements;
f) stop when a predetermined amount of time has elapsed;
f) stop in accordance with a predetermined amount of time measured by processor time;
f) stop in accordance with a predetermined amount of time measured by elapsed wall-clock time; or
f) stop when a predefined percentage of elements in either the first set or the second set has been matched.
REFERENCES:
patent: 4561105 (1985-12-01), Crane et al.
patent: 5588028 (1996-12-01), Parizhsky
patent: 5784490 (1998-07-01), Akra et al.
patent: 5793888 (1998-08-01), Delanoy
patent: 5915250 (1999-06-01), Jain et al.
Dietrich Brenda Lynn
Dietrich, Jr. Walter C.
Azarian Seyed
International Business Machines - Corporation
Johns Andrew W.
Kaufman, Esq. Stephen C.
McGinn & Gibb PLLC
LandOfFree
Method and system for combining pattern matching and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for combining pattern matching and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for combining pattern matching and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2552004