Image analysis – Pattern recognition – Template matching
Reexamination Certificate
2000-06-26
2003-04-08
Mehta, Bhavesh (Department: 2625)
Image analysis
Pattern recognition
Template matching
C382S232000
Reexamination Certificate
active
06546136
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a document analysis system and more particularly to efficient techniques for matching one document to another.
Matching an electronic representation of one image to an electronic representation of another image is useful in many applications. For example, consider an automatic filing application in which document images are stored in directories that contain “similar” documents, where similarity is defined by the degree to which two images have significant areas in common.
There are several available approaches for matching document images. Most approaches can be characterized as consisting of two steps, feature extraction followed by matching of the extracted features to document images in a database. An input image is matched to a database image if they share a significant number of features.
The feature extraction technique used is critical to the performance of the matching system. Ideally, feature extraction should be fast, memory-efficient, and should result in a unique representation for the input image. The uniqueness of the representation assures that a given document image closely matches itself with a high probability and matches no other documents.
Examples of prior art feature extraction techniques used for document image matching operate based on e.g., image texture, character transition probabilities, sequences of consecutive word lengths, invariant relationships between graphic elements of a document, spacings between boxes surrounding connected sets of pixels, etc. What is needed is a document matching system based on feature extraction that improves on the prior art techniques in speed, memory efficiency, and uniqueness of representation.
SUMMARY OF THE INVENTION
A fast, memory efficient, and accurate document image matching system is provided by virtue of the present invention. In certain embodiments, document image matching is based on identifying anchor points of characters in the document. The document matching process includes a feature extraction step where anchor points, e.g., points representing approximate locations of characters, are identified as features for matching.
In a particularly efficient implementation, the anchor points are “pass codes” in a line-by-line compressed representation of a document image. A pass code within a compressed representation of a given line indicates that a run of white or black pixels present substantially above the pass code on a previous line is not found on a current line. CCITT Group III and Group IV facsimile coding standards are examples of compression schemes that make use of pass codes as may be exploited by the present invention.
Another feature provided by the present invention is the application of a modified Hausdorff metric to compare a set of anchor points found in an input document image and sets of anchor points previously identified for prospective matching document images. This metric has been found to be efficient to compute and robust to image degradation caused by photocopying.
A passcode based implementation has been found to provide fast and accurate matching even when given only one square inch patches of images to use for matching. This type of matching system may be easily embodied in a facsimile receiver where the appropriate compressed representation is already available.
REFERENCES:
patent: 4809081 (1989-02-01), Linehan
patent: 4941193 (1990-07-01), Barnsley et al.
patent: 5065447 (1991-11-01), Barnsely et al.
patent: 5245676 (1993-09-01), Spitz
patent: 5263136 (1993-11-01), DeAguiar et al.
patent: 5267047 (1993-11-01), Argenta et al.
patent: 5388167 (1995-02-01), Koga et al.
patent: 5414781 (1995-05-01), Spitz et al.
patent: 5465353 (1995-11-01), Hull et al.
patent: 5533144 (1996-07-01), Fan
patent: 5559942 (1996-09-01), Gough et al.
patent: 5574840 (1996-11-01), Kwatinetz et al.
patent: 5586196 (1996-12-01), Sussman
patent: 5623679 (1997-04-01), Rivette et al.
patent: 5708825 (1998-01-01), Sotomayor
patent: 6104834 (2000-08-01), Hull
patent: 6128102 (2000-10-01), Ota
patent: 6182062 (2001-01-01), Fujisawa et al.
Hull, Jonahan J., “Document Image Matching and Retrieval with Multiple Distortion-Invariant Descriptors”, International Association for Pattern Recognition Workshop on Document Analysis Systems, Series in Machine Perception and Artificial Intelligence, vol. 14, published by World Scientific Publishing Co. Pte. Ltd. 1995, pp. 379-396.
Huttenlocher, Daniel P., et al., “Comparing Images Using the Hausdorff Distance”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, No. 9, Sep. 1993, pp. 850-863.
Rucklidge, William, “Efficient Computation of the Minimum Hausdorff Distance for Visual Recognition”, Technical Report, Department of Computer Science Cornell University, Ithaca, New York, Sep. 1994, pp. 1-169.
Spitz, A. Lawrence, “Skew Determination in CCITT Group 4 Compressed Document Images”, pp. 11-25 (last reference 1989).
Mehta Bhavesh
Patel Kanji
Ricoh & Company, Ltd.
Townsend and Townsend / and Crew LLP
LandOfFree
Matching CCITT compressed document images does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Matching CCITT compressed document images, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Matching CCITT compressed document images will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3028995