Compressed document matching

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S181000, C345S427000

Reexamination Certificate

active

07359901

ABSTRACT:
An apparatus and method for determining if a query document matches one or more of a plurality of documents in a database. In a coarse matching stage, a compressed file or other query document is scanned to produce a bit profile. Global statistics such as line spacing and text height are calculated from the bit profile and used to narrow the field of documents to be searched in an image database. The bit profile is cross-correlated with bit profiles of documents in the search space to identify candidates for a detailed matching stage. If multiple candidates are generated in the coarse matching stage, a set of endpoint features is extracted from the query document for detailed matching in the detailed matching stage. Endpoint features contain sufficient information for various levels of processing, including page skew and orientation estimation. In addition, endpoint features are stable, symmetric and easily computable from commonly used compressed files including, but not limited to, CCITT Group 4 compressed files. Endpoint features extracted in the detailed matching stage are used to correctly identify a matching document in a high percentage of cases.

REFERENCES:
patent: 4292622 (1981-09-01), Henrichon, Jr.
patent: 4809081 (1989-02-01), Linehan
patent: 4985863 (1991-01-01), Fujisawa et al.
patent: 5278920 (1994-01-01), Bernzott et al.
patent: 5351310 (1994-09-01), Califano et al.
patent: 5375176 (1994-12-01), Spitz
patent: 5465353 (1995-11-01), Hull et al.
patent: 5579471 (1996-11-01), Barber et al.
patent: 5636294 (1997-06-01), Grosse et al.
patent: 5689585 (1997-11-01), Bloomberg et al.
patent: 5751286 (1998-05-01), Barber et al.
patent: 5761655 (1998-06-01), Hoffman
patent: 5768420 (1998-06-01), Brown et al.
patent: 5806061 (1998-09-01), Chaudhuri et al.
patent: 5809498 (1998-09-01), Lopresti et al.
patent: 5867597 (1999-02-01), Peairs et al.
patent: 5870754 (1999-02-01), Dimitrova et al.
patent: 5892808 (1999-04-01), Goulding et al.
patent: 5893095 (1999-04-01), Jain et al.
patent: 5915250 (1999-06-01), Jain et al.
patent: 5930783 (1999-07-01), Li et al.
patent: 5933823 (1999-08-01), Cullen et al.
patent: 5940824 (1999-08-01), Takahashi
patent: 5940825 (1999-08-01), Castelli et al.
patent: 5987456 (1999-11-01), Ravela et al.
patent: 5995978 (1999-11-01), Cullen et al.
patent: 6006226 (1999-12-01), Cullen et al.
patent: 6026411 (2000-02-01), Delp
patent: 6086706 (2000-07-01), Brassil et al.
patent: 6104834 (2000-08-01), Hull
patent: 6249604 (2001-06-01), Huttenlocher et al.
patent: 6268935 (2001-07-01), Kingetsu et al.
patent: 6363381 (2002-03-01), Lee et al.
patent: 0 581 971 (1994-02-01), None
patent: 06-168277 (1994-06-01), None
Marshall et al., “Text Retrieval—Windows File Indexers”, InfoWorld v15n21, pp. 123-140, May 24, 1993, ISSN: 0199-6649.
Chalana, Vikram, et al., Duplicate Document Detection in DocBrowse, SPIE Conference on Document Recognition V, Mathsoft Data Analysis Products Division, Seattle, Washington, U.S.A., 1998, 10 pages.
Chen, Francine, R., et al., Spotting phrases in lines of imaged text, SPIE Conference On Document Recognition, SPIE vol. 2422, Xerox Palo Alto Research Center, Palo Alto, California, U.S.A., 1995, pp. 256-269.
Chen, Francine R., et al., Detecting and Locating Partially Specified Keywords in Scanned Images using Hidden Markov Models, Proceedings of IDCAR, Xerox Palo Alto Research Center, Palo Alto, California, U.S.A., 1993, pp. 133-138.
Doermann, David, et al., The Detection of Duplicates in Document Image Databases, IEEE, Technical Report CS-TR-3739, University of Maryland, College Park, Maryland, U.S.A., 1997, pp. 314-318.
Doermann, David, et al., Detection of duplicates in document image databases, Image and Vision Computing, V 16 n 12-13, University of Maryland, College Park, Maryland, U.S.A., Aug. 24, 1998, Abstract, 1 page.
Doermann, David, The Retrieval of Document Images: A Brief Survey, Proceedings of the 4th IDCAR, University of Maryland, College Park, Maryland, U.S.A., IEEE 1997, pp. 945-949.
Faloutsos, C., et al., Efficient and Effective Querying by Image Content, Journal of Intelligent Information Systems, vol. 3, Kluwer Academic Publishers, Boston, Manufactured in The Netherlands, 1994, pp. 231-262.
Hull, Jonathan J., Document Image Similarity and Equivalence Detection, International Journal on Document Analysis and Recognition, vol. 1, No. 1, Ricoh California Research Center, Menlo Park, California, U.S.A., 1998, 17 pages.
Hull, Jonathan, J., Document Matching on CCITT Group 4 Compressed Images, SPIE Conference on Document Recognition IV, SPIE vol. 3027, Ricoh California Research Center, Menlo Park, California, U.S.A. 1997, pp. 82-87.
Hunter, Roy, et al., International Digital Facsimile Coding Standards, Proceedings of the IEEE, vol. 68, No. 7, Jul. 1980, pp. 854-867.
Phillips, Ihsin, T. et al, , CD-ROM Document Database Standard, Proceedings of the 2nd IDCAR, IEEE, 1993, pp. 478-483.
Smeaton, A.F, et al., Using Character Shape Coding for Information Retrieval, Proceedings of the 4th IDCAR, IEEE, 1997, pp. 974-978.
Spitz, A. Lawrence, Using Character Shape Codes For Word Spotting in Document Images, Shape, Structure and Pattern Recognition, Nahariya, Israel, Oct. 4-6, 1994, World Scientific, Singapore, New Jersey, London, Hong Kong, pp. 382-389.
Spitz, A. Lawrence, Skew Determination in CCITT Group 4 Compressed Document Images, Proceedings of the Symposium on Document Analysis and Information Retrieval, Xerox Palo Alto Research Center, Palo Alto, California, U.S.A., 1992, pp. 11-25.
CCITT Compression, downloaded from http://www.advent.co.uk/ccitt.html, Advent Imaging, 1998, 3 pages.
Hull, Jonathan, J., Document Image Matching and Retrieval With Multiple Distortion-Invariant Descriptors, International Association for Pattern Recognition Workshop on Document Analysis Systems, World Scientific, Singapore, New Jersey, London, Hong Kong, 1995, pp. 379-396.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Compressed document matching does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Compressed document matching, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Compressed document matching will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2764218

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.