Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-09-02
2001-11-20
Coby, Frantz (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06321232
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to pattern localization and, more particularly, to a method for using a data structure to facilitate such pattern localization. The data structure is particularly suited for locating geometric patterns (including distinguishable features) in unsegmented images.
BACKGROUND OF THE INVENTION
Image content-based retrieval is becoming a powerful alternative/addition to conventional text annotation-based retrieval. Even so, it has yet to reach the robustness and computational effectiveness of text-based retrieval. Text-based retrieval, on the other hand, is notoriously lacking in precision, even when boolean combinations of key-words are allowed. It is a common observation with those using popular conventional search that full text indexing of documents (scanned or electronic) causes a large number of irrelevant documents to be retrieved.
A more productive use of text-based querying is when it is combined with image content-based querying. A special case of this occurs when the text strings relevant for indexing documents occur within image structures, such as text in special regions of a news video or text within region fields of a form. Retrieval based on such structured text can yield fewer but more relevant matching documents.
An example of the above-mentioned special case arises in the area of processing engineering drawing documents, a large number of which still exist in paper form. Creating electronic conversion of such documents is an important business for large format scanner makers. As is known, large format scanners can scan engineering drawing documents at a relatively fast rate of 25 sheets/minute, and are quickly giving rise to very large databases (in excess of 100,000 objects) of large-sized drawing images (e.g., 14000×9000 pixels). Currently, indexing of such documents is done manually with skilled keyboard operators, and is considered a highly labor intensive activity constituting a significant cost in the digitizing of scanned images. Manual indexing by a keyboard operator can also be unreliable since the keywords employed by a user may not match the ones attached to the documents during database creation.
In contrast to full-text indexing of pure text documents, automatic full-text indexing using conventional OCR algorithms will not yield useful results for drawing images. Fortunately, useful text information for indexing such drawing images is found in specific image structures called “title blocks”. Typically, a title block will include information pertinent for indexing a corresponding drawing, such as part number, name of the unit being depicted, date of design, and architect name. Indexing keyword extraction from such image structures requires that the image structures themselves be first identified.
As will appear from the Detailed Description below, the present invention employs some of the principles underlying a solution for a model indexing problem, namely the principles underlying “Geometric Hashing”. Referring to articles by Y. Lamdan and H. J. Wolfson (entitled “Geometric hashing: A general and efficient model-based recognition scheme”, in Proceeding of the International Conference on Computer Vision, pages 238-249, 1988, and “Transformation invariant indexing” in Geometric Invariants in Computer Vision, IT Press, pages 334-352, 1992), Geometric Hashing has been used to identify objects in pre-segmented image regions. Another work extending the basic geometric hashing scheme for use with line features includes an article by F. C. D. Tsai entitled “Geometric hashing with line features” in Pattern Recognition, Vol. 27, No. 3, pages 377-389, 1994. An extensive analysis of the geometric hashing scheme is provided in an article by W. E. L. Grimson and D. Huttenlocher entitled “On the sensitivity of geometric hashing”, in Proceedings International Conference on Computer Vision, pages 334-339, 1990.
Obtaining suitable geometric hash functions has also been explored in an article by G. Bebis, M. Georgiopolous and N. Lobo entitled “Learning geometric hashing functions for model-based object recognition” in Proceedings International Conference on Computer Vision, pages 543-548, 1995, and a discussion of using the concept of “rehashing” in the context of geometric hashing is provided in an article by I. Rigoustos and R. Hummel “Massively parallel model matching: Geometric hashing on the connection machine” in IEEE Computer, pages 33-41, February 1992.
As taught by now-allowed U.S. patent application Ser. No. 08/878,512 to Syeda-Mahmood (the disclosure of which is incorporated herein by reference), a data structure known as the “geometric hash table” can be used effectively to index handwritten words in a handwriting localization scheme. While the handwriting technique is believed to provide fast search and retrieval in the context of locating and recognizing handwritten word queries in handwritten documents, the same speed for search and retrieval is not obtainable when locating and recognizing two-dimensional patterns in a relatively large document (e.g. engineering drawing document). This degradation of search and retrieval speed is attributable, in substantial part, to the size of the geometric hash table. It would be desirable to provide a system in which localization of two-dimensional patterns could be achieved with a data structure that is considerably more compact than the geometric hash table.
For several types of document images, such as the type of document image associated with typical engineering drawing documents, the size of a corresponding geometric hash table can be quite large. It has been found that a geometric hash table for a group of images from a typical engineering drawing document set can be as large as 40 Gbytes, a size that far exceeds the size of main memory for most computer systems. Thus a database cannot be formed readily for a geometric hash table developed from one of several types of document images. It would be desirable to provide a relatively compact structure that both exploits the principles underlying the geometric hash tree and lends itself readily to searching in databases having images of all sizes.
SUMMARY OF THE INVENTION
In accordance with the present invention there is provided a method for creating a geometric hash tree in a document processing system having a memory. A plurality of images are stored in the memory and organized in a database. Each image includes curve groups wherein each curve group is corresponded with a feature set. The method for creating a geometric hash tree includes the steps of: (1) associating a list of basis triples with an affine coordinate set, the basis triples and the affine coordinate set both varying as a function of the images and their corresponding curve groups; (2) storing both the affine coordinate set and the list of basis triples in the memory; (3) quantizing the affine coordinate set into a plurality of subsets; (4) assigning an order to the plurality of subsets; and (5) creating a geometric hash tree with the quantized affine coordinate set using the order from (4) such that the geometric hash tree is more compact in size than a conventional geometric hash table.
REFERENCES:
patent: 5465353 (1995-11-01), Hull et al.
patent: 5742807 (1998-04-01), Masinter
patent: 5799312 (1998-08-01), Rigoutsos
patent: 5802525 (1998-09-01), Rigoutsos
patent: 5875264 (1999-02-01), Carlstrom
patent: 5897637 (1999-04-01), Guha
patent: 5953451 (1999-09-01), Syeda-Mahmood
patent: 6014733 (2000-01-01), Bennett
patent: 6047283 (2000-04-01), Braun
patent: 6067547 (2000-05-01), Douceur
patent: 6212525 (2001-04-01), Guha
Yehezkel Lamdan and Haim J. Wolfson, “Geometric Hashing: A general and Efficient Model-Based Recognition Scheme”, 1/88, pp. 238-249.
Haim J. Wolfson and Yehezkel Lamdan, “Transformation Invariant Indexing”, in Geometric Invariants in Computer Vision, IT Press, pp. 334-352, 1992.
Frank C.D. Tsai, “Geometric Hashing with Line Features”, Pattern Recognition, vol. 27, No. 3, pp. 377-389, 1994.
W. Eric L. Grimson and Daniel P. Hutten
Blair Philip E.
Coby Frantz
Cohen Gary B.
Xerox Corporation
LandOfFree
Method for creating a geometric hash tree in a document... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for creating a geometric hash tree in a document..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for creating a geometric hash tree in a document... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2617824