Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1997-03-31
2001-01-30
Feild, Joseph H. (Department: 2776)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C345S215000
Reexamination Certificate
active
06182090
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to the field of document storage and retrieval, in particular, the retrieval of a document from a document database using content from an example page taken from the document.
A general approach to the problem of retrieving a target document from document database is to store a set of key words with each document either physically with the document or, more probably, in a lookup table in which the keys are indexed and table entries point to documents in the database. Keys can be easily generated from documents if electronic versions of documents are available. If only paper versions of the documents are available, they can be scanned to form digital images of the pages of the documents and the digital images can be processed by a character recognizer to extract the text of the document and thus the keys. In a more labor-intensive system, the keys can be manually entered.
To retrieve a document, the keys are supplied to a search engine. Where a user is not likely to remember the keys for every document stored in the database, the user can retain an example page from each document as it is stored and supply that example page to a page analyzer for key extraction.
The disadvantage of this general approach is that the documents in the document database and the example pages either need to originate and remain in electronic form, or character recognition would need to be done on example pages to determine the keys. Thus, either the example page needs to be electronic or has to be of sufficient quality that errors do not occur in the scanning and character recognition process.
One example of a prior art system for document presentation is the RightPages document presentation system described in G. Story, “The Right Pages Image-Based Electronic Library for Alerting and Browsing”, COMPUTER, Sept. 1992. In that system, a user is presented with a series of journal covers and the user browses the journal covers to find a desired journal, then browses its table of contents and then selects an article from the journal. Once an example page of a journal article is selected, the system retrieves the target article from a document database. The disadvantage to the RightPages system is that the icons are presented on a computer monitor and therefore are lower resolution than print, and the links between the journal covers and the pages must already exist. Thus, the user must be at the computer monitor to browse example pages.
The document storage and retrieval system taught by U.S. Pat. No. 5,465,353 to Hull, et al., entitled “IMAGE MATCHING AND RETRIEVAL BY MULTI-ACCESS REDUNDANT HASHING” (commonly owned by the assignees of the present application, incorporated by reference herein, and hereinafter “Hull”) is a system for retrieving a target document from a document database by submitting a paper example page retained from the target document to a search engine. The search engine analyzes the example page and determines likely matches among the documents in the database. Where many, documents are to be stored however, storage and organization of the example pages raises some of the same problems that document database storage tries to alleviate, such as having to allocate storage space for paper pages and keeping them organized.
Thus, what is needed is a system for efficiently storing example pages for use in document retrieval and document management.
SUMMARY OF THE INVENTION
An improved document server is provided by virtue of the present invention. A document server is a computer system which maintains a database of documents, either in a structured form such as editable computer files, as digitized images of paper pages from the documents, or a combination of both. A target document is a document in the document database whose retrieval is desired. To retrieve the document, an input is provided to the document server indicating one or more characteristics of the target document, such as keys, an unique label, or an example page. Typically, a document is provided to the document server and only one page is retained. The retained page can then serve as the example page, to be provided when the entire document is desired. An example page could be the first page of the document, but it need not be the first page, nor even a complete page of the document, so long as the example page (or page portion) could be used to distinguish the target document from the other documents in the document database, or at least to identify a set of candidate matching documents which closely match the target document and can be presented to a user for selection of the target document from among the candidate matching documents.
In one embodiment of a document server according to the present invention, an example page for each document in a document database is processed by a page processor to generate an icon, i.e, an iconic representation, of an example page of the document. Typically, this is done at the time the document is first stored in the document database. The page processor analyzes an example page to segment regions of the example page according to image types, such as text, line art, photographs, other graphics, borders, colored areas, glyphs, bar codes, etc. Of course, not all image types need be found in all example pages and image types are not limited to those mentioned here. Once segmented, each region is characterized and reduced in a manner appropriate for the image type of the region. For example, text in text regions is replaced with a block font (defined below) and reduced, while graphics regions are reduced in resolution (by lowering pixel precision and/or the number of pixels per unit area). The reduced regions of the example page are then reassembled into an icon of the example page.
In a specific application of the present invention, many icons are printed on a single page, referred to herein as a “guide” page. This guide page, or multiple guide pages depending on the number of icons, is provided to a user. To retrieve a document, the user visually scans the guide page to find an icon which is visually associated with the target document and then supplies an indication of the selected icon to the document server. The document server analyzes the contents of the icon to detect distinguishing features of the example page represented by the icon and provides those features to a search engine. The search engine then identifies candidate matching documents in the document database. If more than one candidate matching document is returned, the document server provides information about each candidate, such as a thumbnail image of a portion of the candidate document, so that the user can manually select the target document from the candidate matching documents.
Alternatively, each icon could be assigned an identifying label, such as a unique alphanumeric code or machine-readable bar code, which the user provides to the document server for a lookup of the target document. Although the document server does not need to use the content of the icon image for document retrieval, the content of the icon is nonetheless useful to the user, to provide compact visual cues to the target document. With a guide page, the user can scan many icons quickly. Because of the page reduction process, the distinguishing features of the example documents are preserved over the iconization process, and the icons can be made smaller while still allowing distinguishing features to be distinguishable to the user. Instead of each icon having a unique identifier, the icon might be specified by a unique identifier for the guide page on which it is found and the icon's location (e.g., row/column) on the guide page.
Variations of the above embodiments are envisioned. For example, the document server might be integrated with a digital copier to allow the digital copier to output an entire document in response to a user submitting a guide page with an icon circled. The digital copier would scan the submitted guide page and either extract information from the content
Albert Philip H.
Feild Joseph H.
Ricoh & Company, Ltd.
Townsend and Townsend / and Crew LLP
LandOfFree
Method and apparatus for pointing to documents... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for pointing to documents..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for pointing to documents... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2500204