Adaptive recognition of documents using layout attributes

Image analysis – Image transformation or preprocessing – Image storage or retrieval

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S224000, C707S793000

Reexamination Certificate

active

06243501

ABSTRACT:

TECHNICAL FIELD
This invention pertains to the field of data storage and filing systems, and more specifically, to those systems employing optical character recognition.
BACKGROUND ART
Today's businesses rely heavily on paper for many of their daily functions. For instance, most corporate information resides in paper documents. Also, the majority of transactions necessitate either updating existing paper documents or creating new ones. This dependence on paper will continue to characterize businesses for some time to come. For this reason, businesses are always looking for new and efficient means to handle paper documents in order to be able to respond rapidly to events and to cut down on cost.
Currently, manual operations continue to be the method of choice for processing paper documents. In general, a human operator first identifies the document and routes it appropriately. The document may then go through several stations before its processing is judged to be complete. At the end of the cycle, the document is typically archived in a storage filing cabinet according to some preset procedure. If at any later time this same document is needed again, a human operator retrieves it and the cycle starts over. Slow retrieval time, high probability of erroneous filing, and excessive cost associated with the storage space are known to be the major drawbacks of this approach.
The need for efficient methods to process paper documents is not new to the business community. In fact, this need has evolved over the last ten to fifteen years. In the past, businesses spoke of the need for better data management as a way to control the information that flows in and out of an organization. Currently, businesses speak of the need for better document management techniques instead. In the context of paper documents, this is taken to mean the need for more advanced methods to automate the handling of paper documents within an enterprise.
Approaches that attempt to address this problem are collectively referred to as document imaging systems. The basic function of a document imaging system is to convert the paper document into an image bitmap. This image bitmap, rather than the paper copy, is then stored in the system. Other functions may include document identification, attachment of a user identifying information, extraction of either partial or full text from the image, attachment of indexing information, attachment of tracking information, filing into a specific folder, routing over the network, archiving in a specific location, and retrieval.
Document imaging systems aim at providing greater efficiency, better ability of reuse, a reduction of product cycle time, and significant savings. However, this technology is still in its infancy and has been slow to deliver in its promise. A major hurdle has been that these systems are very difficult to fully automate. Human operators are still needed to identify and organize documents before they can be scanned into the system. This operation is time consuming and can reduce or eliminate the intended savings. Also, human operators are needed to enter the necessary keywords by which scanned documents can be retrieved. Manual entry of keywords is both slow and cumbersome, which impacts negatively on the overall efficiency of the system. Additional manual operations may also be needed to perform other tasks such as attachment of tracking information, filing into a specific folder, routing over the network, and archiving in a specific location. Manual functions limit the response time of the overall system as well as increase cost.
Optical Character Recognition technology has made it possible to automate the entry of keywords for the purpose of retrieving documents. It does so by converting the text in the image of the document to ASCII or other character code. In this case, any word in the extracted ASCII text can then be used to search for the document in question. This solution does not, however, address some rather common business needs. For instance, typical businesses process several classes of documents at any given day. In some situations, it may be desired to attach a different list of keywords to each different class of documents. This list may be used alone or in addition to the text extracted from the image. The list of special keywords may include the type of the document, the user ID, the owner of the document, the folder where the document is stored, and, perhaps, some other attributes that are relevant only to the class of documents to which they are attached. In other situations, one may wish to extract only keywords from a limited set of fields in the scanned document. In both of these cases, Optical Character Recognition alone is not sufficient.
Cover sheets or forms based methods have been proposed to deal with the problem of identifying documents at scan time. These same approaches have also attempted to resolve other tasks such as attaching tracking information, filing into a specific folder, routing over the network, and archiving in a specific location. Existing solutions are however, very limited, document specific, and not easy to generalize. Another issue inherent to document imaging systems is the limited amount of resources available for storing document images. This problem is exacerbated when duplicative images of documents are stored after documents are mistakenly input in the system multiple times. Therefore, there is a need for a file storage and retrieval system which allows any user to enter documents into the system and have the correct actions performed upon the document, and which alerts the user upon recognizing duplicative documents, to allow the user to delete duplicative images to conserve storage space.
DISCLOSURE OF INVENTION
The system of the present invention uses an attribute extracting module (
256
) to extract attributes from a document (
50
) input into the system. The system then uses an attribute comparison module (
270
) to compare the extracted attributes with multiple classes (
54
) of documents (
56
). Upon determining that the attributes of the document (
50
) match attributes of one of the classes (
54
), the document (
50
) is classified as belonging to the class (
54
) and is processed in accordance with the system actions associated with the matched class (
54
). In one embodiment of the present invention, the attributes of the input document (
50
) are then compared to the documents (
56
) belonging to the matched class (
54
) which are already on the system. If the system determines that the input document (
50
) matches one of the existing images (
56
), the user (
240
) is alerted that the input document (
50
) already exists in the system.
In a preferred embodiment, a match is determined in response to a comparison quality measure determined by a quality assessment module (
258
). The comparison quality measure measures the accuracy of the comparison. If the comparison quality measure exceeds a threshold, a match is determined to have been made. The comparison quality measure examines, among other factors, sizes, locations, and word accuracy values of matching regions within the input document (
50
) and the matching class (
54
) or document (
56
).


REFERENCES:
patent: 4949287 (1990-08-01), Yamaguchi et al.
patent: 5235652 (1993-08-01), Nally
patent: 5303361 (1994-04-01), Colwell et al.
patent: 5323473 (1994-06-01), Lau
patent: 5339412 (1994-08-01), Fueki
patent: 5359667 (1994-10-01), Borowski et al.
patent: 5369508 (1994-11-01), Lech et al.
patent: 5369742 (1994-11-01), Kurosu et al.
patent: 5375235 (1994-12-01), Berry et al.
patent: 5388158 (1995-02-01), Berson
patent: 5418946 (1995-05-01), Mori
patent: 5438657 (1995-08-01), Nakatani
patent: 5490217 (1996-02-01), Wang et al.
patent: 5519857 (1996-05-01), Kato et al.
patent: 5519865 (1996-05-01), Kondo et al.
patent: 5526443 (1996-06-01), Nakayama
patent: 5526520 (1996-06-01), Krause
patent: 5555362 (1996-09-01), Yamashita et al.
patent: 5615112 (1997-03-01), Sheng et al.
patent: 5628003 (1997-05-01), Fujisawa et al.
patent: 5642288

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Adaptive recognition of documents using layout attributes does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Adaptive recognition of documents using layout attributes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Adaptive recognition of documents using layout attributes will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2518572

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.