Image analysis – Pattern recognition – Context analysis or word recognition
Reexamination Certificate
2002-06-13
2004-07-27
Patel, Kanji (Department: 2625)
Image analysis
Pattern recognition
Context analysis or word recognition
C704S235000
Reexamination Certificate
active
06768816
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to document image analysis, and to text and object recognition techniques for the purpose of creating searchable files from document images. More particularly, the present invention relates to providing more efficient tools and techniques for human based ground-truthing of the searchable files.
2. Description of Related Art
The longtime goal of vendors of text recognition technologies is to create 100% accurate computer searchable files, totally automatically, from a wide range of document images. However, after decades of trying, it has become increasingly apparent that this goal of automation may never be achieved. See, David Doermann, The indexing and retrieval of document images: A survey. Technical Report CS-TR-3876, University of Maryland, Computer Science Department, February, 1998.
So, to compensate for the limited automation of these technologies, human assistance is required. Specifically, text recognition technologies, which include, but are not limited to, Optical Character Recognition (OCR) & Intelligent Character Recognition (ICR), require human assistance, referred to as ground-truthing, that involves human proofreading of the textual output, human comparison of this textual output with the original image text, and human correction of textual recognition errors. See, Doe-Wan Kim, and Tapas Kanungo.
A Point Matching Algorithm for Autmomatic Groundtruth Generation
. Technical Report: LAMP-TR-064/CS-TR-4217/CAR-TR-961/MDA-9049-6C-1250, University of Maryland, College Park, 2001; R. A. Wilkinson, M. D. Garris, J. C. Geist.
Machine
-
Assisted Human Classification of Segmented Characters for OCR Testing and Training
, Technical Report NISTIR 5105 [102K], December, 1992 and In D. P. D'Amato, editor, volume 1906. SPIE, San Jose, Calif., 1993; and Chang Ha Lee, and Tapas Kanungo.
The Architecture of TRUEVIZ: A groundTRUth/metadata Editing and VIsualiZing toolkit
. Technical Report: LAMP-TR-062/CS-TR-4212/CAR-TR-959/MDA-9049-6C-1250, University of Maryland, College Park, 2001.
For mainstream businesses and government agencies, that wish to post mountains of scanned documents to public Web sites and corporate Intranets, this line-by-line checking for, and correction of, recognition errors is impractical. And since these mainstream organizations require 100% accuracy, to ensure that their document images can be reliably retrieved, they have rejected these text recognition products entirely.
Nonetheless, with or without text recognition products, mainstream organizations do realize that a significant amount of human interaction is required in order to guarantee 100% retrieval. So, what these organizations are seeking, is a way to make this time-consuming manual process far more efficient.
Thus, with this goal in mind, the present invention was created.
SUMMARY OF THE INVENTION
The present invention provides a method and a system by which a document image is analyzed for the purposes of establishing a searchable data structure characterizing ground-truthed contents of the document represented by the document image, and in some embodiments including resources for reconstructing an image of the document. According to the present invention, the document image is segmented into a set of image objects, and the image objects are linked with fields that store metadata.
Image objects are specified regions of a document image that may contain a structural element, where examples of structural elements include, but are not limited to, a single word, a title, an author section, a heading, a paragraph, a page, an equation, a signature, a picture, a bar-code, a border, a halftone image, noise, and the entire document image. The image objects into which the document image is segmented may or may not be exclusive, where exclusive image objects do not overlap with other image objects. In embodiments in which the document image consists of a bitmap, image objects may consist of portions of the bitmap that include a shape or shapes including black or colored pixels that are separated from other black or colored pixels by clear regions having specified characteristics.
The image objects are identified and linked with fields for storing metadata. The metadata is used to bind logical structure, and thus meaning, to image objects in the document image. Thus examples of metadata include, but are not limited to, indications, pointers, tags, flags, and plain text represented in computer readable code, such as ASCII, EBCDIC, Unicode, and the like. Image objects linked with metadata fields storing ground-truthed metadata can be organized into searchable records, such as hierarchically organized documents. Thus, the data structure including image objects and linked metadata can be independently searched, retrieved, stored, managed, viewed, highlighted, shared, printed, protected, indexed, edited, extracted, redacted, toggled (between image view and metadata view) and the like.
In the present invention, an interactive framework is presented for efficiently ground-truthing document images via image objects paired with fields for ground-truthed metadata (called herein “image object pairs”). Here, ground-truthing an image object pair is accomplished by ground-truthing its metadata. More specifically, in one embodiment of the invention, in order to “ground-truth” an image object pair, the following two computer assisted steps are available:
1. Initial metadata is input into an image object pair by either (a) manually creating it, (b) automatically creating it (such as with text recognition, etc.), or (c) importing it.
2. Manually verify the accuracy of this initial metadata, or manually correcting this initial metadata.
Embodiments of the present invention increase the efficiency of human ground-truthing by using an index of unique image object pairs. This image object pairs index can eliminate the time and expense of ground-truthing each instance of each unique image object pair one-by-one, as required by text recognition products. Moreover, this index increases the efficiency of human ground-truthing even more as (1) the number of instances associated with any unique image object pair increases, and as (2) the accuracy of the segmentation process increases. Indeed, since the efficiency of human ground-truthing is so strongly influenced by the accuracy of segmentation, the present invention allows for human control over the segmentation process.
Also, the efficiency of human ground-truthing is strongly influenced by the quality of the document images being processed as well. Specifically, poor quality document images that have a lot of ambiguous content, such as those created from faxed, aged, photocopied, and faded paper originals, may reduce tremendously the effectiveness of an image object pairs index, and thus, the efficiency of human ground-truthing. As a result, the present invention also describes a method for ground-truthing image object pairs without using an image object pairs index. Indeed, this method is also useful for ground-truthing document images that contain a substantial amount of handwritten or hand-printed content as well.
Moreover, it should be pointed out that an image object pairs index is also extremely useful even when no ground-truthing occurs. For example, in one embodiment of the invention, an image object pairs index can be used to efficiently retrieve some, or all, of the instances of any unique image object pair contained within the index, when the metadata within each image object pair is NULL.
In one aspect of the invention, a method for analyzing a document image is provided which comprises segmenting the document image to identify a set of image objects within the document image, and processing the set to group image objects within the set into a plurality of subsets, where the subsets may include one or more members. In this aspect, reference image objects are linked to corresponding subsets in a plurality of subsets. Machine-readable data structures are created includi
Hall, Jr. Floyd Steven
Howie Cameron Telfer
Convey Corporation
Haynes Mark A.
Haynes Beffel & Wolfeld LLP
Patel Kanji
LandOfFree
Method and system for interactive ground-truthing of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for interactive ground-truthing of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for interactive ground-truthing of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3198673