Image analysis – Pattern recognition
Reexamination Certificate
2000-02-28
2003-03-25
Johns, Andrew W. (Department: 2721)
Image analysis
Pattern recognition
C382S182000
Reexamination Certificate
active
06539112
ABSTRACT:
TECHNICAL FIELD
This invention relates to the field of automated processing of drop-out forms for use with optical character recognition technology.
BACKGROUND OF THE INVENTION
Automated forms processing is possible because data that has been entered into a form is highly structured. The physical location of data and the structure of data, once located, are well specified. As used throughout this specification and the attached claims, the term “data” refers to any information entered into a form, whether numbers, text, initials, logos, shaded areas, or any other sort of marking entered into the form.
Unfortunately, scanned images can be rotated, stretched, offset, or skewed. Thus, in order to successfully read a form, software must be able to correct for any of these image transformations. By finding landmarks on the image and comparing them with the expected locations of these landmarks as exemplified by a template form, the mapping from image coordinates to template form coordinates can be determined. These landmarks are called registration points and the process of finding the image-to-template coordinate transformation is called registration.
One of the difficulties of using optical character recognition (OCR) technology to automatically read data on a form is that the form itself will often occlude the data. This happens when someone filling out the forms does not properly provide the desired data within the boundaries provided on the form. If a box on a form is intended to hold data, but the person filling out the form writes too large to fit inside the box, then the lines of the box itself will strike through or obscure a portion of the data. The same results occur if data is being typed or printed into a form and the data does not fall cleanly within the boundaries of the box.
To avoid occluding data, forms can be printed in drop-out ink (usually red or blue) that the scanner can filter out to leave only data in the scanned image. However, when a form is printed entirely of drop-out ink, all of the known landmarks are lost at the time of scanning. This leads to what is known as the drop-out form registration point location problem, which refers to the difficulties inherent in locating registration points in the absence of fixed landmarks on the form.
In the case of a mixed stream of image types, the processing system must identify the particular form with which each image is associated. For standard (non-drop-out) forms, this is a relatively simple task because the form is included in each image, and the form will contain landmarks to identify the form type. However, in the case of drop-out forms, where the original form is filtered out of the digital image, the lack of known landmarks makes the problem of form identification vastly more complicated. This is what is referred to as the drop-out form identification problem.
The drop-out form identification and registration point selection problems are only two common examples of the problems encountered when processing forms that provide no fixed landmarks. Another problem is encountered when a form is being processed only to perform OCR on one particular type of data entry. If that particular data entry cannot be located, processing becomes impossible. Existing OCR systems do not provide a convenient, reliable, or efficient automated process to solve any of these problems. Solving these and other problems associated with automated drop-out form processing is the subject of the present invention.
SUMMARY OF THE INVENTION
This invention uses the patterns and structure of the actual data entered into the form to provide an identification region for use in processing the form. As used in this specification and the attached claims, the phrase “identification region” refers to an area in the digital image of a dropped-out form that corresponds to a pre-defined area on a template form. As used in this specification and the attached claims, the phrase “template form” refers to a digital image of the drop-out form that serves as a standardized reference against which later images may be compared. Also, as used in this specification and the attached claims, the pre-defined area on the template form, to which the identification regions correlates, is referred to as a “reference region.”
Once identified, the data within the identification region may be used to identify the particular form from among a mixed stream of forms, provide a registration point for use in registering the image, or solve other types of problems encountered when processing drop-out forms that lack fixed landmarks. The steps in this invention may be configured by the user to function on any form type or mixed stream of form types.
The first step in implementing this invention is to set up the template form. During the set-up phase, the user locates and defines the boundaries of a region on the form in which the entered data ideally will have a distinctive and predictable pattern. This region on the template form is the reference region, and the corresponding region on the scanned image is called the identification region.
As used in this specification and the attached claims, the term “pattern” refers to the formation, shape, or structure represented by the data. One example of a data pattern would be that found in an address section of a filled-out form that uses a standard address format. The first horizontal line typically represents a name; the second line represents a street address; finally, there is a line for a city, state, and zip code. The type of data pattern selected will depend on the particular form and the information requested in the form. For example, a medical insurance claim form may have data fields for patient name, insurance carrier, and policy number. The size, number, distribution, and position relative to one another of data fields such as these define a particular data pattern. As used throughout this specification and in the attached claims, the term “defined data pattern” refers to the data pattern defined in a template form within the reference region, and the term “expected data pattern” refers to the data pattern found within the identification region in the digital image of a form. The expected data pattern corresponds to the defined data pattern.
As used in this specification and the attached claims, the term “distinctive” describes a data pattern that is dissimilar to other data patterns on the same form, thus reducing the probability of mistaking another data pattern for the expected data pattern. Also, as used in this specification and in the attached claims, the term “predictable” describes a data pattern that is expected to be present on substantially all forms that are filled out and possess a fairly standard and constant structure. Because the ink comprising a drop-out form is filtered out during scanning, the data field must be filled out, or there will be nothing to use in identifying the form, locating registration points, or performing other form processing procedures.
If part of the automated drop-out form processing requires identification of the form, then the defined data pattern should also be unique to one particular type of form. As used in this specification and in the attached claims, the term “unique” describes a data pattern that is at a particular location on only one type of form. Because someone using this invention selects the data pattern such that there is a one-to-one correspondence between the type of form and the particular location of the data pattern, verifying that the data pattern exists at that location verifies the identity of the form. If form identification is not required (for example, if only one type of form is being processed, or if distinguishing the type of form is not necessary), then the selected data pattern does not have to be unique.
To delineate the defined data pattern, the user divides the reference region into sub-regions where data (i.e., dark matter) is expected and sub-regions where no data (i.e., white space) is expected. A sub-region is referred to as a “dark zone” if
Azarian Seyed
Johns Andrew W.
RAF Technology, Inc.
Stoel Rives LLP
LandOfFree
Methods and system for identifying a reference region on an... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and system for identifying a reference region on an..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and system for identifying a reference region on an... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3018837