Methods and apparatus for gray image based text identification

Image analysis – Image segmentation – Distinguishing text from other regions

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S176000, C382S173000

Reexamination Certificate

active

06301386

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to text identification. More particularly, the invention relates to advantageous aspects of methods and apparatus for gray image based text identification.
BACKGROUND OF THE INVENTION
The ability to locate and read the relevant information from a financial item is a valuable feature of information processing, and is especially useful in the processing of financial documents. Many financial documents, such as checks, contain entries made in a human-readable format such as printing or handwriting. Many of these entries are not made in a standard machine-readable format such as printing with magnetic ink according to a known standard such as E13B or the like. At least some of the non-standardized information appearing on a check must be translated to machine-readable format, or hand entered directly into a machine processing the check. For example, the amount of a check is typically not entered onto the check in machinereadable format at the time the check is written. The amount of the check, however, is critical to processing of the check, and must be communicated to the check-processing equipment. This has traditionally been done by human operators who read the amount written on the check and enter this amount into a machine which then prints the amount onto the check in magnetic ink.
More recently, however, it has become possible to devise techniques for machine-reading of the non-standardized information, in order to increase processing speed and reduce costs. This machine-reading is typically done by capturing and interpreting an image of the item in order to extract text fields. The captured image is typically a gray image, having areas of varying lightness and darkness; or in other words, pixels of differing gray scale.
Prior art methods typically begin by applying a binarization algorithm to the captured gray image of a document. This results in a binary image, where foreground pixels are black, and background pixels are white. Connected component analysis is performed on the binary image to assemble groups of touching black pixels. Connected components are then grouped into tokens, which are classified into horizontal lines, vertical lines, machine-printed text, and hand-printed text. Statistical features are extracted for each token. The document is classified based on the extracted tokens, where possible classifications include a business check, personal check, deposit slip, giro, or currency. Each area of machine-printed text and hand-printed text is grouped into a zone. Finally, optical character recognition is performed on the zones of interest.
However, it has become increasingly difficult to obtain a good quality binary image as financial institutions are using documents with more and more complex graphical and/or textured backgrounds embedded to prevent fraud. These backgrounds appear lighter on the documents than does the foreground information, but the binarization processes of the prior art remove the information contributed by the lightness of the background. When binarization is completed, the background material appears as dark as does the foreground material, making it difficult to extract the foreground material from the background material. Text recognition becomes more difficult and errors in extracting text from the binary image are more likely to occur.
There exists, therefore, a need in the art for a means for automatic extraction of information from a document which is less susceptible to interference by the presence of background material.
SUMMARY OF THE INVENTION
In accordance with one aspect of the present invention, a method of text identification operates on a gray image as described below. The gray image is preferably subsampled to reduce data to be processed and preprocessed to remove horizontal and vertical lines in the image. The image is subjected to a morphological open, followed by foreground/background segmentation to produce a foreground image. The foreground image is subjected to region filtering, region merging, and region feature extraction and identification. Homogeneous regions are grouped, and noise elimination is performed, leaving a number of small, identified regions. Optical character recognition may then conveniently be performed on the identified regions. With the information provided by the different degrees of lightness and darkness of different portions of the document, background or other extraneous information is able to be identified and removed, and text identification can then proceed on smaller areas of specific interest, at greatly increased speed and efficiency compared to typical binarization-based text identification of the prior art.
A more complete understanding of the present invention, as well as further features and advantages of the invention, will be apparent from the following Detailed Description and the accompanying drawings.


REFERENCES:
patent: 4910787 (1990-03-01), Umeda et al.
patent: 5181255 (1993-01-01), Bloomberg
patent: 5778092 (1998-07-01), Macleod et al.
patent: 5937084 (1999-08-01), Crabtree et al.
patent: 5963662 (1999-10-01), Vachtsevanos et al.
patent: 5999664 (1999-12-01), Mahoney et al.
patent: 6026183 (2000-02-01), Talluri et al.
patent: 6055327 (2000-04-01), Aragon

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and apparatus for gray image based text identification does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods and apparatus for gray image based text identification, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for gray image based text identification will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2590162

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.