Robust method for automatic reading of skewed, rotated or...

Image analysis – Pattern recognition – Template matching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S177000, C382S284000

Reexamination Certificate

active

06735337

ABSTRACT:

TECHNICAL FIELD
The present method relates generally to character reading and more specifically to a robust technique for recognizing character strings in grayscale images where such strings may be of poor contrast or where some characters in the text string or the entire text string may be distorted or partially obscured.
BACKGROUND OF THE INVENTION
Various approaches have been applied to improve the classification accuracy for optical character recognition (OCR) methods. The present method relates generally to optical character recognition and more specifically to a technique for recognizing character strings in grayscale images where such strings may be of poor contrast, variable in position or rotation with respect to other characters in the string or where characters in the string may be partially obscured.
Different challenges are posed in many industrial machine vision character reading applications, such as semiconductor wafer serial number identification, semiconductor chip package print character verification, vehicle tire identification, license plate reading, etc. In these applications, the font, size, and character set are well defined yet the images may be low contrast, individual or groups of characters imprinted in the application may be skewed in rotation or misaligned in position or both, characters may be partially obscured, and the image may be acquired from objects under varying lighting conditions, image system distortions, etc. The challenge in these cases is to achieve highly accurate, repeatable, and robust character reading results.
Character recognition in digital computer images is an important machine vision application. Prior art optical character recognition methods work well (i.e. achieve high classification accuracy) when image contrast is sufficient to separate, or segment, the text from the background. In applications such as document scanning, the illumination and optical systems are designed to maximize signal contrast so that foreground (text) and background separation is easy. Furthermore, conventional approaches require that the characters be presented in their entirety and not be obscured or corrupted to any significant degree. While this is possible with binary images acquired from a scanner or grayscale images acquired from a well controlled low noise image capture environment, it is not possible in a number of machine vision applications such as parts inspection, semiconductor processing, or circuit board inspection. These industrial applications are particularly difficult to deal with because of poor contrast or character obscuration. Applications such as these suffer from a significant degradation in classification accuracy because of the poor characteristics of the input image. The method described herein utilizes two approaches to improve classification accuracy: (1) using region-based hit or miss character correlation and (2) field context information.
In the preferred embodiment, the invention described herein is particularly well suited for optical character recognition on text strings with poor contrast and partial character obscuration as is typically the case in the manufacture of silicon wafers. Many semiconductor manufacturers now include a vendor code on each wafer for identification purposes and to monitor each wafer as it moves from process to process. The processing of silicon wafers involves many steps such as photolithographic exposure etching, baking, and various chemical and physical processes. Each of these processes has the potential for corrupting the vendor code. Usually the corruption results in poor contrast between the characters or the background for some portion of the vendor code. In more severe cases, some of the characters may be photo-lithographically overwritten (exposed) with the pattern of an electronic circuit. This type of obscuration is difficult if not impossible to accommodate with prior art methods. Another possibility is that the vendor code will be written a character at a time (or in character groups) as processes accumulate. This can result in characters within the text string that are skewed or rotated with respect to the alignment of the overall text string.
PRIOR ART
Computerized document processing includes scanning of the document and the conversion of the actual image of a document into an electronic image of the document. The scanning process generates an electronic pixel representation of the image with a density of several hundred pixels per inch. Each pixel is at least represented by a unit of information indicating whether the particular pixel is associated with a ‘white’ or a ‘black’ area in the document. Pixel information may include colors other than ‘black’ and ‘white’, and it may include gray scale information. The pixel image of a document may be stored and processed directly or it may be converted into a compressed image that requires less space for storing the image on a storage medium such as a storage disk in a computer. Images of documents are often processed through OCR (Optical Character Recognition) so that the contents can be converted back to ASCII (American Standard Code for Information Interchange) coded text.
In image processing and character recognition, proper orientation of the image on the document to be processed is advantageous. One of the parameters to which image processing operations are sensitive is the skew of the image in the image field. The present invention provides for pre-processing of individual characters to eliminate skew and rotation characteristics detrimental to many image processing operations either for speed or accuracy. The present invention also accommodates characters that may be partially corrupted or obscured.
Prior art attempts to improve character classification accuracy by performing a contextual comparison between the raw OCR string output from the recognition engine and a lexicon of permissible words or character strings containing at least a portion of the characters contained in the unknown input string (U.S. Pat. No. 5,850,480 by Scanlon et. al. entitled “OCR error correction methods and apparatus utilizing contextual comparison” Second Preferred Method Embodiment paragraphs 2-4). Typically, replacement words or character strings are assigned confidence values indicating the likelihood that the string represents the intended sequence of characters. Because Scanlon's method requires a large lexicon of acceptable string sequences, it is computationally expensive to implement since comparisons must be made between the unknown sequence and all of the string sequences in the lexicon. Scanlon's method is limited to applications where context information is readily available. Typical examples of this type of application include processing forms that have data fields with finite contents such as in computerized forms where city or state fields have been provided.
Other prior art approaches (U.S. Pat. No. 6,154,579 by Goldberg et. al. entitled “Confusion Matrix Based Method and System for Correcting Misrecognized Words Appearing in Documents Generated by an Optical Character Recognition Technique”, Nov. 28, 2000, Detailed Description of the Invention, paragraphs 4-7 inclusive) improve overall classification accuracy by employing a confusion matrix based on sentence structure, grammatical rules or spell checking algorithms subsequent to the primary OCR recognition phase. Each reference word is assigned a replacement word probability. This method, although effective for language based OCR, does not apply to strings that have no grammatical or structural context such as part numbers, random string sequences, encoded phrases or passwords, etc. In addition, Goldbergs approach does not reprocess the image to provide new input to the OCR algorithm.
Other prior art methods improve classification performance by utilizing a plurality of OCR sensing devices as input (U.S. Pat. No. 5,807,747 by Bradford et. al. entitled “Apparatus and method for OCR character and confidence determination using multiple OCR devices”, Sep. 8, 1998,

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Robust method for automatic reading of skewed, rotated or... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Robust method for automatic reading of skewed, rotated or..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Robust method for automatic reading of skewed, rotated or... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3227000

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.