Method and apparatus for performing an automatic correction...

Image analysis – Pattern recognition – Context analysis or word recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S310000, C704S251000, C704S256000

Reexamination Certificate

active

06219453

ABSTRACT:

BACKGROUND INFORMATION
The present invention is directed to a method and system for correcting misrecognized words in electronic documents that have been produced by an optical character recognition system that scans text appearing on a physical medium, and in particular, to a method and system that relies on a Hidden Markov Model-based algorithm to select for each misrecognized word a replacement word with the highest probability of matching the word in the original document that the misrecognized word was intended to match.
Devices that are used in conjunction with optical character recognition (“OCR”) techniques have been in use for some time. Examples of such devices are optical scanners and facsimile machines. What is common to both of these types of devices is that they each scan a physical document bearing printed or handwritten characters in order to produce an electronic image of the original document. The output image is then supplied to a computer or other processing device, which performs an OCR algorithm on the scanned image. The purpose of the OCR algorithm is to produce an electronic document comprising a collection of recognized words that are capable of being edited. The electronic document may be formatted in any one of a plurality of well known applications. For example, if the recognized words are to be displayed on a computer monitor, they may be displayed as a Microsoft Word® document, a WORDPERFECT® document, or any other text-based document. Regardless of how the recognized words of the electronic document are formatted, the recognized words are intended to correspond exactly, in spelling and in arrangement, to the words printed on the original document.
Such exact correspondence, however, does not always occur; as a result, the electronic document may include misrecognized words that never appeared in the original document. For purposes of this discussion, the term “word” covers any set of characters, whether or not the set of characters corresponds to an actual word of a language. Of course, when the phrase “actual word” is used in this discussion, what is meant is that the word comprises a cognizable, intelligible word of the English, or any other, language. Moreover, the term “word” covers sets of characters that include not only letters of the alphabet, but also numbers, punctuation marks, and such typographic symbols as “$”, “&”, “#”, etc. Thus, a misrecognized word may comprise a set of characters that does not comprise an actual word, or a misrecognized word may comprise an actual word that does not have the same spelling as that of the corresponding word in the scanned document. For example, the word “got” may be misrecognized as the non-existent word “qot”, or the word “eat” may be misrecognized as “cat.” Such misrecognized words, whether they comprise a real word or a mere aggregation of characters, may be quite close in spelling to the words of the original document they were intended to match. The cause of such misrecognition errors is largely due to the physical similarities that exist between certain letters of the alphabet. For example, as discussed above, such errors may occur when the letter “g” is confused with the physically similar letter “q”. Another common error that OCR algorithms make is confusing the letter “d” with the two-letter combination of “ol.” The physical resemblance of certain characters is not the only cause of recognition errors, however. For example, the scanning device may include a faulty optical system or a defective charge-coupled device (CCD); the original document may be printed in a hard-to-scan font; or the original document may include scribbles and marks that obscure the actual text.
Certain techniques have been implemented in order to detect and correct such misrecognition errors. For example, if the electronic document containing the recognized words is formatted in a word processing application, a user viewing the document may use the spell checking function provided by the word processing application to correct any words that have been misspelled. Some of these word processing applications also provide a grammar checker, which would identify words that, although spelled correctly, do not belong in the particular sentences in which they appear.
A drawback to these techniques is that a user must manually implement these correction techniques because spell checkers and grammar checkers operate by displaying to the user a list of possible words that may include the correct word. By manipulating an appropriate sequence of keys or other data input means, a user must select from this list what he believes to be the correct word and implement the appropriate commands for replacing the misrecognized word with the selected word. Such a correction technique is time-consuming, and moreover, is prone to human error because in carrying out such operations, the user may inadvertently select an inappropriate word to replace the misrecognized word. What is therefore needed is a correction technique that automatically replaces each misrecognized word with the word most likely matching the corresponding word in the original document. Such a correction technique would not require user intervention.
SUMMARY OF THE INVENTION
In order to overcome the above-mentioned disadvantages found in previous techniques for correcting misrecognized words, the present invention is directed to a method and apparatus that automatically substitutes each misrecognized word with a dynamically generated replacement word that has been determined to be the most likely correct word for replacing the misrecognized word. The recognized words may be based on words appearing on a physical medium (e.g., words printed on a sheet of paper) that has been optically scanned. For each character position of a word appearing in an original document, the present invention generates the N-best characters for occupying that character position. The present invention then generates a recognized word based on the N-best characters for each character position of the original word. The present invention then determines whether each recognized word is correct by executing either a spell checking algorithm, a grammar checking algorithm, a natural language algorithm, or any combination thereof. For each incorrect recognized word, the present invention retrieves from memory the previously generated sets of N-best characters from which the incorrect recognized word was formed. The present invention then generates every possible word that can be generated from the characters included in the retrieved sets of N-best characters. Each of these generated words is referred to as a reference word. The incorrect misrecognized word is replaced by one of these reference words. In order to determine which reference word is to replace the incorrect recognized word, the present invention computes for each reference word a value that reflects the likelihood that the reference word matches the corresponding word appearing on the physical medium. The present invention replaces the incorrect recognized word with the reference word having the greatest likelihood of matching the corresponding word appearing on the physical medium.


REFERENCES:
patent: 3466394 (1969-09-01), French
patent: 3752904 (1973-08-01), Waterbury
patent: 3896266 (1975-07-01), Waterbury
patent: 3988715 (1976-10-01), Mullan et al.
patent: 4718102 (1988-01-01), Crane et al.
patent: 4783804 (1988-11-01), Juang et al.
patent: 4817156 (1989-03-01), Bahl et al.
patent: 4819271 (1989-04-01), Bahl et al.
patent: 4908865 (1990-03-01), Doddington et al.
patent: 5034989 (1991-07-01), Loh
patent: 5050215 (1991-09-01), Nishimura
patent: 5101345 (1992-03-01), MacPhail
patent: 5125022 (1992-06-01), Hunt et al.
patent: 5127043 (1992-06-01), Hunt et al.
patent: 5167016 (1992-11-01), Bagley et al.
patent: 5179718 (1993-01-01), MacPhail
patent: 5216720 (1993-06-01), Naik et al.
patent: 5255310 (1993-10-01), Kim et al.
patent: 5297194 (1994-03-01), Hunt et al.
patent: 5303299 (1994-04-01), Hunt et al.
patent: 5365574 (1994-11-01),

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for performing an automatic correction... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for performing an automatic correction..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for performing an automatic correction... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2527666

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.