Method and apparatus for character recognition

Image analysis – Pattern recognition – Context analysis or word recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S310000

Reexamination Certificate

active

06341176

ABSTRACT:

BACKGROUND OF THE INVENTION
(1) Field of the Invention
The present invention relates to a method and an apparatus for character recognition used when a document such as a printed document, a hand-written document or the like, which is not converted into text data, is converted into text data.
(2) Related Art
There is a certain type of a character recognizing apparatus for converting a printed document or a hand-written document into text data, into which a post-processing is introduced to propose a plurality of candidate characters if the apparatus cannot accurately recognize a character in the document so as to determine a correct character among the plural candidate characters, whereby a rate of recognition is improved.
FIG. 45
is a block diagram showing a general character recognizing apparatus. Now, an operation of the general character recognizing apparatus will be described with reference to FIG.
45
. An image inputting unit
10
captures a paper document, and converts it into image data in a form of bit map. A region dividing unit
31
divides the image data into a character region and a region of picture, graphics or the like other than the character region.
A character extracting unit
32
extracts one character from the divided character region, and supplies it to a character recognizing unit
33
. The character recognizing unit
33
recognizes the character to convert it into character data, and makes a plurality of conversion candidate characters. When a process of recognizing all characters in the character region is completed, a post-processing unit
34
morphologically analyzes a sentence configured with a combination of the conversion candidate characters.
Namely, the post-processing unit
34
requests a dictionary searching unit
20
to search for a word as a search condition. The dictionary searching unit
20
searches for the given word in a word dictionary
40
, and replies as to whether or not there is the word in the word dictionary
40
. The post-processing unit
34
outputs the word as a correct word if the word exists in the word dictionary
40
.
The character recognizing apparatus corrects a character improperly recognized by the character recognizing unit
33
, using the dictionary, as above.
However, the above character recognizing apparatus with the above structure requires enormous labor and time to make a dictionary such as the word dictionary, and maintenance thereof since the morphological analysis is carried out using the dictionary as a post-processing.
Further, the morphological analysis requires complex processes, a lot of time to configure and operate a system therefor, and tends to make a lot of mistakes if there exists an unrecognizable word in the document.
In the light of the above problems, an object of the present invention is to provide a method and an apparatus for character recognition, which can accurately correct misrecognition, and whose system can be configured readily and within a short period of time.
SUMMARY OF THE INVENTION
The object of the present invention is achieved by providing a character recognizing method, comprising the steps of:
recognizing an input character image indicating an input character of an input document as one or more conversion candidate characters denoting candidates for the input, character for each of input character images indicating input characters of the input document;
selecting a series of search character images indicating a series of search input characters from the series of input character images;
selecting a plurality of particular conversion candidate character strings respectively corresponding to the series of search character images from the particular conversion candidate characters;
preparing registered text data indicating one or more registered documents;
searching the registered text data for one particular conversion candidate character string for each of the particular conversion candidate character strings to count an occurrence frequency of the particular conversion candidate character string in the registered text data for each of the particular conversion candidate character strings;
selecting a specific particular conversion candidate character string corresponding to the highest occurrence frequency among those of the particular conversion candidate character strings from the particular conversion candidate character strings; and
determining a series of specific particular conversion candidate characters composing the specific particular conversion candidate character string as a series of correct characters for the series of search character images.
The object of the present invention is also achieved by providing a character recognizing apparatus, comprising:
character recognizing means for recognizing an input character image indicating an input character of an input document as one or more conversion candidate characters denoting candidates for the input character for each of input character images indicating input characters of the input document, selecting a series of search character images indicating a series of search input characters from the series of input character images and selecting a plurality of particular conversion candidate character strings respectively corresponding to the series of search character images from the particular conversion candidate characters;
registered text data storing means for storing registered text data indicating one or more registered documents;
full text searching means for searching the registered text data stored by the registered text data storing means for one particular conversion candidate character string for each of the-particular conversion candidate character strings recognized by the character recognizing means to count an occurrence frequency of the particular conversion candidate character string in the registered text data for each of the particular conversion candidate character strings;
post-processing means for selecting a specific particular conversion candidate character string corresponding to the highest occurrence frequency among those of the particular conversion candidate character strings counted by the full text searching means from the particular conversion candidate character strings recognized by the character recognizing means and determining a series of specific particular conversion candidate characters composing the specific particular conversion candidate character string as a series of correct characters for the series of search character images; and
registered text data outputting means for outputting the series of correct characters determined by the post-processing means as the series of search character images.
In the above steps and configuration, under circumstances where a character recognition cannot be correctly performed, an input character image indicating an input character is recognized as one or more conversion candidate characters for each of input character images indicating input characters. The conversion candidate characters denote candidates for the input character. Thereafter, a series of search character images is selected from the input character images, and a plurality of particular conversion candidate character strings respectively corresponding to the series of search character images are produced from the particular conversion candidate characters by repeatedly selecting the series of particular conversion candidate characters corresponding to the series of search character images. Thereafter, the registered text data indicating one or more registered documents is searched for each particular conversion candidate character string. Therefore, an occurrence frequency of each particular conversion candidate character string in the registered text data can be counted. Thereafter, a specific particular conversion candidate character string corresponding to the highest occurrence frequency is selected, and a series of specific particular conversion candidate characters composing the specific particular conversion candidate character string is determined as a seri

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for character recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for character recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for character recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2825321

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.