Optical character reading method and system for a document...

Image analysis – Pattern recognition – On-line recognition of handwritten characters

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S194000, C382S198000, C382S219000, C382S228000, C382S161000

Reexamination Certificate

active

06636631

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention generally relates to an optical character reader (OCR) and more particularly to a method of and a system for recognizing printed and handwritten characters in a document or a filled-in form.
2. Description of the Prior Art
Similarity calculation techniques based on distance measures are generally used as the similarity calculation techniques in prior art OCR systems. The distance measures are roughly classified into distance measures with respect to the dimension of each of the pattern vectors and distance measures with regard to the distribution of all dimensions. It is known that similarity calculation techniques based on the latter distance measure are more effective as indicated in; N. Kato et al., “A Handwritten Character Recognition System Using Modified Mahalanobis Distance,” Trans. IEICE (The Institute of Electronics, Information and Communication Engineers) Japan, Vol. J79-D-II, No. 1, January 1996, pp. 45-52 (which is hereby incorporated by reference). Among others, the multiple similarity method is one of practically used and popular similarity calculation techniques.
The multiple similarity method first performs a principal component analysis on the distribution, in a pattern space, of training character patterns included in a character category to obtain N eigen values &lgr;i and N eigen vectors &psgr;i, where i=1, 2, . . . , N. For an input character pattern vector f, the similarity S is calculated by using the eigen vectors &psgr;i as reference according to the following equation:
S
j
=
(

i
=
1
N

λ
j
,
i
λ
j
,
i
·
(
ψ
j
,
i
,
f
)
2
|
(
)
j
,
i

|
2
|
·
|

|
2
&RightBracketingBar;
)
1
2
.
The largest similarity and the second largest similarity are compared with each other to determine whether to reject the calculation result. If not (the calculation result should not be rejected), the category of the largest similarity is generally used as the recognition result.
A contour structure matching method is one of character recognition methods using structural features in a conventional OCR. The contour structure matching method is described in Y. Kurosawa et al, “Recognition of Unconstrained Handwritten characters,” Toshiba Review Vol.41, No.12, pp.1012-1055 (1986).
In this method, a freeman code chain coded in eight directions is extracted as initial stage feature data in each input character pattern. The extracted code chain is smoothed and divided by the curvature into couture segments. The divided segments or couture segments are classified into three kinds, i.e., convex segments, linear segments and concave segments. Each of the contour segments is given attribute information such as the length, the radius, the position, the direction of the contour segment.
A database of this method stores, for each character, a description of reference pattern comprising a sequence of contour segments each having above-mentioned attribute information. Each data of the attribute information has an upper limit and a lower limit set. The recognition of an input pattern is performed by examining correspondence between the segments of the input pattern and the segments of a reference pattern.
Y. Kurosawa et al also reported in the above-cited document that a combination of the contour structure matching method and the multiple similarity method provided a much lower rejection rate as compared with cases of using only one of the contour structure matching method and the multiple similarity method.
If a recognition method that only uses statistical characteristics obtained through a principal component analysis is used in a OCR system, it will be difficult for such systems to discriminate similar categories the human would easily discriminate because structural differences in the character pattern are not taken into account in the recognition process. On the other hand, a recognition method that only uses structural characteristics is disadvantageously inefficient in creating a reference pattern database.
Further, if handwritten characters written on an entry form with ruled lines are to be optically read and recognized as they are (without the ruled lines dropped out), character extraction errors tend to occur. For example, a character may couple with a part of a ruled line or may lose a part of itself due to a contact with the ruled line or a crossing the ruled line. A correction of an amount of money may cause a part of a corrected portion or a correction seal impression (a small seal impression stamped by a corrector) to be extracted as a part of the character. In such a case, judging whether to reject the extracted pattern only by the degree of similarity disadvantageously results in an erroneous recognition.
SUMMARY OF THE INVENTION
Accordingly, it is an object of the invention to provide a character recognition method and system that exhibits an improved recognition performance by adopting both of a statistical characteristic-based recognition method and a structural characteristic-based recognition method such that the above-mentioned problems are solved. Specifically, an inventive character recognition method and system mainly uses a recognition method based on statistical characteristics but uses structural characteristics only for discriminating similar categories that would invite errors with a recognition method based on statistical characteristics.
According to an aspect of the invention, a method of recognizing characters of a document including ruled lines in a system having means for reading the document and storing read data as an input image of pels. In this method, the characters are separated from the ruled lines. For each of the characters, a bounding box and a corresponding character pattern are extracted. A bounding box is a box that contains a character with four sides thereof contacting the character. Since the following process is executed for each character, definite articles are used instead of “each” here. Contour information and ground information are extracted as fundamental features from the character pattern. Various statistical features are extracted on the basis of the fundamental features. Various structural features are extracted on the basis of the fundamental features and the character pattern. On the basis of the extracted statistical features, some candidates for the character and corresponding degrees of similarity are found to provide at least such one of the candidates as has largest degree of similarity and degree(s) of similarity associated with the provided candidate(s). A final candidate for the character pattern is output on the basis of the provided candidate(s), the provided degree(s) of similarity, the structural features and the bounding box.
According to another aspect of the invention, a system for recognizing characters of a document including ruled lines is provided. The system comprises a scanner and digitizer for reading the document and storing read data as an input image of pels; a character extractor for separating the characters from the ruled lines and, for each character, extracting a bounding box and a corresponding character pattern defined by the bounding box; a fundamental feature extractor for extracting contour information and ground information as fundamental features from each character pattern; a statistical feature extractor for extracting at least one kind of statistical features on the basis of the fundamental features; a statistical feature-based recognizer for, on the basis of the extracted statistical features, finding some number of candidates for the character and corresponding degrees of similarity to provide at least such one of the candidates as has largest degree(s) of similarity and degree(s) of similarity associated with the provided candidate(s); a structural feature extractor for extracting at least one kind of structural features on the basis of the fundamental features and each character pattern; a structural feature-based recognizer for providing a final candi

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Optical character reading method and system for a document... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Optical character reading method and system for a document..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optical character reading method and system for a document... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3158584

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.