Character recognition system

Image analysis – Image enhancement or restoration

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S289000, C382S290000, C382S291000, C382S292000

Reexamination Certificate

active

06671417

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to character recognizing systems, character recognizing methods and recording media, in which control programs for the same are recorded, and more particularly to optical character recognizing systems for reading characters written on paper or the like with an optical sensor.
In prior art character recognizing systems of the pertaining type, some preprocessings are executed on inputted image for the purpose of correcting variations of the size, skew, etc. of the inputted image.
Among well-known examples of preprocessing are character size normalization and skew correction. Among these preprocessings, reference line detection and correction are particularly applied to character strings of English words or like alphabet characters.
FIG. 5
is a view illustrating the definition of reference lines of a character row. The reference lines are of two different kinds, i.e., an upper and a lower reference line. The dashed and broken lines shown in superposition on the word “good” in
FIG. 5
are the lower and upper reference lines, respectively. The position of the upper reference line is determined such that the upper, end of lowercase characters without ascender or descender (such as a, c, e, m, n, o, u, v, w, x and z) is found on or in the vicinity of the line. The position of the lower reference line is determined such that the lower end of lowercase characters without ascender or descender is found on or in the vicinity of the line.
Considering rectangular areas inscribing and circumscribing a character row, the area under the lower reference line is referred to as descender area. The area over the upper reference line is referred to as ascender area. And the area intervening between the upper and lower reference lines is referred to as body area.
One purpose of reference line detection and correction is as follows; Usually, the area ratios of the body, descender and ascender areas of a hand-written character row are not fixed. The descender and ascender sizes depend on writers. In other words, the area ratios of the body, descender and ascender areas vary with writers. Therefore, only with size normalization for entire character string image, the body area size variations remain, so that it is difficult to read a character row with high accuracy in the succeeding character recognizing process stage.
By detecting the reference lines and correcting the image to obtain constant area ratios (or height ratios) of the body, descender and ascender areas (for instance 1:1), normalized body, descender and ascender areas are obtainable, so that it is possible to expect an accurate character recognizing process in the succeeding stage.
The reference line detection and correction have the following second purpose. Usually, characters in a character row are rarely written in an accurate horizontal direction. In many cases, as a hand-written character row proceeds rightward, the character position is deviated vertically, and also the character size is increased and decreased. Consequently, the upper and lower reference lines fail to be horizontal and parallel. (
FIG. 6
shows such an example.)
By detecting the reference lines from the inputted character row image and correcting the image to obtain horizontal reference line skew, variations of the character row skew and character size in the row can be observed, so that it is possible to expect accurate character recognition in the succeeding stage.
For the above purposes, the prior art character recognizing system has a character row reference line detecting and correcting means. The prior art described above is disclosed in Bozinovic et al, “Off-Line Cursive Script Word Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 1, pp. 68-83, 1989.
In a reference line detecting process disclosed in this literature, a histogram of a horizontally written character row image is obtained by projecting the image horizontally and counting black pixels in each row. Then, the differences between the black pixel numbers in adjacent pixel columns are calculated, positions corresponding to the maximum and minimum differences are then selected, and horizontal straight lines containing the selected positions are made to be the reference lines. This method utilizes the fact that many black pixels are present in the body part of the image.
Another method is disclosed in Caesar et al, “Estimating the Baseline for Written Material”, Proceeding of Third International Conference on Document Analysis and Recognition, 1995.
In a reference line detection process disclosed in this literature, the contour lines of a horizontally hand-written character row image are vertically divided into two parts, and the locally maximal points of the upper contour parts and the locally minimal points of the lower contour parts are all extracted. Then, by adopting the least square method, straight lines are applied as the upper and lower reference lines to the maximal points of the upper contour parts and the minimal points of the lower contour parts. This method utilizes the fact that the majority of the contour lines are located in the neighborhood of the reference lines.
As described above, the prior art techniques mostly utilize such geometric data as image projection and contour directions to detect reference lines by outputting straight lines, which are best applicable as the upper and lower reference lines.
In the above prior art character recognizing method, only positions (y-coordinates) of the upper and lower reference lines, or only positions (y-coordinates) and skews of the reference lines, are estimated under the assumption that the reference lines are straight lines which are horizontal or have a given skew.
FIG. 7
is a block diagram showing the functional constitution of a prior art example of character recognizing system. This example of character recognizing system comprises an image recording means
1
, a preprocessing means
2
, a character row reading means
3
, a reference line detecting means
6
and a reference line correcting means
5
. The reference line detecting means
6
includes a reference line position estimating means
41
and a reference line skew estimating means
42
.
The image recording means
1
stores an inputted character row image. The preprocessing means
2
executes a preprocessing, such as size normalization or character skew correction of the character row image reproduced from the image recording means
1
. The character row reading means
3
reads out character rows in the image preprocessed in the preprocessing means
2
by adequately executing character segmentation, character recognition, language processing, etc.
The reference line detecting means
6
receives the character row image as the subject of preprocessing from the preprocessing means
2
, and estimates the positions and skews of reference lines of the character rows. In the reference line detecting means
6
, the reference line position estimating means
41
estimates reference line positions (y-coordinates), and the reference line skew estimating means
42
estimates reference line skews.
The reference line correcting means
5
receives the estimated values of the reference line positions and skews from the reference line detecting means
6
, and shapes the character row image by affine transformation to obtain horizontal reference lines and predetermined area ratios of the body, descender and ascender areas.
The reference line detecting means
6
projects the character row image horizontally, and produces a histogram by calculating the black pixel number for each pixel row. Specifically, the means
6
obtains the total black pixel number h(j) (j=1, . . . , N) of pixel row corresponding to y-coordinate j (i.e., vertical coordinate, being positive in downward direction of the image. N is the height of the character row image.
The reference line position estimating means
41
calculates histogram difference (h(j)−h(j−1)), (j=1, . . . , N−1) between adj

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Character recognition system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Character recognition system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Character recognition system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3152485

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.