Apparatus for character segmentation and apparatus for...

Image analysis – Image segmentation – Segmenting individual characters or words

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S215000, C382S218000

Reexamination Certificate

active

06330358

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an apparatus for character segmentation and to an apparatus for character recognition using the same. More specifically, the present invention relates to an apparatus for character segmentation in which hand-written characters are provided as inputs, characters are segmented from character data therefore and the result of segmentation are provided, and to an apparatus for character recognition for recognizing the segmented characters.
2. Description of the Background Art
FIG. 15
is a schematic block diagram showing a conventional apparatus for character recognition. Referring to
FIG. 15
, characters are turned into electronic data by input means
20
, and, at first, segmented from a series of characters to separate characters by character segmentation means
21
. Features of each segmented character are extracted by feature extracting means. Recognizing means
23
recognizes the character by calculating a distance between the feature which has been prepared in advance as a dictionary and the feature of the input character, based on the extracted feature. States of and results from input means
20
, character segmentation means
21
, feature extracting means
22
and recognizing means
23
are provided by output means
24
.
Now, it is essential in any character recognizing system to segment separate characters from input character images. A Character recognizing system must have a character segmentation means of sufficiently high performance as well as a character recognizing means of similarly high performance.
Further, before segmentation of characters one by one, a plurality of characters (e.g. lines of characters) must be extracted from a document. Generally, when the document is written laterally, a region where characters exist is estimated from projection of character images in the vertical direction. Even in a case of a hand-written document, characters are written relatively linearly if the characters are written laterally, though one of characters may be inclined. Therefore, in most cases, characters can be correctly extracted from documents by using projections of document images. Even if the individual hand-written character is not neat and character components of different characters are overlapped, positions of the characters can be estimated fairly well from the projections in the document. There may be components of other characters in the characters extracted from hand-written document, however, at least the characters of the objective array of characters are included. Namely, in the prior art, characters including the objective character can be sufficiently extracted from the hand-written document.
Segmentation of characters is effected by checking presence/absence of the spacing of characters. Generally, when the characters are written laterally, gaps between projections of the images of characters in the horizontal direction are used as references, and characters are linearly segmented one-dimensionally. Since italic letters and the like are inclined by a prescribed angle, the direction of projection is changed variously when projection of character images are taken, and the projection at which the angle becomes most acute is employed to enable segmentation of characters employing the projection in the similar manner as in the case of segmentation of general characters. Compared with printed letters in which the spacing between characters is constant, adjacent characters are often very close to each other when the characters are hand-written. In such a case, there is no gap in the projection as represented for example by the projections of characters “
” and “
” shown in FIG.
3
(
a
), and therefore the characters “
” and “
” cannot be directly segmented based on the gap of projections. For this reason, processing such as forced character segmentation is carried out, assuming that the height of the array of characters represents the size of the character, utilizing the nature of Japanese that generally the height and width of a character are identical in most cases. In the example of FIG.
3
(
a
), the gap between the character “
” and the character “
” can be estimated. Using this gap as a reference, the characters “
” and “
” can be segmented.
If components of a character are spaced from each other as in the case of a character “
” shown in FIG.
3
(
b
), for example, the components of the character “
” may possible be erroneously segmented as separate characters. Alternatively, the left side of the character “
” may be segmented as a part of another character from the projection of FIG.
3
(
b
). Therefore, the gap cannot be fully relied on as a gap between characters. Accordingly, a general method is also proposed in which portions tentatively segmented by using projections are labeled by using results of recognition and knowledge of words, and these portions are combined to find an optimal combination based on grammatical meaning to allow complete segmentation. The features of characters used in character recognition means strokes and the like which are basic components of characters and states of distribution of coordinate points when the characters are viewed as character images represented quantitatively as multidimensional vectors. These features are mainly conceived based on intuition of the designer of the character recognition system, and various and many features are employed in character recognition systems. Recognition is done based on the method of multivariate analysis of the multidimensional vectors obtained in this manner.
As for character segmentation, projections are exclusively used as described above. Namely, character segmentation and character recognition are separately carried out using completely different methods.
However, when the method of general determination employing results of recognition and knowledge of words is employed, the number of combinations to be examined will be very large and hence the time necessary for determination becomes very long. The larger the number of characters become, the longer the necessary time. In addition, the software for executing such determination is quite complicated as it must cope with exceptions in determination, so that much time and labor is necessary for producing such software.
If components of a Chinese character are segmented as separate portions, a left-hand radical and right-hand radical may possibly be treated as independent characters. In such a case, the components must be analyzed not only by grammatical knowledge but also by the context. Therefore, un even longer time is required for examination of the characters and extensive knowledge related to context is necessary to analyze all possible combinations. Especially in case of hand-written characters which are written very close with each other, the possibility of correct segmentation of individual characters by using projection is low, and therefore satisfactory results cannot easily be obtained unless optimal combinations of portions simply segmented are to be found.
When projections are used, segmentation is effected linearly. Therefore, if a character is forcefully separated at a character width estimated based on the height of the array, it is possible that component of other characters exist in the segmented character region, which leads to lower ratio of recognition.
It may helpful to change the direction of projection such as in the case of italic letters, to cope with the inclination of hand-written characters. However, different from printed characters, inclination of respective characters existing in the same array of characters is not constant even if the writer is the same. Further, the character are often rotated and buried. Therefore, it is difficult to correctly estimate the spacing of characters even if the direction of projection is changed variously.
As described above, since hand-written characters are in most cases written very close to each other, it is difficult to extract characters one-dimensionally by using pro

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus for character segmentation and apparatus for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus for character segmentation and apparatus for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus for character segmentation and apparatus for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2562228

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.