Pattern extraction apparatus

Image analysis – Image segmentation – Segmenting individual characters or words

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S178000, C382S185000, C382S194000, C382S202000

Reexamination Certificate

active

06434270

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a pattern extraction apparatus and a pattern extracting method, and is specifically applicable to a case where a box and a ruled line indicating the range of a pattern containing characters, graphics, symbols, images, etc. in a hand-written character recognition apparatus, a printed character recognition apparatus, a graphics recognition apparatus, etc.
2. Prior Art Technology
Recently, there has been an increasing demand for a hand-written character recognition apparatus such as an optical character reader as a peripheral unit for inputting financial documents, business documents, etc.
A conventional optical character reader performs a character segmenting process on each character of a character pattern from an input image before recognizing a character. To attain a high character recognition rate for each character, an optical character reader has to correctly segment a character as a pre-recognition process.
Therefore, when a conventional optical character reader reads a character, a character is written in a specified range in a document such as a listing in which a character input position is specified (not with drop-out color but with, for example, a black rectangular box or a ruled line with similar color or density as a character) to attain a high recognition rate.
However, the conventional optical character reader has the problem that the character recognition rate is low because a character cannot be correctly segmented when the ruled line or rectangular box indicating a specified input range touches or intersects the character. For example, a current optical character reader cannot recognize a slight obliqueness, concavity, or convexity of a rectangular box when the rectangular box is removed. As a result, if the position or the line width of a rectangular box is changed, a part of a character to be recognized may be lost or a part of the rectangular box may remain unremoved.
When a range of inputting characters in a listing is specified, the information about the position and the fineness of a ruled line should be preliminarily stored, and the information about the range of inputting characters should be updated if a listing format is changed. Therefore, the conventional system gives a user a heavy load. Furthermore, in a system of specifying a character range, an unknown listing format cannot be processed.
In the previous Japanese patent application (Tokuganhei) No. 7-203259, the Applicant suggested the technology of extracting and removing a rectangular box without inputting format information about the position or size of a rectangular box. Applicable listings in this technology are a one-rectangular box, a block rectangular box (containing a single horizontal row of characters, or a free-format rectangular box), or a table having rectangular box with horizontal lines regularly arranged. Furthermore, the technology can process listings having no rectangular tables, having further complicated table structures, or listings in which dotted lines and solid lines coexist.
Described below is the outline of the process performed by the pattern extraction apparatus described in the specification and the attached drawings of the previous Japanese patent application (Tokuganhei) No. 7-203259.
First, an input image is labelled, and a portion pattern which is formed from pixels linked to each other in any of eight directions, that is, horizontally, vertically and diagonally, can be extracted as a linked pattern.
Then, the horizontal or vertical lines are fined to reduce the difference in fineness of lines between a character and a rectangular box by performing a masking process on a linked pattern extracted by labelling an input image. In the masking process, the entire image of the linked pattern is scanned using two types of masks, that is, a horizontal mask and a vertical mask. The proportion of the pattern to the mask is computed. If the proportion is above a predetermined value, then the entire mask is recognized as a pattern. If it is equal to or below the predetermined value, then vertical and horizontal elements are extracted by deleting the pattern in the mask.
Then, the masked pattern is divided into a plurality of pieces vertically or horizontally, and a contiguous projection value of the pattern is computed in each of the ranges divided vertically and horizontally. Based on the contiguous projection pattern, a predetermined length of a line or a part of a straight line is detected by an approximate rectangle. A contiguous projection value is obtained by adding the projection value of a target row or a target column to the projection value of a row or a column close to the target row or the target column.
Next, among the lines each forming part of a rectangle obtained by the contiguous projection method, adjacent lines forming part of a rectangle are combined into a long line. Thus, the obtained lines form an approximate rectangle, and can be recognized as candidates for horizontal or vertical ruled lines of a listing.
Then, the horizontal or vertical lines recognized as candidates for ruled lines are searched to detect the left and right margins for the horizontal lines, and the upper and lower margins for the vertical lines.
Next, small patterns arranged at predetermined intervals are detected to extract dotted lines and obtain an approximate rectangle using the dotted lines as in the above described lines.
A set of two horizontal lines forming part of a rectangular box is determined from among the horizontal lines detected in the above described process. Two horizontal lines are sequentially extracted from the top. When the two extracted horizontal lines have the same length or the lower horizontal line is longer than the upper horizontal line, the two horizontal lines are recognized as a set of horizontal lines. Unless the two extracted horizontal lines have the same length or the lower horizontal line is longer than the upper horizontal line, the two lines are recognized as a set even if the lower line is shorter.
Then, from among the horizontal lines detected in the above described process, the vertical ruled lines are determined if both upper and lower ends of them reach the above described set of two horizontal lines recognized as a set of two horizontal ruled lines.
Then, the range of a rectangle encompassed by the above described set of two horizontal lines and the two vertical ruled lines both upper and lower ends of which reach the set of the two horizontal lines is extracted as a cell. A line forming part of the cell is recognized as a ruled line. A line not forming part of the cell is recognized as a pattern other than a ruled line.
When the rectangle encompassed by the horizontal and vertical ruled lines determined in the above described process is further divided into smaller rectangular areas, the rectangle is newly defined as a table. By repeating the above described process, the rectangular areas are divided into furthermore smaller rectangles.
Thus, according to the conventional technology, any table formed by rectangular areas can be processed regardless of a regular or an irregular structure of a rectangular box. The process can also be performed on solid lines and dotted lines as ruled lines to be processed.
However, the above described pattern extraction apparatus selects as a candidate for a ruled line an area having a high density of pixels. If characters are close to each other or touch each other, the density of the pixels becomes high around the characters, and the character area can be regarded as a candidate for a ruled line.
For example, in
FIG. 1A
, when a character string 201
is entered in a listing
200
, the density of the pixels of the pattern in a rectangular area
202
is high. Therefore, the pattern is recognized as a candidate for a ruled line although it is part of the character string
201
. However, since the rectangular area
202
does not touch any of the ruled lines forming the listing
200
, the rectangular area
202
cannot

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Pattern extraction apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Pattern extraction apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pattern extraction apparatus will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2945734

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.