Image analysis – Histogram processing – For segmenting an image
Reexamination Certificate
1997-08-11
2001-05-01
Au, Amelia (Department: 2723)
Image analysis
Histogram processing
For segmenting an image
C382S199000, C382S286000, C707S793000
Reexamination Certificate
active
06226402
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a ruled line extracting apparatus for extracting a ruled line portion from an arbitrary document image read by a photoelectric converter, etc., and method thereof.
2. Description of the Related Art
In recent years, the demand for an electronic filing system which converts a paper document into an electronic form, and stores it on an optical disc, etc., has increased, in order to improve the efficiency of operations performed within a company. With a conventional electronic filing system, a paper document is converted into an image by a photoelectric converter such as an image scanner, etc., and the image with a search keyword attached is stored on an optical disc or on a hard disk. However, since the keyword must be input from a keyboard, the input operation is troublesome.
As a former application by the present applicant in order to overcome this troublesome operation, “Title Extracting Apparatus for Extracting Title from Document Image and Method Thereof, U.S. patent application Ser. No. 08/694,503, Japanese patent application H7-341983” can be referred to. With this method, a document title included in an image is automatically extracted and registered as a keyword. Additionally, management information such as a title, destination, transmitting source etc., can be automatically extracted from various document images including a table format document. For example, it is proved that a title outside a table can be extracted with approximately 90% accuracy.
A title inside a table, however, can be extracted with only 55% accuracy, which is insufficient to be put into practical use. To extract a keyword such as a title from inside a table with high accuracy, ruled lines structuring the table must be accurately extracted. The technique for extracting a ruled line has been developed mainly for a spreadsheet in which characters, etc. are regularly lined up.
As the conventional techniques for extracting a ruled line, “Image Extracting Method” (Japanese patent laid-open H6-309498) and “Image Extracting Apparatus” (Japanese patent laid-open H7-28937) can be referred to. With these techniques, a frame can be extracted or removed without requiring an input of information such as a frame position etc., in a spreadsheet. A spreadsheet which can be processed is a sheet composed of one-character frames, block frames (horizontal one-line frames, or free format frames), or a sheet having a structure in which the shape of a frame is rectangular, and horizontal frame lines are regularly arranged.
Additionally, as the techniques for extracting a ruled line according to former applications in Japan by the present applicant, “Frame Extracting Apparatus and Rectangle Extracting Apparatus” (Japanese patent application H7-203259), “Pattern Area Extracting Apparatus and Pattern Extracting Apparatus” (Japanese patent application H7-282171), and “Pattern Extracting Apparatus and Pattern Area Extracting Method” (Japanese patent application H8-107568) can be referred to.
With these techniques, a frame can be extracted/removed even if the outer periphery of frames is rectangular as shown in
FIG. 1A
, or not rectangular as shown in FIG.
1
B. Furthermore, the frame of a table structured by a rectangle which is surrounded by a frame, and partitioned into smaller portions, can also be extracted and removed, like the shaded portion shown in FIG.
1
B. Provided below is the outline of this process.
(1) thinning: With a mask process, horizontal and vertical segments are made thinner, and the difference between the thickness of a character and that of a frame is eliminated.
(2) segment extraction: a relatively long straight line is extracted with the adjacency projection method according to the “Image Extracting Method” (Japanese patent laid-open H6-309498). The adjacency projection method is a method for recognizing the result of adding the projection value of pixels included in rows or columns around a specific row or column, to the projection value of pixels in the specific row or column, as the final projection value of the specific row or column. With this method, pixel distribution around a particular row or column can be globally identified.
(3) straight line extraction: extracted segments are sequentially searched, and it is examined whether or not there is an empty space of a predetermined length between segments. If there is no such empty space, the segments are sequentially linked, so that a long straight line is extracted.
(4) straight line integration: extracted straight lines are again integrated. Straight lines separated into two or more portions due to a blur are integrated into one straight line.
(5) straight line extension: a straight line which is made shorter due to a blur is extended, and restored to its original length, only when a spreadsheet is proved to be regular.
However, the above described techniques have the following problems.
According to the techniques disclosed in the former applications, whether the shape of a frame of a spreadsheet is regular or irregular, it can be processed as long as it is a table frame composed of rectangular regions. Whether a ruled line to be targeted is a solid or dotted line, it can be processed regardless of the existence of a blur. Furthermore, a straight line which is made shorter due to an extreme blur is extended only when a table is proved to be regular.
A normal input image may sometimes include characters of a thick font, or a shaded portion in a table, as shown in FIG.
1
C. In such a case, a ruled line is erroneously extracted from a defaced character string in which characters touch one another, and ruled lines which are erroneously extracted may sometimes be integrated with correct ruled lines.
Additionally, a ruled line which touches a group of black pixels such as a shaded portion, or a ruled line which touches a character cannot be extracted. To overcome these problems, it is desirable that a table document such as a spreadsheet whose ruled-line structure is known beforehand should be a process target.
However, since it is unknown beforehand what type of table a normal document handled by electronic filing includes, the probability that various images including a defaced character etc., are input, is high. Accordingly, a ruled-line is not necessarily and correctly extracted according to the techniques of the former applications as they are.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a ruled line extracting apparatus and method thereof, which allow a ruled line portion to be extracted from a normal document image whose ruled-line structure cannot be predicted.
The ruled-line extracting apparatus according to the present invention comprises an estimating unit, storing unit, segment extracting unit, calculating unit, straight line extracting unit, graph generating unit, straight line processing unit, straight line integrating unit and a straight line deleting unit.
In a first aspect of the present invention, the estimating unit estimates the size of a standard pattern included in an input image; and the straight line extracting unit sets a threshold value based on the information about the size of the standard pattern, and extracts the information of one or more straight line patterns from the input image using the threshold value.
In a second aspect of the present invention, the straight line extracting unit extracts the information about one or more straight line patterns from an input image; the calculating unit obtains a representative value of the sizes of the one or more straight line patterns; and the straight line processing unit sets a threshold value based on the representative value, and processes the information of the one or more straight line patterns using the threshold value.
In a third aspect of the present invention, the straight line extracting unit extracts the information of one or more straight line patterns from an input image; the calculating unit obtains a representative value of the sizes of one or
Au Amelia
Fujitsu Limited
Johnson Timothy M.
Staas & Halsey , LLP
LandOfFree
Ruled line extracting apparatus for extracting ruled line... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Ruled line extracting apparatus for extracting ruled line..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Ruled line extracting apparatus for extracting ruled line... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2506312