Image analysis – Pattern recognition – Feature extraction
Reexamination Certificate
1998-12-30
2004-03-23
Johnson, Timothy M. (Department: 2625)
Image analysis
Pattern recognition
Feature extraction
C382S176000
Reexamination Certificate
active
06711292
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to page segmentation systems for classifying data within specific regions of a document image. In particular, the present invention relates to a block selection system for identifying table images in a document image and for identifying features within the table images.
2. Incorporation by Reference
Commonly-assigned U.S. patent applications Ser. No. 07/873,012, now U.S. Pat. No. 5,680,479, entitled “Method and Apparatus For Character Recognition”, Ser. No. 08/171,720, now U.S. Pat. No. 5,588,072, entitled “Method and Apparatus For Selecting Text And/Or Non-Text Blocks In A Stored Document”, Ser. No. 08/338,781, entitled “Page Analysis System”, Ser. No. 08/514,250, now U.S. Pat. No. 5,774,579, entitled “Block Selection System In Which Overlapping Blocks Are Decomposed”, Ser. No. 08/514,252, now U.S. Pat. No. 5,848,186, entitled “Feature Extraction System”, Ser. No. 08/664,675, entitled “System For Extracting Attached Text”, and Ser. No. 09/002,684, entitled “System For Analyzing Table Images,” are herein incorporated as if set forth in full.
3. Description of the Related Art
A conventional page segmentation system can be applied to a document image in order to identify data types contained within specific regions of the document image. The identified types can then be used to extract data of a particular type from a specific region of the document image and to determine a processing method to be applied to the extracted data.
For example, using conventional systems, data identified as text data is extracted from a specific region of a document and subjected to optical character recognition (OCR) processing. Results of the OCR processing are stored in ASCII code along with information regarding the location of the specific region. Such storage facilitates word processing of the text data as well as subsequent reconstruction of the document. In addition, conventional systems can be used to extract data identified as graphics data, subject the extracted data to image compression, and store the compressed data along with location information. In sum, conventional page segmentation systems allow automatic conversion of bit-mapped image data of a document to an appropriate format, such as ASCII, JPEG, or the like, and also allow substantial reconstruction of the bit-mapped image.
One specialized example of such page segmentation concerns table images within a document. Once a table image is identified, processing such as that described in above-mentioned U.S. Pat. No. 5,848,186 or U.S. patent application Ser. No. 09/002,684 can be used to identify rows and columns within the table, to extract text data within individual table cells defined by the rows and columns, and to subject the extracted text data to OCR processing. As a result, table image data located within a document image can be automatically input to a spreadsheet application in proper row/column format.
The above-described systems are designed to recognize a standard-format table image having a solid frame and solid horizontal and vertical lines defining rows and columns within the table image. Accordingly, in a case that a table image contains broken or dotted grid lines, or contains no grid lines at all, the above systems are not likely identify the image as a table. Rather, the table is likely determined to be a region of text or a line drawing. Consequently, row/column information is not determined, nor are individual cells within the table associated with row/column addresses.
SUMMARY OF THE INVENTION
The present invention addresses the foregoing by providing identification of a table image in a document in which grid lines of the table image are broken, dotted, or otherwise incomplete. An additional aspect of the present invention provides output of text block coordinates and coordinates of areas roughly corresponding to individual table cells within the identified table. Advantageously, such information can be input to a table feature identification system to identify table columns, rows, or other features.
In one specific aspect, the invention is a system for identifying a table image in a document image which includes identification of a frame image in the document image, identification of white areas within the frame image, identification of broken lines within the frame image, calculation of horizontal and vertical grid lines based on the identified white areas and the identified broken lines, and determination of whether the frame is a table image based on the calculated horizontal and vertical grid lines. Beneficially, the identified table image can then be subjected to table-specific processing.
As described above, conventional page segmentation systems often misidentify a table image which does not contain a full set of horizontal and vertical grid lines. The present invention can also be utilized in such cases to properly identify and process the table image. According to this aspect, the present invention relates to a system for processing a region as a table image in a block selection system for identifying regions of a document image. The invention includes acceptance of user input indicating that a region of a document image is a table image, identification of white areas within the region, identification of broken lines within the region, and calculation of horizontal and vertical grid lines based on the identified white areas and the identified broken lines. As a result of the foregoing features, table information is obtained corresponding to the region, and can be used to further analyze the region for table features such as rows, columns or the like.
This brief summary has been provided so that the nature of the invention may be understood quickly. A more complete understanding of the invention can be obtained by reference to the following detailed description of the preferred embodiments thereof in connection with the attached drawings.
REFERENCES:
patent: 4953108 (1990-08-01), Kato et al.
patent: 5048107 (1991-09-01), Tachikawa
patent: 5075895 (1991-12-01), Bessho
patent: 5101448 (1992-03-01), Kawachiya et al.
patent: 5129012 (1992-07-01), Abe
patent: 5185813 (1993-02-01), Tsujimoto
patent: 5278920 (1994-01-01), Bernzott et al.
patent: 5287417 (1994-02-01), Eller et al.
patent: 5335290 (1994-08-01), Cullen et al.
patent: 5341227 (1994-08-01), Kumashiro
patent: 5420695 (1995-05-01), Ohta
patent: 5448692 (1995-09-01), Ohta
patent: 5465304 (1995-11-01), Cullen et al.
patent: 5485566 (1996-01-01), Rahgozar
patent: 5587808 (1996-12-01), Hagihara et al.
patent: 5588072 (1996-12-01), Wang
patent: 5617485 (1997-04-01), Ohuchi et al.
patent: 5661818 (1997-08-01), Gaborski et al.
patent: 5680478 (1997-10-01), Wang et al.
patent: 5680479 (1997-10-01), Wang et al.
patent: 5689342 (1997-11-01), Nakatsuka
patent: 5729627 (1998-03-01), Mizuno et al.
patent: 5745596 (1998-04-01), Jefferson
patent: 5754708 (1998-05-01), Hayashi et al.
patent: 5771313 (1998-06-01), Hayashi et al.
patent: 5774579 (1998-06-01), Wang et al.
patent: 5822454 (1998-10-01), Rangarajan
patent: 6006240 (1999-12-01), Handley
patent: 6044383 (2000-03-01), Suzuki et al.
patent: 6081616 (2000-06-01), Vaezi et al.
Katsuhiko Itonori; “Table Structure Recognition based on Textblock Arrangement and Ruled Line Position”; IEEE; Proceedings of the Second International Conference; pp. 765-768, Jul. 1993.*
Yuki Hirayama; “A Method for Table Structure Analysis Using DP Matching”; IEEE; Proceedings of the Third International Conference; vol. 2; pp. 583-586, Sep. 1995.
Canon Kabushiki Kaisha
Fitzpatrick ,Cella, Harper & Scinto
Johnson Timothy M.
LandOfFree
Block selection of table features does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Block selection of table features, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Block selection of table features will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3202543