Image analysis – Image segmentation – Distinguishing text from other regions
Reexamination Certificate
1998-01-05
2001-01-09
Mehta, Bhavesh (Department: 2721)
Image analysis
Image segmentation
Distinguishing text from other regions
C382S175000, C382S180000, C382S171000
Reexamination Certificate
active
06173073
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to page segmentation systems for classifying regions of a document image. More particularly, the present invention relates to a block selection system for identifying and defining features within table images.
2. Incorporation by Reference
Commonly-assigned U.S. application Ser. No. 07/873,012, now U.S. Pat. No. 5,680,479, entitled “Method and Apparatus For Character Recognition”, Ser. No. 08/171,720, now U.S. Pat. No. 5,583,072, entitled “Method and Apparatus For Selecting Text And/Or Non-Text Blocks In A Stored Document”, Ser. No. 08/338,781, entitled “Page Analysis System” now U.S. Pat. No. 5,987,171, Ser. No. 08/514,252, entitled “Feature Extraction System”, now U.S. Pat. No. 5,848,186, and Ser. No. 08/664,675, entitled “System For Extracting Attached Text”, are herein incorporated as if set forth in full.
3. Description of the Related Art
Conventional page segmentation systems are applied to document images in order to identify data types contained within specific regions of the document images. This information can be used to extract data within a specific region and to determine a type of processing to be applied to the extracted data.
For a document containing a table image, a region of text, or table cell, located within the table image can be converted to ASCII characters using optical character recognition (OCR) processing and stored in an ASCII file along with information corresponding to the location of the table cell. However, conventional systems cannot accurately determine a row and column address corresponding to the table cell. Accordingly, the recognized ASCII characters cannot be reliably input to a spreadsheet based on row and column address data.
In addition, the data produced by conventional systems is often insufficient to adequately recreate the internal features of a bit-mapped table image. For example, the data does not reflect vertical and horizontal grid lines within an analyzed table image. As defined herein, vertical and horizontal grid lines define each row and column within a table, and can be either visible or non-visible. Therefore, although a conventional system can be used to create an ASCII version of a bitmapped table, the ASCII version does not include data representative of table grid lines. Accordingly, the stored data cannot be used to accurately recreate a bit-mapped version of grid lines within the table. Moreover, in a case that it is desired to edit text within a table cell, it is difficult to determine, based on information provided by conventional systems whether the edited text will intersect with a grid line and thereby violate row/column boundaries.
Consequently, what is needed is a system for accurately identifying and representing internal features of a bit-mapped table image, such as rows, columns, and table grid lines.
SUMMARY OF THE INVENTION
The present invention addresses the foregoing by identifying super-cells within a bit-mapped table image. Super-cells are areas of a table image bounded by visible table grid lines and which include one or more table cells. Advantageously, the locations and dimensions of the identified super-cells can be used to reconstruct a bit-mapped image of the visible grid lines of the table image. In addition, by referring to the dimensions of a super-cell surrounding a table cell, it is possible to determine whether editing text of the table cell will cause the edited text to intersect a grid line.
According to one aspect, the invention is a system for performing block selection on a bit-mapped image of a table comprised of table cells arranged into rows and columns, the rows and columns defined by visible and non-visible grid lines. The system identifies super-cells that include one or more table cells, wherein super-cells are identified according to traced white areas surrounding table calls and bounded by visible grid lines, determines whether vertical and horizontal grid lines bounding each table cell are visible or non-visible, and determines whether vertical and horizontal grid lines bounding each super-cell are visible or non-visible.
By virtue of the foregoing, the present invention determines information which can be used to substantially reconstruct the internal features of a table image. Moreover, the determined information can be stored in or along with an ASCII file in order to provide an accurate representation of the entire table image.
In another aspect, rows within a table image are identified by 1) detecting areas of reversed text within the image of the table, 2) calculating a horizontal histogram of connected components within the image of the table, the histogram not reflecting connected components within the detected areas, 3) defining rows within the image of the table according to the horizontal histogram, and 4) re-defining the rows based on locations of traced white areas and partial grid lines with respect to the defined rows.
As a result of the foregoing aspect, table cells can be accurately identified and input to appropriate rows of a spreadsheet.
In a related aspect, columns within the table image are identified by 1) detecting areas of reversed text within the image of the table, 2) calculating a vertical histogram of connected components within the image of the table, the histogram not reflecting connected components within the detected areas, 3) defining columns within the image of the table according to the vertical histogram, and 4) re-defining the columns based on locations of traced white areas and partial grid lines with respect to the defined rows.
By virtue of the foregoing aspects, data contained in table cells can be accurately extracted and output to a spreadsheet application. Moreover, additional cells can be easily added to the table based on existing rows and columns.
This brief summary has been provided so that the nature of the invention may be understood quickly. A more complete understanding of the invention can be obtained by reference to the following detailed description of the preferred embodiment thereof in connection with the attached drawings.
REFERENCES:
patent: 5048107 (1991-09-01), Tachikawa
patent: 5075895 (1991-12-01), Bessho
patent: 5101448 (1992-03-01), Kawachiya et al.
patent: 5129012 (1992-07-01), Abe
patent: 5185813 (1993-02-01), Tsujimoto
patent: 5278920 (1994-01-01), Bernzott et al.
patent: 5335290 (1994-08-01), Cullen et al.
patent: 5420695 (1995-05-01), Ohta
patent: 5448692 (1995-09-01), Ohta
patent: 5465304 (1995-11-01), Cullen et al.
patent: 5485566 (1996-01-01), Rahgozar
patent: 5588072 (1996-12-01), Wang
patent: 5661818 (1997-08-01), Gaborski et al.
patent: 5680478 (1997-10-01), Wang et al.
patent: 5680479 (1997-10-01), Wang et al.
patent: 5689342 (1997-11-01), Nakatsuka
Wang, Shin-Ywan, et al., “Block Selection: A Method for Segmenting Page Image of Various Editing Styles”, Proceedings of the Third International Conference on Document Analysis and Recognition, Aug. 14-16, 1995, (8 pages).
Canon Kabushiki Kaisha
Fitzpatrick ,Cella, Harper & Scinto
Mehta Bhavesh
LandOfFree
System for analyzing table images does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System for analyzing table images, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for analyzing table images will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2490227