Image analysis – Pattern recognition – Classification
Reexamination Certificate
1997-06-20
2004-02-03
Au, Amelia M. (Department: 2623)
Image analysis
Pattern recognition
Classification
C382S180000
Reexamination Certificate
active
06687404
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates generally to the field of computer-implemented methods of, and systems for, text image modeling, recognition and layout analysis, and more particularly to a method and system for training layout parameters specified in a two-dimensional (2D) image grammar that models text images. The image grammar is used in various document processing operations, including document image recognition and layout analysis operations.
1. Document Image Layout Analysis and Image Layout Models.
Document image layout analysis is a type of document image recognition operation implemented in a processor-controlled machine that automatically makes determinations about the geometric, spatial and functional relationships of the physical and logical structures of a text (or document) image. These physical and logical structures are referred to herein as “image constituents.” An image constituent as used herein is a portion of an image that is perceived by an observer to form a coherent unit in the image. Image constituents are typically represented in an image-based data structure as collections of pixels, but are generally described in terms of the perceived unit, rather than in terms of the pixels themselves. Examples of image constituents include the conventional text units of individual character or symbol images (referred to as “glyphs”), words, and text lines. Image constituents can contain other image constituents and so may also include groupings of these conventional text units into the logical, or semantic, notions of paragraphs, columns, sections, titles, footnotes, citations, headers, page numbers, and any one of a number of other logical structures to which the observer of a document may assign meaning. A glyph is typically considered to be the smallest image constituent; a group of image constituents is often called a “block,” and includes, by way of example, a word, a text line, or a group of text lines. The layout analysis process typically produces the image locations of blocks with functional labels assigned to them according to their physical features, their physical image location, or their logical meaning, as imposed by the functional requirements of a particular type of document. The location of an image constituent is typically expressed in terms of image coordinates that define a minimally-sized bounding box that includes the image constituent.
To enhance the ability to perform functional layout analysis some document image layout systems use a priori information about the physical structure of a specific class of documents in order to accurately and efficiently identify constituents in documents that include specific types of higher-level image constituents. This a priori information is commonly referred to as a document “class layout specification,” or may be referred to as a document image model. A document class layout specification describes or models the structure of the class of documents to be analyzed and supplies information about the types, locations and other geometrical attributes of the constituents of a given class of document images.
A class layout specification may be supplied in one of two ways: (1) as an explicit data structure input to the layout analysis system, which typically allows for different types of documents to be processed according to the structural information provided by the data structure input; or (2) in the form of document description information that is implicitly built into the processing functionality of the system, on the assumption that all documents to be processed by the system are restricted to having the same structural layout specification. A class layout specification in effect “tunes” the layout analysis system to particular document structures and restricts the type of document image for which layout analysis is to be performed.
Examples of document image layout systems that make use of an explicit class layout specification are disclosed in U.S. Pat. No. 5,574,802, entitled, “Method and Apparatus for Document Element Classification by Analysis of Major White Region Geometry”; in G. Story, et al, in “The RightPages image-based electronic library for alerting and browsing”,
IEEE Computer
, September 1992, pp. 17-26 (hereafter, “the Story reference”); in G. Nagy, et al in “A prototype document image analysis system for technical journals”,
IEEE Computer
, July, 1992, pp. 10-22 (hereafter “the Nagy reference”); in A. Dengel, “ANASTASIL: a system for low-level and high-level geometric analysis of printed documents”, in H. Baird, H. Bunke and K. Yamamoto,
Structured Document Image Analysis
, Berlin: Springer-Verlag, 1992; in J. Higashino, H. Fujisawa, Y. Nakano, and M. Ejiri, “A knowledge-based segmentation method for document understanding”,
Proceedings of the
8
th
International Conference on Pattern Recognition
(
ICPR
), Paris, France, 1986, pp. 745-748; and in L. Spitz, “Style directed document recognition”,
First Intl. Conf. on Doc. Analysis and Recognition
(
ICDAR
), Saint Malo, France, September 1991, pp. 611-619.
U.S. Pat. No. 5,574,802 discloses a system for logically identifying document elements in a document image using structural models; the system includes a geometric relationship comparator for comparing geometric relationships in a document to the geometric relationships in a structural model to determine which one of a set of structural models of document images matches a given input document image. A logical tag assigning system then assigns logical tags to the document elements in the image based on the matching structural model. If the document elements are treated as nodes and the spatial relationships between the document elements are treated as links between the nodes, the document elements and relationships of a structural model form a graph data structure. Structural models are preferably predefined and prestored, and may be created by an end user, using a specific structural model definition support system, based on observation of model documents which best represent the type of document to be represented by a particular structural model. U.S. Pat. No. 5,574,802 discloses further that during creation of the structural model, the end-user may be prompted to designate rectangles for the document elements contained in sample document images, and the structural model definition support system then measures the distances between the designated rectangles for each of the major geometric relationships (i.e., either an “above-below” or “right-left” relationship) and stores these measurements.
The Story reference discloses the use of explicitly-defined “partial order grammars” (“pogs”) to guide labeling of rectangular blocks that are extracted from journal table of contents page images. During pogs parsing of a page image in the RightPages system, each rectangular block identified and extracted is considered a terminal symbol and two relationships between blocks are defined: a left-right relationship and an above-below relationship. The grammar groups the rectangles into the image constituents.
The Nagy reference discloses a document image analysis system, called the “Gobbledoc” system, that uses an explicit document class layout specification in the form of a collection of publication-specific document grammars that are formal descriptions of all legal page formats that articles in a given technical journal can assume. The document grammar guides a segmentation and labeling process that subdivides the page image into a collection of nested rectangular blocks. Applying the entire document grammar to an input page image results in a subdivision of the page image into nested rectangular blocks. The subdivision is represented in a data structure called the X-Y tree. The rectangular regions are labeled with logical categories such as abstract, title-block, byline-block, reference-entry and figure-caption.
Many existing systems rely on a two-part process to perform document image layout analysis. A first phase that performs feature analysis or extraction, com
Arnon Dennis S.
Chou Philip A.
Hull Jesse
Kopec Gary E.
Au Amelia M.
Miller Martin
Xerox Corporation
LandOfFree
Automatic training of layout parameters in a 2D image model does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic training of layout parameters in a 2D image model, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic training of layout parameters in a 2D image model will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3313665