Image analysis – Pattern recognition – Template matching
Reexamination Certificate
2000-05-12
2004-05-18
Dastouri, Mehrdad (Department: 2623)
Image analysis
Pattern recognition
Template matching
C382S228000, C382S194000
Reexamination Certificate
active
06738518
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates generally to image decoding and image recognition techniques, and specifically to image decoding and recognition techniques using stochastic finite state networks such as Markov sources. In particular, the present invention provides a technique for producing heuristic scores for use by a dynamic programming operation in the decoding of text line images.
Automatic speech recognition systems based on stochastic grammar frameworks such as finite state Markov models are known. Examples are described in U.S. Pat. No. 5,199,077 entitled “Wordspotting For Voice Editing And Indexing”, and in reference [2], both of which use hidden Markov models (HMMs). Bracketed numerals identify referenced publications listed in the Appendix of Referenced Documents.
Stochastic grammars have also been applied to document image recognition problems and to text recognition in particular. See, for example, the work of Bose and Kuo, identified in reference [1], and the work of Chen and Wilcox in reference [2] which both use hidden Markov models (HMMs) for word or text line recognition. See also U.S. Pat. No. 5,020,112, issued to P. A. Chou and entitled “Image Recognition Using Two-Dimensional Stochastic Grammars.”
U.S. Pat. No. 5,321,773, issued to Kopec and Chou, discloses a document recognition technique known as Document Image Decoding (hereafter, DID) that is based on classical communication theory. This work is further discussed in references [2], [4] and [5]. The DID model
800
, illustrated in
FIG. 28
, includes a stochastic message source
810
, an imager
811
, a channel
812
and a decoder
813
. The stochastic message source
810
selects a finite string M from a set of candidate strings according to a prior probability distribution. The imager
811
converts the message into an ideal binary image Q. The channel model
812
maps the ideal image into an observed image Z by introducing distortions due to printing and scanning, such as skew, blur and additive noise. Finally, the decoder
813
receives observed image Z and produces an estimate {circumflex over (M)} of the original message according to a maximum a posteriori (MAP) decision criterion. Note that in the context of DID, the estimate {circumflex over (M)} of the original message is often referred to as the transcription of observed image Z.
The structure of the message source and imager is captured formally by combining their functions into a single composite image source
815
, as shown by the dotted lines in FIG.
28
. Image source
815
models image generation using a Markov source. A Markov source is a stochastic finite-state automaton that describes the entire two-dimensional (2D) spatial layout and image components that occur in a particular class of document images as a regular grammar, representing these spatial layout and image components as a finite state network. Prior attempts with stochastic grammar representations of text images confined their representations to single words or single lines of text, without regard to where these words or lines were located on the 2D page. A general Markov source model
820
is depicted in FIG.
29
and comprises a finite state network made up of a set of nodes and a set of directed transitions into each node. There are two distinguished nodes
822
and
824
that indicate initial and final states, respectively. A directed transition t between any two predecessor (L
t
) and successor (R
t
) states in the network of
FIG. 29
has associated with it a 4-tuple of attributes
826
comprising a character template, Q, a label or message string, m, a transitional probability, &agr;, and a two-dimensional integer vector displacement, &Dgr;.
For example, Markov source model
830
illustrated in
FIG. 30
is a simple source model for the class of 2D document images that show a single column of English text in 12 pt. Adobe Times Roman font. In this model, documents consist of a vertical sequence of horizontal text lines, alternating with white (background) space. A horizontal text line is a sequence of typeset upper- and lower-case symbols (i.e., letter characters, numbers and special characters in 12 pt. Adobe Times Roman font) that are included in the alphabet used by the English language. The image coordinate system used with the class of images defined by model
830
is one where horizontal movement, represented by x, increases to the right, vertical movement, represented by y, increases downward, the upper left corner of the image is at x=y=0, and the lower right corner of the image is at x=W, y=H , where W and H respectively indicate the width and height of the image in pixels.
As illustrated in
FIG. 28
, a Markov source model serves as an input to an image synthesizer in the DID framework. For an ordered sequence of characters in an input message string in the English language and using model
830
of
FIG. 30
, the image synthesizer generates a page image of a single-text column by placing templates in positions in the page image that are specified by model
830
. The operation of text column source model
830
as an image synthesizer may be explained in terms of an imager automaton that moves over the image plane under control of the source model. The movement of the automaton constitutes its path, and, in the case of model
830
, follows the assumptions indicated above for the conventional reading order for a single column of text in the English language. From start state node n
I
at the top left corner of the image, the imager automaton enters and self-transitions through iterations of node n
1
vertically downward, creating vertical white space. At some point the imager reaches the top of a text line and enters state n
2
which represents the creation of a horizontal text line. The displacement (
0
,
34
) of the transition into n
2
moves the imager down to the text baseline;
34
is the font height above the baseline. The self-transitions at node n
2
, indicated by the loop at n
2
and symbols
831
and
832
, represent the individual characters of the font and horizontal white space such as occurs with spaces between words. The imager transitions horizontally from left to right along the text line through iterations of node n
2
until there are no more characters to be printed on the line (which may be indicated in a variety of ways not specifically shown in model
830
.) At the end of the text line, the imager drops down vertically by the font depth distance
13
and transitions to node n
3
. At node n
3
one of two things can happen. If there are remaining text lines, the imager enters “carriage return” state n
4
to return to the left margin of the page and back to n
1
. Or, if there are no more characters or the imager has reached the bottom right corner of the page, the imager transitions from n
3
to the final node n
F
. Node n
2
may be considered the “printing” state, where text lines are produced. Additional description of how an image synthesizer functions in the DID framework with model
830
may be found in U.S. Pat. No. 5,526,444 at cols. 5-7 and the description accompanying
FIGS. 15-18
therein, and in U.S. Pat. No. 5,689,620, at col. 36-40 and the description accompanying
FIG. 14
at col. 39-40 therein.
The attributes on the transitions in Model
830
of
FIG. 30
have been simplified in this illustration. Each directed transition into n
2
, for example, has the associated 4-tuple of attributes shown in FIG.
29
: a transition probability, a message string identifying a symbol or character in the English language, a corresponding character template in the font to be used in the page image, and a vector displacement, shown as (w
t
,
0
) in
FIG. 30
that indicates the (x,y) position in the image that the path takes next. For node n
2
, displacement (w
t
,
0
) indicates a horizontal distance w that is the set width of the template. The set width of a template specifies the horizontal (x-direction) distance on the text line tha
Bloomberg Dan S.
Minka Thomas P.
Popat Ashok C.
Dastouri Mehrdad
Xerox Corporation
LandOfFree
Document image decoding using text line column-based... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Document image decoding using text line column-based..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Document image decoding using text line column-based... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3213048