Dynamic programming operation with skip mode for text line...

Image analysis – Pattern recognition – Template matching

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S229000, C382S233000, C704S243000, C704S256000, C707S793000

Reexamination Certificate

active

06594393

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates generally to image decoding and image recognition techniques, and in particular to such techniques using stochastic finite state networks such as Markov sources. In particular, the present invention provides a technique for improving the efficiency of decoding text line images using a stochastic finite state network.
Automatic speech recognition systems based on stochastic grammar frameworks such as finite state Markov models are known. Examples are described in U.S. Pat. No. 5,199,077 entitled “Wordspotting For Voice Editing And Indexing”, and in reference [2], both of which use hidden Markov models (HMMs). Bracketed numerals identify referenced publications listed in the Appendix of Referenced Documents.
Stochastic grammars have also been applied to document image recognition problems and to text recognition in particular. See, for example, the work of Bose and Kuo, identified in reference [1], and the work of Chen and Wilcox in reference [2] which both use hidden Markov models (HMMs) for word or text line recognition. See also U.S. Pat. No. 5,020,112, issued to P. A. Chou and entitled “Image Recognition Using Two-Dimensional Stochastic Grammars.”
U.S. Pat. No. 5,321,773 (hereafter the '773 DID patent), issued to Kopec and Chou, discloses a document recognition technique known as Document Image Decoding (hereafter, DID) that is based on classical communication theory. This work is further discussed in references [2], [4] and [5]. The DID model
800
, illustrated in
FIG. 7
, includes a stochastic message source
810
, an imager
811
, a channel
812
and a decoder
813
. The stochastic message source
810
selects a finite string M from a set of candidate strings according to a prior probability distribution. The imager
811
converts the message into an ideal binary image Q. The channel
812
maps the ideal image into an observed image Z by introducing distortions due to printing and scanning, such as skew, blur and additive noise. Finally, the decoder
813
receives observed image Z and produces an estimate {circumflex over (M)} of the original message according to a maximum a posteriori (MAP) decision criterion. Note that in the context of DID, the estimate {circumflex over (M)} of the original message is often referred to as the transcription of observed image Z.
The structure of the message source and imager is captured formally by combining their functions into a single composite image source
815
, as shown by the dotted lines in FIG.
7
. Image source
815
models image generation using a Markov source. A Markov source is a stochastic finite-state automaton that describes the spatial layout and image components that occur in a particular class of document images as a regular grammar, representing these spatial layout and image components as a finite state network. A general Markov source model
820
is depicted in FIG.
8
and comprises a finite state network made up of a set of nodes and a set of directed transitions into each node. There are two distinguished nodes
822
and
824
that indicate initial and final states, respectively. A directed transition t between any two predecessor (L
t
) and successor (R
t
) states in the network of
FIG. 8
has associated with it a 4-tuple of attributes
826
comprising a character template, Q, a label or message string, m, a transitional probability, &agr;, and a two-dimensional integer vector displacement, &Dgr;. The displacement indicates a horizontal distance that is the set width of the template. The set width of a template specifies the horizontal (x-direction) distance on the text line that the template associated with this transition occupies in the image.
U.S. Pat. No. 5,689,620 extended the principles of DID and the use of Markov source models to support the automatic supervised training of a set of character templates in the font of a particular collection or class of documents, thereby enabling the decoding of font-specific documents for which templates are not otherwise easily available. The use of a Markov source model to describe the spatial layout of a 2D document page and the arrangement of image components such as lines, words and character symbols on the page provides a great deal of flexibility for describing a wide variety of document layouts. This flexibility, combined with automatic training of character templates in a specific font, provide a powerful technological advantage in the field of automatic document recognition. DID enables the decoding (recognition) of any type of character symbols in virtually any type and size of font and in any type of 2D spatial layout.
The powerful flexibility offered by the DID system is limited in actual use by the time complexity involved in the decoding process. Decoding involves the search for the path through the finite state network representing the observed image document that is the most likely path that would have produced the observed image. The '773 DID patent discloses that decoding involves finding the best (MAP) path through a three-dimensional (3D) decoding trellis data structure indexed by the nodes of the model and the coordinates of the image plane, starting with the initial state and proceeding to the final state. Decoding is accomplished by a dynamic programming operation, typically implemented as a Viterbi algorithm. A general description of the implementation of the Viterbi algorithm in the context of Document Image Decoding is omitted here and is provided in the discussion of an implementation of the present invention in the Detailed Description below.
The dynamic programming operation used to decode an image involves computing the probability that the template of a transition corresponds to a region of the image to be decoded in the vicinity of the image point. This template-image probability is represented by a template-image matching score that indicates a measurement of the match between a particular template and the image region at the image point. Branches in the decoding trellis are labeled with the matching scores. The size and complexity of the image as defined by the model (i.e., the number of transitions) and the number of templates to be matched are major factors in computation time. Indeed, the time complexity of decoding using a two-dimensional image source model and a dynamic programming operation, is O(∥&bgr;∥×H×W), where ∥&bgr;∥ is the number of transitions in the source model and H and W are the image height and width, respectively, in pixels.
There are two factors that influence this complexity. The first is finding the baselines of horizontal text lines. Although decoding computation grows only linearly with image size, in absolute terms it can be prohibitive because, in effect, each row of pixels in the image is evaluated (decoded) as the baseline of a possible horizontal text line. For example, a two-dimensional image of a column of black text represented in a single known font printed on an 8.5×11 inch page of white background and scanned at 300 dpi resolution causes line decoding to occur 3300 times (300 dpi×11 inches).
A second key bottleneck in the implementation of the dynamic programming decoding procedure is the computation of template-image matching scores. A score is the measurement of the match between a template and a 2D region of the image. Each template is matched at each position of a horizontal row of pixels in the image during text line decoding. If there are 100 templates and 1500-2000 x-pixel positions in a line, then each template has to be matched at each x position on the line, requiring a minimum of 10
5
actual scores for the line. When the position of an actual baseline is not known exactly, each template could be matched at as many as five vertical pixel positions as well. Thus it was estimated that actual template-image matching scoring for each text line in an image required at least 10
6
image-template scores. In early implementati

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Dynamic programming operation with skip mode for text line... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Dynamic programming operation with skip mode for text line..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic programming operation with skip mode for text line... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3011238

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.