Extracting information from symbolically compressed document...

Image analysis – Pattern recognition – Context analysis or word recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Extracting information from symbolically compressed document... Extracting information from symbolically compressed document...

: 1999-04-08
: 2003-12-02
: Mehta, Bhavesh M. (Department: 2625)
: Image analysis
: Pattern recognition
: Context analysis or word recognition

: C382S160000, C382S177000, C382S228000, C382S232000, C707S793000, C707S793000, C707S793000
: Reexamination Certificate
: active
: 06658151
: ABSTRACT:

The present invention relates to the field of document image processing, and more particularly to processing document images that have been symbolically compressed.
BACKGROUND OF THE INVENTION
Storage and transmission of electronic document images have become increasingly prevalent, spurring deployment and standardization of new and more efficient document compression techniques. Symbolic compression of document images, for example, is becoming increasingly common with the emergence of the JBIG2 standard and related commercial products. Symbolic compression techniques improve compression efficiency by 50% to 100% in comparison to the commonly used Group 4 compression standard (CCITT Specification T.6). A lossy version of symbolic compression can achieve 4 to 10 times better compression efficiency than Group 4.
In symbolic compression, document images are coded with respect to a library of pattern templates. Templates in the library are typically derived by grouping (clustering) together connected components (e.g., alphabetic characters) in the document that have similar shapes. One template is chosen or generated to represent each cluster of similarly shaped connected components. The connected components in the image are then represented by a sequence of template identifiers and their spatial offsets from the preceding component. In this way, an approximation of the original document is obtained without duplicating storage for similarly shaped connected components. Minor differences between individual components and their representative templates, as well as all other components which are not encoded in this manner, are optionally coded as residuals.
Many document management activities, such as document classification, duplicate detection and language identification, are based on the semantic content of document images. Consequently, in traditional document management systems, compressed document images are first decompressed then subjected to optical character recognition (OCR) to recover the semantic information needed for classification, language identification and duplicate detection. In the context of a database of symbolically compressed document images, the need to decompress and perform OCR consumes considerable processing resources. Also, because OCR engines are usually limited in the number and variety of typefaces they recognize, recovery of semantic information through conventional OCR techniques may not be possible for some symbolically compressed documents.
SUMMARY OF THE INVENTION
A method and apparatus for extracting information from symbolically compressed document images are disclosed. An input document image is represented by a sequence of template identifiers to reduce storage consumed by the input document image. The template identifiers are replaced with alphabet characters according to language statistics to generate a text string representative of text in the input document image. In one embodiment, the template identifiers are replaced with alphabet characters according to a hidden Markov model. Also, a conditional n-gram technique may be used to obtain indexing terms for document matching and other applications.
These and other features and advantages of the invention will be apparent from the accompanying drawings and from the detailed description that follows below.

REFERENCES:
patent: 4610025 (1986-09-01), Blum et al.
patent: 5062143 (1991-10-01), Schmitt
patent: 5418951 (1995-05-01), Damashek
patent: 5452442 (1995-09-01), Kephart
patent: 5467425 (1995-11-01), Lau et al.
patent: 5752051 (1998-05-01), Cohen
patent: 5809172 (1998-09-01), Melen
patent: 5809476 (1998-09-01), Ryan
patent: 5982929 (1999-11-01), Han et al.
patent: 6011905 (2000-01-01), Huttenlocher et al.
patent: 6038342 (2000-03-01), Bernzott et al.
patent: 6052481 (2000-04-01), Grajski
patent: 6088484 (2000-07-01), Mead
patent: 6092038 (2000-07-01), Kanevsky
patent: 6118899 (2000-09-01), Bloomfield et al.
patent: 6157905 (2000-12-01), Powell
patent: 6617369 (2000-12-01), Schulze
patent: 6169969 (2001-01-01), Cohen
patent: 6311152 (2001-10-01), Bai et al.

Affiliated with

Hull Jonathan J.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Lee Dar-Shyang

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Blakely , Sokoloff, Taylor & Zafman LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Desire Gregory

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Mehta Bhavesh M.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Ricoh Co. Ltd.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Extracting information from symbolically compressed document... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Extracting information from symbolically compressed document..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Extracting information from symbolically compressed document... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3103795

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure