System for performing collective symbol-based compression of a c

Facsimile and static presentation processing – Static presentation processing – Data corruption – power interruption – or print prevention

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

358 12, 358 11, G06T 1500, G05B 1100

Patent

active

060209720

ABSTRACT:
A method and apparatus for compressing a corpus of document images into a collective tokenized representation. Initially, documents in the corpus are individually compressed into a document tokenized format. A document image in the document tokenized format is represented using a symbol table and a table of positions. Each symbol in the symbol table is a shape in the original document image. The positions in the table of positions indicates where the symbols in the symbol table are placed to form the document image. Subsequently, the individual symbol tables of each document in the corpus are assembled to form clusters of similar shapes. These clusters are then analyzed to identify the degree of interrelationship between the symbols in the individual symbol tables. Individual document symbol tables with a large number of recurring symbols are grouped together. For each of the groups of symbol tables, a collective symbol table is computed. The collective symbol table improves the compression ratio of a corpus by eliminating redundant shapes appearing in the individual document symbol tables. Also, the collective symbol table advantageously identifies groupings of documents in the corpus which are related because a significant number of similar shapes are used in each of the documents.

REFERENCES:
patent: 5303313 (1994-04-01), Mark et al.
patent: 5305433 (1994-04-01), Ohno
patent: 5321770 (1994-06-01), Huttenlocher et al.
patent: 5331556 (1994-07-01), Black, Jr. et al.
patent: 5504843 (1996-04-01), Catapano et al.
patent: 5539841 (1996-07-01), Huttenlocher et al.
patent: 5778361 (1998-07-01), Nanjo et al.
patent: 5884014 (1999-03-01), Huttenlocher et al.
patent: 5911140 (1999-06-01), Tukey et al.
patent: 5940822 (1999-08-01), Haderle et al.
U.S. Patent Application No. 08/575,305, entitled "Classification of Scanned Symbols into Equivalence Classes," to Daniel Davies, filed Dec. 20, 1995.
U.S. Patent Application No. 08/575,313, entitled "Consolidation Of Equivalence Classes Of Scanned Symbols," to Daniel Davies, filed Dec. 20, 1995.
U.S. Patent Application No. 08/652,864 entitled "Fontless Structured Document Image Representations for Efficient Rendering," to Daniel R. Huttenlocher et al., filed May 23, 1996.
U.S. Patent Application No. 08/655,546 entitled "Method and Apparatus for Comparing Symbols Extracted from Binary Images of Text" William J. Rucklidge et al., filed May 30, 1996.
U.S. Patent Application No. 08/752,497, entitled "Using Fontless Structured Document Image Representations To Render Displayed And Printed Documents At Preferred Resolutions," to Daniel R. Huttenlocher et al., filed Nov. 8, 1996.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for performing collective symbol-based compression of a c does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System for performing collective symbol-based compression of a c, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for performing collective symbol-based compression of a c will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-942197

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.