Image analysis – Image compression or coding
Reexamination Certificate
2000-05-19
2003-09-30
Boudreau, Leo (Department: 2621)
Image analysis
Image compression or coding
C382S242000, C358S001150
Reexamination Certificate
active
06628837
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to document image encoding and decoding, and more particularly, to a method and apparatus for improving accuracy of optical character recognition (OCR).
2. Description of Related Art
Input scanners have been developed for uploading hardcopy documents into electronic document processing systems. These scanners typically convert the appearance of a hardcopy document into a raster formatted, digital data stream, thereby providing a bitmapped representation of the hardcopy document appearance. OCR systems such as Textbridge produced by ScanSoft, Inc. convert bitmapped document appearances into corresponding symbolic encodings. Unfortunately, OCR systems are not immune to making errors when inferring a correlation between a particular bitmap pattern and a corresponding document encoding (e.g., ASCII).
This problem has been address by designing special fonts such as OCR-B fonts, where characters that are likely to be confused (e.g., 1, l, and I) are given distinctly different typographic features. This allows an OCR system to more accurately infer the correlation between a bitmap pattern and its corresponding document encoding. In addition, Plumb et al. disclose in “Tools for Publishing Source Code via OCR,” 1997, printing the primary channel of a hardcopy document by replacing spaces and tabs with printable characters. Also, U.S. Pat. No. 4,105,997 discloses a method for using checksums of text in a document to locate errors during OCR.
This problem has also been addressed in U.S. Pat. No. 5,486,686, which discloses a document processing system in which human readable hardcopy renderings of a document are integrated with complete or partial electronic representations of the document and/or its content. The electronic representation provides an “assist channel” that encodes information about the document or computed from the document. The assist channel is defined using printable machine-readable codes. In one illustrated example, the assist channel can be defined using compact glyph codes at the bottom of a document.
More specifically, an “assist channel” of a hardcopy document is a machine readable encoding of side information that aids an OCR application in decoding the contents of a primary channel. The “primary channel” of a hardcopy document includes the human readable information of document. The primary channel, which cannot be modified and is slightly error prone to OCR processing, carries most of the information content of the document. One use of the assist channel is to encode information that assists in the identification of failures of an OCR application in decoding the contents of a primary channel as disclosed for example in U.S. Pat. Nos. 5,625,721; 5,748,807; and 6,047,093.
Even with these advances that improve OCR processing using an assist channel, it continues to be desirable to provide an assist channel encoding that balances and improves the tradeoff between the amount of information encoded in the assist channel and the improved accuracy of the OCR system given the encoded information. At one extreme, the assist channel can contain as much information as the primary channel (i.e., redundant information). At the other extreme, the assist channel can simply contain a single checksum of the contents of a document. There exists therefore the desirability to provide an assist channel encoding that compensates for the failure of the primary channel during OCR processing yet is compact relative to the primary channel.
SUMMARY OF THE INVENTION
In accordance with the invention, there is provided a method, and apparatus therefor, for generating image data for rendering on a hardcopy document. A primary set of symbol data is identified that provides a first channel of human readable information to be rendered on the hardcopy document. A secondary set of encoding data is computed from the primary set of symbol data. The secondary set of encoding data provides an assist channel of machine readable information that is rendered on the hardcopy document.
In accordance with one aspect of the invention, the assist channel is encoded for a selected line of the primary set of symbol data, having an ordered set of c
1
, c
2
, c
3
, . . . c
1
, c
i
, c
i+1
, . . . c
n
symbols, by: sequentially computing a hash of each of the symbols of the selected line with a state change function H, where the state change function H produces a hash h
i
that is at least a function of the current symbol in the selected line c
i
and the preceding computed hash h
i−1
; and computing a set of guard values for each of the symbols of the selected line with a guard extractor function G, where the guard extractor function G produces a guard value g
i
that is at least a function of the computed hash h
i
; the computed set of guard values defining the second set of encoding data for the selected line.
REFERENCES:
patent: 4105997 (1978-08-01), McGinn
patent: 4728984 (1988-03-01), Daniele
patent: 5321773 (1994-06-01), Kopec et al.
patent: 5486686 (1996-01-01), Zdybel, Jr. et al.
patent: 5526444 (1996-06-01), Kopec et al.
patent: 5594809 (1997-01-01), Kopec et al.
patent: 5625721 (1997-04-01), Lopresti et al.
patent: 5689620 (1997-11-01), Kopec et al.
patent: 5748807 (1998-05-01), Lopresti et al.
patent: 5995668 (1999-11-01), Corset et al.
patent: 6047093 (2000-04-01), Lopresti et al.
Costello et al. “Applications of Error-Control Coding,” IEEE Transactions on Information Theory, vol. 44, No. 6, Oct. 1998, pp. 2531-2560.
Dobrushin “Shannon's Theorems for Channels with Synchronization Errors,” (Translated from Problemy Peradachi Informatsii, vol. 3, No. 4, pp. 18-36, 1967) UDC 621.391.13, pp. 11-26.
Electronic Frontier Foundation Cracking DES: Secrets of Encryption Research, Wiretap Politics, and Chip Design, Distributed by O'Reilly & Associates, May 1998, pp. 4-1 to 4-14.
Hagenauer “Rate-Compatible Punctured Convolutional Codes (RCPC Codes) and Their Applications,” IEEE Transactions on Communications, vol. 36, No. 4, Apr. 1998, pp. 389-400.
Jelinek “Fast Sequentiual Decoding Algorithm Using a Stack,”IBM Journal of Research and Development, vol. 13, Nov. 1969, pp. 675-685.
Jones et al. “Integrating Multiple Knowledge Sources in a Bayesian OCR Post-Processor,” Proceedings of ICDAR 91, Saint-Malo, France; vol. 2, pp. 925-933.
Kam et al. “Document Image Decoding by Heuristic Search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, No. 9, Sep. 1996, pp. 945-950.
Kopec et al. “Document Image Decoding Using Markov Source Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, No. 6, Jun. 1994, pp. 602-617.
Kopec “Multilevel Character Templates for Document Image Decoding,” Proceedings of Document Recognition IV, San Jose, California; Feb. 12-13, 1997, SPIE vol. 3027, pp. 168-177.
Lee Convolutional Coding: Fundamentals and Applications, Artech House, Boston, 1997, pp. 101-116 and pp. 225-244.
Loprestri et al. “Certifiable Optical Character Recognition,” IEEE Proceedings of the 2ndInternational Conference on Cocument Analysis and Recognition, pp. 432-435.
Peterson et al. Error-Correcting Codes, Second Edition, The Massachusetts Institute of Technology, 1972, pp. 269-309.
Schlegel Trellis Coding, IEEE Press, New York, 1997, pp. 43-89 and 233-262.
Schulman “Asymptotically Good Codes Correcting Insertions, Deletions, and Transpositions,” IEEE Transactions on Information Theory, vol. 45, No. 7, Nov. 1999, pp. 2552-2557.
Stambler “Memoryless Channels with Synchronization Errors: The General Case,” (Translated from Problemy Peredachi Informatsii, vol. 6, No. 3, pp. 43-49, Jul.-Sep., 1970) UDC 621.391.1, pp. 223-237.
Ullman “On the Capabilities of Codes to Correct Synchronization Errors,” IEEE Transactions on Information Theory, vol. IT-13, No. 1, Jan. 1967, pp. 95-105.
Witten et al. Arithmetic Doding for Data Compression, Comm. ACM vol. 30, No. 6, 1987. pp. 520-540.
Greene Daniel H.
Popat Ashok C.
Boudreau Leo
Dang Duy M.
Inouye Patrick J.S.
Xerox Corporation
Zell Thomas B.
LandOfFree
Assist channel coding with convolution coding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Assist channel coding with convolution coding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Assist channel coding with convolution coding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3024068