Method of downsampling documents

Image analysis – Image transformation or preprocessing

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06389178

ABSTRACT:

TECHNICAL FIELD
The invention relates to downsampling documents, and in particular to downsampling using optical character recognition, font substitution and equalization.
BACKGROUND OF THE INVENTION
Input/output devices in modern telecommunications and computer systems are devices by which information (e.g. text, data, video, images, etc.) can be transferred to or from the system or be displayed for further processing or interpretation, including interpretation by people using the system. The information of interest is termed a “document.” A document may be made manifest or rendered in variety of forms. For example, a document could be rendered in an analog fashion, e.g. on paper, microfiche, or 35 mm film. Alternatively, the document may be rendered as a digital bit map, e.g. a screen dump, or the document may be rendered in a character representation, e.g. ASCII, Latin 1, unicode or in a markup language such as LATEX, SGML or postscript. It is possible to convert a document in one representation to a document in another representation; however, the conversion may result in a loss of information or in the introduction of noise (e.g. the loss of resolution in documents produced by fax machines).
Importantly, a document often needs to be sampled at different rates in order to be rendered on different output devices. For example, laser printers typically produce tangible paper outputs with a resolution of 300-600 dots per inch (dpi) while the resolution of a fax output is 100-200 dpi, and the resolution of bit map terminals is 75-100 dpi. To output a document where the resolution of the document is higher than the resolution of the output device, it is typically necessary to downsample the higher resolution document so as to output only a portion of the information in the higher resolution document. Standard downsampling techniques include low pass filtering and decimation. However, these techniques do not work well for very low resolution devices.
Another downsampling technique is font substitution. This method is applied only to documents in a text or character representation (i.e. a representation in which the sequence and location of characters is known). A font is a representation of a character set (e.g. an alphabet). A font has a number of attributes: the family (e.g. Times Roman, Helvetica, etc.); the face (e.g. bold, italics, etc.); the size (e.g. 12 point, 18 point); and the resolution of the output device via which the document will be rendered. In font substitution, a character in the higher resolution document is identified, and the character is output to the lower resolution device in a font designed to be “good looking” at the lower resolution. In short, in font substitution one or more of the font attributes are changed before the characters in the document are output to the lower resolution device. The problem with downsampling by font substitution is the need to know, reliably, the position and identity of the characters so that an appropriate substitute can be selected. Such information is available in documents represented in LATEX, SGML or in some optical character recognition (OCR) systems. However, this information is typically not readily available in many types of documents, e.g. faxes. Thus, there is a need for improved methods of downsampling in order to output documents on low resolution devices.
SUMMARY OF THE INVENTION
The aforementioned problems are solved, in accordance with the principles of the invention, by a method of downsampling a component in a document where the component is in a character representation and has an associated reliability measure. The reliability measure indicates the probability that the associated character representation correctly identifies the component. The method downsamples the component by a first method of downsampling if the reliability measure is above a threshold and by a second method of downsampling otherwise.
In preferred embodiments of the invention the first method of downsampling is so-called font substitution, and the second method is so-called decimation. In a further aspect of the invention, decimation is combined with nonlinear filtering in downsampling the component.


REFERENCES:
patent: 3496543 (1970-02-01), Greenly
patent: 5359671 (1994-10-01), Rao
patent: 5418864 (1995-05-01), Murdock et al.
Mori et al. “Historical Review of OCR Research and Development” Proceedings of the IEEE vol. 80. No. 7. Jul. 1992.*
Govindan et al. “Character Recognition—A Review” Pattern Recognition vol. 23. No. 7 pp 671-683, 1990.*
Endoh et al. “JBIG ISO JTC1/SC2/WG8 & CCITT SG VI” Jul., 1989 JBIG-N.*
O'Gorman “Image and Document Processing Techniques for the RighPages Electronic Library System” 0-8186-2915-0/92 IEEE.*
S. Mori et al., “Historical Review of OCR Research and Development,”Proceedings of the IEEE, vol. 80, No. 7, 1029-1057 (Jul., 1992).
V. K. Govindan et al., “Character Recognition—A Review,”Pattern Recognition, vol. 23, No. 7, 67-683 (1990).
C.F.N. Cowan et al., “Non-Linear System Modelling: Concept and Application,”Proc. 1984 ICASSP, 4561-4564 (1984).
T. Endoh et al., “Progressive Reduction Standard for Bi-level Images,”JBIG-ISO JTCI/SC2/WG8 CCITT SG VIII(Jul. 1989).
L. O'Gorman, “Image and Document Processing Techniques for the RightPages Electronic Library System,”Proc. 1992 Int'l Conf. on Pattern Recognition, vol. II, 260-263 (1992).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of downsampling documents does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of downsampling documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of downsampling documents will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2870809

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.