Method of estimating at least one run-based font attribute...

Image analysis – Pattern recognition – Feature extraction

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S173000

Reexamination Certificate

active

06178263

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to methods for improving the quality of images reproduced by Optical Character Recognition (“OCR”) systems by determining the font attributes of characters with or without character identification information.
2. Discussion of Related Art
Existing OCR systems have the capability of recognizing letters printed in different fonts with acceptable accuracy. Methods and apparatuses that duplicate letters and ensure that they are substantially similar when compared to originals normally require original font information concerning the most important font attributes to be available beforehand, i.e., before duplication. The most important font attributes include serifness (serif vs. san serif), posture (Roman vs. Italic), character x-height, ascender height, and stroke thickness.
SUMMARY OF THE INVENTION
It is an object of the invention to improve the output appearance of characters or letters reproduced by an optical character recognition system.
It is a further object of the invention to determine font attributes including character x-height, ascender height, stroke thickness, posture, serifness and proportion.
It is another object of the invention to determine original font attributes of characters with or without information regarding the original characters being available.
Thus, in accordance with one aspect of the present invention, the shortcomings of existing OCR systems are overcome by a method which estimates font attributes of a group of characters. The group is typically a word or a line of text. The use of a group of characters, instead of individual characters, improves the reliability of estimating font attributes. If only an individual character within a line of text or word were to be used, the estimates deduced may be inaccurate depending upon the character chosen for the estimate. This arises because some characters appear to change very little when a different font is used, making it difficult to determine one font from another using estimates derived from analysis of such characters.
In general, a character can be thought of as comprising a number of “bits”. When shown in its “bit-mapped” from a character's many bits are depicted. Such a bit-map may then be “run-length” encoded using methodologies similar to those disclosed in CCITT Recommendation 1.6, Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus (1984) incorporated herein by reference. Further, run-length encoded character information can then be used to determine a character's “strokes.” Accordingly, “runs” are used in run-based embodiments of the present invention and “strokes” are used in stroke-based embodiments.
A character-based embodiment of the present invention comprises forming a histogram from heights of the characters and extracting x-height and ascender height from this histogram.
Font attributes may be estimated using character-based, run-based and/or stroke-based methodologies. A run-based embodiment of the present invention comprises forming “runs”, from bitmaps of characters and estimating a stroke thickness and posture. The stroke thickness is obtained as the median of the run lengths. The posture is determined by analyzing histograms containing slant information extracted from the runs. A stroke-based embodiment comprises decomposing each character of the group of characters into strokes, classifying the strokes, estimating font information which includes stroke thickness, posture and serifness from the parameterized and classified strokes, storing the font information in histograms and analyzing them.
These and other aspects of the invention are described in the following detailed description of preferred embodiments of the invention.


REFERENCES:
patent: 4897880 (1990-01-01), Wilber et al.
patent: 5048113 (1991-09-01), Yamagata et al.
patent: 5159645 (1992-10-01), Kumagai
patent: 5245674 (1993-09-01), Cass et al.
patent: 5253307 (1993-10-01), Wayner et al.
patent: 5278920 (1994-01-01), Bernzott et al.
patent: 5321768 (1994-06-01), Fenrich et al.
patent: 5359673 (1994-10-01), De La Beaujardiere
patent: 5369715 (1994-11-01), Tanaka et al.
patent: 5408540 (1995-04-01), Zlotnick
patent: 5513277 (1996-04-01), Huttenlocher
patent: 5544259 (1996-08-01), McCubbrey
patent: 5668891 (1997-09-01), Fan et al.
patent: 5692069 (1997-11-01), Hanson
patent: 5883974 (1999-03-01), Fan et al.
patent: 6038342 (2000-03-01), Bernzott et al.
Kahan et al, “On the Recognition of Printed Characters of Any Font and Size”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-9, No. 2, Mar. 1987, pp. 274-288.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of estimating at least one run-based font attribute... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of estimating at least one run-based font attribute..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of estimating at least one run-based font attribute... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2540071

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.