Image analysis – Learning systems – Trainable classifiers or pattern recognizers
Reexamination Certificate
2000-12-28
2004-02-10
Mehta, Bhavesh M. (Department: 2625)
Image analysis
Learning systems
Trainable classifiers or pattern recognizers
C382S198000, C382S200000, C382S203000, C382S242000, C382S325000
Reexamination Certificate
active
06690821
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of Invention
This invention relates to systems and methods for automatically processing captured document images. More particularly, this invention relates to systems and methods for automatically recognizing the font of printed text.
2. Description of Related Art
A document image may be captured and converted to digital signals (pixels) by an image capture device, such as a scanner or a facsimile machine. Subsequent processing of these digital signals may include outputting to an image output terminal such as a viewing device or printer, data compression to a more compact format, or optical character recognition. A useful step in each of these exemplary subsequent processes is the automatic determination of the text font used in the document. Examples of the text fonts include, Postscript 10-point Helvetica, 12-point Helvetica-Bold, 11-point Times-Roman, and the like. Such text can be considered a connected component. A connected component is an “island” of black pixels in a binary scan of a document, that is a set of black pixels, connected diagonally or orthogonally, one to another, and surrounded by white.
SUMMARY OF THE INVENTION
The methods and systems of this invention can automatically determine the text fonts in a captured image.
The methods and systems of this invention provide automatic determination of the text fonts in a captured image in a simple accurate, and language independent manner with the ability to work with smaller samples of text than previous methods.
In various exemplary embodiments of the methods and systems according to this invention, training data is used to determine characteristics of a sample of the captured image.
In one exemplary embodiment of the methods and systems according to this invention, the training data is divided into groups, according to sizes of bounding boxes.
In one exemplary embodiment of the methods and systems according to this invention, the training data for each group processed to give the probability for each chain code segment and the probability of each successive pair of chain code segments.
In various exemplary embodiments of the methods and systems according to this invention, the training data includes training sets of various font types.
In various exemplary embodiments of the methods and systems according to this invention, chain code segments for each connected component's boundary in the sample of the captured image data are determined. A chain code is a sequence of north/south/east/west directions taken while traversing the boundary of a connected component.
In one exemplary embodiment of the methods and systems according to this invention, for each training set, training data is grouped according to bounding boxes, with each group having a bounding box size.
In various exemplary embodiments of the methods and systems according to this invention, the font type of the captured image data is determined from the determined probabilities of the training data.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of various exemplary embodiments.
REFERENCES:
patent: 5091976 (1992-02-01), Murayama
patent: 5182777 (1993-01-01), Nakayama et al.
patent: 5245674 (1993-09-01), Cass et al.
patent: 5253307 (1993-10-01), Wayner et al.
patent: 5315668 (1994-05-01), O'Hair
patent: 6327385 (2001-12-01), Kamitani
patent: 6337924 (2002-01-01), Smith
patent: 6552728 (2003-04-01), Moore et al.
English Abstract of JP 58-222384, Dec. 24, 1983.*
G.E. Kopec, Least-squares font metric estimation from images, IEEE Transactions on Image Processing, Oct. 1993, vol 2, iss 4, p 510-519.*
Hochberg et al, Automatic script identification from images using cluster-based templates, Proceedings of the Third International Conference on Document Analysis and Recognition, Aug. 14-16, 1995, vol 1, p 378-381.*
Zramdini et al, A Study of document image degradation effects on font recognition, Proceedings of the Third International Conference on Document Analysis and Recognition, Aug. 14-16, 1995, vol 2, p 740-743.*
Tanprasert et al, Thai type style recognition, Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, Jul. 1999, vol 4, p 336-339.*
La Manna et al, Optical font recognition for multi-font OCR and document processing, Proceedings of the Tenth International Workshop on Database and Expert Systems Applications, Sep. 1-3, 1999, p 549-553.*
Yong Zhu et al, Font recognition based on global texture analysis, Proceedings of the Fifth International Conference on Document Analysis and Recognition, Sep. 20-22, 1999, p 349-352.*
Min-Chul Jung et al, Multifont classification using typographical attributes, Proceedings of the Fifth International Confernce on Document Analysis and Recognition, Sep. 20-22, 1999, p 353-356.*
Spitz, “Determination of the Script and Language Content of Document Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Inc., New York, vol. 19, No. 3, Mar. 1, 1997, pp. 235-245.
Kimura et al., “Improvement of Handwritten Japanese Character Recogintion Using Weighted Direction Code Histogram”, Pattern Recognition, Pergamon Press, Inc., New York, vol. 30, No. 8, Aug. 1, 1997, pp. 1329-1337.
Hongwei et al., “Font Recognition and Contextual Processing For More Accurate Text Recognition”, Proceedings of the 4thInternational Conference on Document Analysis and Recognition, Germany, Aug. 1997, Proceedings of the ICDAR, Los Alamitos, IEEE Comp. Soc. US, vol. II, Aug. 18, 1997, pp. 39-44.
Zramdini et al., “Optical Font Recognition Using Typographical Features”, IEEE Ttransactions on Pattern Analysis and Machine Intelligence, IEEE Inc., New York, vol. 20, No. 8, Aug. 1, 1998, pp. 877-882.
Atici et al., “A Heuristic Algorithm for Optical Character Recognition of Arabic Script”, Signal Processing European Journal Devoted to the Methods and Applications of Signal Processing, Elsevier Science Publishers B.V., vol. 62, No. 1, Oct. 1, 1997, pp. 87-99.
Bern Marshall W.
Goldberg David
Mehta Bhavesh M.
Oliff & Berridg,e PLC
Sukhaphadhana Christopher
Xerox Corporation
LandOfFree
Determining the font of text in an image does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Determining the font of text in an image, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Determining the font of text in an image will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3279501