Image analysis – Image segmentation – Distinguishing text from other regions
Reexamination Certificate
1999-11-17
2003-09-02
Jones, Andrew W. (Department: 2621)
Image analysis
Image segmentation
Distinguishing text from other regions
C382S180000, C382S173000, C382S224000, C382S264000
Reexamination Certificate
active
06614930
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to systems that recognize patterns in digitized images and more particularly to such systems that isolate symbols such as text characters in video data streams.
Real-time broadcast, analog tape, and digital video are important for education, entertainment, and a host of multimedia applications. With the size of video collections being in the millions of hours, technology is needed to interpret video data to allow this material to be used and accessed more effectively. Various such enhanced uses have been proposed. For example, the use of text and sound recognition can lead to the creation of a synopsis of an original video and the automatic generation of keys for indexing video content. Another range of applications relies on rapid real-time classification of text and/or other symbols in broadcast (or multicast, etc.) video data streams. For example, text recognition can be used for any suitable purpose, for example video content indexing.
Various text recognition techniques have been used to recognize digitized patterns. The most common example is document optical character recognition (OCR). The general model for all of these techniques is that an input vector is derived from an image, the input vector characterizing the raw pattern. The vector is mapped to one of a fixed number or range of symbol classes to “recognize” the image. For example, the pixel values of a bitmap image may serve as an input vector and the corresponding classification set may be an alphabet, for example, the English alphabet. No particular technique for pattern recognition has achieved universal dominance. Each recognition problem has its own set of application difficulties: the size of the classification set, the size of the input vector, the required speed and accuracy, and other issues. Also, reliability is an area that cries out for improvement in nearly every area of application.
As a result of the foregoing shortcomings, pattern recognition is a field of continuous active research, the various applications receiving varying degrees of attention based on their respective perceived merits, such as utility and practicability. Probably the most mature of these technologies is the application of pattern recognition to text characters, or optical character recognition (OCR). This technology has developed because of the desirability and practicality of converting printed subject matter to computer-readable characters. From a practicality standpoint, printed documents offer a data source that is relatively clear and consistent. Such documents are generally characterized by high-contrast patterns set against a uniform background and are storable with high resolution. For example, printed documents may be scanned at arbitrary resolution to form a binary image of the printed characters. Also, there is a clear need for such an application of pattern recognition in that the conversion of documents to computer-based text avoids the labor of keyboard transcription, realize economy in data storage, permits documents to be searched, etc.
Some application areas have received scant attention because of the attending difficulty of performing symbol or character classification. For example, the recognition of patterns in video streams is an area that is difficult due to at least the following factors. Characters in a video stream tend to be presented against spatially non-uniform (sometimes, temporally variable) backgrounds, with poor resolution, and low contrast. Recognizing characters in a video stream is therefore difficult and no reliable methods are known. In addition, for some applications, as disclosed in the foregoing related applications at least, fast recognition speeds are highly desirable.
Systems and methods for indexing and classifying video have been described in numerous publications, including: M. Abdel-Mottaleb et al., “CONIVAS: Content-based Image and Video Access System,” Proceedings of ACM Multimedia, pp. 427-428, Boston (1996); S-F. Chang et al., “VideoQ: An Automated Content Based Video Search System Using Visual Cues,” Proceedings of ACM Multimedia, pp. 313-324, Seattle (1994); M. Christel et al., “Informedia Digital Video Library,” Comm. of the ACM, Vol. 38, No. 4, pp. 57-58 (1995); N. Dimitrova et al., “Video Content Management in Consumer Devices,” IEEE Transactions on Knowledge and Data Engineering (November 1998); U. Gargi et al., Indexing Text Events in Digital Video Databases,” International Conference on Pattern Recognition, Brisbane, pp. 916-918 (August 1998); M. K. Mandal et al., “Image Indexing Using Moments and Wavelets,” IEEE Transactions on Consumer Electronics, Vol. 42, No. 3 (August 1996); and S. Pfeiffer et al., “Abstracting Digital Moves Automatically,” Journal on Visual Communications and Image Representation, Vol. 7, No. 4, pp. 345-353 (1996).
The extraction of characters by a method that uses local thresholding and the detection of image regions containing characters by evaluating gray-level differences between adjacent regions has been described in “Recognizing Characters in Scene Images,” Ohya et al., IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 16, pp. 214-224 (February 1994). Ohya et al. further discloses the merging of detected regions having close proximity and similar gray levels in order to generate character pattern candidates.
Using the spatial context and high contrast characteristics of video text to merge regions with horizontal and vertical edges in close proximity to one another in order to detect text has been described in “Text, Speech, and Vision for Video Segmentation: The Informedia Project,” by A. Hauptmann et al., AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision (1995). R. Lienhart and F. Suber discuss a non-linear color system for reducing the number of colors in a video image in “Automatic Text Recognition for Video Indexing,” SPIE Conference on Image and Video Processing (January 1996). The reference describes a split-and-merge process to produce homogeneous segments having similar color. Lienhart and Suber use various heuristic methods to detect characters in homogenous regions, including foreground characters, monochrome or rigid characters, size-restricted characters, and characters having high contrast in comparison to surrounding regions.
The use of multi-valued image decomposition for locating text and separating images into multiple real foreground and background images is described in “Automatic Text Location in Images and Video Frames,” by A. K. Jain and B. Yu, Proceedings of IEEE Pattern Recognition, pp. 2055-2076, Vol. 31 (Nov. 12, 1998). J-C. Shim et al. describe using a generalized region-labeling algorithm to find homogeneous regions and to segment and extract text in “Automatic Text Extraction from Video for Content-Based Annotation and Retrieval,” Proceedings of the International Conference on Pattern Recognition, pp. 618-620 (1998). Identified foreground images are clustered in order to determine the color and location of text.
Other useful algorithms for image segmentation are described by K. V. Mardia et al. in “A Spatial Thresholding Method for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10, pp. 919-927 (1988), and by A. Perez et al. in “An Iterative Thresholding Method for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 9, pp. 742-751 (1987).
Various techniques for locating text in a digitized bitmap are known. Also known are techniques for binarizing character data to form an image that can be characterized as black-on-white and for performing character recognition on bitmap images. Text, and other patterns, in video streams range from the predictable, large, and clear, which are easy to classify to the crude, fleeting, and unpredictably-oriented and—positioned, which contain insufficient information, even in principle, to classify without assistance from auxiliary contextual data. There is also on-going research to incre
Agnihotri Lalitha
Dimitrova Nevenka
Elenbaas Jan Herman
Alavi Amir
Goodman Edward W.
Jones Andrew W.
Koninklijke Philips Electronics , N.V.
LandOfFree
Video stream classifiable symbol isolation method and system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Video stream classifiable symbol isolation method and system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Video stream classifiable symbol isolation method and system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3078596