Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal
Reexamination Certificate
1997-05-27
2001-06-05
Kelley, Chris (Department: 2613)
Pulse or digital communications
Bandwidth reduction or expansion
Television or motion video signal
Reexamination Certificate
active
06243419
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a scheme for detecting captions in video data, and more particularly, to a scheme for detecting captions in coded video data as well as a video retrieval, a video content indication display, and a video display based on the coded video caption detection.
2. Description of the Background Art
As a method for extracting an information indicative of the video content from the video for the purpose of carrying out a processing based on the video content such as video retrieval or video editing, a method for extracting caption regions from the video has been known. Here, the captions generally include texts, photographs, symbols, patterns, markings, icons, etc., which are made to appear in the video by using a technique such as the superimpose technique, and the caption region is a pixel or a set of pixels which contain such a caption.
The conventionally known methods for automatically extracting caption regions from the video include a method which utilizes the property that the caption region has a relatively high intensity compared with the background region so that its edge can be easily detected (see R. Lienhart et al.: “Automatic text recognition in digital videos”, Image and Video Processing IV, Proc. SPIE 2660-20, January 1996, for example), and a method which utilizes the fact that the caption region has large intensity differences at its periphery (see M. A. Smith et al.: “Video Skimming for Quick Browsing based on Audio and Image Characterization”, Carnegie Mellon University, Technical Report CMU-CS-95-186, July 1995, for example).
In Lienhart et al., the frame image is segmented by the split and merge algorithm, and a caption region is detected according to a size of a region and its motion between frames. In this method, the segmentation utilizes the fact that the caption has a uniform pixel value so that the caption and the background are effectively separated according to a difference in intensities.
In Smith et al., a caption region is detected by obtaining and smoothing an edge of the image. This method utilizes the fact that the caption has a relatively high contrast compared with the background so that the edge of the caption becomes sharp.
As a modification of the latter type of the conventionally known method, there is also a proposition for improving the precision of the caption extraction by averaging several frames that contain the caption so as to emphasize the caption while reducing an influence of background fluctuations.
Now, in order to extract caption regions from the coded video which is coded by utilizing the inter-frame correlation, if any of the conventionally known methods as described above is to be used, it would be necessary to decode the coded video completely once so as to restore the original frame images, and then carry out the extraction processing as described above with respect to the restored original frame images. However, this provision requires the image decoding processing in addition to the caption region extraction processing, so that the processing cost would be high and the high speed caption region extraction would be difficult.
In addition, in a case of applying the above described method for averaging a plurality of frames to the coded video, it is necessary to carry out the averaging after a plurality of frame images are all decoded, so that the processing cost would be even higher.
Now, the conventional methods for detecting captions from the video have been based on local characteristics obtained from one to several frame images.
For instance, there is a conventional method which utilizes the fact that the caption region has large intensity differences on its edge, in which the caption is detected by finding a frame in which the caption appears, and taking differences of intensity and color with respect to frames before and after the caption appearance.
Also, there is a conventional method which utilizes the property that the caption region has a relatively high intensity compared with the background region so that its edge can be easily detected, in which the caption is detected by using the edge detection based on the first order derivative of the image and the projections of the edge image into vertical and horizontal directions.
Also, there is a conventional method which utilizes the fact that the caption is stationary and has a high intensity, in which a text portion is detected by obtaining a portion which has no motion between two frames and an intensity greater than or equal to a prescribed value (see Japanese Patent Application No. 8-331456 (1996)).
As such, the conventional methods for detecting captions from the video are utilizing the time-wise localized information such as one or two frame images. For this reason, these conventional methods have been associated with a problem that an imaged object other than the caption which has the similar characteristics as the caption, such as the characteristics of being stationary, having a high intensity, and having large high frequency components, could be erroneously detected as the caption.
On the other hand, there has also been a problem that the caption which appears on the video for a long period of time would not be correctly detected as the caption when there is a temporal movement or a contour blurring due to an influence of image degradation, noises, etc. As a consequence, there have been cases in which the single continuous caption is erroneously detected for multiple times as different captions over a plurality of time sections.
In other words, the conventional methods are judging the existence of the caption according to a certain short time section, so that it is difficult to avoid an erroneous detection of an imaged object other than the caption or an erroneous overlooking of the caption due to noises. Consequently, when any of the conventional methods is used for the purpose of obtaining a list of captions from the video, there are cases in which an imaged object other than the caption is erroneously displayed or a single caption is displayed more than once in overlaps.
Now, in conjunction with increasing activities in video distributions such as the television broadcasting, the digital satellite broadcasting, the laser disks, the digital video disks, and the video-on-demand, etc., there are increasing demands for flexible handling of video data. To this end, there have been propositions of techniques which attach various kinds of contents or index information to the video so as to enable the retrieval of and/or the random access to the video. As an information which characterizes the video, the captions which generally include texts, photographs, symbols, patterns, markings, icons, etc., are important as they reflect the meanings or the contents of the video. For this reason, there have been propositions of a method for automatically detecting captions from the video.
For example, there is a conventional method disclosed in Japanese Patent Application No. 8-331456 (1996) mentioned above, which utilizes the fact that the caption is stationary and has a high intensity, in which a text portion is detected by obtaining a portion which has no motion between two frames and an intensity greater than or equal to a prescribed value
Also, there is a conventional method which utilizes the property that the caption has a sharp edge and a high intensity, in which a text portion is detected by obtaining a block for which both the edge sharpness and the intensity of the frame image are greater than prescribed thresholds (see Japanese patent Application No. 8-212231 (1996)).
As such, the conventional methods for detecting captions from the video are detecting the caption by utilizing the property of the caption itself such as itsedge sharpness or its intensity so that there has been a problem that an ability for detecting a switching point between captions has been low.
For instance, in Japanese Patent Application No. 8-212231 mentioned above, the frame image is segmented into blocks
Akutsu Akihito
Hamada Hiroshi
Niikura Yasuhiro
Satou Takashi
Taniguchi Yukinobu
Banner & Witcoff , Ltd.
Kelley Chris
Nippon Telegraph and Telephone Corporation
Philippe Gims
LandOfFree
Scheme for detecting captions in coded video data without... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Scheme for detecting captions in coded video data without..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Scheme for detecting captions in coded video data without... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2481660