Image analysis – Pattern recognition – Classification
Reexamination Certificate
1998-10-13
2002-05-14
Chen, Wenpeng (Department: 2624)
Image analysis
Pattern recognition
Classification
C382S232000, C348S700000, C345S215000
Reexamination Certificate
active
06389168
ABSTRACT:
TECHNICAL FIELD
The present invention relates generally to processing digital video information and, more specifically, to parsing and indexing compressed video streams.
DESCRIPTION OF THE RELATED ART
Digitized video provides significant improvements over analog video media with regard to video image manipulation. Digital video data can be readily compressed and decompressed, thereby enabling efficient transmission between remote sites. Efficient compression and decompression also enhance performance in the storage and retrieval of video data. As computer networks improve and video libraries become more accessible to homes and offices via the Internet, the importance of sufficient bandwidth to support video transmission and efficient methods for video indexing, retrieval, and browsing becomes more acute. However, the effective utilization of video resources is hampered by sometimes inadequate organization of video information and a need for further improvements in the performance of retrieval systems for video information.
The time dependent nature of video makes it a uniquely challenging media to, process. Several compression standards have been developed and implemented within the last two decades for video compression, including MPEG-1 and MPEG-2. Numerous techniques for video indexing and retrieval have been developed within the parameters defined by MPEG-1 and MPEG-2. U.S. Pat. No. 5,719,643 to Nakajima describes a method for detecting scene cuts in which an input image and a reference image are entered into an image processing unit and both are converted to contracted images. The contracted input image is compared to the contracted reference image to determine an interframe difference in luminance signals of the input and reference frames and temporal changes between the input and reference frames. Based on the comparison, a determination is made as to whether the input frame is a cut frame, a non-cut frame, or a cut-frame candidate.
It is also known in the art to select key frames from video sequences in order to use the selected frames as representative frames to convey the content of the video sequences from which they are chosen. The key frames are extracted from the video sequences in a manner which is similar to the determination of scene cuts, otherwise known as shot boundaries. A reference frame is compared to an input frame to determine whether the two frames are sufficiently different that a preselected difference threshold has been exceeded. Key frames can be used to enable users of retrieval systems to efficiently browse an entire video sequence by viewing only key frames. Key frames can also be utilized in video retrieval so that only key frames of a video sequence will be searched instead of searching all frames within a video sequence.
The current methods for detecting shot boundaries, extracting key frames, and video retrieval all rely on dissimilarities or similarities between video frames. However, reliance on global descriptions of video frames does not always provide the desired precision in video indexing and retrieval. For example, users of a video retrieval system might have particular subject matter within a video frame which they desire to retrieve without knowledge of any background information which might accompany the subject matter in any particular shot. Utilizing the current video retrieval methods, which rely on global descriptions of video frames, users might well be unable to locate a video shot containing the desired subject matter.
What is needed is a method and system which enables efficient indexing, retrieval, and browsing of compressed video at a higher level of detail than is available in the current art.
SUMMARY OF THE INVENTION
A method for parsing, indexing and retrieving compressed video data includes indexing video frames within a compressed video stream based on a comparison of video objects within frames of the compressed video stream. A first configuration of video objects in a first video frame and a second configuration of video objects in a second video frame are identified, wherein each first frame video object and each second frame video object has an associated quantitative attribute. A comparison of a first quantitative attribute associated with a first frame video object to a second quantitative attribute associated with a second frame video object is performed to ascertain whether a difference between the first and second quantitative attributes exceeds a predetermined threshold. If the predetermined threshold is exceeded, a video frame is selected from a sequence of video frames bounded by the first and second video frames, and the selected frame is used for indexing purposes.
In a preferred embodiment, the method is performed to identify shot boundaries and key instances of video objects, extract key video frames, detect camera operations, detect special effects video editing, and to enable video retrieval and browsing.
The quantitative attributes of video objects relate to at least one of size, shape, motion, or texture. Shot boundaries are detected within a video sequence by selecting the first video frame, which might be an initial video frame in a sequence, and the second video frame such that the first video frame is separated by some predetermined number of video frames from the second video frame. First quantitative attributes associated with the first frame video objects are calculated and compared to second quantitative attributes associated with second frame video objects to determine a quantitative attribute differential between the first and second frames. Alternatively, a quantity of first frame video objects is calculated and compared to a quantity of second frame video objects to determine a video object quantity differential between the first and second video frames. In a first embodiment, the quantitative attribute differential is compared to a shot boundary threshold. If the quantitative attribute differential exceeds the shot boundary threshold, a shot boundary is indexed in the video sequence bounded by first and second video frames. In a second embodiment, the video object quantity differential is compared to a shot boundary threshold to determine if it exceeds the shot boundary threshold to determine if the threshold is exceeded and, if the threshold is exceeded, a shot boundary is indexed. This process is repeated by selecting subsequent first video frames and subsequent second video frames for shot boundary analysis to identify additional shot boundaries in the video sequence.
Within the video shots defined by the shot boundaries, key instances of objects, key frames, camera operations, and special effects video edits are identified. Key instances of video objects are selected within the shot boundaries by calculating first quantitative attributes of a first instance of a video object in a first frame and second quantitative attributes of a second instance of the video object in a second frame and calculating a quantitative attribute differential between the first and second instances of the video object. The quantitative attribute differential is compared to a key instance threshold and, if the differential exceeds the threshold, a key instance of the object is, selected from the video sequence bounded by the first and second video frames. The calculation of the quantitative attribute differential captures a wide variety of instance-to-instance transitions which can trigger selections of key instances of video objects. For example, a sequence in which the camera zooms in on the video object rapidly results in a size differential between first and second instances of the video object, which alone is sufficient to exceed the threshold. Alternatively, a combination of changes in quantitative attributes for a video object, such as size and shape, might exceed the threshold, even though none of the quantitative attribute changes in isolation would be sufficient to exceed the threshold.
Key frames are extracted from the various shots of the video sequence defined by shot boundaries by calcula
Altunbasak Yucel
Zhang Hong-Jiang
Chen Wenpeng
Hewlett--Packard Company
LandOfFree
Object-based parsing and indexing of compressed video streams does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Object-based parsing and indexing of compressed video streams, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Object-based parsing and indexing of compressed video streams will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2866749