Method for detecting scene changes in a digital video stream

Television – Image signal processing circuitry specific to television – Motion dependent key signal generation or scene change...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C348S701000, C348S699000

Reexamination Certificate

active

06738100

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to video management systems. More specifically, the invention is directed to a system for automatically processing a video sequence to extract metadata that provides an adequate visual representation of the video.
2. Description of the Related Technology
The management of video data is a critical information management problem. The value of video footage can be effectively utilized only when it can be reused and repurposed in many different contexts. One of the key requirements to effectively access video from a large collection is the ability to retrieve video information by content. Content-based retrieval of video data demands a computer-readable representation of video. This representation of the original video data is called metadata. The metadata includes a representation of the visual, audio and semantic content. In other words, a good representation of a video should effectively capture the look of the video, its sound and its meaning. An effective representation of the video captures the essence of the video in as small a representation as possible. Such representations of the video can be stored in a database. A user trying to access video from a collection can query the database to perform a content-based search of the video collection to locate the specific video asset of interest.
FIG. 1
illustrates a block diagram of a video database system
100
. Such a system is described in Designing Video Data Management Systems, Arun Hampapur, University of Michigan, 1995, which is herein incorporated by reference. Video data
102
is input into a Metadata Extraction module
104
. The resultant metadata is stored in a database system
106
by use of an insertion interface
108
.
The extraction (
104
) of metadata from the actual video data
102
is a very tedious process called video logging or manual annotation. Typically this process requires on average labor of eight times the length of the video. What is desired is a system which would automatically process a video so as to extract the metadata from a video sequence of frames that provides a good visual representation of the video.
Some of the terminology used in the description of the invention will now be discussed. This terminology is explained with reference to a set of example images or frames shown in FIG.
2
. Image one shows a brown building
120
surrounded by a green lawn
122
with a blue sky
124
as a background. Image two shows a brown car
126
on a green lawn
128
with a blue sky
130
as a background. Let us assume that these two frames are taken from adjacent shots in a video. These two frames can be compared based on several different sets of image properties, such as color properties, distribution of color over the image space, structural properties, and so forth. Since each image property represents only one aspect of the complete image, a system for generating an adequate representation by extracting orthogonal properties from the video is needed. The two images in
FIG. 2
would appear similar in terms of their chromatic properties (both have approximately the same amount of blue, green and brown color's) but would differ significantly in terms of their structural properties (the location of edges, how the edges are distributed and connected to each other, and so forth).
An alternate scenario is where the two images differ in their chromatic properties but are similar in terms of their structural properties. An example of such a scenario occurs when there are two images of the same scene under different lighting conditions. This scenario also occurs when edit effects are introduced during the film or video production process like when a scene fades out to black or fades in from black.
Given any arbitrary video, the process used for generating an adequate visual representation of the video must be able to effectively deal with the situations outlined in the above discussion. The use of digital video editors in the production process is increasing the fraction of frames which are subjected to digital editing effects. Thus an effective approach to generating adequate visual representations of videos is desired that uses both chromatic and structural measurements from the video.
Several prior attempts at providing an adequate visual representation of the visual content of a video have been made: Arun Hampapur, Designing Video Data Management Systems, The University of Michigan, 1995; Behzad Shahraray, Method and apparatus for detecting abrupt and gradual scene changes in image sequences, AT&T Corp, 32 Avenue of the Americas, New York, N.Y. 10013-2412, 1994, European Patent Application number 066327 A2; Hong Jiang Zhang, Stephen W Smoliar and Jian Hu Wu, A system for locating automatically video segment boundaries and for extracting key-frames, Institute of System Science, Kent Ridge, Singapore 0511, 1995, European Patent Application number 0 690413 A2; and Akio Nagasaka and Yuzuru Tanaka, “Automatic Video Indexing and Full-Video Search for Object Appearances”,
Proceedings of the
2
nd Working Conference on Visual Database Systems
, p.119-133, 1991. Most existing techniques have focused on detecting abrupt and gradual scene transitions in video. However, the more essential problem to be solved is deriving an adequate visual representation of the visual content of the video.
Most of the existing scene transition detection techniques, including Shahraray and Zhang et al., use the following measurements for gradual and abrupt scene transitions: 1) Intensity based difference measurements wherein the difference between two frames from the video which are separated by some time interval “T”, is extracted. Typically, the difference measures include pixel difference measures, gray level global histogram measures, local pixel and histogram difference measures, color histogram measures, and so forth. 2) Thresholding of difference measurements wherein the difference measures are thresholded using either a single threshold or multiple thresholds.
However, to generate an adequate visual representation of the visual content of the video, a system is needed wherein the efficacy of the existing techniques is not critically dependent on the threshold or decision criteria used to declare a scene break or scene transition. Using existing techniques, a low value of the threshold would result in a oversampled representation of the video, whereas, a higher value would result in the loss of information. What is needed is a system wherein the choice of the decision criteria is a non-critical factor.
SUMMARY OF THE INVENTION
One embodiment of the present invention includes a computer-based system for identifying keyframes or a visual representation of a video by use of a two stage measurement process. Frames from a user-selected video segment or sequence are processed to identify the keyframes. The first stage preferably includes a chromatic difference measurement to identify a potential set of keyframes. To be considered a potential frame, the measurement result must exceed a user-selectable chromatic threshold. The potential set of keyframes is then passed to the second stage which preferably includes a structural difference measurement. If the result of the structural difference measurement then exceeds a user-selectable structural threshold, the current frame is identified as a keyframe. The two stage process is then repeated to identify additional keyframes until the end of the video. If a particular frame does not exceed either the first or second threshold, the next frame, after a user-selectable time delta, is processed.
The first stage is preferably computationally cheaper than the second stage. The second stage is more discriminatory since it preferably operates on a smaller set of frames. The keyframing system is extensible to additional stages or measurements as necessary.
In one aspect of the invention, there is a method for detecting scene changes in a digital video data stream displayed upon a monitor coupled to a comput

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for detecting scene changes in a digital video stream does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for detecting scene changes in a digital video stream, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for detecting scene changes in a digital video stream will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3188483

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.