Image analysis – Applications – Motion or velocity measuring
Reexamination Certificate
1999-11-30
2003-04-15
Boudreau, Leo (Department: 2621)
Image analysis
Applications
Motion or velocity measuring
C382S171000, C382S173000, C348S207110, C348S231900, C348S422100, C348S700000, C348S571000, 36, C345S215000, C358S906000, C358S909100
Reexamination Certificate
active
06549643
ABSTRACT:
BACKGROUND
1. Technical Field
The present invention relates generally to digital video processing and analysis and, more specifically, to a system and method for selecting key-frames from video data based on quantifiable measures such as the amount of motion and/or color activity of the video data.
2. Description of Related Art
The use of digital video in many multimedia systems is becoming quite popular. Videos are playing an increasingly important role in both education and commerce. In addition to the currently emerging services such as video-on-demand and pay-television, a variety of new information services such as digital catalogues and interactive multimedia documents, including text, audio and video are being developed.
Some conventional digital video application, however, use time consuming fast forward or rewind methods to search, retrieve and obtain a quick overview of the video content. As such, methods are continuously being developed for accessing the video content, which present the visual information in compact forms such that the operator can quickly browse a video clip, retrieve content in different levels of detail and locate segments of interest. To enable time-efficient access, digital video must be analyzed and processed to provide a structure that allows the user to locate any event in the video and browse it very quickly.
In general, a widely used method to provide the aforementioned needs is to generate a video summary. Conventional video summarization methods typically include segmenting a video into an appropriate set of segments such as video “shots” and selecting one or more key-frames from the shots. A video shot refers to a contiguous recording of one or more video frames depicting a continuous action in time and space. In a shot, the camera could remain fixed, or it may exhibit one of the characteristic motions such as panning, tilting, or tracking. For most videos, shot changes or cuts are created intentionally by video/film directors. Since there are typically many images in a given video shot, it is desirable to reduce the number of such images to one or more key-frames to represent the content of a given shot.
Conventional methods for selecting key-frames may be broadly classified into three categories: uniform sampling, color and motion based methods. Conventional methods based on uniform sampling select the frame(s) at certain instances in the shot as key-frames for the shot. For instance, one common practice is to select only one frame as a key-frame, i.e., the nth frame of the video shot where n is predetermined (which is typically the first frame n=1), to represent content of the video shots. Generally speaking, in a video shot where object and/or camera motion, and visual effects are prominent, one representative image is typically not sufficient to represent the entire content in the video. As such, other conventional key-frame selection methods based on uniform sampling of the images select multiple key-frames within the video shot by selecting those frames that exist at constant time-intervals in a video shot. Irrespective of the content in the video shot, however, this method yields multiple key-frames. However, when the shot is stationary all the other key-frames except the first key-frame will be redundant.
The problem with the uniform sampling methods is that the viewer may be misguided about the video content when there is high motion and/or color activity in the shot due to the well-known uniform sampling problem, i.e. aliasing.
Conventional key-frame selection methods based on color typically employ histogram or wavelet analysis to find the similarity between consecutive frames in a video shot. For example, one conventional method involves comparing a current frame in the video shot with the previous key-frame in terms of the similarity of their color histograms or luminance projections starting from the first frame of the shot where this frame is selected as the first key-frame. The frame with a similarity value smaller than a predetermined threshold is selected as the next key-frame. The selection is terminated when the end of the shot is reached (see, “Efficient Matching and Clustering of Video Shots,” by B. Liu, et al, Proc. IEEE ICIP, Vol. 1, pp. 338-341, Washington D.C., October, 1995.) A similar method uses chromatic features to compute the image similarity (see “A Shot Classification Method Of Selecting Effective Key-Frames For Video Browsing,” by H. Aoki, et al, Proc. ACM Multimedia, pp., 1-10, Boston, Mass., November, 1996.
One problem with histogram-based methods is that they typically fail to select key-frames when the spatial layout of the content changes while the histogram remains constant. As such, other conventional methods use wavelet coefficients or pixel-based frame differences to compute the similarity between frames to handle spatial layout problem.
Another conventional key-frame selection method disclosed in U.S. patent application Ser. No. 5,635,982 entitled: “System For Automatic Video Segmentation and Key Frame Extraction For Video Sequences Having Both Sharp and Gradual Transitions.” With this method, starting from the first frame in the shot, the frame is compared with the rest of the frames in the shot until a significantly different frame is found, and that image is selected as a candidate against which successive frames are compared. The method suggest using one of the following metrics: color, motion or hybrid (color and motion combined), to compute the frame similarity. Another conventional method disclosed in U.S. Pat. No. 5,664,227 entitled: “System and Method For Skimming Digital Audio/video Data” employs a statistical change detection method that uses DCT (discrete cosine transform) coefficients of the compressed images as a similarity measure to select multiple key-frames in a video shot. This method also requires selection a threshold.
All the above methods use some form of statistical measure to find the dissimilarity of images and heavily depend on the threshold selection. One problem associated with such an approach is that selecting the appropriate threshold that will work for every kind of video is not trivial since these thresholds cannot be linked semantically to events in the video, but rather only used to compare statistical quantities. Although domain specific threshold selection is addressed in some of these conventional methods, the video generation techniques change over time. Yet there is a vast amount of sources in every domain. In addition color based similarity measures cannot quantify the dynamic information due to the camera or object motion in the video shot.
Conventional key-frame selection methods that are based on motion are better suited for controlling the number of frames based on temporal dynamics in the scene. In general, pixel-based image differences or optical flow computation are typically used in motion based key-frame selection methods. For instance, in one conventional method, a cumulative average image difference curve is sampled non-uniformly to obtain multiple key-frames (see “Visual Search in a Smash System” by R. L. Lagendijk, et al, Proc. IEEE ICIP, pp. 671-674, Lausanne, Switzerland, September 1996. This method, however, requires the pre-selection of the number of key-frames for a video clip. Another method uses optical flow analysis to measure the motion in a shot and select key-frames at the local minima of motion (see “Key Frame Selection By Motion Analysis,” by W. Wolfe, Proc. IEEE ICASSP, pp. 1228-1231, Atlanta Ga., May, 1996). Another conventional method involves using the cumulative amount of special motions for selection (see “Scene Change Detection and Content-based Sampling Of Video Sequences,” by B. Shahraray, Proc. SPIE Digital Video Compression” Algorithms and Technologies, Vol. 2419, pp. 2-13, San Jose, Calif., February, 1995). Motion based key-frame selection methods model the temporal changes of the scene with motion only.
It is believed that that a good key-frame selection method is the one that can
1. exploit the dynamic
Liou Shih-Ping
Toklu Candemir
Boudreau Leo
Lu Tom Y.
Paschburg Donald B.
Siemens Corporate Research Inc.
LandOfFree
System and method for selecting key-frames of video data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for selecting key-frames of video data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for selecting key-frames of video data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3024329