Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal
Reexamination Certificate
2000-02-22
2004-05-25
Rao, Andy (Department: 2613)
Pulse or digital communications
Bandwidth reduction or expansion
Television or motion video signal
C375S240080
Reexamination Certificate
active
06741655
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to techniques for searching and retrieving visual information, and, more particularly to the use of content-based search queries to search for and retrieve moving visual information.
2. Description of Related Art
During the past several years, as the Internet has reached maturity and multimedia applications have come into wide spread use, the stock of readily available digital video information has become ever increasing. In order to reduce bandwidth requirements to manageable levels, such video information is generally stored in the digital environment the form of compressed bitstreams that are in a standard format, e.g., JPEG, Motion JPEG, MPEG-1, MPEG-2, MPEG-4, H.261 or H.263. At the present time, hundreds of thousands of different still and motion images, representing everything from oceans and mountains to skiing and baseball, are available over the Internet.
With the increasing wealth of video information available in a digital format, a need to meaningfully organize and search through such information has become pressing. Specifically, users are increasingly demanding a content based video search engine that is able to search for and retrieve specific pieces of video information which meet arbitrary predetermined criteria, such as shape or motion characteristics of video objects embedded within the stored video information, in response to a user-defined query.
In response to this need, there have been several attempts to develop video search and retrieval applications. Existing techniques fall into two distinct categories: query by example (“QBE”) and visual sketching.
In the context of image retrieval, examples of QBE systems include QBIC, PhotoBook, VisualSEEk, Virage and FourEyes, some of which are discussed in T. Minka, “An Image Database Browser that Learns from User Interaction,” MIT Media Laboratory Perceptual Computing Section, TR #365 (1996). These systems work under the pretext that several satisfactory matches must lie within the database. Under this pretext, the search begins with an element in the database itself, with the user being guided towards the desired image over a succession of query examples. Unfortunately, such “guiding” leads to substantial wasted time as the user must continuously refine the search.
Although space partitioning schemes to precompute hierarchical groupings can speed up the database search, such groupings are static and require recomputation when a new video is inserted into the database. Likewise, although QBE is, in principle, extensible, video shots generally contain a large number of objects, each of which is described by a complex multi-dimensional feature vector. The complexity arises partly due to the problem of describing shape and motion characteristics.
The second category of search and retrieval systems, sketch based query systems, compute the correlation between a user-drawn sketch and the edge map of each of the images in the database in order to locate video information. Sketch based query systems such as the one described in Hirata et al., “Query by Visual Example, Content Based Image Retrieval, Advances in Database Technology—EDBT,” 580 Lecture Notes on Computer Science (1992, A. Pirotte et al. eds.), compute the correlation between the sketch and the edge map of each of the images in a database. In A. Del Bimbo et al., “Visual Image Retrieval by Elastic Matching of User Sketches,” 19 IEEE Trans. on PAMI, 121-132 (1997), a technique which minimizes an energy functional to achieve a match is described. In C. E. Jacobs, et al., “Fast Miltiresolution Image Querying,” Proc. of SIGGRAPH, 277-286, Los Angeles (August 1995), the authors compute a distance between the wavelet signatures of the sketch and each of the images in the database.
Although some attempts have been made to index video shots, none attempt to represent video shots as dynamic collection of video objects. Instead, the prior techniques have utilized image retrieval algorithms for indexing video simply by assuming that a video clip is a collection of image frames.
In particular, the techniques developed by Zhang and Smoliar as well as the ones developed at QBIC use image retrieval methods (such as by using color histograms) for video. A “key-frame” is chosen from each shot, e.g., the r-frame in the QBIC method. In the case of Zhang and Smoliar, the key frame is extracted from a video clip by choosing a single frame from the clip. The clip is chosen by averaging over all the frames in the shot and then choosing the frame in the clip which is closest to the average. By using conventional image searches, such as a color histogram search, the key frames are used to index video.
Likewiese, in the QBIC project, the r-frame is selected by taking an arbitrary frame, such the first frame, as the representative frame. In case the video clip has motion, the mosaiked representation is used as the representative frame for the shot. QBIC again uses their image retrieval technology on these r-frames in order for them to index video clips.
In order to index video clips, the Informedia project creates a transcript of video by using a speech recognition algorithm on the audio stream. Recognized words are aligned with the video frame where the word was spoken. A user may search video clips by doing a keyword search. However, the speech to text conversion proved to be a major stumbling block as the accuracy of the conversion algorithm was low (around 20-30%), a significant impact on the quality of retrieval.
The above-described prior art devices fail to satisfy the growing need for an effective content based video search engine that is able to search for and retrieve specific pieces of video information which meet arbitrary predetermined criteria. The techniques are either incapable of searching motion video information or search such information only with respect to a global parameter such as panning or zooming. Likewise the prior art techniques fail to describe techniques for retrieving video information based on spatial and temporal characteristics. Thus, the aforementioned existing techniques cannot search for and retrieve specific pieces of video information which meet arbitrary predetermined criteria such as shape or motion characteristics of video objects embedded within the stored video information, in response to a user-defined query.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a truly content based video search engine.
A further object of the present invention is to provide a search engine which is able to search for and retrieve video objects embedded in video information.
Another object of the invention is to provide a mechanism for filtering identified video objects so that only objects which best match a user's search query will be retrieved.
Yet another object of the present invention is to provide a video search engine that is able to search for and retrieve specific pieces of video information which meet arbitrary predetermined criteria in response to a user-defined query.
A still further object of the present invention is to provide a search engine which is able to extract video objects from video information based on integrated feature characteristics of the video objects, including motion, color, and edge information.
In order to meet these and other objects which will become apparent with reference to further disclosure set forth below, the present invention provides a system for permitting a user to search for and retrieve video objects from one or more sequences of frames of video data over an interactive network. The system advantageously contains one or more server computers including storage for one or more databases of video object attributes and storage for one or more sequences of frames of video data to which the video object attributes correspond, a communications network permitting transmission of the one or more sequences of frames of video data from the server computers, and a client computer. The client computer houses a que
Chang Shih-Fu
Chen William
Meng Horace J.
Sundaram Hari
Zhong Di
Baker & Botts L.L.P.
Rao Andy
The Trustees of Columbia University in the City of New York
LandOfFree
Algorithms and system for object-oriented content-based... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Algorithms and system for object-oriented content-based..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Algorithms and system for object-oriented content-based... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3229261