Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal
Reexamination Certificate
2000-03-14
2004-05-11
Rao, Andy (Department: 2713)
Pulse or digital communications
Bandwidth reduction or expansion
Television or motion video signal
C375S240260
Reexamination Certificate
active
06735253
ABSTRACT:
BACKGROUND OF THE INVENTION
I. Field of the Invention
The present invention relates to techniques for editing and parsing compressed digital information, and more specifically, to editing and parsing visual information in the compressed domain.
II. Description of the Related Art
With the increasing use of local area, wide area and global networks to spread information, digital video has become an essential component of many new media applications. The inclusion of video in an application often gives the application not only increased functional utility, but also an aesthetic appeal that cannot be obtained by text or audio information alone. However, while digital video greatly increases our ability to share information, it demands special technical support in processing, communication, and storage.
In order to reduce bandwidth requirements to manageable levels, video information is generally transmitted between systems in the digital environment the form of compressed bitstreams that are in a standard format, e.g., Motion JPEG, MPEG-1, MPEG-2, H.261 or H.263. In these compressed formats, the Discrete Cosine Transform (“DCT”) is utilized in order to transform N×N blocks of pixel data, where n typically is set to eight, into the DCT domain where quantization is more readily performed. Run-length encoding and entropy coding (i.e., Huffman coding or arithmetic coding) are applied to the quantized bitstream to produce a compressed bitstream which has a significantly reduced bit rate than the original uncompressed source signal. The process is assisted by additional side information, in the form of motion vectors, which are used to construct frame or field-based predictions from neighboring frames or fields by taking into account the inter-frame or inter-field motion that is typically present.
In order to be usable by a receiving system, such coded bitstreams must be both parsed and decoded. For example, in the case of an MPEG-2 encoded bitstream, the bitstream must be parsed into slices and macroblocks before the information contained in the bitstream is usable by an MPEG-2 decoder. Parsed bitstream information may be used directly by an MPEG-2 decoder to reconstruct the original visual information, or may be subjected to further processing.
In the case of compressed digital video, further processing of video information can occur either in the normal, uncompressed domain or in the compressed domain. Indeed, there have been numerous attempts by others in the field to realize useful techniques for indexing and manipulating digital video information in both the uncompressed and compressed domains.
For example, in the article by S. W. Smoliar et al., “Content-Based Video Indexing and Retrieval,” IEEE Multimedia, summer 1994, pp. 62-72, a color histogram comparison technique is proposed to detect scene cuts in the spatial (uncompressed) domain. In the article by B. Shahraray, “Scene Change Detection and Content-Based Sampling of Video Sequences,” SPIE Conf. Digital Image Compression: Algorithms and Technologies 1995, Vol. 2419, a block-based match and motion estimation algorithm is presented.
For compressed video information, the article by F. Arman et al., “Image Processing on Compressed Data for Large Video Databases,” Proceedings of ACM Multimedia '93, June 1993, pp. 267-272, proposes a technique for detecting scene cuts in JPEG compressed images by comparing the DCT coefficients of selected blocks from each frame. Likewise, the article by J. Meng et al., “Scene Change Detection in a MPEG Compressed Video Sequence,” IS&T/SPIE Symposium Proceedings, Vol. 2419, February 1995, San Jose, Calif., provides a methodology for the detection of direct scene cuts based on the distribution of motion vectors, and a technique for the location of transitional scene cuts based on DCT DC coefficients. Algorithms disclosed in the article by M. M. Yeung, et al. “Video Browsing using Clustering and Scene Transitions on Compressed Sequences,” IS&T/SPIE Symposium Proceedings, February 1995, San Jose, Calif. Vol. 2417, pp. 399-413, enable the browsing of video shots after scene cuts are located. However, the Smoliar et al., Shahraray, and Arman et al. references are limited to scene change detection, and the Meng et al. and Yeung et al. references do not provide any functions for editing compressed video.
Others in the field have attempted to address problems associated with camera operation and moving objects in a video sequence. For example, in the spatial domain, H. S. Sawhney, et al., “Model-Based 2D & 3D Dominant Motion Estimation for Mosaicking and Video Representation,” Proc. Fifth Int'l conf. Computer Vision, Los Alamitos, Calif., 1995, pp. 583-390, proposes to find parameters of an affine matrix and to construct a mosaic image from a sequence of video images. In similar vain, the work by A. Nagasaka et al., “Automatic Video Indexing and Full-Video Search for Object Appearances,” in E. Knuth and L. M. Wegner, editors, Video Database Systems, II, Elsevier Science Publishers B.V., North-Holland, 1992, pp. 113-127, proposes searching for object appearances and using them in a video indexing technique.
In the compressed domain, the detection of certain camera operations, e.g., zoom and pan, based on motion vectors have been proposed in both A. Akutsu et al., “Video Indexing Using Motion Vectors,” SPIE Visual Communications and Image Processing 1992, Vol. 1818, pp. 1522-1530, and Y. T. Tse et al., “Global Zoom/Pan Estimation and Compensation For Video Compression” Proceedings of ICASSP 1991, pp.2725-2728. In these proposed techniques, simple three parameter models are employed which require two assumptions, i.e., that camera panning is slow and focal length is long. However, such restrictions make the algorithms not suitable for general video processing.
There have also been attempts to develop techniques aimed specifically at digital video indexing. For example, in the aforementioned Smoliar et al. article, the authors propose using finite state models in order to parse and retrieve specific domain video, such as news video. Likewise, in A. Hampapur, et al., “Feature Based Digital Video Indexing,” IFIP2.6 Visual Database Systems, III, Switzerland, March, 95, a feature based video indexing scheme using low level machine derivable indices to map into the set of application specific video indices is presented.
One attempt to enable users to manilupate image and video information was proposed by J. Swartz, et al., “A Resolution Independent Video Language,” Proceedings of ACM Multimedia '95, pp. 179-188, as a resolution independent video language (Rivl). However, although Rivl uses group of pictures (GOPs) level direct copying whenever possible for “cut and paste” operations on MPEG video, it does not use operations in the compressed domain at frame and macroblock levels for special effects editing. Instead, most video effects in Rivl are done by decoding each frame into the pixel domain and then applying image library routines.
The techniques proposed by Swartz et al. and others which rely on performing some or all video data manipulation functions in the uncompressed domain do not provide a useful, truly comprehensive technique for indexing and manipulating digital video. As explained in S.-F. Chang, “Compressed-Domain Techniques for Image/Video Indexing and Manipulation,” IEEE Intern. Conf. on Image Processing, ICIP 95, Special Session on Digital Image/Video Libraries and Video-on-demand, October 1995, Washington, D.C., the disclosure of which is incorporated by reference herein, the compressed-domain approach offers several powerful benefits.
First, implementation of the same manipulation algorithms in the compressed domain is much cheaper than that in the uncompressed domain because the data rate is highly reduced in the compressed domain (e.g., a typical 20:1 to 50:1 compression ratio for MPEG). Second, given most existing images and videos stored in the compressed form, specific manipulation algorithms can be applied to the compressed streams without full decoding of the compre
Chang Shih-Fu
Meng Horace J.
Rao Andy
The Trustees of Columbia University in the City of New York
LandOfFree
Methods and architecture for indexing and editing compressed... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and architecture for indexing and editing compressed..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and architecture for indexing and editing compressed... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3225402