Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1999-12-06
2004-10-05
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S256000, C704S233000, C704S500000, C704S246000, C348S468000
Reexamination Certificate
active
06801895
ABSTRACT:
FIELD OF THE INVENTION
The present invention is directed to audio classification. More particularly the present invention is directed to a method and apparatus for classifying and separating different types of multi-media events based upon an audio signal.
BACKGROUND
Multi-media presentations simultaneously convey both audible and visual information to their viewers. This simultaneous presentation of information in different media has proven to be an efficient, effective, and well received communication method. Multi-media presentations date back to the first “talking pictures” of a century ago and have grown, developed, and improved not only into the movies of today but also into other common and prevalent communication methods including television and personal computers.
Multi-media presentations can vary in length from a few seconds or less to several hours or more. Their content can vary from a single uncut video recording of a tranquil lake scene to a well edited and fast paced television news broadcast containing a multitude of scenes, settings, and backdrops.
When a multi-media presentation is long, and only a small portion of the presentation is of interest to a viewer, the viewer can, unfortunately, spend an inordinate amount of time searching for and finding the portion of the presentation that is of interest to them. The indexing or segmentation of a multi-media presentation can, consequently, be a valuable tool for the efficient and economical retrieval of specific segments of a multi-media presentation.
In a news broadcast on commercial television, stories and features are interrupted by commercials interspersed throughout the program. A viewer interested in viewing only the news programs would, therefore, also be required to view the commercials located within the individual news segments. Viewing these interposed and unwanted commercials prolongs the entire process for the viewer by increasing the time required to search through the news program in order to find the desired news pieces. Conversely, some viewers may instead be interested in viewing and indexing the commercials rather than the news programs. These viewers would similarly be forced to wade through the lengthy news programs in order to find the commercials that they sought to review. Thus, in both of these examples, it would benefit the user if the commercials and the news segments could be easily separated, identified, and indexed, so that the segments of the news program that were of specific interest to a viewer could be easily identified and located.
Various attempts have been made to identify and index the commercials placed within a news program. In one known labor intensive process the news program is indexed through the manual observation and indexing of the entire program—an inefficient and expensive endeavor. In another known process researchers have utilized the introduction or re-introduction of an anchor person in the news program to provide a queue for each segment of the broadcast. In other words, every time the anchor person was introduced a different news segment was thought to begin. This method has proven to be a complex and inaccurate process relying upon the individual intricacies of the various news stations and their various news anchor people; one that can not be implemented on a widespread basis but is, rather, confined to a restrictive number of channels and anchor people due to the time required in establishing the system.
It is, therefore, desirable to provide a simpler process for identifying and indexing commercials in a television news program: one that does not rely on the individual queues of a particular news network or reporter; one that can be efficiently and accurately implemented over a wide range of news programs and commercials; one that overcomes the shortcomings of the processes used today.
SUMMARY OF THE INVENTION
The present invention includes a method and apparatus for segmenting a multi-media program based upon audio events. In one embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.
In an alternative embodiment of the present invention a computer-readable medium is provided. This medium has stored thereon instructions that are adapted to be executed by a processor and, when executed, define a series of steps to identify commercial segments of a television news program. These steps include selecting samples of an audio stream at a preselected interval and then grouping these samples into clips which are then analyzed to determine if a commercial is present within the clip. This analysis includes determining: the non silence ratio of the clip; the standard deviation of the zero crossing rate of the clip; the volume standard deviation of the clip; the volume dynamic range of the clip; the volume undulation of the clip; the 4 Hz modulation energy of the clip; the smooth pitch ratio of the clip; the non-pitch ratio of the clip; and, the energy ratio in the sub-band of the clip.
REFERENCES:
patent: 5499243 (1996-03-01), Hall
patent: 6205422 (2001-03-01), Gu et al.
patent: 6295092 (2001-09-01), Hullinger et al.
patent: 6298323 (2001-10-01), Kaemmerer
patent: 6418412 (2002-07-01), Asghar et al.
J. Saunders, “Real-time Discrimination of Broadcast Speech/Music,”Proc. Of ICASSP'96, vol. 2, 1996, pp. 993-996.
L. Chen and P. Faudemay, “Multi-Criteria Video Segmentation for TV News,”Proc. Of lst Multimedia Signal Processing Workshop, Princeton, Jun. 1997, pp. 319-324.
Z. Liu, Y. Wang and T. Chen, “Audio Feature Extraction and Analysis for Scene Segmentation and Classification,”Journal of VLSI Signal Processing System, Jun. 1998.
Chien Yong Low, Qi Tian and Hongjiang Zhang, “An Automatic News Video Parsing, Indexing, and Browsing System,”Proc. Of the Fourth ACM International Multimedia Conference, Boston, Nov. 1996, pp. 425-426.
A.E. Rosenberg, I. Magrin-Chagnolleau, S. Parthasarathy, and Q. Huang, “Speaker detection in broadcast speech database,”Proc of International Conference on Spoken Language Processing, Sydney, Nov. 1998.
A. Farshid, A. Hsu, and M-Y Chiu, “Feature Management for Large Video Databases,”Proc. Of SPIE: Storage and Retrieval for Image and Video Databases, San Jose, USA, 1993.
S. Smoliar, H. Zhang, A. Kankanhalli, “Automatic Partitioning of Full-motion Video,” IEEE Computer Society Press, 1995.
A. Hauptmann and M. Witbrock, “Story Segmentation and Detection of Commercials in Broadcast News Video,”Proc. Of Advances in Digital Libraries Conference, Santa Barbara, Apr. 1998.
M.A. Hearst, “Multi-paragraph Segmentation of Expository Text,”The 32ndAnnual Meeting of the Association for Computational Linguistics, pp. 9-16, New Mexico, USA, Jun. 1994.
Z. Liu and Q. Huang, “Classification of Audio Events in Broadcast News,”Proc. Of IEEE Workshop in Multimedia Signal Processing, Dec. 1998.
I. Mani, D. House, D. Maybury, and M. Green, “Towards Content-based Browsing of Broadcast News Video,”Intelligent Multimedia Information Retrieval, 1997.
M. Maybury, M. Merlino, and J. Rayson, “Segmentation, Content Extraction and Visualization of Broadcast News Video using Multistream Analysis,”Proc. Of ACM Multimedia, Boston, USA, 1996.
A. Merlino, D. Morey, and M. Maybury, “Broadcast News Navigation Using Story Sgmentation,”Proc. Of ACM Multimedia, Nov. 1997.
M.G. Brown, J. Foote, and J. Jones, “Automatic content-based retrieval of broadcast news,”Proc. Of ACM Multimedia, pp. 35-42, San Francisco, USA, 1995.
J. Nam and A.H. Tewfik, “Combined Audio and Visual Streams Analysis for Video Sequence Segmentation,”Proc. Of ICASSP, vol. 4, pp. 2665-2668, 1997.
Y. Rui, T.S. Huang, and S. Mehrotra, “Constructing Table-of-Content for Videos,”ACM Journal of Multimedia System
Huang Qian
Liu Zhu
AT&T Corp.
Han Qi
LandOfFree
Method and apparatus for segmenting a multi-media program... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for segmenting a multi-media program..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for segmenting a multi-media program... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3320716