Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-04-07
2003-10-14
To, Doris H. (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S256000, C704S272000, C704S235000, C700S214000, C400S116000, C084S609000
Reexamination Certificate
active
06633845
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to multimedia applications, databases and search engines, and more specifically, to a computer-based system and method for automatically generating a summary of a song.
2. Background Information
Governmental, commercial and educational enterprises often utilize database systems to store information. Much of this information is often in text format, and can thus be easily searched for key phrases or for specified search strings. Due to recent advances in both storage capacity and processing power, many database systems now store audio files in addition to the more conventional text-based files. For example, digital juke boxes that can store hundreds, if not thousands, of songs have been developed. A user can select any of these songs for downloading and/or playback. Several commercial entities have also begun selling music, such as CD-ROMs, over the Internet. These entities allow users to search for, select and purchase the CD-ROMs using a browser application, such as Internet Explorer from Microsoft Corp. of Redmond, Wash. or Netscape Navigator from America Online, Inc. of Dulles, Va.
Since there is currently no mechanism for efficiently searching the content of audio files, system administrators typically append conventional, text-based database fields, such as title, author, date, format, keywords, etc. to the audio files. These conventional database fields can then be searched textually by a user to locate specific audio files. For Internet or on-line music systems, the generation of such database fields for each available CD-ROM and/or song can be time-consuming and expensive. It is also subject to data entry and other errors.
For users who do not know the precise title or the artist of the song they are interested in, such text-based search techniques are of limited value. Additionally, a search of database fields for a given search string may identify a number of corresponding songs, even though the user may only be looking for one specific song. In this case, the user may have to listen to substantial portions of the songs to identify the specific song he or she is interested in. Users may also wish to identify the CD-ROM on which a particular song is located. Again, the user may have to listen to significant portions of each song on each CD-ROM in order to locate the particular song that he or she wants. Rather than force users to listen to the first few minutes of each song, a short segment of each song or CD-ROM could be manually extracted and made available to the user for review. The selection of such song segments, however, would be highly subjective and again would be time-consuming and expensive to produce.
Systems have been proposed that allow a user to search audio files by humming or whistling a portion of the song he or she is interested in. These systems process this user input and return the matching song(s). Viable commercial systems employing such melodic query techniques, however, have yet to be demonstrated.
SUMMARY OF THE INVENTION
One aspect of the present invention is the recognition that many songs, especially songs in the rock and popular (“pop”) genres, have specific structures, including repeating phrases or structural elements, such as the chorus or refrain, that are relatively short in duration. These repeating phrases, moreover, are often well known, and can be used to quickly identify specific songs. That is, a user can identify a song just by hearing this repeating phrase or element. Nonetheless, these repeating phrases often do not occur at the beginning of a song. Instead, the first occurrence of such a repeating phrase may not take place for some time, and the most memorable example of the phrase may be its third or fourth occurrence within the song. The present invention relates to a system for analyzing songs and identifying a relatively short, identifiable “key phrase”, such as a repeating phrase that may be used as a summary for the song. This key phrase or summary may then be used as an index to the song.
According to the invention, the song, or a portion thereof, is digitized and converted into a sequence of feature vectors. In the illustrative embodiment, the feature vectors correspond to mel-frequency cepstral coefficients (MFCCs). The feature vectors are then processed in a novel manner in order decipher the song's structure. Those sections that correspond to different structural elements can then be marked with identifying labels. Once the song is labeled, various rules or heuristics are applied to select a key phrase for the song. For example, the system may determine which label appears most frequently within the song, and then select at least some portion of the longest occurrence of that label as the summary.
The deciphering of a song's structure may be accomplished by dividing the song into fixed-length segments, analyzing the feature vectors of the corresponding segments and combining like segments into clusters by applying a distortion algorithm. Alternatively, the system may employ a Hidden Markov Model (HMM) approach in which a specific number of HMM states are selected so as to correspond to the song's labels. After training the HMM, the song is analyzed and an optimization technique is used to determine the most likely HMM state for each frame of the song.
REFERENCES:
patent: 5038658 (1991-08-01), Tsuruta et al.
patent: 5521324 (1996-05-01), Dannenberg et al.
patent: 5537488 (1996-07-01), Menon et al.
patent: 5625749 (1997-04-01), Goldenthal et al.
patent: 5649234 (1997-07-01), Klappert et al.
patent: 5703308 (1997-12-01), Tashiro et al.
patent: 5918223 (1999-06-01), Blum
patent: 5929857 (1999-07-01), Dinallo et al.
patent: 5937384 (1999-08-01), Huang et al.
patent: 6023673 (2000-02-01), Bakis
patent: 6064958 (2000-05-01), Takahashi et al.
patent: 6195634 (2001-02-01), Dudemaine et al.
patent: 6226612 (2001-05-01), Srenger et al.
patent: 6233545 (2001-05-01), Datig
patent: 6304674 (2001-10-01), Cass et al.
SpeechBot (COMPAQ, Internet product announcement, Dec. 1999).*
K. Martin, Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing, M.I.T. Media Laboratory Perceptual Computing Section Technical Report No. 399, Dec., 1996, pp. 1-11.
J. Foote, “Content-Based Retrieval of Music and Audio”, pp. 1-10.
E. Wold, T. Blum, D. Keislar and J. Wheaton, “Content-Based Classification, Search, and Retrieval of Audio”, IEEE Multimedia 1996, pp. 27-36.
A. Ghias, J. Logan, D. Chamberlin and B. Smith, “Query By Humming—Musical Information Retrieval in an Audio Database”, ACM Multimedia '95-Electronic Proceedings, Nov. 5-9, 1995, pp. 1-11.
R. McNab, L. Smith, I. Witten, C. Henderson and S. Cunningham, “Towards the Digital Music Library: Tune Retrieval from Acoustic Input”, ACM 1996, pp. 11-18.
M. Brand, “Structure learning in conditional probability models via an entropic prior and parameter extinction”, Oct. 19, 1997 revised Aug. 24, 1998, pp. 1-27.
M. Brand, “Pattern discovery via entropy minimization”, Mar. 8, 1998 revised Oct. 29, 1998, pp. 1-10.
K. Kashino and H. Murase, “Music Recognition Using Note Transition Context”, pp. 1-4.
M. Siegler, U. Jain, B. Raj and R. Stern, “Automatic Segmentation, Classification and Clustering of Broadcast News Audio”, pp. 1-3.
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev and P. Woodland, “The HTK Book”, Version 2.2, Dec. 1995, pp. 3-20, 67-76, 113-153 and Table of Contents.
Y. Zhuang, Y. Rui, T. Huang and S. Mehrota, “Adaptive Key Frame Extraction Using Unsupervised Clustering”, pp. 1-5.
K. Martin, E. Scheirer and B. Vercoe, “Music Content Analysis through Models of Audition”, ACM Multimedia '98 Workshop on Content Processing of Music for Multimedia Applications, Sep. 12, 1998.
J. Brown, “Musical fundamental frequency tracking using a pattern recognition method”, J. Accoust. Soc. Am., Sep., 1992, pp. 1394-1402.
J. Brown and B. Zhang, Musical frequency tracking using the methods of conventional and “narrowed” autocorrelation, J. Accoust. Soc. Am., May
Chu Stephen Mingyu
Logan Beth Teresa
Nolan Daniel A.
To Doris H.
LandOfFree
Music summarization system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Music summarization system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Music summarization system and method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3146557