Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2001-10-15
2002-06-11
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S243000, C704S247000
Reexamination Certificate
active
06405166
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of Invention
This invention is directed to a multimedia search apparatus and methods for searching multimedia content using speaker detection to segment the multimedia content.
2. Description of Related Art
In one known method for speaker identification and verification, Gaussian Mixture Models (GMMS) are used to model the spectral shapes of the speaker's voice. This method is described in “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,” Douglas A. Reynolds,
IEEE Transactions on Speech and Audio Processing
, vol. 3, no. 1, January 1995 (Reynolds), which is incorporated herein by reference. This method uses Gaussian Mixture Models to verify the identity of a speaker such as when conducting financial transactions. However, the above-described speaker identification and verification method assumes that only one speaker is the source of the audio input for all samples. Thus, this method is only practical for identifying a single speaker. Therefore, there is a need for new technology to provide more reliable speaker detection when more than one speaker may be present in multimedia information.
SUMMARY OF THE INVENTION
This invention provides multimedia search apparatus and methods for searching multimedia content using speaker detection to segment the multimedia content. The multimedia search apparatus and methods may aid in browsing multimedia content and may be used in conjunction with known browsing techniques such as word spotting, topic spotting, image classification, and the like.
The multimedia search apparatus receives a search request from a user device. The search request includes information regarding the target speaker for which the search is to be conducted. Based on the search request, the multimedia search apparatus retrieves the multimedia content from a multimedia database.
In one embodiment of the invention, the multimedia search apparatus retrieves Gaussian Mixture Models (GMMs) from a Gaussian Mixture Model storage device, corresponding to the target speaker and background data. Based on the retrieved Gaussian Mixture Models, the multimedia search device searches the multimedia data of the multimedia content and segments the multimedia data. The segments are identified by determining an average normalized score for blocks of frames of the multimedia data and determining if the average normalized score exceeds one or more predetermined thresholds. If the average normalized score exceeds the one or more thresholds, the frame may be part of a target speaker segment. If the normalized score falls below one or more of the thresholds, the frame may be considered to be in a background segment.
Once the segments are identified by the multimedia search device, the segments may be provided to the user device as results of the search. Accordingly, the user device may choose from the identified multimedia content and multimedia segments for playback.
REFERENCES:
patent: 4773093 (1988-09-01), Higgins et al.
patent: 5271088 (1993-12-01), Bahler
patent: 5522012 (1996-05-01), Mammone et al.
patent: 5548647 (1996-08-01), Naik et al.
patent: 6317710 (2001-11-01), Huang et al.
D. Roy and C. Malamud, Speaker identification based text to audio alignment for an audio visual retrieval system,Proc. ICASSP 97, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Munich, 1099-1102, 1997.
M-H. Siu, G. Yu, and H. Gish, An unsupervised, sequential learning algorigthm for the segmentation of speech waveforms with multiple speackers,Proc. ICASSP 92, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, San Francisco, vol. II, 189-192..
Speech segmentation and clustering based on speaker features,Proc. ICASSP 93IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Minneapolis 395-398, 1993.
C. Montacie and Marie-Jose Caraty, Sound Channel Video Indexing, ESCA, Eurospeech97, Rhodes, Greece ISSN 1018-4074, pp. 2359-2362.
L. Wilcox, F. Chen, D. Kimber, and V. Balasubramanian, Segmentation of speech using speaker identification,Proc. ICASSP 94, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Adelaide, 161-164, 1994.
D.A. Reynolds & R.C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,”IEEE Trans. On Speech and Audio Processing, vol. 3, 1995, pp. 72-83.
Foote et al. (“Finding Presentations in Recorded Meetings using Audio and VidEo Features,” 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Mar. 1999).
Huang Qian
Magrin-Chagnolleau Ivan
Parthasarathy Sarangarajan
Rosenberg Aaron Edward
AT&T Corp.
Dorvil Richemond
Nolan Daniel A.
Oliff & Berridg,e PLC
LandOfFree
Multimedia search apparatus and method for searching... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multimedia search apparatus and method for searching..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multimedia search apparatus and method for searching... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2956439