Multimedia search apparatus and method for searching...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C231S004000, C231S004000

Reexamination Certificate

active

06317710

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of Invention
This invention is directed to a multimedia search apparatus and methods for searching multimedia content using speaker detection to segment the multimedia content.
2. Description of Related Art
In one known method for speaker identification and verification, Gaussian Mixture Models (GMMs) are used to model the spectral shapes of the speaker's voice. This method is described in “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,” Douglas A. Reynolds,
IEEE Transactions on Speech and Audio Processing,
vol. 3, no. 1, January 1995 (Reynolds), which is incorporated herein by reference. This method uses Gaussian Mixture Models to verify the identity of a speaker such as when conducting financial transactions. However, the above-described speaker identification and verification method assumes that only one speaker is the source of the audio input for all samples. Thus, this method is only practical for identifying a single speaker. Therefore, there is a need for new technology to provide more reliable speaker detection when more than one speaker may be present in multimedia information.
SUMMARY OF THE INVENTION
This invention provides multimedia search apparatus and methods for searching multimedia content using speaker detection to segment the multimedia content. The multimedia search apparatus and methods may aid in browsing multimedia content and may be used in conjunction with known browsing techniques such as word spotting, topic spotting, image classification, and the like.
The multimedia search apparatus receives a search request from a user device. The search request includes information regarding the target speaker for which the search is to be conducted. Based on the search request, the multimedia search apparatus retrieves the multimedia content from a multimedia database.
In one embodiment of the invention, the multimedia search apparatus retrieves Gaussian Mixture Models (GMMs) from a Gaussian Mixture Model storage device, corresponding to the target speaker and background data. Based on the retrieved Gaussian Mixture Models, the multimedia search device searches the audio data of the multimedia content and segments the audio data. The segments are identified by determining an average normalized score for blocks of frames of the audio data and determining if the average normalized score exceeds one or more predetermined thresholds. If the average normalized score exceeds the one or more thresholds, the frame may be part of a target speaker segment. If the normalized score falls below one or more of the thresholds, the frame may be considered to be in a background segment.
Once the segments are identified by the multimedia search device, the segments may be provided to the user device as results of the search. Accordingly, the user device may choose from the identified multimedia content and multimedia segments for playback.


REFERENCES:
patent: 4773093 (1988-09-01), Higgins et al.
patent: 5271088 (1993-12-01), Bahler
patent: 5522012 (1996-05-01), Mammone et al.
patent: 5548647 (1996-08-01), Naik et al.
Reynolds and Rose (“Robust Text-Independent Speaker Identification using Gaussian Mixture Speaker Models,” ©1995, IEEE Log #9406779).*
Roy & Malamud (“Speaker Identification Based Text to Audio Alignment for an Audio Retrieval System,” ©Apr. 1997 IEEE).*
Foote et al (“Finding Presentations in Recorded Meetings using Audio and Video Features,” 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Mar. 1999).*
Wilcox et al (“Segmentation of Speech using Speaker Identification,” IEEE International Conference on Acoustics, Speech, and Signal Processing, ©Apr. 1994).*
D. Roy and C. Malamud, Speaker identification based text to audio alignment for an audio visual retrieval system, Proc. ICASSP 97, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Munich, 1099-1102, 1997.
M-H. Siu, G. Yu, and H. Gish, An unsupervised, sequential learning algorigthm for the segmentation of speech waveforms with multiple speackers, Proc. ICASSP 92, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, San Francisco, vol. 11, 189-192.
Speech segmentation and clustering based on speaker features, Proc. ICASSP 93 IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Minneapolis 395-398, 1993.
C. Montacie and Marie-Jose Caraty, Sound Channel Video Indexing, ESCA, Eurospeech97, Rhodes, Greece ISSN 1018-4074, pp. 2359-2362.
L. Wilcox, F. Chen, D. Kimber, and V. Balasubramanian, Segmentation of speech using speaker identification, Proc. ICAASP 94, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, Adelaide, 161-164, 1994.
D. A. Reynolds & R. C. Rose, “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models,”IEEE Trans. on Speech and Audio Processing, vol. 3, 1995, pp. 72-83.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multimedia search apparatus and method for searching... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multimedia search apparatus and method for searching..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multimedia search apparatus and method for searching... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2604176

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.