Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-02-12
2002-06-25
Chawan, Vijay B (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S245000, C704S236000, C704S251000
Reexamination Certificate
active
06411930
ABSTRACT:
FIELD OF THE INVENTION
This invention relates generally to methods and apparatus for use in performing speaker identification.
BACKGROUND OF THE INVENTION
In systems that provide for identification of a speaker, a general technique is to score the speaker's enunciation of a test phrase against each one of a number of individual Gaussian mixture models (GMM) and to select, or identify, the speaker as that person associated with the individual GMM, or set of GMMs, achieving the best score above a certain threshold using, e.g., a maximum likelihood technique. Typically, these systems generate individual GMMs by independently training, a priori, on small (e.g., 30 milli-second (ms.)) speech samples of training phrases spoken by the respective person.
Unfortunately, such systems do not perform well when attempting to discriminate the true speaker from people that merely sound like the true speaker. As such, in an attempt to improve discrimination these systems increase the number of GMMs to include “cohort” or “background” models, i.e., people that sound like the true speaker but are not (e.g., see Herbert Gish and Michael Schmidt, “Text-independent speaker identification,”
IEEE Signal Processing Magazine
, pages 18-32, 1994).
Alternatively, for both the speech and speaker recognition problems, a different approach has recently been proposed which uses a discriminative cost finction (which measures the empirical risk) during training in place of the maximum likelihood estimation, giving significantly improved generalization performance (e.g., see, Biing-Hwang Juang, Wu Chou, and Chin-Hui Lee, “Minimum Classification Error Rate Methods for Speech Recognition,”
IEEE Transactions on Speech and Audio Processing
, 5(3):257-265, 1997; and Chi-Shi Lui Chin-Hui Lee, Wu Chou, Biing-Hwang Juang, and Aaron E. Rosenberg, “A study on minimum error discriminative training for speaker recognition,”
Journal of the Acoustical Society of America
, 97(1):637-648, 1995). However, here the underlying model (a set of hidden Markov models) is left unchanged, and in the speaker recognition case, only the small vocabulary case of isolated digits was considered.
In providing speaker identification systems such as described above, support vector machines (SVMs) have been used for the speaker identification task directly, by training one-versus-rest and one-versus-another classifiers on the preprocessed data (e.g., see M. Schmidt, “Identifying speaker with support vector networks,”
Interface
'96
Proceedings
, Sydney, 1996). However, in such SVM-based speaker identification systems, training and testing are both orders of magnitude slower than, and the resulting performance is similar to, that of competing systems (e.g., see also, National Institute for Standards and Technology, Speaker recognition workshop, Technical Report, Maritime Institute of Technology, Mar. 27-28, 1996).
SUMMARY OF THE INVENTION
Unfortunately, the above-described approaches to speaker-identification are not inherently discriminative, in that a given speaker's model(s) are trained only on that speaker's data, and effective discrimination relies to a large extent on finding effective score normalization and thresholding techniques. Therefore, I have developed an alternative approach that adds explicit discrimination to the GMM method. In particular, and in accordance with the invention, I have developed a way to perform speaker identification that uses a single Gaussian mixture model (GMM) for multiple speaker—referred to herein as a Discriminative Gaussian mixture model (DGMM).
In an illustrative embodiment of the invention, a DGMM comprises a single GMM that is used for all speakers. A likelihood sum of the GMM is factored into two parts, one of which depends only on the Gaussian mixture model, and the other of which is a discriminative term. The discriminative term allows for the use of a binary classifier, such as a support vector machine (SVM).
In another embodiment of the invention, a voice messaging system incorporates a DGMM. The voice messaging system comprises a private branch exchange (PBX) and a plurality of user terminals, e.g., telephones, personal computers, etc.
REFERENCES:
patent: 5271088 (1993-12-01), Bahler
patent: 5638487 (1997-06-01), Chigier
patent: 5806032 (1998-09-01), Sproat
patent: 5839103 (1998-11-01), Mammone et al.
patent: 5862519 (1999-01-01), Sharma et al.
patent: 5960397 (1999-09-01), Rahim
patent: 6029124 (2000-02-01), Gillick et al.
patent: 6173260 (2001-01-01), Slaney
Slomka et al., (“A comparison of Gaussian Mixture and Multiple Binary Classifier Models for Speaker Verification”, Australian, New Zealand Conference on Intelligent Information Systems, 1996, Nov. 18-20, 1996, pp. 316-319).*
Del Alamo et al., (“Discriminative training of GMM for speaker identification”, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 89-92).
Chawan Vijay B
Lucent Technologies - Inc.
Troutman Sanders LLP
LandOfFree
Discriminative gaussian mixture models for speaker verification does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Discriminative gaussian mixture models for speaker verification, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Discriminative gaussian mixture models for speaker verification will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2921161