Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-02-25
2001-08-07
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S243000
Reexamination Certificate
active
06272462
ABSTRACT:
BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to speech recognition systems. More particularly, the invention relates to speech model adaptation in a supervised system employing a corrective adaptation procedure that weights correct and incorrect models by a log likelihood ratio between current and best hypotheses.
Speech recognizers in popular use today employ speech models that contain data derived from training speakers. In many cases, training speech from these speakers is collected in advance and used to generate speaker independent models representing a cross section of the training speaker population. Later, when the speech recognizer is used, data extracted from speech of a new speaker is compared with the speaker independent models and the recognizer identifies the words in its lexicon that represent the best match between the new speech and the existing speech models.
If the new speaker's speech patterns are sufficiently similar to those of the training population, then the recognizer will do a reasonably good job of recognizing the new speaker's speech. However, if the new speaker has a strong regional accent or other speech idiosyncrasies that are not reflected in the training population, then recognition accuracy fails off significantly.
To enhance the reliability of the speech recognizer, many recognition systems implement an adaptation process whereby adaptation speech is provided by the new speaker, and that adaptation speech is used to adjust the speech model parameters so that they more closely represent the speech of the new speaker. Some systems require a significant quantity of adaptation speech. New speakers are instructed to read long passages of text, so that the adaptation system can extract the necessary adaptation data to adapt the speech models.
Where the content of the adaptation speech is known in advance, the adaptation system is referred to as performing “supervised” adaptation. Where the content of the adaptation speech is not known in advance, the adaptation process is referred to as “unsupervised” adaptation. In general, supervised adaptation will provide better results than unsupervised adaptation. Supervised techniques are based on the knowledge of the adaptation data transcriptions, whereas unsupervised techniques determine the transcriptions of the adaptation data automatically, using the best models available, and consequently provide often limited improvements as compared to supervised techniques.
Among the techniques available to perform adaptation, transformation-based adaptation (e.g., Maximum Likelihood Linear Regression or MLLR) and Bayesian techniques (e.g., Maximum A Posteriori or MAP) adaptation are most popular. While transformation-based adaptation provides a solution for dealing with unseen models, Bayesian adaptation uses a priori information from speaker independent models. Bayesian techniques are particularly useful in dealing with problems posed by sparse data. In practical applications, depending on the amount of adaptation available, transformation-based, Bayesian techniques or a combination of both may be chosen.
Given a small amount of adaptation data, one of the common challenges of supervised adaptation is to provide adapted models that accurately match a user's speaking characteristics and are discriminative. On the other hand, unsupervised adaptation has to deal with inaccuracy of the transcriptions and the selection of reliable information to perform adaptation. For both sets of techniques it is important to adjust the adaptation procedure to the amount of adaptation data available.
The present invention addresses the foregoing issue by providing a corrective adaptation procedure that employs discriminative training. The technique pushes incorrect models away from the correct model, rendering the recognition system more discriminative for the new speakers speaking characteristics. The corrective adaptation procedure will work with essentially any adaptation technique, including transformation-based adaptation techniques and Bayesian adaptation techniques, and others.
The corrective adaptation procedure of the invention weights correct and incorrect speech models by a log likelihood ratio between the current model and the best hypothesis model. The system generates a set of N-best models and then analyzes these models to generate the log likelihood ratios. Because supervised adaptation is performed, and the correct label sequence is known, the N-best information is exploited by the system in a discriminative way. In the preferred system a positive weight is applied to the correct label and a negative weight is applied to all other labels.
In comparison with other discriminative methods, the corrective adaptation technique of the invention has several advantages. It is computationally inexpensive, and it is easy to implement. Moreover, the technique carries out discrimination that is specific to a given speaker, such that convergence is not an issue.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings. dr
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1
is a block diagram illustrating the adaptation system of the invention in its single-pass form;
FIG. 2
is a block diagram of the adaptation system illustrating how a multiple pass system may be implemented using iteration;
FIG. 3
is a flowchart diagram illustrating the corrective N-best decoding process of the invention.
REFERENCES:
patent: 5970239 (1999-11-01), Bahl et al.
C.J. Leggetter and P.C. Woodland,Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models, Computer Speech and Language, 1995, pp. 171-185.
Jean-Luc Gauvain and Chin-Hui Lee,Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markow Chains, IEEE Transactions on Speech and Audio Processing, vol. 2, No. 2, Apr. 1994, pp. 291-298.
Tomoko Matsui and Sadaoki Furui,N-Best-Based Instantaneous Speaker Adaptation Method for Speech Recognition, NTT Human Interface Laboratories, 3-9-11, Midori-cho, Musashino-shi, Tokyo, Japan, pp. 973-975.
Chen et al, An n-best candidates-based discriminative training for speech recognition applications, IEEE, Jan. 1994, pp. 206-216.*
Chow, Maximum mutual information estimation of hmm parameters for continuous speech recognition using the n-best algorithm, IEEE, 1990, pp. 701-704.*
Korkmazsky et al, Discriminative training of the pronunciation networks, IEEE, 1997, pp. 223-229.*
Juang et al, Discriminative learning for minimum error classification, IEEE, Dec. 2, 1992, pp. 3043-3054.*
Seyed Mohammad Ahadi-Sarkani, Bayesian and Predictive Techniques for Speaker Adaptation, Jan. 1996.
M.J.F. Gales & P.C. Woodland, Variance Compensation Within The MLLR Framework, Feb. 1996.
Gelin Philippe
Junqua Jean-claude
Nguyen Patrick
Harness & Dickey & Pierce P.L.C.
Panasonic Technologies Inc.
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Supervised adaptation using corrective N-best decoding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Supervised adaptation using corrective N-best decoding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Supervised adaptation using corrective N-best decoding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2546697