Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-04-06
2002-10-22
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S236000, C704S239000, C704S243000
Reexamination Certificate
active
06470314
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to speech recognition systems and, more particularly, to methods and apparatus for rapidly adapting such speech recognition systems to new acoustic conditions via cumulative distribution function matching techniques.
BACKGROUND OF THE INVENTION
A real-world speech recognition system encounters several acoustic conditions in the course of its application. For instance, a speech recognition system that handles telephony transactions can be reached through a regular handset, a cellular phone or a speakerphone. Each represents a different acoustic environment. Currently, it is well known that a system trained only for a particular acoustic condition degrades drastically when it encounters a different acoustic condition. To avoid this problem, one normally trains a system with data representing all the possible acoustic environments. However, it is often difficult to anticipate all the different acoustic and channel conditions and, moreover, such a pooled system often becomes too large and, hence, computationally burdensome.
Earlier techniques to adapt the acoustic models to a specific environment may be roughly classified into “model transformation” and “feature space transformation” techniques. In these techniques, the test utterance is first decoded with a generic speaker independent system (first pass), and the transcription with errors is used to compute the extent of the mismatch between the generic model and the specific environment.
A specific example of “model transformation” is MLLR (Maximum Likelihood Linear Regression) as described in C. J. Legetter and P. C. Woodland, “Speaker Adaptation of Continuous Density HMM's Using Multivariate Linear Regression,” ICSLP 1994, pp. 451-454, the disclosure of which is incorporated by reference herein. MLLR is based on the assumption that the model that is most suitable for transcribing the test speech is related to the generic model by means of a linear transform, i.e., the means and covariances of the gaussians in the transformed model are related to the means and covariances of the gaussians in the generic model by a linear transform. The parameters of the transformation are computed so that the likelihood of the test speech is maximized with the use of the transformed system, and assuming that the first pass transcription is the correct transcription of the test speech.
In “feature space transformation” techniques, the feature space of the test utterance is assumed to be related to the generic feature space through a linear transformation, and the linear transformation is computed, as before, to maximize the likelihood of the test speech under the assumption that the first pass transcription is correct, see, e.g., A. Sankar and C. H. Lee, “A Maximum-likelihood Approach to Stochastic Matching for Robust Speech Recognition,” IEEE Trans., ASSP, 1995, the disclosure of which is incorporated by reference herein.
Other techniques to implement “feature space transformation” also exist, for example, see L. Neumeyer and M. Weintraub, “Probabilistic Optimum Filtering for Robust Speech Recognition,” ICASSP, 1994, pp. 417-420; and F. H. Liu, A. Acero and R. M. Stern, “Efficient Joint Compensation of Speech for the Effect of Additive Noise and Linear Filtering,” ICASSP, 1992, the disclosures of which are incorporated by reference herein. These techniques do not require a first pass decoding, but they do have the computational overhead of vector quantizing the acoustic space, and finding the center that is closest to each test feature vector.
SUMMARY OF THE INVENTION
The present invention provides rapid, computationally inexpensive, nonlinear transformation methods and apparatus for adaptation of speech recognition systems to new acoustic conditions. The methodologies of the present invention may be considered as falling under the category of “feature space transformation” techniques. Such inventive techniques have the advantage of being computationally much less inexpensive than the conventional techniques described above as the techniques of the invention do not require a first pass decoding or a vector quantization computation.
Generally, the invention provides equalization via cumulative distribution function matching between training acoustic data and test acoustic data. The acoustic data is preferably in the form of cepstral vectors, although spectral vectors or even raw speech samples may be used. The present invention represents a more powerful and flexible transformation as the mapping of the test feature to the space of the training features is not constrained to be linear.
In an illustrative aspect of the invention, a method of adapting a speech recognition system to one or more acoustic conditions, the method comprising the steps of: (i) computing cumulative distribution functions based on dimensions of speech vectors associated with training speech data provided to the speech recognition system; (ii) computing cumulative distribution functions based on dimensions of speech vectors associated with test speech data provided to the speech recognition system; (iii) computing a nonlinear transformation mapping based on the cumulative distribution functions associated with the training speech data and the cumulative distribution functions associated with the test speech data; and (iv) applying the nonlinear transformation mapping to speech vectors associated with the test speech data prior to recognition, wherein the speech vectors transformed in accordance with the nonlinear transformation mapping are substantially similar to speech vectors associated with the training speech data.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
REFERENCES:
patent: 4897878 (1990-01-01), Boll et al.
patent: 5548647 (1996-08-01), Naik et al.
patent: 5864810 (1999-01-01), Digalakis et al.
patent: 6026359 (2000-02-01), Yamaguchi et al.
patent: 6151574 (2000-11-01), Lee et al.
patent: 6230125 (2001-05-01), Vainio
patent: 6263334 (2001-07-01), Fayyad et al.
Bellegarda et al (“Robust Speaker Adaptation using a Piecewise Linear Acoustic Mapping”, 1992 IEEE International Conferencc on Acoustics, Speech, and Signal Processing, Mar. 1992 pp. 445-448 vol. 1).*
Reynolds et al (“The Effects Of Telephone Transmission Degradations On Speaker Recognition Performance”, 1995 International Conference on Acoustics, Speech, and Signal Processing, May 1995).*
Yuk et al (“Telephone Speech Recognition Using Neural Networks And Hidden Markov Models”, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, Mar. 1999).*
A. Sankar, “A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 4, No. 3, pp. 190-202, 1996.
C.J. Leggetter et al., “Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models,” Computer Speech and Language, Academic Press Limited, pp. 173-185, 1995.
L. Neumeyer et al., “Probabilistic Optimum Filtering for Robust Speech Recognition,” IEEE, pp. I-417-I-420, 1994.
F-H. Liu et al., “Efficient Joint Compensation of Speech for the Effects of Additive Noise and Linear Filtering,” ICASSP, 4 pages, 1992.
Dharanipragada Satyanarayana
Padmanabhan Mukund
Dorvil Richemond
International Business Machines - Corporation
Nolan Daniel
Otterstedt Paul J.
Ryan & Mason & Lewis, LLP
LandOfFree
Method and apparatus for rapid adapt via cumulative... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for rapid adapt via cumulative..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for rapid adapt via cumulative... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2971136