Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-06-01
2003-08-19
Knepper, David D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S243000, C382S190000
Reexamination Certificate
active
06609093
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to pattern recognition systems and, more particularly, to methods and apparatus for performing discriminant feature space analysis in pattern recognition systems such as, for example, speech recognition systems.
BACKGROUND OF THE INVENTION
State-of-the-art speech recognition systems use cepstral features augmented with dynamic information from the adjacent speech frames. The standard MFCC+&Dgr;+&Dgr;&Dgr; scheme (Mel-Frequency Cepstral Coefficients plus first and second derivatives, or delta and double delta), while performing relatively well in practice, has no real basis from a discriminant analysis point of view. The same argument applies for the computation of the cepstral coefficients from the spectral features: it is not clear that the discrete cosine transform, among all linear transformations, has the best discriminatory properties even if its use is motivated by orthogonality considerations.
Linear discriminant analysis (LDA) is a standard technique in statistical pattern classification for dimensionality reduction with a minimal loss in discrimination, see, e.g., R. O. Duda et al., “Pattern Classification and Scene Analysis,” Wiley, New York, 1973; and K. Fukunaga, “Introduction to Statistical Pattern Recognition,” Academic Press, New York, 1990, the disclosures of which are incorporated by reference herein. Its application to speech recognition has shown consistent gains for small vocabulary tasks and mixed results for large vocabulary applications, see, e.g., R. Haeb-Umbach et al., “Linear Discriminant Analysis for Improved Large Vocabulary Continuous Speech Recognition,” Proceedings of ICASSP '92, Volume 1, pp. 13-16, 1992; E. G. Schukat-Talamazzini et al., “Optimal Linear Feature Space Transformations for Semi-Continuous Hidden Markov Models,” Proceedings of ICASSP '95, pp. 369-372, 1994; and N. Kumar et al., “Heteroscedastic Discriminant Analysis and Reduced Rank HMMs for Improved Speech Recognition,” Speech Communication, 26:283-297, 1998, the disclosures of which are incorporated by reference herein.
One reason could be because of the diagonal modeling assumption that is imposed on the acoustic models in most systems: if the dimensions of the projected subspace are highly correlated then a diagonal covariance modeling constraint will result in distributions with large overlap and low sample likelihood. In this case, a maximum likelihood feature space transformation which aims at minimizing the loss in likelihood between full and diagonal covariance models is known to be very effective, see, e.g., R. A. Gopinath, “Maximum Likelihood Modeling with Gaussian Distributions for Classification,” Proceedings of ICASSP '98, Seattle, 1998; and M. J. F. Gales, “Semi-tied Covariance Matrices for Hidden Markov Models,” IEEE Transactions on Speech and Audio Processing,” 7:272-281, 1999, the disclosures of which are incorporated by reference herein.
Secondly, it is not clear what the best definition for the classes should be: phone, subphone, allophone or even prototype-level classes can be considered, see, e.g., R. Haeb-Umbach et al., “Linear Discriminant Analysis for Improved Large Vocabulary Continuous Speech Recognition,” Proceedings of ICASSP '92, Volume 1, pp. 13-16, 1992, the disclosure of which is incorporated by reference herein. Related to this argument, the class assignment procedure has an impact on the performance of LDA; EM-based (Expectation Maximization algorithm based) approaches which aim at jointly optimizing the feature space transformation and the model parameters have been proposed, see, e.g., the above-referenced E. G. Schukat-Talamazzini et al. article; the above-referenced N. Kumar et al. article; and the above-referenced M. J. F. Gales article.
Chronologically, the extension of LDA to Heteroscedastic Discriminant Analysis (HDA) under the maximum likelihood framework appears to have been proposed first by E. G. Schukat-Talamazzini in the above-referenced article (called maximum likelihood rotation). N. Kumar, in the above-referenced N. Kumar et al. article, studied the case for diagonal covariance modeling and general (not necessarily orthogonal) transformation matrices and made the connection with LDA. Following an argument of Campbell, in N. A. Campbell, “Canonical Variate Analysis—A General Model Formulation,” Australian Journal of Statistics, 26(1):86-96, 1984, the disclosure of which is incorporated by reference herein, N. Kumar showed that HDA is a maximum likelihood solution for normal populations with common covariances in the rejected subspace. In R. A. Gopinath, “Maximum Likelihood Modeling with Gaussian Distributions for Classification,” Proceedings of ICASSP '98, Seattle, 1998, the disclosure of which is incorporated by reference herein, a maximum likelihood linear transformation (MLLT) was introduced which turns out to be a particular case of Kumar's HDA when the dimensions of the original and the projected space are the same. Interestingly, M. J. F. Gales' global transform for semi-tied covariance matrices, in the above-referenced M. J. F. Gales article, is identical to MLLT but applied in the model space (all other cases are feature space transforms). Finally, Demuynck in K. Demuynck, et al., “Optimal Feature Sub-space Selection Based On Discriminant Analysis,” Proceedings of Eurospeech '99, Budapest, Hungary, 1999, the disclosure of which is incorporated by reference herein, uses a minimum divergence criterion between posterior class distributions in the original and transformed space to estimate an HDA matrix.
Thus, as suggested above, LDA is known to be inappropriate for the case of classes with unequal sample covariances. While, in recent years, there has been an interest in generalizing LDA to HDA by removing the equal within-class covariance constraint, as mentioned above, there have not been any substantially satisfactory approaches developed. One main reason for this is because existing approaches deal with objective functions related to the rejected dimensions which are irrelevant to the discrimination of the classes in the final projected space. Thus, a need exists for an improved HDA approach for use in pattern recognition systems.
SUMMARY OF THE INVENTION
The present invention provides a new approach to heteroscedastic linear analysis (HDA) by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Accordingly, in one aspect of the invention, a method for use in a pattern recognition system of processing feature vectors extracted from a pattern signal input to the system, comprises the following steps. First, a projection matrix is formed based on a heteroscedastic discriminant objective function which, when applied to the feature vectors extracted from the pattern signal, maximizes class discrimination in a resulting subspace associated with the feature vectors, while ignoring one or more rejected dimensions in the objective function. The projection matrix is then applied to the feature vectors extracted from the pattern signal to generate transformed feature vectors for further processing in the pattern recognition system. For example, further processing may comprise classifying the transformed features associated with the input pattern signal. It may also include filtering, re-ranking or sorting the output of the classification operation.
In addition, we present a link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained maximum likelihood (ML) projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume.
The present invention also provides that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (e.g., MLLT—maximum likelihood linear transformation) to the HDA space results in an increased classification accuracy.
In another embodiment
Gopinath Ramesh Ambat
Padmanabhan Mukund
Saon George Andrei
Dang Tax Ann
International Business Machines - Corporation
Knepper David D.
Ryan & Mason & Lewis, LLP
LandOfFree
Methods and apparatus for performing heteroscedastic... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for performing heteroscedastic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for performing heteroscedastic... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3073501