Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-11-29
2003-02-25
Chawan, Vijay (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S256000, C704S255000, C704S246000, C704S243000, C704S244000
Reexamination Certificate
active
06526379
ABSTRACT:
BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to statistical model-based speech recognition systems. More particularly, the invention relates to a system and method for improving the accuracy of acoustic models used by the recognition system while at the same time controlling the number of parameters. The discriminative clustering technique allows robust recognizers of small size to be constructed for resource-limited applications such as in embedded systems and consumer products.
Much of the automatic speech recognition technology today relies upon Hidden Markov Model (HMM) representation of features extracted from digitally recorded speech. A Hidden Markov Model is represented by a set of states, a set of vectors defining transitions between certain pairs of states, probabilities that apply to state-to-state transitions and further sets of probabilities characterizing observed output symbols and initial conditions. Frequently the probabilities associated with the Hidden Markov Model are represented as Gaussians expressed by representing the mean and variance as floating point numbers.
Hidden Markov Models can become quite complex, particularly as the number of states representing each speech unit is increased and as more complex Gaussian mixture density components are used. Complexity is further compounded by the need to have additional sets of models to support context-dependent recognition. For example, to support context-dependent recognition in a recognizer that models phonemes, different sets of Gaussians are typically required to represent the different allophones of each phoneme.
The above complexity carries a price. Recognizers with more sophisticated, and hence more robust, models typically require a large amount of memory and processing power. This places a heavy burden on embedded systems and speech-enabled consumer products, because these typically do not have much memory or processing power to spare. What is needed, therefore, is a technique for reducing the number of Gaussians needed to represent speech, while retaining as much accuracy as possible. For the design of memory-restricted embedded systems and computer products, the most useful solution would give the system designer control over the total number of parameters used.
The present invention provides a technique for improving modeling power while reducing the number of parameters. In its preferred embodiment, the technique takes a bottom-up approach for defining clusters of Gaussians that are sufficiently close to one another to warrant being merged. In its preferred form, the technique begins with as many clusters as Gaussians used to represent the states of the Hidden Markov Models. Clusters are then agglomerated, in tree fashion, to minimize the dispersion inside the cluster and to maximize the separation between clusters. The agglomerative process proceeds until the desired number of clusters is reached. The system designer may specify the desired number based on memory footprint and processing architecture. A Lloyd-Max clustering algorithm is then performed to move Gaussians from one cluster to another in order to further decrease the dispersion within clusters.
Unlike conventional systems that tend to merely average Gaussian mean and variance values together, the method of the present invention employs a powerful set of equations that provides the parameters representative of each cluster (e.g. centroid), so that the Bhattacharyya distance is minimized inside the cluster. This provides a far better way of estimating the parameters representative of the cluster, because it is consistent with the metric used to associate the Gaussians to the cluster itself. In the preferred implementation, the Bhattacharyya distance is minimized through an iterative procedure that we call the minimum mean Bhattacharyya center algorithm.
For a more complete understanding of the invention, its objects and advantages, refer to the following specification and to the accompanying drawings.
REFERENCES:
patent: 4718088 (1988-01-01), Baker et al.
patent: 4817156 (1989-03-01), Bahl et al.
patent: 4829577 (1989-05-01), Kuroda et al.
patent: 4903035 (1990-02-01), Kropielnicki et al.
patent: 5046099 (1991-09-01), Nishimura
patent: 5050215 (1991-09-01), Nishimura
patent: 5127055 (1992-06-01), Larkey
patent: 5150449 (1992-09-01), Yoshida et al.
patent: 5170432 (1992-12-01), Hackbarth et al.
patent: 5233681 (1993-08-01), Bahl et al.
patent: 5280562 (1994-01-01), Bahl et al.
patent: 5293584 (1994-03-01), Brown et al.
patent: 5375173 (1994-12-01), Sanada et al.
patent: 5473728 (1995-12-01), Luginbuhl et al.
patent: 5522011 (1996-05-01), Epstein et al.
patent: 5579436 (1996-11-01), Chou et al.
patent: 5617486 (1997-04-01), Chow et al.
patent: 5651094 (1997-07-01), Takagi et al.
patent: 5664059 (1997-09-01), Zhao
patent: 5737723 (1998-04-01), Riley et al.
patent: 5778342 (1998-07-01), Erell et al.
patent: 5787394 (1998-07-01), Bahl et al.
patent: 5793891 (1998-08-01), Takahashi et al.
patent: 5794192 (1998-08-01), Zhao
patent: 5806029 (1998-09-01), Buhrke et al.
patent: 5812975 (1998-09-01), Komori et al.
patent: 5825978 (1998-10-01), Digalakis et al.
patent: 5839105 (1998-11-01), Ostendorf et al.
patent: 5842163 (1998-11-01), Weintraub
patent: 5864810 (1999-01-01), Digalakis
patent: 5890114 (1999-03-01), Yi
patent: 5895447 (1999-04-01), Ittycheriah et al.
patent: 5912989 (1999-06-01), Watanabe
patent: 5983178 (1999-11-01), Naito et al.
patent: 6073096 (2000-06-01), Gao et al.
patent: 6108628 (2000-08-01), Komori et al.
patent: 6223159 (2001-04-01), Ishii
patent: 6263309 (2001-07-01), Nguyen et al.
patent: 6336108 (2002-01-01), Thiesson et al.
patent: 6343267 (2002-01-01), Kuhn et al.
Abrash et al., (“Acoustic adaptation using nonlinear transformations of HMM parameters”, ICASSP-96, Conference proceedings., 1996 IEEE Conference on Acoustics, Speech, and Signal processing, 1996, vol.2, pp. 729-732, May 1996).*
V. Digalakis, et al., Rapid speech recognizer adaptation to new speakers, Tech. Univ. of Crete, Chania, Greece, pp. 765-768, vol. 2, Mar. 1999.
S.J. Cox, et al., Simultaneous speaker normalisation and utterance labelling using Bayesian
eural net techniques, British Telecom Res. Lab., Ipswich, UK, pp. 161-164, vol. 1, Apr. 1990.
Yunxin Zhao, An acoustic-phonetic-based speaker adaptation technique for improving speaker-independent continuous speech recognition, Speech technol. Lab., Panasonic Technol. Inc., Santa Barbara, CA, USA, pp. 380-394, vol. 2, Jul. 1994.
V. Abrash et al., Acoustic adaptation using nonlinear transformations of HMM parameters, Speech Res. & Technol. Lab., SRI Int., Menlo Park, CA, USA, pp. 729-732, vol. 2, May 1996.
R. Kuhn, et al., Eigenfaces and eigenvoices: dimensionally reduction for specialized pattern recognition, Panasonic Technol.-STL, Santa Barbara, CA, USA, pp. 71-76, Dec. 1998.
J.-L. Gauvain, et al., Improved acoustic modeling with Bayesian learning, AT&T Bell Labs., Murray Hill, NJ, USA, pp. 481-484, vol. 1, Mar. 1992.
Ming-Whei Feng, Speaker Adaptation Based on Spectral Normalization and Dynamic HMM Parameter Adaptation, GTE Laboratories Inc., IEEE, 1995, pp. 704-707.
J. McDonough, et al., Speaker-adapted training on the Switchboard Corpus, BBN Syst. & Technols., Cambridge, MA, USA, pp. 1059-1062, vol. 2, Apr. 1997.
Brian Mak, et al., Phone Clustering Using the Bhattacharyya Distance, Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology.
W. Chou, et al., Segmental GPD Training of HMM Based Speech Recognizer, AT&T Bell Laboratories, IEEE, Sep. 1992, pp. I-473-I-476.
Alejandro Acero, et al., Speaker and Gender Normalization for Continuous-Density Hidden Markov Models, Microsoft Corporation, IEEE, Mar. 1996, pp. 342-345.
Ariane Lazarides, et al., Improving Decision Trees for Acoustic Modeling, Locus Speech Corporation, pp. 1053-1056.
Roland Kuhn, et al., Improved Decision Trees for Phonetic Modeling, Centre de recherche informatique de Montreal, IEEE, May 1995, pp. 552-555.
Yunxin Zhao, Overcoming Speaker Variability in A
Junqua Jean-Claude
Rigazio Luca
Tsakam Brice
Chawan Vijay
Harness Dickey & Pierce PLC
LandOfFree
Discriminative clustering methods for automatic speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Discriminative clustering methods for automatic speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Discriminative clustering methods for automatic speech... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3165012