Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-01-21
2001-11-13
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S257000
Reexamination Certificate
active
06317712
ABSTRACT:
FIELD OF INVENTION
This invention relates to phonetic modeling of speech and more particularly to phonetic modeling using acoustic decision trees.
BACKGROUND OF INVENTION
Although there are very few phones in a language, modeling those few phones is not sufficient for speech recognition purpose. The coarticulation effect makes the acoustic realization of the same phone in different context very different. For example, English has about 40 to 50 phones, Spanish has a little more than 20 phones. Training only 50 phonetic models for English is not sufficient to cover all the coarticulation effects. Context-dependent models are considered for the speech recognition purpose because of this reason. Context-dependent phonetic modeling has now become standard practice to model variations seen in the acoustics of a phone caused by phonetic context. However, if only immediate contexts are considered, there are 50
30
=125,000 models to be trained, this large number of models defeats the motivation of using phonetic models in the first place. Fortunately, some contexts will result in large acoustic difference, some will not. Therefore, the phonetic models can be clustered to not just reduce the number of models but also increase the training robustness.
The art of figuring out how to cluster phonetic models is one of the core research areas in the speech community for large vocabulary speech recognition. The clustering algorithm needs to achieve the following three goals: 1) maintaining the high acoustic resolution while achieving the most clustering, 2) all the clustered units can be well trainable with the available speech data and 3) being able to predict unseen contexts with the clustered models. Decision tree clustering using phonological rules has been shown to achieve the above objectives. See for example D. B. Paul, “Extensions to Phone-state Decision-tree Clustering: Single Tree and Tagged Clustering,” Proc. ICASSP 97, Munich, Germany, April 1997.
Previously, applicant reported on FeaturePhones, a phonetic context clustering method which defines context in articulatory features, and clusters the context at the phone level using decision trees. See Y. H. Kao et al. “Toward Vocabulary Independent Telephone Speech Recognition,” ICASSP 1994, Vol. 1, pgs. 117-120 and K. Kondo et al. “Clustered Interphase or Word Context-Dependent Models for Continuously Read Japanese,” Journal of Acoustical Society of Japan, Vol. 16, No. 5, pgs. 299-310, 1995. This proved to be an efficient clustering method when the training data was scarce, but was too restrictive to take advantage of significantly more training data.
SUMMARY OF INVENTION
In accordance with one embodiment of the present invention, a method of phonetic modeling that applies a decision tree algorithm to an acoustic level by the steps of training baseform monophone models, training all triphone models present in the training corpus, with monophone as seeds for each center phone, splitting the root node into two descendant nodes, repeating the splitting procedure on all leaf and clustering the leaves of tree or averaging the models in the cluster to obtain seed models for each cluster.
REFERENCES:
patent: 5388183 (1995-02-01), Lynch
patent: 5745649 (1998-04-01), Lubensky
patent: 5794197 (1998-08-01), Alleva et al.
patent: 5812975 (1998-09-01), Komori et al.
patent: 6006186 (1999-12-01), Chen et al.
ICASSP-93. Alleva et al., “Predicting unseen triphones with senones” PP 311-314, vol. 2. Apr. 1993.*
ICSLP 96. International Conference on Spoken Language, 1996. Aubert et al., “A bottom-up approach for handling unseen triphones in vocabulary continuous speech recognition” PP 14-17 vol. 1. Oct. 199.
Kao Yu-Hung
Kondo Kazuhiro
Dorvil Richemond
Telecky Jr Frederick J.
Texas Instruments Incorporated
Troike Robert L.
LandOfFree
Method of phonetic modeling using acoustic decision tree does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of phonetic modeling using acoustic decision tree, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of phonetic modeling using acoustic decision tree will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2611810