Patent
1995-01-23
1998-02-03
Zele, Krista M.
395 254, 395 258, 395 259, 395 253, G10L 900
Patent
active
057153671
ABSTRACT:
A computerized system time aligns frames of spoken training data against models of the speech sounds; automatically selects different sets of phonetic context classifications which divide the speech sound models into speech sound groups aligned against acoustically similar frames; creates model components from the frames aligned against speech sound groups with related classifications; and uses these model components to build a separate model for each related speech sound group. A decision tree classifies speech sounds into such groups, and related speech sound groups descend from common tree nodes. New speech samples time aligned against a given speech sound group's model update models of related speech sound groups, decreasing the training data required to adapt the system. The phonetic context classifications can be based on knowledge of which contextual features are associated with acoustic similarity. The computerized system samples speech sounds using a first, larger, parameter set; automatically selects combinations of phonetic context classifications which divide the speech sounds into groups whose frames are acoustically similar, such as by use of a decision tree; selects a second, smaller, set of parameters based on that set's ability to separate the frames aligned with each speech sound group, such as by used of linear discriminant analysis; and then uses these new parameters to represent frames and speech sound models. Then, using the new parameters, a decision tree classifier can be used to re-classify the speech sounds and to calculate new acoustic models for the resulting groups of speech sounds.
REFERENCES:
patent: 5497447 (1996-03-01), Bahl et al.
Hwang, and Huang, Shared-Distribution Hidden Markov Models for Speech Recognition. IEEE Transactions on Speech and Audio Processing, vol. 1 (1993).
Bahl, deSouza, Gopalakrishnan, Nahamoa, and Picheny, Decision Trees for Phonological Rules in Cotinuous Speech, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1991.
Hwang, Huang, and Alleva, Predicting Unseen Triphones With Senones, in IEEE International Conference on Acoustics, Speech, and Signal Processing, 1993.
Young, Odell, and Woodland, Tree-Based State Tying for High Accuracy Modelling, ARPA Human Language Technology Workshop, Mar. 8-11, 1994.
Woodland, Leggetter, Odell, Valtchev and Young, The Development of the 1994 HTK Large Vocabulary Speech Recognition System, ARPA Spoken Language System Technology Workshop, Jan. 22-25, 1995.
Bahl, Balakrishnan-Aiyer, Franz, Gopalakrishnan, Gopinath, Novak, Padmanabhan, and Roukos, Performance of the IBM Large Vocabulary Continuous Speech Recognition System On the ARPA NAB News Task, ARPA Spoken Language System Technology Workshop, Jan. 22-25, 1995.
Nguyen, Makhoul, Schwartz, Kubata, LaPre, Yuan, Zhao, Anastasakos, and Zavaliagkos, (draft) The 1994 BBN/BYBLOS Speech Recognition, ARPA Spoken Language System Technology Workshop, Jan. 22-25, 1995.
Gillick Laurence S.
Scattone Francesco
Dragon Systems, Inc.
Hunter Daniel S.
Porter Edward W.
Zele Krista M.
LandOfFree
Apparatuses and methods for developing and using models for spee does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatuses and methods for developing and using models for spee, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatuses and methods for developing and using models for spee will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-670080