Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2005-03-01
2008-08-05
McFadden, Susan (Department: 2626)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
Reexamination Certificate
active
07409346
ABSTRACT:
A structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence. Bi-directional temporal filtering with finite impulse response (FIR) is applied to the segmental target sequence as the FIR filter's input. At the second stage the dynamics of speech cepstra are predicted analytically based on the FIR filtered VTR targets. The combined system of these two stages thus generates correlated and causally related VTR and cepstral dynamics where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. The combined system also gives the acoustic observation probability given a phone sequence. Using this probability, different phone sequences can be compared and ranked in terms of their respective probability values. This then permits the use of the model for phonetic recognition.
REFERENCES:
patent: 6618699 (2003-09-01), Lee et al.
patent: 7050975 (2006-05-01), Deng et al.
patent: 7117148 (2006-10-01), Droppo et al.
patent: 7206741 (2007-04-01), Deng et al.
Bilmes, J., “Graphical Models and Automatic Speech Recognition”, Mathematical Foundations of Speech and Language Processing, Springer-Verlag New York, Inc., pp. 191-245, 2004.
Bridle et al., J. S., “An Investigation of Segmental Hidden Dynamic Models of Speech Coarticulation for Automatic Speech Recognition”, Report of a Project at the 1998 Workshop on Language Engineering, Center for Language and Speech Processing at John Hopkins University, pp. 1-61, 1998.S.
Chelba et al., C., “Structured language modeling”, Computer Speech and Language, vol. 14, pp. 283-332, Oct. 2000.
Deng, L., “A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal”, Signal Processing, vol. 27, pp. 65-78, 1992.
Deng, L., “A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition”, Speech Communication, vol. 24, No. 4, pp. 299-323, Jul. 1998.
Deng, L., “Computational Models for Speech Production”, Computational Models of Speech Pattern Processing, Springer-Verlag Berlin Heidelberg, pp. 199-213, 1998.
Deng, L., “Switching Dynamic System Models for Speech Articulation and Acoustics”, Mathematical Foundations of Speech and Language Processing, Springer-Verlag New York, Inc., pp. 115-134, 2004.
Deng et al., L., “Context-dependent Markov model structured by locus equations: Applications to phonetic classification”, The Journal of the Acoustical Society of America, vol. 96, No. 4, pp. 2008-2025, Oct. 1994.
Deng et al., L., “A Structured Speech Model with Continuous Hidden Dynamics and Prediction-Residual Training for Tracking Vocal Tract Resonances”, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 557-560, May 2004.
Deng et al., L., “Tracking Vocal Tract Resonances Using a Quantized Nonlinear Function Embedded in a Temporal Constraint”, IEEE Transactions on Speech and Audio Processing, Mar. 2004.
Ficus, J. G., “A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction”, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings, pp. 347-354, 1997.
Gao et al., Y., “Multistage Coarticulation Model Combining Articulatory, Formant and Cepstral Features”, Proceedings of the ICSLP, vol. 1, pp. 25-28, 2000.
Gay, T. “Effect of speaking rate on vowel formant movements”, The Journal of the Acoustical Society of America, vol. 63, No. 1, pp. 223-230, Jan. 1978.
Siu et al., M., “Parametric Trajectory Mixtures for LVCSR”, 5thInternational Conference on Spoken Language Processing, Sydney, Australia, pp. 3269-3272, 1998.
Hertz, S. R., “Streams, phones and transitions: toward a new phonological and phonetic model of formant timing”, Journal of Phonetics, vol. 19, pp. 91-109, 1991.
Lindblom, B., “Spectrographic Study of Vowel Reduction”, Journal of the Acoustical Society of America, vol. 35, No. 11, pp. 1773-1781, Nov. 1963.
Holmes et al., W.J., “Probabilistic-trajectory segmental HMMs”, Computer Speech and Language, vol. 13, pp. 3-37, 1999.
Klatt, D. H., “Software for a cascade/parallel formant synthesizer”, Journal of the Acoustical Society of America, vol. 67, No. 3, pp. 971-995, Mar. 1980.
Lindblom, B., “Explaining Phonetic Variation: A Sketch of the H & H Theory”, Speech Production and Speech Modelling, Kluwer Academic Publishers, pp. 403-439, 1990.
Ma et al., J. Z., “Efficient Decoding Strategies for Conversational Speech Recognition Using a Constrained Nonlinear State-Space Model”, IEEE Transactions on Speech and Audio Processing, vol. 11, No. 6, pp. 590-602, Nov. 2003.
Moon et al., S., “Interaction between duration, context, and speaking style in English stressed vowels”, Journal of the Acoustical Scoeity of America, vol. 96, No. 1, pp. 39-55, Jul. 1994.
Ostendorf et al., M., “From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition”, IEEE Transactions on Speech and Audio Processing, vol. 4, No. 5, pp. 360-378, Sep. 1996.
Pitermann et al., M., “Effect of speaking rate and constrastive stress on formant dynamics and vowel perception”, Journal of the Acoustical Society of America, vol. 107, No. 6, pp. 3425-3437, Jun. 2000.
POls, L C. W., “Psycho-acoustics and Speech Perception”, Computational Models of Speech Pattern Processing, Springer-Verlag Berlin Heidelberg, pp. 10-17, 1999.
Rose et al., R. C., “The potential role of speech production models in automatic speech recognition”, Journal of the Acoustical Society of America, vol. 99, No. 3, pp. 1699-1709, Mar. 1996.
Stevens, K. N., “On the quantal nature of speech”, Journal of Phonetics, vol. 17, 1989.
Sun et al., J., “An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition”, Journal of the Acoustical Society of America, vol. III, No. 2, pp. 1086-1101, Feb. 2002.
van Bergem, D. R., “Acoustic vowel reduction as a function of sentence accent, word stress, and word class”, Speech Communication, vol. 12, pp. 1-23 1993.
Wang et al., W., “The Use of a Linguistically Motivated Language Model in Conversational Speech Recognition”, 2004 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 261-264, May 2004.
Wouters et al., J., “Control of Spectral Dynamics in Concatenative Speech Synthesis”, IEEE Transactions on Speech and Audio Processing, vol. 9, No. 1, pp. 30-38, Jan. 2001.
Zhou et al., J., “Coatriculation Modeling by Embedding a Target-Directed Hidden Trajectory Model into HMM—Model and Training”, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 744-747, Apr. 2003.
Kamm et al., T., “Vocal tract normalization in speech recognition: Compensating for systematic speaker variability”, The Journal of the Acoustical Society of America, vol. 97, No. 5, Pt. 2, pp. 3246-3247, May 1995.
Wegmann et al., S., “Speaker Normalization on Conversational Telephone Speech”, IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, pp. 339-341, May 1996.
Ma et al., J., “A mixed-level switching dynamic system for continuous speech recognition”, Computer Speech and Language, vol. 18, pp. 49-65, 2004.
Zweig, G., “Bayesian network structures and inference techniques for automatic speech recogntion”, Computer Speech and Language, vol. 17, pp. 173-193, 2003.
Bilmes, J.
Acero Alejandro
Deng Li
Yu Dong
Magee Theodore M.
McFadden Susan
Microsoft Corporation
Westman Champlin & Kelly P.A.
LandOfFree
Two-stage implementation for phonetic recognition using a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Two-stage implementation for phonetic recognition using a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Two-stage implementation for phonetic recognition using a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4010907