Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Reexamination Certificate
2005-12-20
2010-06-08
Hudspeth, David R (Department: 2626)
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
C704S010000, C704S243000, C707S793000
Reexamination Certificate
active
07734460
ABSTRACT:
A time-asynchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, nodes and links in the lattices developed from the model are expanded via look-ahead. Heuristics as utilized by a search algorithm are estimated. Additionally, pruning strategies can be applied to speed up the search.
REFERENCES:
patent: 6501833 (2002-12-01), Phillips et al.
patent: 7403941 (2008-07-01), Bedworth et al.
patent: 2005/0246172 (2005-11-01), Huang
patent: 2006/0074676 (2006-04-01), Deng
patent: 2006/0100862 (2006-05-01), Deng
patent: 2006/0200351 (2006-09-01), Acero
Seide, F. et al., “Coarticulation Modeling by Embedding a Target-Directed Hidden Trajectory Model into HMM -Map Decoding and Evaluation,” Proc. ICASSP, 2003, pp. 748-751.
Confidence based lattice segmentation and minimum Bayes-risk decoding—Goel, Kumar, et al.—Eurospeech 2001.
Aubert 1995: X.L. Aubert, “An overview of decoding techniques for large vocabulary continuous speech recognition,” Computer Speech and Language, vol. 16, pp. 89-114, 2002.
Bilmes 2004: J. Bilmes. “Graphical models and automatic speech recognition,” in M. Johnson, M. Ostendorf, S. Khudanpur, and R. Rosenfeld (eds.) in Mathematical Foundations of Speech and Language Processing, Springer-Verlag, New York, 2004, pp. 135-186.
Bridle et al. 1998: J. Bridle, L. Deng, J. Picone J. et al. “An Investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition,” Final Report for the 1998 Workshop on Language Engineering, Center for Language and Speech Processing at Johns Hopkins University, 1998, pp. 1-61.
Buzen 1973: J. Buzen, “Computational algorithms for closed queueing networks with exponential servers,” Communications of the ACM, vol. 16, No. 9, 1973, pp. 527-531.
Chelba and Jelinek 2000: C. Chelba and F. Jelinek. “Structured language modeling,” Computer Speech and Language, Oct. 2000, pp. 283-332.
Deng and Braam 1994: L. Deng and D. Braam, “Context-dependent Markov model structured by locus equations: Applications to phonetic classification,” J. Acoust. Soc. Am., vol. 96.
Deng 1998: L. Deng. “A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition,” Speech Communication, vol. 24, No. 4, 1998 pp. 299-323.
Deng 2004: L. Deng. “Switching dynamic system models for speech articulation and acoustics,” in M. Johnson, M. Ostendorf, S. Khudanpur, and R. Rosenfeld (eds.) in Mathematical Foundations of Speech and Language Processing, Springer-Verlag, New York, 2004, pp. 115-134.
Deng et al. 2004a: L. Deng, L. Lee, H. Attias, and A. Acero. “A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances,” IEEE Proc. ICASSP, May 2004, vol. I.
Deng et al. 2004b: L. Deng, A. Acero, and I. Bazzi. “Tracking vocal tract resonances using a quantized nonlinear function embedded in a temporal constraint,” IEEE Trans. Speech & Audio Processing, accepted 2004.
Deng et al. 2004c: L. Deng, D. Yu, and A. Acero, “A quantitative model for formant dynamics and contextually assimilated reduction in fluent speech”, ICSLP 2004, Jeju, Korea, 2004.
Deng et al. 2004d: L. Deng, D. Yu, and A. Acero, “A bi-directional target-filtering model of speech coarticulation and reduction: Two-stage implementation for phonetic recognition”, IEEE Trans. Speech & Audio Processing, accepted 2004.
Deng et al. 2004e: L. Deng, D. Yu, A. Acero, “A quantitative model for formant dynamics & contextually assimilated reduction in fluent speech,” Proc. ICSLP, pp. 719-722, Jeju, Korea, 2004.
Deng et al. 2005a: L. Deng, X. Li, D. Yu, & A. Acero, “A Hidden Trajectory Model with Bi-directional Target-Filtering: Cascaded vs. Integrated Implementation for Phonetic Recognition” Proc. ICASSP, pp. 337-340, 2005, Philadelphia, PA, USA.
Deng et al. 2005c: L. Deng, D. Yu, and A. Acero “Learning statistically characterized resonance targets in a hidden trajectory model of speech coarticulation and reduction,” in Proc. Interspeech 2005, Lisbon, Sep. 2005, pp. 1097-1100.
Deng et al. 2005d: L. Deng, D. Yu, and A. Acero. “A long-contextual-span model of resonance dynamics for speech recognition: parameter learning and recognizer evaluation”, ASRU 2005 (to appear).
Eide and Gish 1996: E. Bide and H. Gish, “A parametric approach to vocal tract length normalization,” IEEE Proc. ICASSP, pp. 346-348, 1996.
Gao et al. 2000: Y. Gao,R. Bakis, J. Huang, and B. Zhang, “Multistage coarticulation model combining articulatory, formant and cepstral features”.
Glass 2003: J. Glass. “A probabilistic framework for segment-based speech recognition,” in Computer Speech and Language, vol. 17, 2003, pp. 137-152.
Goel and Byrne 1999: V. Goel and W. Byrne, “Task dependent loss functions in speech recognition: A-star search over recognition lattices”, In Proc. EUROSPEECH 99, 1999.
Kamm et al. 1995: T. Kamm, G. Andreou, and J. Cohen, “Vocal tract normalization in speech recognition: Compensating for systematic speaker variability,”Proc. of the 15th Annual Speech Research Symposium, pp. 161-167, CLSP, Johns Hopkins University, Baltimore, MD, Jun. 1995.
Hamme and Van Aelten 1996: H. Van Hamme and F. Van Aelten, “An adaptive-beam pruning technique for continuous speech recognition,” in Proceedings of ICSLP, 1996, pp. 2083-2086.
Holmes and Russel 1999: W. Holmes and M. Russell. “Probabilistic-trajectory segmental HMMs,” Computer Speech and Language, vol. 13, 1999, pp. 3-37.
Klatt 1980: D. Klatt. “Software for a cascade/parallel formant synthesizer,” J. Acoust. Soc. Am., vol. 99, No. 3, 1980, pp. 971-995.
Lee and Rose 1998: L. Lee and R. Rose, “A frequency warping approach to speaker normalization,” IEEE Trans. Speech & Audio Processing, vol. 6, pp. 49-60, Jan. 1998.
Little 1961: J. Little, “A proof for the queuing formula:L= λW,” Operations Research, vol. 9, No. 3, May-Jun. 1961, pp. 383-387.
Ma and Deng 2003: J. Ma and L. Deng. “Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model for vocal-tract-resonance dynamics,” IEEE Trans. Speech & Audio Proc., vol. 11, 2003, pp. 590-602.
McDonough et al. 1998: J. McDonough, W. Byrne, and X. Luo, “Speaker normalization with all-pass transforms,” Proc. ICSLP, vol. 6, pp. 2307-2310, 1998.
Naito et al. 2002: M. Naito, L. Deng, and Y. Sagisaka. “Speaker clustering for speech recognition using vocal-tract parameters,” Speech Communication, vol. 36, No. 3-4, Mar. 2002, pp. 305-315.
Odel 1995: J. J. Odell, “The use of context in large vocabulary speech recognition,” Ph.D. dissertation, Queens' college, 1995.
Ortmanns and Ney 2000: S. Ortmanns and H. Ney, “Look-ahead techniques for fast beam search,” Computer Speech and Language, vol. 14, pp. 15-32, 2000.
Ostendorf et al. 1996: M. Ostendorf, V. Digalakis, and J. Rohlicek. “From HMMs to segment models: A unified view of stochastic modeling for speech recognition” IEEE Trans. Speech & Audio Proc., vol. 4, 1996, pp. 360-378.
Pyen and Woodland 1997: D. Pye and P.C. Woodland, “Experiments in speaker normalisation and adaptation for large vocabulary speech recognition”, IEEE Proc. ICASSP, pp. 1047-1050, 1997.
Rose et al. 1996: R. Rose, J. Schroeter, and M. Sondhi. “The potential role of speech production models in automatic speech recognition,” J. Acoust. Soc. Am., vol. 99, 1996, pp. 1699-1709.
Russell and Norvig 1995: S. Russell, and P. Norvig, “Artificial Intelligence: A Modern Approach”, Prentice Hall, Engle
Acero Alejandro
Deng Li
Yu Dong
Hudspeth David R
Kelly Joseph R.
Microsoft Corporation
Rider Justin W
Westman Champlin & Kelly P.A.
LandOfFree
Time asynchronous decoding for long-span trajectory model does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Time asynchronous decoding for long-span trajectory model, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Time asynchronous decoding for long-span trajectory model will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4202984