Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
2007-07-31
2007-07-31
Armstrong, Angela (Department: 2626)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
Reexamination Certificate
active
10601350
ABSTRACT:
Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance thereof includes the steps/operations of: (i) selecting between an acoustic-only data model and an acoustic-visual data model based on a condition associated with a visual environment; and (ii) decoding at least a portion of an input spoken utterance using the selected data model. Advantageously, during periods of degraded visual conditions, the audio-visual speech recognition system is able to decode (recognize) input speech data using audio-only data, thus avoiding recognition inaccuracies that may result from performing speech recognition based on acoustic-visual data models and degraded visual data.
REFERENCES:
patent: 6219640 (2001-04-01), Basu et al.
patent: 6594629 (2003-07-01), Basu et al.
patent: 2002/0116197 (2002-08-01), Erlen
patent: 2003/0018475 (2003-01-01), Basu et al.
patent: 2003/0177005 (2003-09-01), Masai et al.
Garg et al, “Frame-dependent multi-stream reliability indicators for audio-visual speech recognition,” Proceedings of International Conference on Acoustics, Speech and Signal Processing, ICASSP 2003, vol. 1, Apr. 2003, pp. 24-27.
G. Potamianos et al., “Hierarchical Discriminant Features for Audio-Visual LVCSR,” Proceedings of ICASSP 2001, 4 pages.
C.R. Rao, “Linear Statistical Inference and Its Applications,” John Wiley and Sons, New York, pp. 476-490, 1965.
J.R. Deller, Jr. et al., “Discrete-Time Processing of Speech Signals,” Prentice Hall, pp. 677-699 and 745-759, 1987.
A. Gersho et al., “Vector Quantization and Signal Compression,” Kluwer Academic Publishers, pp. 309-337 and 345-367, 1992.
A. Papoulis, “Probability, Random Variables, and Stochastic Process,” McGraw-Hill Publishing Company, pp. 30, 46-48 and 73-74, 1984.
H.L. van Trees, “Detection, Estimation, and Modulation Theory,” Part I, John Wiley & Sons, pp. 23-46, 1968.
A. Garg et al., “Frame-Dependent Multi-Stream Reliability Indicators for Audio-Visual Speech Recognition,” Proceedings of the International Conference on Acoustics, Speech and Signal Processing, (ICASSP). vol. I, pp. 24-27, Apr. 2003.
A. Rogozan et al., “Adaptive Fusion of Acoustic and Visual Sources for Automatic Speech Recognition,” Speech Communication 26, Elsevier Science Publishers, vol. 26, No. 1-2, pp. 149-161, Oct. 1998.
J.H. Connell et al., “A Real-Time Prototype for Small-Vocabulary Audio-Visual ASR,” Proceedings of the International Conference on Multimedia and Expo (ICME), vol. II, pp. 469-472, Jul. 2003.
Connell Jonathan H.
Haas Norman
Marcheret Etienne
Neti Chalapathy Venkata
Potamianos Gerasimos
Armstrong Angela
Ryan & Mason & Lewis, LLP
LandOfFree
Audio-only backoff in audio-visual speech recognition system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Audio-only backoff in audio-visual speech recognition system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Audio-only backoff in audio-visual speech recognition system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3765747