Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
2007-09-11
2007-09-11
Azad, Abul K. (Department: 2626)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S240000
Reexamination Certificate
active
10608988
ABSTRACT:
A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.
REFERENCES:
patent: 2003/0110038 (2003-06-01), Sharma et al.
patent: 2004/0088272 (2004-05-01), Jojic et al.
J.W. Fisher III, T. Darrell, W.T. Freeman, and P. Viola. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation. In Advances in Neural Information Processing Systems 13, MIT Press, Dec. 2000.
W.H. Sumby and Irwin Pollack. Visual Contribution to Speech Intelligibility in Noise. The Journal of the Acoustical Society of America. vol. 26, No. 2, pp. 212-215, Mar. 1954.
H. Attias, A. Acero, J.C. Platt, and L. Deng, Speech Denoising and Dereverberation using Probabalisitic Models, Microsoft Research, 2002, 7 pages.
M.J. Beal, H. Attias, and N. Jojic. Audio-video Sensor Fusion with Probabalistic Graphical Models, Microsoft Research, 2002. 15 pages.
V.R. De Sa and D. Ballard. Category Learning through Multi-Modality Sensing. In Neural Computation, 10(5), 1998. 24 pages.
Brendan Frey and Nebojsa Jojic. Estimating Mixture Models of Images and Inferring Spatial Transformations using the EM Algorithm, In Computer Vision and Pattern Recognition(CVPR), 1999, 7 pages.
J. Hershey and M. Casey, Audio-visual Sound Separation via Hidden Markov Models. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pp. 1173-1180, Cambridge, MA, 2002, MIT Press.
J. Hershey and J.R. Movellan, Audio Vision: Using Audio-visual Synchrony to Locate Sounds. In in Advances in Neural Information Processing Systems 12. S.A. Solla, T.K. Leen, and K.R. Muller(eds.), pp. 813-819, MIT Press, 2000.
Attias Hagai
Hershey John R.
Jojic Nebojsa
Kristjansson Trausti Thor
Amin Turocy & Calvin LLP
Azad Abul K.
Microsoft Corporation
LandOfFree
Speech detection and enhancement using audio/video fusion does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech detection and enhancement using audio/video fusion, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech detection and enhancement using audio/video fusion will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3729459