Trainable videorealistic speech animation

Education and demonstration – Language – Speech

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C434S156000, C434S169000, C434S30700R, C704S260000, C345S473000

Reexamination Certificate

active

10352319

ABSTRACT:
A method and apparatus for videorealistic, speech animation is disclosed. A human subject is recorded using a video camera as he/she utters a predetermined speech corpus. After processing the corpus automatically, a visual speech module is learned from the data that is capable of synthesizing the human subject's mouth uttering entirely novel utterances that were not recorded in the original video. The synthesized utterance is re-composited onto a background sequence which contains natural head and eye movement. The final output is videorealistic in the sense that it looks like a video camera recording of the subject. The two key components of this invention are 1) a multidimensional morphable model (MMM) to synthesize new, previously unseen mouth configurations from a small set of mouth image prototypes; and 2) a trajectory synthesis technique based on regularization, which is automatically trained from the recorded video corpus, and which is capable of synthesizing trajectories in MMM space corresponding to any desired utterance.

REFERENCES:
patent: 5111409 (1992-05-01), Gasper et al.
patent: 5613056 (1997-03-01), Gasper et al.
patent: 5903892 (1999-05-01), Hoffert et al.
patent: 6181351 (2001-01-01), Merrill et al.
patent: 6250928 (2001-06-01), Poggio et al.
patent: 6377925 (2002-04-01), Greene et al.
patent: 6539354 (2003-03-01), Sutton et al.
patent: 6564263 (2003-05-01), Bergman et al.
patent: 6593936 (2003-07-01), Huang et al.
patent: 6600502 (2003-07-01), Brewster, Jr.
patent: 6622171 (2003-09-01), Gupta et al.
patent: 6654018 (2003-11-01), Cosatto et al.
patent: 6735566 (2004-05-01), Brand
patent: 6735738 (2004-05-01), Kojima
patent: 6766299 (2004-07-01), Bellomo et al.
patent: 6813607 (2004-11-01), Faruquie et al.
patent: 6919892 (2005-07-01), Cheiky et al.
patent: 2002/0194006 (2002-12-01), Challapali
patent: 2003/0040916 (2003-02-01), Major
patent: 2004/0120554 (2004-06-01), Lin et al.
Cook, Gareth, “At MIT, they can put words in our mouths,”Boston Globe, May 15, 2002, http://cuneus.ai.mit.edu:8000/research/mary101
ews/bostonglobe-ezzat.shtml (downloaded Nov. 14, 2002), 3 pp.
Ezzat, T. et al., “Trainable Videorealistic Speech Animation,”Proc. of SIGGRAPH 2002, San Antonio, Texas (11 pp.).
Barron, J.L., et al., “Performance of Optical Flow Techniques,”Int. Jnl. of Computer Vision 12(1): 43-77 (1994). Also, http://www.csd.uwo.ca/faculty/barron/PAPERS/ijcv94.ps.
Beymer, D. and T. Poggio, “Image representations for visual learning,”Science 272, (1966) 31 pp. Also, http://www.ai.mit.edu/projects/cbcl/publications/ps/science-beymer.ps.
Black, M. et al., “Robustly Estimating Changes in Image Appearance,”Computer Vision and Image Understanding, Special Issue on Robust Statistical Techniques in Image Understanding, pp. 8-31 (2000). Also, http://www.cs.brown.edu/people/black/Papers/cviu.1999.0825.pdf.
Blanz, V. and T. Vetter, “A Morphable Model for the Synthesis of 3D Faces,” inProceedings of SIGGRAPH 2001, ACM Press/ACM SIGGRAPH, Los Angeles, 187-194 (1999). Also, http://www.mpi-sb.mpg.de/˜blanz/publications/morphmod2.pdf.
Brand, M. and A. Hertzmann, “Style Machines,” inProceedings of SIGGRAPH 2000, ACM Press/ACM SIGGRAPH, 183-192 (2000). Also http://www.merl.com/papers/docs/TR2000-14.pdf.
Brand, M., “Voice Puppetry,” draft of Dec. 7, 1998, final version inProceedings of SIGGRAPH 1999, ACM Press/ACM SIGGRAPH, Los Angeles, 21-28 (1999). Also, http://www.merl.com/papers/docs/TR99-20.pdf.
Burt, P.J. and E.H. Adelson, “The Laplacian Pyramid as a Compact Image Code,”IEEE Trans. on Communications, COM-31, 4 (Apr.) 532-540 (1983). Also, http://www-bcs.mit.edu/people/adelson/pub—pdfs/pyramid83.pdf.
Cootes, T.F. et al., “Active Appearance Models,” inProceedings of the European Conference on Computer Vision(1998) 16 pp.. Also, http://www.wiau.man.ac.uk/˜bim/Models/aam.ps.gz.
Cosatto, E. and H.P. Graf, “Sample-Based Synthesis of Photorealistic Talking Heads,” inProceedings of Computer Animation '98, pp. 103-110 (1998). Also, http://www.research.att.com/˜eric/papers/CA98.pdf.
Ezzat, T. and T. Poggio, “Visual Speech Synthesis by Morphing Visemes,”International Journal of Computer Vision38, 45-57 (2000). Also, http://cuneus.ai.mit.edu:8000/publications/AIM-1658.pdf.
Girosi, F. et al., “Priors, Stabilizers, and Basis Functions: From regularization to radial, tensor and additive splines,” Tech. Rep. 1430, MIT AI Lab, Jun. 1993, 28 pp. Also, ftp://publications.ai.mit.edu/ai-publications/1000-1499/AIM-1430.ps.Z.
Guenter, B. et al., “Making Faces,” inProceedings of SIGGRAPH 1998, ACM Press/ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM55-66 (1998). Also, http://research.microsoft.com/MSRSIGGRAPH/1998/pdf/makingfaces.pdf.
Huang, X. et al., “The SPHINX-99 Speech Recognition System: An Overview,”Computer Speech and Language 7(2):137-143 (1993).
Jones, M.J. and T. Poggio, “Multidimensional Morphable Models”, inProceedings of International Conference on Computer Vision(1998) 7 pp.. http://www.ai.mit.edu/projects/cbcl/publications/ps/ICCV98-matching2.ps.
Lee, S. et al., “Image Metamorphosis Using Snakes and Free-Form Deformations,” inProceedings of SIGGRAPH 1995, ACM Press/ACM SIGGRAPH, vol. 29 of Computer Graphics Proceedings, Annual Conference Series, ACM, pp. 439-448 (1995). Also, http://www.postech.ac.kr/˜leesy/ftp/sig95.pdf.
Lee, S. et al., “Polymorph: Morphing among Multiple Images,”IEEE Computer Graphics and Applications 18, 58-71 (1998). Also, http://www.postech.ac.kr/˜leesy/ftp/cgna98.pdf.
Masuko, T. et al., “Text-to-visual speech synthesis based on parameter generation from hmm,” inICASSP(1998) 4 pp. Also, http://sp-www.ip.titech.ac.jp/research/pdf/icassp98-avsynHMM.pdf.
Pighin, F. et al., “Synthesizing Realistic Facial Expressions from Photographs,” inProceedings of SIGGRAPH 1998, ACM Press/ACM SIGGRAPH, Computer Graphics Proceedings, Annual Conference Series, ACM, 75-84 (1998). Also, http://www.ict.usc.edu/˜pighin/publications/realface.pdf.
Poggio, T. and T. Vetter, “Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries,” Tech. Rep. 1347, Artificial Intelligence Laboratory, MIT (1992). Also, ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-1347.pdf.
Roweis, S., “EM Algorithms for PCA and SPCA” inAdvances in Neural Information Processing Systems, The MIT Press, vol. 10 (1998). Also, http://www.cs.toronto.edu/˜roweis/papers/empca.pdf.
Sjolander, K. and J. Beskow, “Wavesurfer—an open source speech tool,” inProc. of ICSLP, vol. 4, 464-467 (2000). Also, http://www.speech.kth.se/wavesurfer/wsurf—icslp00.pdf.
Tenenbaum, J.B. et al., “A Global Geometric Framework for Nonlinear Dimensionality Reduction,”Science, vol. 290, pp. 2319-2323 (Dec. 2000).
Tipping, M.E. and C.M. Bishop, “Mixtures of Probabilistic Principal Component Analyzers,”Neural Computation 11(2):443-482 (1999). Also, ftp://ftp.research.microsoft.com/users/mtipping/msrc-mppca.ps.gz.
Waters, K., “A Muscle Model for Animating Three-Dimensional Facial Expression,” inComputer Graphics(Proceedings of ACM SIGGRAPH 87), 21(4) ACM, 17-24 (Jul. 1987). Also, http://www.cs.dartmouth.edu/˜cs43/papers/facial/p17-waters.pdf.
Bergen, J.R. et al., “Hierarchical Model-Based Motion Estimation,” inProceedings of the European Conference on Computer Vision, 237-252 (1992).
Brooke, N.M. and S.D. Scott, “Computer Graphics Animations of Talking Faces Based on Stochastic Models,” inIntl. Symposium on Speech, Image Processing, and Neural Networks, 73-76 (1994).
Moulines, E. and F. Charpentier, “Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones,”Speech Communicati

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Trainable videorealistic speech animation does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Trainable videorealistic speech animation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Trainable videorealistic speech animation will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3756680

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.