Method and system for aligning natural and synthetic video...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S270000, C345S215000

Reexamination Certificate

active

06567779

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates generally to methods and systems for coding of images, and more particularly to a method and system for coding images of facial animation.
According to MPEG-4's TTS architecture, facial animation can be driven by two streams simultaneously—text, and Facial Animation Parameters (FAPs). In this architecture, text input is sent to a Text-To-Speech (TTS) converter at a decoder that drives the mouth shapes of the face. FAPs are sent from an encoder to the face over the communication channel. Currently, the Verification Model (VM) assumes that synchronization between the input side and the FAP input stream is obtained by means of timing injected at the transmitter side. However, the transmitter does not know the timing of the decoder TTS. Hence, the encoder cannot specify the alignment between synthesized words and the facial animation. Furthermore, timing varies between different TTS systems. Thus, there currently is no method of aligning facial mimics (e.g., smiles, and expressions) with speech.
The present invention is therefore directed to the problem of developing a system and method for coding images for facial animation that enables alignment of facial mimics with speech generated at the decoder.
SUMMARY OF THE INVENTION
The present invention solves this problem by including codes (known as bookmarks) in the text string transmitted to the Text-to-Speech (TTS) converter, which bookmarks can be placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp (ETS). Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, according to the present invention, the Facial Animation Parameter (FAP) stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp (RTS) derived from the timing of its TTS converter to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference. In order to prevent conflicts between the encoder time stamps and the real-time time stamps, the encoder time stamps have to be chosen such that a wide range of decoders can operate.
Therefore, in accordance with the present invention, a method for encoding a facial animation including at least one facial mimic and speech in the form of a text stream, comprises the steps of assigning a predetermined code to the at least one facial mimic, and placing the predetermined code within the text stream, wherein said code indicates a presence of a particular facial mimic. The predetermined code is a unique escape sequence that does not interfere with the normal operation of a text-to-speech synthesizer.
One possible embodiment of this method uses the predetermined code as a pointer to a stream of facial mimics thereby indicating a synchronization relationship between the text stream and the facial mimic stream.
One possible implementation of the predetermined code is an escape sequence, followed by a plurality of bits, which define one of a set of facial mimics. In this case, the predetermined code can be placed in between words in the text stream, or in between letters in the text stream.
Another method according to the present invention for encoding a facial animation includes the steps of creating a text stream, creating a facial mimic stream, inserting a plurality of pointers in the text stream pointing to a corresponding plurality of facial mimics in the facial mimic stream, wherein said plurality of pointers establish a synchronization relationship with said text and said facial mimics.
According to the present invention, a method for decoding a facial animation including speech and at least one facial mimic includes the steps of monitoring a text stream for a set of predetermined codes corresponding to a set of facial mimics, and sending a signal to a visual decoder to start a particular facial mimic upon detecting the presence of one of the set of predetermined codes.
According to the present invention, an apparatus for decoding an encoded animation includes a demultiplexer receiving the encoded animation, outputting a text stream and a facial animation parameter stream, wherein said text stream includes a plurality of codes indicating a synchronization relationship with a plurality of mimics in the facial animation parameter stream and the text in the text stream, a text to speech converter coupled to the demultiplexer, converting the text stream to speech, outputting a plurality of phonemes, and a plurality of real-time time stamps and the plurality of codes in a one-to-one correspondence, whereby the plurality of real-time time stamps and the plurality of codes indicate a synchronization relationship between the plurality of mimics and the plurality of phonemes, and a phoneme to video converter being coupled to the text to speech converter, synchronizing a plurality of facial mimics with the plurality of phonemes based on the plurality of real-time time stamps and the plurality of codes.
In the above apparatus, it is particularly advantageous if the phoneme to video converter includes a facial animator creating a wireframe image based on the synchronized plurality of phonemes and the plurality of facial mimics, and a visual decoder being coupled to the demultiplexer and the facial animator, and rendering the video image based on the wireframe image.


REFERENCES:
patent: 4520501 (1985-05-01), Dubrucq
patent: 4841575 (1989-06-01), Welsh et al.
patent: 4884972 (1989-12-01), Gasper
patent: 5111409 (1992-05-01), Gasper et al.
patent: 5473726 (1995-12-01), Marshall
patent: 5608839 (1997-03-01), Chen
patent: 5623587 (1997-04-01), Bulman
patent: 5634084 (1997-05-01), Malsheen et al.
patent: 5657426 (1997-08-01), Waters et al.
patent: 5732232 (1998-03-01), Brush, II et al.
patent: 5793365 (1998-08-01), Tang et al.
patent: 5802220 (1998-09-01), Black et al.
patent: 5806036 (1998-09-01), Stork
patent: 5812126 (1998-09-01), Richardson et al.
patent: 5818463 (1998-10-01), Tao et al.
patent: 5826234 (1998-10-01), Lyberg
patent: 5878396 (1999-03-01), Henton
patent: 5880731 (1999-03-01), Liles et al.
patent: 5884029 (1999-03-01), Brush, II et al.
patent: 5907328 (1999-05-01), Brush, II et al.
patent: 5920835 (1999-07-01), Huzenlaub et al.
patent: 5930450 (1999-07-01), Fujita
patent: 5963217 (1999-10-01), Grayson et al.
patent: 5970459 (1999-10-01), Yang et al.
patent: 5977968 (1999-11-01), Le Blanc
patent: 5983190 (1999-11-01), Trower, II et al.
patent: 6177928 (2001-01-01), Basso et al.
patent: 6477239 (2002-11-01), Ohki et al.
Baris Uz, et al.; Realistic Speech Animation of Synthetic Faces; Proceedings Computer Animation '98, Philadelphia, PA, USA, Jun. 8-10, 1998, pp 111-118, XP002111637, IEEE Comput. Sco., Los Alamitos, CA, ISBN: 0-8186-8541-7, section 6 (Synchronizing Speech with Expressions), pp. 115-116.
ISO/IEC/JTC 1/SC 29/WG11: “Report of the 43rdWG 11 Meeting”, Coding of Moving Pictures and Audio; ISO/IEC JTC 1/SC 29/WG 11 N2114, Mar. 1998, XP002111638 International Organisation for Standardisation, p. 40, TTSI Section.
Chiariglione L, “MPEG and Multimedia Communications”; IEEE Transactions On Circuits and Systems For Video Technolotgy, vol. 7, No. 1, Feb. 1, 1997, pp. 5-18, XP000678876; ISSN 1051-8215, sections VII and VIII, pp. 12-16.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for aligning natural and synthetic video... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for aligning natural and synthetic video..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for aligning natural and synthetic video... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3079049

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.