Method of and apparatus for animation, driven by an audio...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S270000, C704S277000, C345S474000

Reexamination Certificate

active

06665643

ABSTRACT:

FIELD OF THE INVENTION
This invention concerns audio-visual or multimedia communication systems and in particular a method and an apparatus for the animation, driven by parameters derived from audio sources, of a synthesized human face model.
BACKGROUND OF THE INVENTION
At present, development activities for multimedia applications are considering the integration of natural and synthetic audio-visual objects with increasing interest, in order to facilitate and improve user application interaction. In such an area, adopting anthropomorphic models to facilitate the man-machine interaction is envisaged. Such interest has been also perceived by international standardization bodies, and the ISO/IEC standard 14496, “Generic Coding of Audio-Visual Objects”, has entered at present its definition phase. Said standard, which is commonly known as the “MPEG-4 standard” and is hereinafter referred to by such term, is aimed among other things at providing a reference framework for said applications.
Regardless of the specific solutions given by the MPEG-4 standard, the anthropomorphic models are thought of as an ancillary means to other information streams and are seen as objects capable of animation, where the animation is driven, by way of an example, by audio signals, such as the voice. In that case it is necessary to develop animation systems that, in synchronism with the voice itself, can deform the geometry and the look of the models in such a way that the synthetic faces take up typical countenances related to speech. The requisite target is a talking head or face that has a look as much as possible close to reality.
The application contexts of animated models of that kind may range from Internet applications, such as welcome messages or on line assistance messages, to co-operative work applications (for instance, electronic mail readers), as well as to professional applications, such as the implementation of post-production effects in the film and TV industry, to video games, and so on.
The models of human faces are generally implemented starting from a geometric representation formed by a 3-D mesh structure or “wire frame”. The animation is based on the application in sequence and without interruption of appropriate deformations of the polygons forming the mesh structure (or of a subset of such polygons) to such way as to achieve the required effect during the display phase, in a specific case, movement of the jaw and lip region.
The solution defined-by the MPEG-4 standard envisages for such a purpose the use of a set of face animation parameters, defined independently of the model, so as to ensure the interworking of the systems. This set of parameters is organized on two layers: the upper layer is formed by the so called “visemes” which represent the positions of the speaker's mouth in correspondence with the phonemes (i.e. the elementary sound units); the lower layer represents instead the elementary deformations to be applied in correspondence with the different visemes. The standard precisely defines how lower layer parameters must be used, whereas it does not set constraints on the use of upper layer parameters. The standard defines a possible association between phonemes and visemes for the voice driven animation; thereafter relating parameters shall have to be applied to the model adopted.
Different methods of achieving animation are known from the literature. By way of an example, one can mention the following papers: “Converting Speech into Lip Movements: A Multimedia Telephone for Hard of Hearing People”, by F. Lavagetto, IEEE Transactions on Rehabilitation Engineering, Vol.3, No. 1, March 1995; DIST, University of Genoa, “Description of algorithms for Speech-to-Facial Movements Transformations”, ACTS “SPLIT” Project, November 1995; TUB, Technical University of Berlin, “Analysis and Synthesis of Visual Speech Movements, ACTS “SPLIT” Project, November 1995.
The first document describes the possibility of implementing animation starting from phonemes, by identifying the visemes associated and transforming the visemes into articulatory parameters to be applied to a model; alternatively it suggests the direct transformation of spectral information into articulatory parameters through a neural network adequately trained. However the adopted articulatory parameters are not the facial animation parameters envisaged by MPEG-4 standard and therefore the suggested method is not flexible. Also the two papers presented at the ACTS “SPLIT” Project do not describe the use of facial animation parameters foreseen by MPEG-4 standard; further the obtained parameters are only aimed at choosing an image from a database containing images of lips in different-positions {corresponding to the various visemes).
SUMMARY OF THE INVENTION
According to this invention, a method and an apparatus for animation are provided that are able to receive visemes and to apply the appropriate, geometric deformations to any facial model complying with MPEG-4 standard. Besides assuring a much higher quality, this allows the user to observe the synthetic speaker in positions different from the frontal one, to move closer to or away from it, etc.
More particularly; the invention provides a method wherein the driving audio signal is converted into phonetic data readable by a machine and such data are transformed into parameters representative of elementary deformations to be applied to such model, and wherein the transformation of phonetic data includes the following steps: associating individual items of phonetic information or groups of phonetic information items (visemes) representative of a corresponding position of the speaker's mouth, said visemes being selected within a set which comprises visemes independent of the language of the driving audio signal and visemes specific for such a language;
splitting each viseme into a group of macroparameters characterizing the mouth shape and the positions of lips and jaw, and associating each of the macroparameters of a given viseme with an intensity value representative of a displacement from a neutral position and selected within an interval determined in an initialization phase so as to guarantee a good naturalness of the animated model;
splitting the macroparameters into said parameters representative of deformations to be applied to a face model, which parameters are selected within a group of standard facial animation parameters relating to the mouth movements, and associating said parameters with intensity values which depend on the intensity values of macroparameters and which are also selected within an interval designed to guarantee the naturalness of the animated model.
The invention also concerns the apparatus for the implementation of the method, comprising:
means for generating phonetic information representative of the driving audio signal, readable by a machine; means for converting the phonetic information into parameters representative of elementary deformations to be applied to such a model, said conversion means being capable of: associating individual phonetic information items or groups of phonetic information items with respective information items (visems} representative of a corresponding mouth position in the synthesized model, the visemes being read from a memory containing visemes independent of the language of the driving audio signal and visemes specific for such a language; splitting each viseme into a group of macroparameters characterizing mouth shape and positions of lips and jaw in the model; associating each of the macroparameters of a given viseme with an intensity value representative of a displacement from a neutral position and selected within a given interval in an initialization phase so as to guarantee a good naturalness of the animated model; splitting the macroparameters into parameters representative of deformations to be applied to such a model, which parameters are selected within a group of standard facial animation parameters relating to mouth movements; associating said parameters with intensity val

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of and apparatus for animation, driven by an audio... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of and apparatus for animation, driven by an audio..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of and apparatus for animation, driven by an audio... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3179437

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.