Computer graphics processing and selective visual display system – Computer graphics processing – Animation
Reexamination Certificate
1998-04-13
2001-01-30
Nguyen, Phu K. (Department: 2772)
Computer graphics processing and selective visual display system
Computer graphics processing
Animation
Reexamination Certificate
active
06181351
ABSTRACT:
FIELD OF THE INVENTION
The invention generally relates to computer-generated animation, and more specifically to synchronizing animation with recorded speech.
BACKGROUND AND SUMMARY OF THE INVENTION
Computer animation has come into widespread use for a variety of applications. One such application is character animation. For example, a game program may present an animated character for entertainment, or an educational program may include an animated teacher character. In addition, animated characters are a useful part of social interfaces that present an interactive interface with human qualities. For instance, an animated character may appear on a computer display to help a user having difficulty completing a function or to answer questions. The character's creators may give it certain human traits reflected in gestures and other behavior, and the character may be programmed to react to actions by the user.
A challenge facing computer animators is presenting a convincing animation. One element of this challenge involves presenting a speaking character. Sound output for the character can be sent to a sound device such as a computer speaker. In the character animation, some activity is performed, such as having the character's mouth move or displaying the text of the spoken words in an accompanying word balloon, such as that shown in a newspaper comic strip. The appearance of words in the balloon can be paced to provide a closed-captioning effect. In this way, the user is presented with the illusion that the character on the display is actually speaking the words sounded from the computer speaker.
However, to create a compelling simulation of a speaking character, the character's mouth should be synchronized with the audio output. Part of the human communication experience includes receiving visual cues from whoever is speaking. If a character's mouth movement does not match the spoken words, the user will not experience a realistic presentation of the character. Instead, the animation is much like a foreign film in which the spoken translation is dubbed over the original sound track. In addition, if the appearance of the words in the character's word balloon is not properly paced with the character's speech, the resulting presentation can be confusing. Poor quality animation reduces the effectiveness of the character presentation. This can be especially troublesome if the character is being used as part of a social interface that is based on presenting a convincing simulation of an interactive speaking character. A social interface can be a useful tool for placing the computer user at ease and for assisting the user with unfamiliar tasks. However, a confusing character presentation defeats the purpose of a social interface.
When animation is done without a computer, synchronization is accomplished by an animator who draws each frame of the animated character to reflect an appropriate mouth shape. Inappropriate frames in an animation are usually perceptible by the viewer and result in an inferior animation. Therefore, the animator is typically a highly skilled professional who is highly compensated for high quality work. In addition, the process can be time consuming, as the animator often reviews the animation a small portion at a time to craft appropriate mouth shapes in each animation frame.
With the advent of computer animation systems, various tools have become available to professional animators to assist in the animation process. However, even with the aid of a computer, the professional animator still reviews and edits the animation a small portion at a time to ensure an appropriate mouth shape reflects what is being spoken in the recorded speech. Although the computer can provide some useful features, a great deal of work is still required by the animator, adding considerably to development costs. Further, computer software typically undergoes multiple revisions during its life cycle. Repeatedly involving the professional animator in each revision can become prohibitively expensive.
To avoid the expenses related to the labor-intensive task of the animator, some software developers have addressed the problem of mouth synchronization by using the amplitude of the accompanying recorded speech to control mouth movement. Throughout the animation, the size of the character's mouth opening is adjusted to match the amplitude of the speech sounded from the computer's speaker. However, this approach has the drawback of inaccurately depicting the character's mouth in many instances. For example, the amplitude of an aspirated sound such as the “h” in “hello” is typically very low. Accordingly, based on amplitude, a closed mouth might be displayed when the “h” sound is voiced. However, the human mouth must be open in order to pronounce the “h” sound. Similar problems exist for other sounds. As a result, this approach has not led to high quality presentations of animated characters.
Another approach to solving the synchronization problem is to use a synthetic voice generated by a text to speech (“TTS”) software engine to generate the speech sound for the character animation. A text to speech engine can output a synthetic voice based on a text string. For instance, if supplied with the text “hello,” the TTS engine will produce a voice speaking the word “hello.” As the TTS engine generates output, a system can select appropriate mouth shapes for use in the animation. The result is animation in which the character's mouth movement is synchronized with the synthetic voice. However, due to various limitations associated with synthetic voices, the sound output does not result in a voice that is of the quality available from human professional vocal talent. Thus, the TTS approach does not result in high quality animated speaking characters. In addition, one of the features of a social interface is to put the user at ease by presenting human characteristics in the animated character. Typically, the user perceives that a synthetic voice is that of a machine lacking familiar human characteristics. As a result, the TTS approach fails to offer the convincing presentation needed for a social interface.
The invention provides a method and system for synchronizing computer output or processing with recorded speech. The invention is particularly suited to synchronizing the animation of a character with recorded speech while avoiding the problems described above. Although the synchronization can be performed without a professional animator, the resulting animation is of the high quality necessary for a compelling presentation of a speaking character. The invention can also be used to synchronize other computer output with recorded speech. For example, a background color or background scene can be changed based on an event in the recorded speech.
In one implementation, a system synchronizes the animation of a character with recorded speech in the form of speech sound data. The system includes a sound file tool, a speech recognition engine, and a file player. The sound file tool acquires the speech sound data and a text of the speech sound data. The speech recognition engine analyzes the speech sound data and the text to determine linguistic event values and time values. A linguistic event value indicates a linguistic event in the speech sound data, such as a spoken phoneme, a spoken word, or some other event. A time value indicates when the linguistic event occurs within the speech sound data. The sound file tool annotates the speech sound data with these values to create a linguistically enhanced sound file.
When the character is animated, the file player plays the linguistically enhanced sound file to produce sound output and send information to the animation. The information includes events specifying that the animation perform some action to indicate the linguistic event at a time indicated by the time value. For example, a particular mouth shape associated with a spoken phoneme could be presented in a frame of the character animation or the text
Merrill John Wickens Lamb
Trower, II Tandy W.
Weinberg Mark Jeffrey
Klarquist Sparkman Campbell & Leigh & Whinston, LLP
Microsoft Corporation
Nguyen Phu K.
LandOfFree
Synchronizing the moveable mouths of animated characters... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Synchronizing the moveable mouths of animated characters..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Synchronizing the moveable mouths of animated characters... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2448848