Speech-controlled animation system

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S235000, C704S254000

Reexamination Certificate

active

06766299

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to animation systems and, more particularly, to a method and apparatus for generating an animated sequence having synchronized visual and audio components.
BACKGROUND OF THE INVENTION
Existing technology related to Internet communication systems includes such applications as pre-animated greetings, avatars, e-mail web based audio delivery and video conferencing. Originally, e-mail messages were sent through the Internet as text files. However, soon the commercial demand for more visual stimulus and the advances in compression technology allowed graphics in the form of short pre-animated messages with imbedded audio to be made available to the consumer. For example, software packages from Microsoft Greetings Workshop allow a user to assemble a message with pre-existing graphics, short animations and sound. These are multimedia greeting cards that can be sent over the Internet but without the voice or gesture of the original sender.
Existing software in the area of video conferencing allows audio and video communication through the Internet. Connectix, Sony Funmail and Zap technologies have developed products that allow a video image with sound to be sent over the Internet. Video Email can be sent as an executable file that can be opened by the receiver of the message without the original software. However, video conferencing requires both sender and receiver to have the appropriate hardware and software. Although video e-mail and conferencing can be useful for business applications many consiers have reservations about seeing their own image on the screen and prefer a more controllable form of communication.
In the area of prior art Internet messaging software, a variety of systems have been created. Hijinx Masquerade software allows text to be converted into synthetic voices and animated pictures that speak the voices. The system is designed to use Internet Relay Chat (IRC) technology. The software interface is complicated and requires the user to train the system to match text and image. The result is a very choppy animated image with mouth shape accompanied by a synthetic computer voice. The software is limited by its inability to relay the actual voice of its user in sync with a smooth animation. In addition, a Mitsubishi technology research group has developed a voice puppet, which allows an animation of a static image file to be driven by speech in the following manner. The software constructs a model using a limited set of the speaker's facial gestures, and applies that model to any 2D or 3D face, using any text, mapping the movements on to the new features. In order to learn to mimic someone's facial gestures, the software needs several minutes of video of the speaker, which it analyzes, maps and stylizes. This software allows a computer to analyze and stylize video images, but does not directly link a user's voice to animation for communication purposes. Geppetto software also aids professional animators in creating facial animation. The software helps professionals generate lip-sync and facial control of 3D computer characters for 3D games, real-time performance and network applications. The system inputs the movement of a live model into the computer using motion analysis and MIDI devices. Scanning and motion analysis hardware capture a face and gestures in real time and then records the information into a computer for animation of a 3D model.
Prior art software for Internet communication has also produced “avatars”, which are simple characters that form the visual embodiment of a person in cyberspace and are used as communication and sales tools on the Internet. These animations are controlled by real time commands, allowing the user to interact with others on the Internet. Microsoft's V-Chat software offers an avatar pack, which includes downloadable characters and backgrounds, and which can be customized by the user with a character editor. The animated character can be represented in 3D or in 2D comic style strip graphic with speech bubbles. It uses the Internet Relay Chat (IRC) protocol and can accommodate private or group chats. The user is required to type the message on a keyboard and if desired choose an expression from a menu. Accordingly, while chatting the user must make a conscious effort to link the text with the appropriate character expression, since the system does not automatically perform this operation. In addition, the animated characters do not function with lip-synced dialogue generated by the user.
A number of techniques and systems exist for synchronizing the mouth movements of an animated character to a spoken sound track. These systems, however, are mainly oriented to the entertainment industry, since their operation generally requires much technical sophistication to ultimately produce the animated sequence. For example, U.S. Pat. No. 4,360,229 discloses a system where recorded sound track is encoded into a sequence of phoneme codes. This sequence of phoneme codes is analyzed to produce a sequence of visual images of lip movements corresponding to the sound track. These visual images can then be overlaid onto existing image frames to yield an animated sequence. Similarly, U.S. Pat. No. 4,913,539 teaches a system that constructs a synchronized animation based upon a recorded sound track. The system disclosed therein uses linear prediction techniques, instead of phoneme recognition devices to code the sound track. This system, however, requires that the user “train” the system by inputting so-called “training utterances” into the system, which compares the resulting signals to the recorded sound track and generates a phonetic sequence.
Furthermore, speech-driven animation software has been developed to aid in the laborious task of matching specific mouth shapes to each phoneme in a spoken dialogue. LipSync Talkit and Talk Master Pro work as plugins for professional 3D animation programs such as 3D Studio Max and Lightwave 3D. These systems take audio files of dialogue, link them to phonemes and morph the 3D-speech animation based on facial bone templates created by the animator. Then the animation team assembles the remaining animation. These software plugins, however, require other professional developer software to implement their functionality for complete character design. In addition, they do not function as self-contained programs for the purpose of creating eech driven animations and sending these animations as messages through the Internet.
The user of prior art speech-driven animation software generally must have extensive background in animation and 3D modeling. In light of the foregoing, a need exists for an easy-to-use method and system for generating an animated sequence having mouth movements synchronized to a spoken sound track inputted by a user. The present invention substantially fulfills this need and a tool for automated animation of a character without prior knowledge of animation techniques from the end user.
SUMMARY OF THE INVENTION
The present invention provides methods, systems and apparatuses directed toward an authoring tool that gives users the ability to make high-quality, speech-driven animation in which the animated character speaks in the user's voice. Embodiments of the present invention allow the animation to be sent as a message over the Internet or used as a set of instructions for various applications including Internet chat rooms. According to one embodiment, the user chooses a character and a scene from a menu, then speaks into the computer's microphone to generate a personalized message. Embodiments of the present invention use speech-recognition technology to match the audio input to the appropriate animated mouth shapes creating a professional looking 2D or 3D animated scene with lip-synced audio characteristics.
The present invention, in one embodiment, creates personalized animations on the fly that closely resemble the high quality of hand-finished products. For instance, one embodiment of the present invention rec

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech-controlled animation system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech-controlled animation system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech-controlled animation system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3196113

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.