Post-synchronizing an information stream including the...

Television – Synchronization – Locking of video or audio to reference timebase

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C348S578000, C348S553000, C345S473000

Reexamination Certificate

active

06697120

ABSTRACT:

The invention relates to a method of post-synchronizing an information stream, which information stream comprises an audio signal and a video signal, the method comprising the step of: performing a translation process to obtain at least one translated audio signal.
The invention further relates to a transmitter for transmitting an information stream comprising at least one translated audio signal and a video signal.
The invention further relates to a receiver for receiving an information stream.
The invention further relates to a communication system comprising: a plurality of stations comprising means for transmitting and means for receiving an information stream, which information stream comprises an audio and a video signal; and a communication network for linking said stations.
The invention further relates to an information stream comprising a video signal and a plurality of audio signals relating to different languages and a storage medium.
Post-synchronizing an information stream is especially known from the field of movies and television programs. Post-synchronization means that the original audio signal is replaced by another audio signal that is normally a translation of the original audio signal. This has the advantage that an audience that does not understand the original language can understand the movie without having to read subtitles. It is however annoying to the audience that the movement of the lips does not correspond anymore to the audio signal.
It is, inter alia, an object of the invention to overcome the above-mentioned problem. To this end, a first aspect of the invention provides a method characterized in that the method comprises the steps of: tracking said video signal to obtain original lip-objects; replacing said original lip-objects with new lip-objects, said new lip-objects corresponding to said translated audio signal.
The facilities to track and manipulate lip-objects are provided by an object-oriented coding technique, e.g. MPEG-4. Because of the object-oriented nature of such a coding technique, the lip-objects are regarded as separate objects that can be handled and manipulated separately. An overview of the MPEG-4 standard is given in the ISO/IEC document JTC1/SC29/WG11/N2459, October 1998, Atlantic City, further referred to as the “MPEG-4 standard”. Further information can be found in the ISO/IEC document JTC1/SC29/WG11/N2195, March 1998, Tokyo, which describes MPEG-4 Applications. MPEG-4 is an ISO/IEC standard developed by MPEG (Motion Picture Experts Group). This standard provides the standardized technological elements enabling the integration of the production, distribution and content access paradigms of three fields: digital television, interactive graphics applications (synthetic content) and interactive multimedia. MPEG-4 provides ways to represent units of aural, visual or audiovisual content, called “media objects”. These media objects can be of natural or synthetic origin; this means that they could be recorded with a camera or microphone, or generated with a computer. Audiovisual scenes are composed of several media objects, e.g. audio and video objects. MPEG-4 defines the coded representation of objects such as synthetic face objects and synthetic sound. MPEG-4 provides facilities to distinguish different objects of a scene. In particular, it is possible by lip-tracking to record lips of a person as a separate object, a so-called lip-object. This lip-object can be manipulated. From the lip-object it is possible to extract lip-parameters that describe the lips on base of a lip-model. Such a lip-model can be locally stored, which makes it possible to construct lips by only sending the according lip-parameters.
According to the invention, the original lip-objects are replaced with new lip-objects that correspond to the translated audio signal. In this way, a video signal is obtained wherein lip-movements better correspond to the translated signal. The translation becomes more natural and in an ideal case the viewer will not notice that the information stream is in fact a translation of an original information stream. Lip-objects comprise lips as well as relevant parts of the face.
According to the MPEG-4 standard, media objects can be placed anywhere in a given coordinate system. Transforms can be applied to change the geometrical or acoustical appearance of a media object. Streamed data can be applied to media objects in order to modify their attributes. Synchronization of elementary streams is achieved through time stamping of individual access units within elementary streams. Usually, the new lip-objects are synchronized with the translated audio signal.
The tools for representing natural video in the MPEG-4 visual standard aim at providing standardized core technologies allowing efficient storage, transmission and manipulation of textures, images and video data for multimedia environments. These tools allow the decoding and representation of atomic units of image and video content, called video objects. An example of a video object could be a talking person or only his lips.
The face is an object capable of facial geometry ready for rendering and animation. The shape, texture and expressions of the face are generally controlled by a bit stream containing instances of Facial Definition Parameter (FDP) sets and/or Facial Animation Parameter (FAP) sets. Frame-based and temporal-DCT coding of a large collection of FAPs can be used for accurate speech articulation.
Viseme and expression parameters are used to code specific speech configurations of the lips and the mood of the speaker. A viseme is a sequence of one or more facial feature positions corresponding to a phoneme. A phoneme is a distinct speech element that represents shortest representative phonetics. Visemes perform the basic units of visual articulatory mouth shapes. A viseme comprises mouth parameters which specify the mouth opening, height, width and protrusion. The face animation part of the standard allows sending parameters that calibrate and animate synthetic faces. These models themselves are not standardized by MPEG-4, only the parameters are. The new lip-objects can always be manipulated to fit best in the video signal.
Advantageous embodiments of the invention are defined in the dependent claims. An embodiment of the invention provides a method, characterized by comprising the step of: obtaining said new lip-objects by tracking at least one further video signal, said further video signal comprising lip-movements corresponding to said translated audio signal. This embodiment describes a method to obtain the new lip-objects. Because the further video signal comprises lip-movements that correspond to the translated audio signal, the lip-objects that are derived from the further video signal correspond to the translated audio signal. Preferably, the further video signal is obtained by recording the lips of a translator or an original actor. Tracking lip-objects is performed on this further video signal to obtain the new lip-objects. It may be efficient to combine the recording of the lip-movement and the translation of the audio signal. A translator or an original actor can for example provide the translated audio signal as well as the lip-objects at the same time. The advantage of an original actor is that the correspondence of the lips is better, because the new lip-objects originate from the same lips as the original lip-objects.
A further embodiment of the invention provides a method wherein said translation process comprises the steps of: converting the original audio signal into translated text; and deriving said translated audio signal and said new lip-objects from said translated text. In this embodiment, the result of a translation process is translated text. The translated text can be obtained with keyboard input from a translator or by analyzing the audio signal. A computer may for example first convert the audio signal into text and thereafter translate the text into translated text. The translated text is in this case used to derive the translated

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Post-synchronizing an information stream including the... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Post-synchronizing an information stream including the..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Post-synchronizing an information stream including the... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3354144

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.