Video signal processing systems and methods utilizing...

Television – Two-way video and voice communication – Transmission control

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C348S384100

Reexamination Certificate

active

06330023

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to processing video signals, and more specifically to systems for interpolating and coding video signals using speech analysis techniques.
2. Description of the Related Art
The progressive developments in digital electronics and digital computing since the 1960s have resulted in the conversion, from analog to digital technology, of devices for storing and processing audio and video signals. Storing, processing and transmitting signals digitally offers significant advantages. Digital signals are less sensitive to transmission noise than analog signals. Moreover, digital signals of different types can be treated in a unified way and, provided adequate decoding arrangements exist, can be mixed on the same channel. The latter approach is the main feature of the Integrated Services Digital Network (ISDN) which is currently being developed and implemented. The ISDN can handle, for example, speech, image and computer data on a single channel.
A major disadvantage of digital communication, however, is that it requires greater channel bandwidth. This can be several times the bandwidth of an equivalent analog channel. In multimedia, videotelephony, and teleconferencing applications, bandwidth and storage space limitations permit only a relatively low frame rate (typically 5-10 frames per second, but as low as 1-2 frames per second for some applications). Thus, there is currently a strong emphasis on techniques and systems which compress the channel bandwidth required to transmit the signals. In the context of speech signals, for example, a number of techniques have been proposed which are capable of efficiently coding at very low bit-rates (between 4.8 to 64 kbits/s). Such techniques include logarithmic pulse code modulation (Log PCM), adaptive pulse code modulation (APCM), adaptive differential pulse code modulation (ADPCM), delta modulation (DM), and continuously variable slope delta modulation (CVSD). All of these techniques operate directly on the time domain signal and achieve reduced bit rates by exploiting the sample to sample correlation or redundancy in the speech signal.
While the coding techniques discussed above permit very-low bit rates to be achieved for the transmission or storage of speech signals, they are less suitable for the coding of video signals. Thus, although current visual coding standards may also operate at very low bit rates, the trade-off between temporal and spatial resolution results in visually annoying motion or spatial artifacts. As such, various techniques have been proposed to interpolate between transmitted or stored frames as a means of increasing the frame rate for flicker free and smooth motion rendition.
In the interframe coding of television pictures, for example, it is known to drop or discard information from some frames or fields by subsampling the video signal at a fraction of the normal rate. At the receiver, a reconstructed version of the information contained in the nontransmitted frames or fields is obtained by interpolation, using information derived from the transmitted fields. Simple linear interpolation may be performed by averaging the intensity information defining picture elements (pels) in the preceding and succeeding transmitted fields at fixed locations most closely related to the location of the picture element being processed. In certain instances, the interpolation may be performed adaptively, such that the pels used to form certain reconstructed or estimated intensity values are selected from two or more groups having different spatial patterns or such that the information obtained from pels in the same relative spatial positions in the prior and succeeding frames are combined in two or more different ways.
Although both the fixed and the adaptive techniques described above adequately recover nontransmitted or unstored picture information when little motion occurs in a picture, their performance is less than adequate when objects in the picture are moving quickly in the field of view. That is, reconstruction by these interpolation techniques often causes blurring and other objectionable visual distortion. Thus, a more advanced interframe coding technique is proposed in U.S. Pat. No. 4,383,272 issued to Netravali et al. on May 10, 1983 and entitled VIDEO SIGNAL INTERPOLATION USING MOTION ESTIMATION. In accordance with the technique disclosed therein, information defining elements of a picture are estimated by interpolation using information from related locations in preceding and succeeding versions of the picture. The related locations are determined by forming an estimate of the displacement of objects in the picture. Displacement estimates are advantageously formed recursively, with updates being formed only in moving areas of the picture. While this coding technique is capable of eliminating the annoying distortion and flicker associated with the other prior art techniques described above, it is still incapable of reproducing the motion of a speaker's mouth in so-called talking-head (i.e. speaking-person) sequences.
Normal speech has about 13 speech sounds per second, and the positions of the lips, jaw, teeth, and tongue change at even higher rates. As such, it will be readily appreciated that at rates of 5-10 frames per second or lower, a great deal of information about mouth movements is necessarily lost. Accordingly, it is a principal object of the present invention to enable improved reconstruction of non-transmitted or non-stored fields or frames of a video signal indicative of a speaking personsequence using information from the speaking person's utterances and at least one transmitted or stored field or frame.
SUMMARY OF THE INVENTION
The foregoing and additional objects, which will hereinafter become apparent to those skilled in the art, are achieved in accordance with the present invention by a method and apparatus for increasing the frame rate of an image of a speaking person transmitted or stored at very low bitrates.
The apparatus of the present invention comprises frame generating means, responsive to an audio signal associated with an utterance by the speaking person and a frame to be reconstructed and to a video signal indicative of an existing frame or field, for generating a reconstructed frame of the image. The apparatus further includes means for associating respective portions of the audio signal with facial feature information and means for inserting a reconstructed frame between consecutive existing (i.e. stored or transmitted) frames.
The apparatus also includes monitoring means for detecting the audio signal portions, each signal portion corresponding to a speaker mouth formation. The signal portions may correspond to a phoneme, a homophene, or some other speech-based criteria from which mouth formation data can be reliably predicted. The stored facial information may include visemes, as well as feature position parameters relating to the jaw, teeth, tongue, and cheeks. Accordingly, the associating means may include a memory having stored therein a speaker-independent table of feature position parameters for respective detected signal portions. In a modified embodiment, the apparatus further includes means responsive to the monitoring means for storing speaker-dependent mouth position parameter data indicative of respective mouth positions as corresponding signal portions indicative of phonemes are detected by the monitoring means.
A method of increasing the frame rate of an image or picture of a speaking person in accordance with the present invention comprises the steps of monitoring an audio signal indicative of an utterance by the speaking person and associated with a frame to be reconstructed, monitoring a video signal indicative of an existing frame, associating individual portions of the audio signal with facial feature information, reconstructing at least one frame of the picture from the existing frame utilizing facial feature information obtained in the associating step, and inserting a rec

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Video signal processing systems and methods utilizing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Video signal processing systems and methods utilizing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Video signal processing systems and methods utilizing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2598843

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.