Image analysis – Image compression or coding – Predictive coding
Reexamination Certificate
1999-04-27
2003-04-15
Do, Anh Hong (Department: 2721)
Image analysis
Image compression or coding
Predictive coding
C382S236000
Reexamination Certificate
active
06549669
ABSTRACT:
FIELD OF THE INVENTION
The invention relates generally to audio and video encoding and, more particularly, to scene switching in predictive audio and video encoding.
BACKGROUND OF THE INVENTION
FIG. 1
illustrates a conventional system wherein audio and video inputs received at a transmitter are respectively encoded, and the encoded audio and video information is transmitted across a communication channel to a receiver that decodes the encoded audio and video information into respective audio and video outputs which are intended to closely match the audio and video inputs. Examples of the transmission channel of
FIG. 1
include wireless transmission channels and data networks such as the Internet.
Audio encoding in
FIG. 1
can be accomplished using predictive coding, such as performed, for example, by the well known CELP codec. Such a predictive codec (coder/decoder) includes digital filters which have memory. A short-term filter is used to predict the vocal tract, and a long-term filter is used to predict the vocal chords. A codebook which contains a set of excitation vectors is used to describe the residual (i.e., non-predictable) data.
The input signal to such a predictive audio codec is divided into frames, typically less than 40 milliseconds of audio data per frame. For each frame, a set of filter parameters and the excitation vectors describing that particular frame are derived using digital signal processing. The calculation of filter parameters depends not only on the current audio frame, but also on the state of the digital filters when they begin processing the current frame. For example, if an input signal suddenly becomes completely silent, the memory of the digital filters would generate a signal that would extend into the silent frame. This is conventionally called ringing. When calculating the parameters for the short-term filter and the long-term filter, conventional codecs compensate for the aforementioned ringing phenomenon.
It is well known that, when starting an audio decoder, audible artifacts could result if the filter states of the decoder are initialized with nonrepresentative values, such as random values or even zeros. Therefore, pre-determined filter states are often preloaded into the filters to initialize the audio decoder. This procedure is conventionally known as audio decoder homing.
In conventional video encoding, the video encoder receives a video input from a camera, and takes a snapshot of (captures) the incoming video for encoding. There are several well-known conventional video encoding/decoding techniques, for example MPEG1 and MPEG2. MPEG1 is well suited to video coding for CD-based video. The MPEG1 standard specifies the syntax of the coded bit stream and also describes a model decoder. Frames of video are coded as pictures, with each frame being encoded in a progressive order. There are three main types of coded pictures in MPEG1, namely I-pictures (intrapictures) which are intraframe encoded and do not use prediction, P-pictures (forward predicted pictures) which are interframe encoded using motion prediction from a previous I or P picture in the sequence, and B-pictures (bidirectional predicted pictures) which are interframe encoded using interpolated motion prediction between a previous I or P picture and the next I or P picture in the sequence.
MPEG2 extends the functionality provided by MPEG1 to enable efficient coding of video and associated audio at a wide range of resolutions and bit rates. MPEG2 describes a range of profiles and levels which provide encoding parameters for a range of applications. Each profile specifies a particular set of coding features.
In the above-described predictive video encoding techniques such as MPEG1 and MPEG2, the current output of the video decoder depends on the previous output of the video decoder, for example in P pictures and B pictures. Similarly, in the above-described predictive audio encoding techniques, such as CELP coding, the current output of the audio decoder depends on the state that the audio decoder's digital filters were left in after calculating the previous audio decoder output. This reliance on a previous video encoder output or a previous digital filter state of an audio decoder can dramatically degrade the audio and video quality when the audio and video inputs of
FIG. 1
are switched from one scene to another.
Referring now to the conventional audio/video transmitter arrangement shown in
FIG. 2
, a plurality of video cameras are switchably connectable to the video encoder. Also, when switching from the camera associated with scene A to the camera associated with scene B, the input of the audio encoder is also switched from sound A associated with scene A to sound B associated with scene B. Examples of such an arrangement could be surveillance equipment, or a lecture with a teacher (scene A and sound A) and students (scene B and sound B).
When switching between scene A and scene B, the difference in the corresponding images may be quite large. If the picture produced directly after the switching operation is to be predicted based on the previous picture (such as a P or B picture in MPEG1 or MPEG2), the large difference between the two pictures will typically cause a very noticeable effect in the video stream, and several subsequent frames will typically be required to “catch up” after the switch. This effect is quite noticeable when using conventional video conferencing tools such as VIC, especially if the scene is switched back and forth several times.
The arrangement of
FIG. 2
will not adversely affect the audio encoding process because, as mentioned above, conventional predictive audio codecs compensate for the ringing effect.
FIG. 3
illustrates another conventional audio/video transmitter arrangement wherein each video stimulus (scene A, scene B) has its own video encoder, and each audio stimulus (sound A, sound B) has its own audio encoder. The outputs of each audio and video encoder are then transmitted over the communication channel. In this configuration, the receiver (at the other end of the channel) can switch between the different scenes and corresponding sounds. Switching between two audio encoded streams will break the sequence of consecutive audio frames and, if there is a mismatch between the filter states and the incoming filter parameters and excitation vectors, an audible artifact might well be generated. When the receiver switches between video streams, the video decoder will disadvantageously generate the first picture after the switch based on an erroneous previous picture.
FIG. 4
illustrates a conventional audio/video receiver arrangement which can be used with the transmitter arrangements of
FIG. 2
or
FIG. 3
, and wherein the aforementioned problems can occur during scene switching. A single audio decoder receives an input encoded audio stream (which can include both the audio A stream and audio B stream of FIG.
3
), and a single video decoder receives an input encoded video stream (which can include the video A stream and video B stream of FIG.
3
).
It is desirable in view of the foregoing to provide predictive audio and video encoding which accommodates switching among multiple scenes/sounds without the aforementioned disadvantages of the prior art.
The present invention provides for smooth switching among multiple scenes/sounds in a predictive audio/video encoding environment.
REFERENCES:
patent: 4716453 (1987-12-01), Pawelski
patent: 5416520 (1995-05-01), Kuzuma
patent: 5446491 (1995-08-01), Shibata et al.
patent: 5602592 (1997-02-01), Mori et al.
patent: 5684954 (1997-11-01), Kaiserswerth et al.
patent: 5790179 (1998-08-01), Shibata et al.
patent: 5841475 (1998-11-01), Kurihara et al.
patent: 5850207 (1998-12-01), Eglit
patent: 5859932 (1999-01-01), Etoh
patent: 5923783 (1999-07-01), Kawauchi et al.
patent: 5933536 (1999-08-01), Fukuzawa
patent: 0590974 (1994-04-01), None
patent: 0606675 (1994-07-01), None
patent: 9728652 (1997-08-01), None
patent: 9832281 (1998-07-01), None
Patent Abstracts of Japan, JP 08 147000 A (Yamaha Co
Bergquist Henrik
Sundqvist Jim
Do Anh Hong
Jenkens & Gilchrist P.C.
Telefonaktiebolaget L M Ericsson (publ)
LandOfFree
Predictive audio and video encoding with smooth scene... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Predictive audio and video encoding with smooth scene..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Predictive audio and video encoding with smooth scene... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3098397