Methods and apparatus for improved motion estimation for...

Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06697427

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to improvements in video encoding, for example, the encoding employed in such encoding standards as MPEG-1, MPEG-2, H.261, H.263, and motion estimation. More particularly, it relates to advantageous techniques for applying frequency domain analysis to motion estimation.
BACKGROUND OF THE INVENTION
The moving pictures expert group (MPEG) video compression standards, MPEG-1 (ISO 11172-2) and MPEG-2 (ISO 13818-2), employ image processing techniques at multiple levels. Of interest to the present invention is the processing of 16×16 macroblocks and 8×8 blocks. In the terminology used by the MPEG standards, a “frame” is an X by Y image of pixels, or picture elements. Each pixel represents the smallest discrete unit in an image. The “pixel”, in MPEG usage, consists of three color components, one luminance and two chrominance values, Y, Cb, and Cr, respectively. Each frame is subdivided into 16×16 “macroblocks” of pixels. A grouping of macroblocks is called a “slice”. Each macroblock is further sub-divided into 8×8 “blocks” of pixels. A macroblock is typically comprised of four luminance (Y) and two or more chrominance (C
b
and C
r
) blocks. A more detailed description of luminance and chrominance is included in the MPEG-1 and MPEG-2 specifications. A sequence of frames ultimately makes up a video sequence.
One of the key compression methods used in MPEG is the discrete cosine transform (DCT) or the two dimensional discrete cosine transform (2D-DCT) coupled with quantization. During the encoding process, each block is transformed from its spatial-domain representation or its actual pixel values to a frequency-domain representation utilizing a two-dimensional 8×8 DCT. The quantization has the effect of deemphasising or eliminating visual components of the block with high spatial frequencies not normally visible to the human visual system, thus reducing the volume of data needed to represent the block. The quantization values used by the MPEG protocols are in the form of a quantization scale factor, included in the encoded bitstream, and the quantization tables. There are default tables included in the MPEG specification. However, these can be replaced by quantization tables included in the encoded bitstream. The decision as to which scale factors and tables to use is made by the MPEG encoder.
One of the fundamental methods used by the MPEG protocol is a mechanism whereby a macroblock within a single frame within a sequence of frames is represented in a motion vector (MV) encoded format. An MV represents the spatial location difference between that macroblock and a reference macroblock from a different, but temporally proximate, frame. Note that whereas DCT compression is performed on a block basis, the MVs are determined for macroblocks.
MPEG classifies frames as being of three types: I-frame (Intra-coded), P-frame (Predictive-coded), and B-frame (Bidirectionally predictive-coded). I-frames are encoded in their entirety. All of the information to completely decode an I-frame is contained within its encoding. I-frames can be used as the first frame in a video sequence, as the first frame of a new scene in a video sequence, as reference frames described further below, as refresh frames to prevent excessive error build-up, or as error-recovery frames, for example, after incoming bitstream corruption. They can also be convenient for special features such as fast forward and fast reverse.
P-frames depend on one previous frame. This previous frame is called a reference frame, and may be the previous I-frame, or P-frame, as shown below. An MV associated with each macroblock in the P-frame points to a similar macroblock in the reference frame. During reconstruction, or decoding, the referenced macroblock is used as the starting point for the macroblock being decoded. Then, a, preferably small, difference macroblock may be applied to the referenced macroblock. To understand how this reference-difference macroblock combination works, consider the encoding process of a P-frame macroblock. Given a macroblock in the P-frame, a search is performed in the previous reference frame for a similar macroblock. Once a good match is found, the reference macroblock pixel values are subtracted from the current macroblock pixel values. This subtraction results in a difference macroblock. Also, the position of the reference macroblock relative to the current macroblock is recorded as an MV. The MV is encoded and included in the encoder's output. This processing is followed by the DCT computation and quantization of the blocks comprising the difference macroblock. To decode the P-frame macroblock, the macroblock in the reference frame indicated by the MV is retrieved. Then, the difference macroblock is decoded and added to the reference macroblock. The result is the original macroblock values, or values very close thereto. Note that the MPEG encoding and decoding processes are categorized as lossy compression and decompression, respectively.
The idea is that the encoding of the MV and the difference information for a given macroblock will result in a smaller number of bits in the resulting bitstream than the complete encoding of the macroblock by itself. Note that the reference frame for a P-frame is usually not the immediately preceding frame. A sample ordering is given below.
B-frames depend on two reference frames, one in each temporal direction. Each MV points to a similar macroblock in each of the two reference frames. In the case of B-frames, the two referenced macroblocks are averaged together before any difference information is added in the decoding process. Per the MPEG standard, B-frame is not used as a reference frame. The use of B-frames normally results in a more compact representation of each macroblock.
A typical ordering of frame types would be I
1
, B
2
, B
3
, P
4
, B
5
, B
6
, P
7
, B
8
, B
9
, I
10
, and so on. Note that the subscripts refer to the temporal ordering of the frames. This temporal ordering is also the display ordering produced by the MPEG decoder. The encoded ordering of these frames, found in an MPEG bitstream, is typically different: I
1
, P
4
, B
2
, B
3
, P
7
, B
5
, B
6
, I
10
, B
8
, B
9
, and so forth. The first frame is always an I-frame. As mentioned above, an I-frame has no temporal dependencies upon other frames, therefore an I-frame contains no MVs. Upon completion of the decoding of this frame, it is ready for display. The second frame to be decoded is P
4
. It consists of MVs referencing I
1
and differences to be applied to the referenced macroblocks. After completion of the decoding of this frame, it is not displayed, but first held in reserve as a reference frame for decoding B
2
and B
3
, then displayed, and then used as a reference frame for decoding B
5
and B
6
. The third frame to be decoded is B
2
. It consists of pairs of MVs for each macroblock that reference I
1
and P
4
as well as any difference information. Upon completion of the decoding of B
2
, it is ready for display. The decoding then proceeds to B
3
. B
3
is decoded in the same manner as B
2
B
3
's MVs reference I
1
and P
4
. B
3
is then displayed, followed by the display of P
4
. P
4
then becomes the backward-reference frame for the next set of frames. Decoding continues in this fashion until the entire set of frames, or video sequence, has been decoded and displayed.
A video sequence generally approximates the appearance of smooth motion. In such a sequence, a given block of pixels in a given frame will be similar in content to one or more spatially proximate blocks in a range of temporally proximate frames. Given smooth real motion within a scene represented by such a sequence, and smooth apparent motion caused by changes in the orientation, point of view, and characteristics such as field width, for example, of the recorder of such a sequence, the positions of blocks that exhibit the greatest similarity across a number of temporally adjacent frames is very likely to be approximatel

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Methods and apparatus for improved motion estimation for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Methods and apparatus for improved motion estimation for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for improved motion estimation for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3354959

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.