Image analysis – Image compression or coding – Interframe coding
Reissue Patent
2001-11-19
2004-08-10
Tran, Phuoc (Department: 2621)
Image analysis
Image compression or coding
Interframe coding
C382S238000
Reissue Patent
active
RE038563
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention provides a method and apparatus for coding of digital video images such as bi-directionally predicted video object planes (B-VOPs),in particular, where the B-VOP and/or a reference image used to code the B-VOP is interlaced coded.
The invention is particularly suitable for use with various multimedia applications, and is compatible with the MPEG-4 Verification Model (VM) 8.0 standard (MPEG-4 VM 8.0) described in document ISO/IEC/JTC1/SC29/WG11 N1796, entitled “MPEG-4 Video Verification Model Version 8.01”, Stockholm, July 1997, incorporated herein by reference. The MPEG-2 standard is a precursor to the MPEG-4 standard, and is described in document ISO/IEC 13818-2, entitled “Information Technology—Generic Coding of Moving Pictures and Associated Audio, Recommendation H.262,” Mar. 25, 1994, incorporated herein by reference.
MPEG-4 is a coding standard which provides a flexible framework and an open set of coding tools for communication, access, and manipulation of digital audio-visual data. These tools support a wide range of features. The flexible framework of MPEG-4 supports various combinations of coding tools and their corresponding functaionalities for applications required by the computer, telecommunication, and entertainment (i.e., TV and film) industries, such as database browsing, information retrieval, and interactive communications.
MPEG-4 provides standardized core technologies allowing efficient storage, transmission and manipulation of video data in multimedia environments. MPEG-4 achieves efficient compression, object scalability, spatial and temporal scalability, and error resilience.
The MPEG-4 video VM coder/decoder (codec) is a block- and object-based hybrid coder with motion compensation. Texture is encoded with an 8×8 Discrete Cosine Transformation (DCT) utilizing overlapped block-motion compensation. Object shapes are represented as alpha maps and encoded using a Content-based Arithmetic Encoding (CAE) algorithm or a modified DCT coder, both using temporal prediction. The coder can handle sprites as they are known from computer graphics. Other coding methods, such as wavelet and sprite coding, may also be used for special applications.
Motion compensated texture coding is a well known approach for video coding, and can be modeled as a three-stage process. The first stage is signal processing which includes motion estimation and compensation (ME/MC) and a two-dimensional (2-D) spatial transformation. The objective of ME/MC and the spatial transformation is to take advantage of temporal and spatial correlations in a video sequence to optimize the rate-distortion performance of quantization and entropy coding under a complexity constraint. The most common technique for ME/MC has been block matching, and the most common spatial transformation has been the DCT.
However, special concerns arise for ME/MC of macroblocks (MBs) in B-VOPs when the MB is itself interlaced coded and/or uses reference images which are interlaced coded.
In particular, it would be desirable to have an efficient technique for providing motion vector (MV) predictors for a MB in a B-VOP. It would also be desirable to have an efficient technique for direct mode coding of a field coded MB in a B-VOP. It would further be desirable to have a coding mode decision process for a MB in a field coded B-VOP for selecting the reference image which is results in the most efficient coding.
The present invention provides a system having the above and other advantages.
SUMMARY OF THE INVENTION
In accordance with the present invention, a method and apparatus are presented for coding of digital video images such as a current image (e.g., macroblock) in a bi-directionally predicted video object plane (B-VOP), in particular, where the current image and/or a reference image used to code the current image is interlaced (e.g., field) coded.
In a first aspect of the invention, a method provides direct mode motion vectors (MVs) for a current bi-directionally predicted, field coded image such as a macroblock (ME) having top and bottom fields, in a sequence of digital video images. A past field coded reference image having top and bottom fields, and a future field coded reference image having top and bottom fields are determined. The future image is predicted using the past image such that MV
top
, a forward MV of the top field of the future image, references either the top or bottom field of said past image. The field which is referenced contains a best-match MB for a MB in the top field of the future image.
This MV is termed a “forward” MV since, although it references a past image (e.g., backward in time), the prediction is from the past image to the future image, e.g., forward in time. As a mnemonic, the prediction direction may be thought of as being opposite the direction of the corresponding MV.
Similarly, MV
bot
, a forward motion vector of the bottom field of the future image, references either the top or bottom field of the past image. Forward and backward MVs are determined for predicting the top and/or bottom fields of the current image by scaling the forward MV of the corresponding field of the future image.
In particular, MV
f,top
, the forward motion vector for predicting the top field of the current image, is determined according to the expression MV
f,top
=(MV
top
*TR
B,top
)/TR
D,top
+MV
D
, where MV
D
is a delta motion vector for a search area, TR
B,top
corresponds to a temporal spacing between the top field of the current image and the field of the past image which is referenced by MV
top
, and TR
D,top
corresponds to a temporal spacing between the top field of the future image and the field of the past image which is referenced by MV
top
. The temporal spacing may be related to a frame rate at which the images are displayed.
Similarly, MV
f,bot
, the forward motion vector for predicting the bottom field of the current image, is determined according to the expression MV
f,bot
=(MV
bot
*TR
B,bot
)/TR
D,bot
+MV
D
, where MV
D
is a delta motion vector, TR
B,bot
corresponds to a temporal spacing between the bottom field of the current image and the field of the past image which is referenced by MV
bot
, and TR
D,bot
corresponds to a temporal spacing between the bottom field of the future MB and the field of the past MB which is referenced by MV
bot
.
MV
b,top
, the backward motion vector for predicting the top field of the current MB is determined according to the equation MV
b,top
=((TR
B,top
−TR
D,top
)*MV
top
)/TR
D,top
when the delta motion vector MV
D
=0, or MV
b,top
=MV
f,top
−MV
top
when MV
D
≠0.
MV
b,bot
, the backward motion vector for predicting the bottom field of the current MB is determined according to the equation MV
b,bot
=((TR
B,bot
−TR
D,bot
)*MV
bot
)/TR
D,bot
when the delta motion vector MV
D
=0, or MV
b,bot
=MV
f,bot
−MV
bot
when MV
D
≠0.
A corresponding decoder is also presented.
In another aspect of the invention, a method is presented for selecting a coding mode for a current predicted, field coded MB having top and bottom fields, in a sequence of digital video MBs. The coding mode may be a backward mode, where the reference MB is temporally after the current MB in display order, a forward mote, where the reference MB is before the current MB, or average (e.g., bi-directional) mode, where an average of prior and subsequent reference MBs is used.
The method includes the step of determining a forward sum of absolute differences error, SAD
forward,field
for the current MB relative to a past reference MB, which corresponds to a forward coding mode. SAD
forward,field
indicates the error in pixel luminance values between the current MB and a best match MB in the past reference MB. A backward sum of absolute differences error, SAD
backward,field
for the current MB relative to a future reference MB, which corresponds to a backward coding mode is also determined. SAD
backward,field
indicates the error in pixel luminance val
Chen Xuemin
Eifrig Robert O.
Luthra Ajay
Lipsitz Barry R
Tran Phuoc
LandOfFree
Prediction and coding of bi-directionally predicted video... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Prediction and coding of bi-directionally predicted video..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Prediction and coding of bi-directionally predicted video... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3271166