Estimating rate-distortion characteristics of binary shape data

Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06542545

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to estimating rate-distortion, and more particularly, to the estimating the rate-distortion characteristics of binary shape data in a video sequence.
BACKGROUND OF THE INVENTION
Recently, a number of standards have been developed for communicating visual information. For digital images, the best known standard is JPEG, see Pennebacker et al., “JPEG Still Image Compression Standard,” Van Nostrand Reinhold, 1993. For video sequences, the most widely used standards include MPEG- 1 (for storage and retrieval of moving pictures), MPEG-2 (for digital television) and H.263, see ISO/IEC JTC1 CD 11172, MPEG, “Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media up to about 1.5 Mbit/s—Part 2: Coding of Moving Pictures Information,” 1991, LeGall, “MPEG: A Video Compression Standard for Multimedia Applications,” Communications of the ACM, Vol. 34, No. 4, pp. 46-58, 1991, ISO/IEC DIS 13818-2, MPEG-2, “Information Technology—Generic Coding of Moving Pictures and Associated Audio Information—Part 2: Video,” 1994, ITU-T SG XV, DRAFT H.263, “Video Coding for Low Bitrate Communication,” 1996, ITU-T SG XVI, DRAFT13 H.263+Q15-A-60 rev.0, “Video Coding for Low Bitrate Communication,” 1997.
These standards are relatively low-level specifications that primarily deal with spatial compression in the case of images, and spatial and temporal compression for video sequences. As a common feature, these standards perform compression on a per frame basis. With these standards, one can achieve high compression ratios for a wide range of applications.
Newer video coding standards, such as MPEG-4 (for multimedia applications), see “Information Technology—Generic coding of audio/visual objects,” ISO/IEC FDIS 14496-2 (MPEG4 Visual), Nov. 1998, allow arbitrary-shaped objects to be encoded and decoded as separate video object planes (VOP). The objects can be visual, audio, natural, synthetic, primitive, compound or combinations thereof.
This emerging standard is intended to enable multimedia applications, such as interactive video, where natural and synthetic materials are integrated, and where access is universal. For example, one might want to “cut-and-paste” a moving figure or object from one video to another. In this type of application, it is assumed that the objects in the multimedia content have been identified through some type of segmentation process, see for example, U.S. patent application Ser. No. 09/326,750 “Method for Ordering Image Spaces to Search for Object Surfaces” filed on Jun. 4, 1999 by Lin et al.
The emergence of the MPEG-4 standard has provoked a great deal of interest in object-based encoding methodologies. One of the key requirements for object-based encoding is an efficient and flexible means for coding the shape of objects. The MPEG standard has adopted a context-based arithmetic encoding (CAE) process for this purpose. For compatibility with texture coding, this process has been modified to operate at the macroblock level. A macroblock is a 16×16 group of pixels in an image or frame.
For the coding of texture, a variety of models exist. These models provide a relation between the rate and distortion that can be achieved, see for example, Chiang et al. “A new rate control scheme using quadratic rate distortion modeling,” IEEE Trans. Circuits and Systems for Video Technology, February 1997, and Hang et al. “Source model for transform video coder and its application—Part I: Fundamental theory,” IEEE Trans. Circuits and Systems for Video Technology, April 1997.
These models are most useful for rate control and have been successfully been applied to frame-based video coding. Given some bit budget for a frame, one can find a quantizer value that meets a specified constraint on the rate. Additionally, such models can be used to analyze the source or sources to be encoded in an effort to optimize coding in a computationally efficient way. In the case of shape coding, however, no such models exist.
The relationship between the rate and distortion is very different. The reason for this difference is due to the techniques used to code each type of data. In the MPEG standards, texture is coded by first partitioning the data into disjoint macroblocks. The data in these macroblocks are decorrelated using the well-known Discrete Cosine Transform (DCT), which has the property of mapping the signal energy into a small number of coefficients. From this frequency domain, loss may be introduced by quantizing the DCT coefficients. In this process, some high frequency coefficients may become zero. At this point, the 2D macroblock of quantized DCT coefficients are organized into a 1D vector using a zigzag scanning pattern. The run-lengths of these coefficients are then entropy coded using a Huffman look-up table. In this way, long zero run- lengths can be efficiently encoded. Signal variance and the quantizer value play a major role in the final energy of the DCT coefficients. Consequently, variance-like measures have been widely used as the observed data or input for rate-distortion (R-D) or rate-quantizer models.
In the MPEG-4 standard, the shape data are also partitioned into disjoint macroblocks. As with texture, the macroblocks can be encoded using several modes. For simplicity, the intra mode is only described. In this mode, three different types of blocks are considered: transparent, opaque, and border blocks. Transparent and opaque blocks are signaled as a macroblock type. For the border blocks, a template of 10 pixels is used to define the casual context for predicting the shape value of a current pixel.
FIG. 1
shows an intra-context template of ten pixels (c
0
, . . . , c
9
)
100
, and a current pixel x
101
. Note, the specific arrangement of the ten neighborhood pixels in rows of three, five, and two pixels, and the location of the current pixel with respect to the template.
A context C for the current pixel is determined according to:
C
=

k

c
k
·
2
k
Typically, the context C ranges from 0 to 1023. The context is used to index a probability table to obtain a sequence of probabilities that are used to drive an arithmetic encoder.
When shape macroblocks are coded at full-resolution (16×16 pixels), this algorithm is able to achieve a lossless representation. To reduce the bit-rate, distortion can be introduced through successive down-sampling of the original macroblock by a factor of two, four, more. In this case, the subsampling factor is transmitted along with the subsampled data, and at the decoder end, the data are upsampled back to the full-resolution.
There are two major differences between the texture and shape coding. The first difference is the entropy coding process. Texture coding uses a Huffman table to assign variable length codes to quantized DCT coefficient run-lengths, while shape coding computes a context for every pixel and associates a probability that the pixel is either zero or one. The second difference is in the way that distortion is introduced. Texture coding quantizes the DCT-domain coefficients, while shape coding down-samples the data.
Because of these differences, new methods are required to estimate the rate-distortion characteristics of object shape.
SUMMARY OF THE INVENTION
The invention provides a method that estimates rate and distortion characteristics of a video object. First and second object shape features are respectively extracted at a first and second resolution of the video object. First and second rate distortion characteristics of the video object are respectively determined from the extracted first and second object shape features according to first and second modeling parameters. The extracted object shape features can be discrete, such as states of binary shape patterns of the video object, or the object shape features can be continuous such as a set of statistical moments representing a probability density function of the video object.
In one aspect of the invention the video object is segmented into macroblocks,

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Estimating rate-distortion characteristics of binary shape data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Estimating rate-distortion characteristics of binary shape data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Estimating rate-distortion characteristics of binary shape data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3018580

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.