Image analysis – Image compression or coding – Adaptive coding
Reexamination Certificate
1999-01-28
2002-03-19
Chen, Wenpeng (Department: 2624)
Image analysis
Image compression or coding
Adaptive coding
C382S236000, C375S240020
Reexamination Certificate
active
06360017
ABSTRACT:
FIELD OF THE INVENTION
The invention relates generally to video encoding which utilizes motion-compensated video compression techniques, and more particularly to perceptual-based spatio-temporal segmentation for motion estimation of video signals.
BACKGROUND OF THE INVENTION
The age of digital video communications is arriving slower than many had anticipated. Picturephone (1970s) and videophone (1990s) were not commercial successes because they did not provide full color, full motion video and were not cost effective. Desktop video, using windows in computer monitors or TV screens, requires special purpose chips or graphics accelerators to perform the encoding operations. The chips usually come mounted on video boards that are expensive and whose installation intimidates most users. One main reason that these video processors are necessary is attributable to the use of block-based motion compensation, although two-dimensional block-based transforms and lossless compression of quantized transform coefficients add to the computational burden. Motion compensation accounts for over 60% of the computational effort in most video compression algorithms. Although there are algorithms that avoid motion compensation, such as Motion-JPEG, they tend to consume some ten times more transmission bandwidth or storage space because they fail to capitalize on the interframe correlation between successive frames of video. This is especially critical in video conferencing and distance learning applications, thus, rendering these algorithms uncompetitive in such applications.
Sources such as speech and video are highly correlated from sample to sample. This correlation can be used to predict each sample based on a previously reconstructed sample, and then encode the difference between the predicted value and the current sample. A main objective of motion compensation is to reduce redundancy between the adjacent pictures. There are two kinds of redundancy well known in video compression: (i) spatial (intra-frame) redundancy; and (ii) temporal (inter-frame) redundancy. Temporal correlation usually can be reduced significantly via forward, backward, or interpolative prediction based on motion compensation. The remaining spatial correlation in the temporal prediction error images can be reduced via transform coding. In addition to spatial and temporal redundancies, perceptual redundancy has begun to be considered in video processing technology, e.g., N. S. Jayant et al., “Signal Compression Based on Models of Human Perception,” Proceedings of IEEE, Volume 10, Oct. 1993.
FIG. 1
illustrates a block diagram of a widely used video encoder
10
for encoding video signals for transmission, storage, and/or further processing. The encoder
10
includes a motion estimator
12
and a signal subtractor
14
, both coupled to the input of the encoder. The encoder
10
also includes a transformer (e.g., a discrete cosine transform or DCT generator)
16
coupled to the signal subtractor
14
, a quantizer
18
coupled to the transformer
16
, and an entropy encoder
20
coupled to the quantizer
18
and the output of the encoder
10
. An inverse transformer (e.g., an inverse DCT generator)
22
is also included and coupled between the quantizer
18
and the entropy encoder
20
. The encoder
10
also includes a signal combiner
24
coupled to the inverse transformer
22
, a delay
26
coupled to the signal combiner
24
, and a motion compensator
28
coupled to the delay
26
, the signal subtractor
14
, the signal combiner
24
, and the motion estimator
12
.
It is known that motion estimation and motion compensation, as described in detail in Y. Nakaya et al., “Motion Compensation Based on Spatial Transformations,” IEEE Transactions on Circuits and Systems for Video Technology,” Volume 4, Number 3, Pages 339-356, June 1994, can be used to improve the inter-frame prediction by exploiting the temporal redundancy in a sequence of frames. The motion estimator
12
performs n×n (n typically equals 16) block-based matching of the k
th
input frame F
k
using the k−1
st
decompressed frame {circumflex over (F)}
k−1
(generated by delay
26
) as the reference. The matching criterion usually employed is mean absolute error (MAE), although mean square error (MSE) may alternatively be employed. For the i
th
macroblock, the error measure en
i
(d) for the displacement vector d between F
k
and {circumflex over (F)}
k−1
is:
en
i
⁡
(
d
)
=
∑
(
x
,
y
)
∈
⁢
B
⁢
⁢
&LeftDoubleBracketingBar;
F
k
⁡
(
x
,
y
)
-
F
^
k
-
1
⁡
(
x
-
d
,
y
-
d
)
&RightDoubleBracketingBar;
(
1
)
where B is the measurement block being predicted. It is evident that a motion vector obtained based on MSE is ∥x∥=x
2
and a motion vector obtained based on MAE is ∥x∥=|x| in equation (1). MAE is usually used, rather than MSE, because MAE is free of multiplications and provides similar results in terms of predictive error. The offset between each block in F
k
and the block in {circumflex over (F)}
k−1
that best matches it is called the motion vector for that block. That is, the motion vector mv
i
for macroblock i is:
mv
i
=
arg
⁢
⁢
min
d
⁢
∈
⁢
S
⁢
en
i
⁡
(
d
)
(
2
)
where S is the search area. Interpolation schemes allow the motion vectors to achieve fractional-pel accuracy, as described in ITU-T Recommendation H.263, “Video Coding For Low Bit Rate Communication,” December 1995. Motion estimation is computationally demanding in that both signals, F
k
and {circumflex over (F)}
k−1
entering the motion estimator
12
are high rate and, thus, the operations that have to be performed on them are computationally intensive even if the search for the best-matching block is performed only hierarchically rather than exhaustively. The result of the motion estimation is the set of motion vectors M
k
for k
th
frame.
The M
k
are usually losslessly compressed and then conveyed to the transmission channel for immediate or eventual access by the decoder. Also, the M
k
are fed back to the motion compensator
28
in the prediction loop of the encoder. The M
k
constitute a recipe for building a complete frame, herein referred to as {tilde over (F)}
k
, by translating the blocks of {circumflex over (F)}
k−1
. The motion compensated frame {tilde over (F)}
k
is subtracted pixel-wise from the current input frame F
k
, in signal subtractor
14
, to produce a difference frame D
k
, often referred to as the displaced frame difference (DFD), as further described in T. Ebrahimi et al., “New Trends in Very Low Bit Rate Video Coding,” Proceedings of the IEEE, Volume 83, Number 6, Pages 877-891, June 1995; and W. P. Li et al., “Vector-based Signal Processing and Quantization For Image and Video Compression,” Proceedings of the IEEE, Volume 83, Number 2, Pages 317-335, February 1995. The remaining spatial correlation in D
k
is eliminated by the transformer
16
and the quantizer
18
. The transformer may, for example, be a discrete cosine transform (DCT) generator which generates DCT coefficients for macroblocks of frames. The quantizer then quantizes these coefficients. The lossy version of D
k
, denoted as {circumflex over (D)}
k
and generated by inverse transformer
22
, and the motion compensated frame {tilde over (F)}
k
are used in the compressor/feedback loop to reconstruct the reference frame {circumflex over (F)}
k
for the next input frame {circumflex over (F)}
k−1
. Finally, the Huffman (or arithmetic) coded lossy compressed version of D
k
, generated by the entropy encoder
20
, is transmitted to the decoder. It is to be appreciated that
FIG. 1
represents a generic coder architecture described in the current video codec (coder/decoder) standards of H.261, H.263, MPEG-1, and MPEG-2. Further details on these standards are respectively described in: M. Liou, “Overview of the P*64 Kbit/s Video Coding Standard,” Communications of the ACM, Volume 34, Number 4, Pages 59-63, April 1991; ITU-T Rec
Chiu Yi-Jen
Hartung John
Jacquin Arnaud Eric
Safranek Robert James
Chen Wenpeng
Lucent Technologies - Inc.
Ryan & Mason & Lewis, LLP
LandOfFree
Perceptual-based spatio-temporal segmentation for motion... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Perceptual-based spatio-temporal segmentation for motion..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Perceptual-based spatio-temporal segmentation for motion... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2843331