Image analysis – Color image processing – Compression of color images
Reexamination Certificate
2001-07-20
2004-10-26
Johns, Andrew W. (Department: 2621)
Image analysis
Color image processing
Compression of color images
C348S700000
Reexamination Certificate
active
06810144
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to detection of video data and, more particularly to detection of a cartoon in a common video data stream.
BACKGROUND OF THE INVENTION
Identifying a specific kind of genre, i.e. cartoons, motion pictures, commercials, etc., in a video data signal through automated manual means has been a challenge through the years dating back to the inception of digital media.
Typically, analyzing video data for the purpose of detecting their content involves examining a video signal, which could have been encoded. The encoding, which in this case involves compression, of video signals for storage or transmission and the subsequent decoding is well-known. One of the video compression standards is MPEG, which stands for Moving Picture Expert Group. MPEG is an ISO, the International Standards Organization. “MPEG video” actually consists at the present time of two finalized standards, MPEG-1 and MPEG-2, with a third standard, MPEG-4, in the process of being finalized.
MPEG video compression is used in many current and emerging products. MPEG is at the heart of digital television set-top boxes, DSS, HDTV decoders, DVD players, video conferencing, Internet video, and other applications. These applications benefit from video compression by requiring less storage space for archived video information, less bandwidth for the transmission of the video information from one point to another, or a combination of both.
While color is typically represented by 3 color components—red (R), green (G) and blue (B), in the video compression world it is represented by luminance and chrominance components. Research into human visual system has shown that the eye is more sensitive to changes in luminance, and less sensitive to variations in chrominance. MPEG operates on a color space that effectively takes advantage of the eye's different sensitivity to luminance and chrominance information. Thus, MPEG uses the YCbCr color space to represent the data values instead of RGB, where Y is the luminance component, experimentally determined to be Y=0.299R+0.587G+0.114B, Cb is the blue color difference component, where Cb=B−Y, and Cr is the red color difference component, where Cr=R−Y.
MPEG video is arranged into a hierarchy of layers to help with error handling, random search and editing, and synchronization, for example with an audio bit-stream. The first layer, or top layer, is known as the video sequence layer, and is any self-contained bitstream, for example a coded movie, advertisement or a cartoon.
The second layer, below the first layer, is the group of pictures (GOP), which is composed of one or more groups of intra (I) frames and/or non-intra (P and/or B) pictures as illustrated in
FIG. 1. I
frames are strictly intra compressed. Their purpose is to provide random access points to the video. P frames are motion-compensated forward-predictive-coded frames. They are inter-frame compressed, and typically provide more compression than I frames. B frames are motion-compensated bidirectionally-predictive-coded frames. They are inter-frame compressed, and typically provide the most compression.
The third layer, below the second layer, is the picture layer itself. The fourth layer beneath the third layer is called the slice layer. Each slice is a contiguous sequence of raster ordered macroblocks, most often on a row basis in typical video applications. The slice structure is intended to allow decoding in the presence of errors. Each slice consists of macroblocks, which are 16×16 arrays of luminance pixels, or picture data elements, with two 8×8 arrays (depending on format) of associated chrominance pixels. The macroblocks can be further divided into distinct 8×8 blocks, for further processing such as transform coding, as illustrated in
FIG. 2. A
macroblock can be represented in several different manners when referring to the YCbCr color space. The three formats commonly used are known as 4:4:4, 4:2:2 and 4:2:0 video. 4:2:2 contains half as much chrominance information as 4:4:4, which is a full bandwidth YCbCr video, and 4:2:0 contains one quarter of the chrominance information. As illustrated in
FIG. 3
, because of the efficient manner of luminance and chrominance representation, the 4:2:0 representation allows immediate data reduction from 12 blocks/macroblock to 6 blocks/macroblock.
Because of high correlation between neighboring pixels in an image, the Discrete Cosine Transform (DCT) has been used to concentrate randomness into fewer, decorrelated parameters. The DCT decomposes the signal into underlying spatial frequencies, which then allow further processing techniques to reduce the precision of the DCT coefficients. The DCT and the Inverse DCT transform operations are defined by Equations 1 and 2 respectively:
F
⁡
(
μ
,
v
)
=
1
4
⁢
C
⁡
(
μ
)
⁢
C
⁡
(
v
)
⁢
∑
x
=
0
7
⁢
∑
y
=
0
7
⁢
f
⁡
(
x
,
y
)
⁢
cos
⁡
[
(
2
⁢
x
+
1
)
⁢
⁢
μ
⁢
⁢
π
16
]
⁢
cos
⁡
[
(
2
⁢
y
+
1
)
⁢
v
⁢
⁢
π
16
]
⁢


⁢
C
⁡
(
μ
)
=
1
2
⁢
⁢
for
⁢
⁢
μ
=
0
⁢


⁢
C
⁡
(
μ
)
=
1
⁢
⁢
for
⁢
⁢
μ
=
1
,
2
,
…
⁢
,
7
[
Equation
⁢
⁢
1
]
f
⁡
(
x
,
y
)
=
1
4
⁢
∑
μ
=
0
7
⁢
∑
v
=
0
7
⁢
C
⁡
(
μ
)
⁢
C
⁡
(
v
)
⁢
F
⁡
(
μ
,
v
)
⁢
cos
⁡
[
(
2
⁢
x
+
1
)
⁢
μπ
16
]
⁢
cos
⁡
[
(
2
⁢
y
+
1
)
⁢
v
⁢
⁢
π
16
]
[
E
⁢
⁢
q
⁢
⁢
u
⁢
⁢
a
⁢
⁢
t
⁢
⁢
i
⁢
⁢
o
⁢
⁢
n
⁢
⁢
2
]
As illustrated in
FIG. 2
, a block is first transformed from the spatial domain into a frequency domain using the DCT, which separates the signal into independent frequency bands. The lower frequency DCT coefficients toward the upper left corner of the coefficient matrix correspond to smoother spatial contours, while the DC coefficient corresponds to a solid luminance or color value of the entire block. Also, the higher frequency DCT coefficients toward the lower right corner of the coefficient matrix correspond to finer spatial patterns, or even noise within the image. At this point, the data is quantized. The quantization process allows the high energy, low frequency coefficients to be coded with greater number of bits, while using fewer or zero bits for the high frequency coefficients. Retaining only a subset of the coefficients reduces the total number of parameters needed for representation by a substantial amount. The quantization process also helps in allowing the encoder to output bitstreams at specified bitrate.
The DCT coefficients are coded using a combination of two special coding schemes: Run length and Huffman. Since most of the non-zero DCT coefficients will typically be concentrated in the upper left corner of the matrix, it is apparent that a zigzag scanning pattern, as illustrated in
FIG. 2
, will tend to maximize the probability of achieving long runs of consecutive zero coefficients.
MPEG-2 provides an alternative scanning method, which may be chosen by the encoder on a frame basis, and has been shown to be effective on interlaced video images. Each non-zero coefficient is associated with a pair of pointers: first, the coefficient's position in the block which is indicated by the number of zeroes between itself and the previous non-zero coefficient and second, by the coefficient value. Based on these two pointers, the coefficient is given a variable length code from a lookup table. This is done in a manner so that a highly probable combination gets a code with fewer bits, while the unlikely ones get longer codes. However, since spatial redundancy is limited, the I frames provide only moderate compression. The P and B frames are where MPEG derives its maximum compression efficiency. The efficiency is achieved through a technique called motion compensation based
Agnihotri Lalitha
Jasinschi Radu S.
McGee Tom
Nesvadba Jan
Alavi Amir
Johns Andrew W.
Koninklijke Philips Electronics , N.V.
Vodopia John F.
LandOfFree
Methods of and system for detecting a cartoon in a video... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods of and system for detecting a cartoon in a video..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods of and system for detecting a cartoon in a video... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3327535