Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal
Reexamination Certificate
2001-04-06
2004-04-13
An, Shawn S. (Department: 2613)
Pulse or digital communications
Bandwidth reduction or expansion
Television or motion video signal
C375S240230, C375S240020, C375S240200, C375S240120, C375S240260, C375S240240, C382S251000, C382S250000, C382S246000, C382S238000
Reexamination Certificate
active
06721359
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to video coding, and in particular, to pre-quantization of motion compensated blocks for video coding at very low bit rates. The present invention provides a method and an apparatus for significantly reducing the number of computations at a video encoder.
2. Background Art
FIG. 1
illustrates the general structural blocks that are used for, and the steps involved in, the conventional digital coding of a sequence of video images. In particular, the video image is made up of a sequence of video frames
10
that are captured, such as by a digital camera, and transmitted to a video encoder
12
. The video encoder
12
receives the digital data on a frame-by-frame and macroblock-by-macroblock basis, and applies a video encoding algorithm to compress the video data. In some applications, the video encoding algorithm can also be implemented in hardware. The video encoder
12
generates an output which consists of a binary bit stream
14
that is processed by a modulator
16
. The modulator
16
modulates the binary bit stream
14
and provides the appropriate error protection. The modulated binary bit stream
14
is then transmitted over an appropriate transmission channel
18
, such as through a wireless connection (e.g., radio frequency), a wired connection, or via the Internet. The transmission can be done in an analog format (e.g., over phone lines or via satellite) or in a digital format (e.g., via ISDN or cable). The transmitted binary bit stream
14
is then demodulated by a demodulator
20
and provided to a video decoder
22
. The video decoder
22
takes the demodulated binary bit stream
24
and converts or decodes it into sequential video frames. These video frames are then provided to a display
26
, such as a television screen or monitor, where they can be viewed. If the transmission channel
18
utilizes an analog format, a digital-to-analog converter is provided at the modulator
16
to convert the digital video data to analog form for transmission, and an analog-to-digital converter is provided at the demodulator
20
to convert the analog signals back into digital form for decoding and display.
The video encoding can be embodied in a variety of ways. For example, the actual scene or image can be captured by a camera and provided to a chipset for video encoding. This chipset could take the form of an add-on card that is added to a personal computer (PC). As another example, the camera can include an on-board chip that performs the video encoding. This on-board chip could take the form of an add-on card that is added to a PC, or as a separate stand-alone video phone. As yet another example, the camera could be provided on a PC and the images provided directly to the processor on the PC which performs the video encoding.
Similarly, the video decoder
22
can be embodied in the form of a chip that is incorporated either into a PC or into a video box that is connected to a display unit, such as a monitor or television set.
Each digital video frame
10
is made up of x columns and y rows of pixels (also known as “pels”). In a typical frame
10
(see FIG.
2
), there could be 720 columns and 640 rows of pels. Since each pel contains 8 bits of data (for luminance data), each frame
10
could have over three million bits of data (for luminance data). If we include chrominance data, each pel has up to 24 bits of data, so that this number is even greater. This large quantity of data is unsuitable for data storage or transmission because most applications have limited storage (i.e., memory) or limited channel bandwidth. To respond to the large quantity of data that has to be stored or transmitted, techniques have been provided for compressing the data from one frame
10
or a sequence of frames
10
to provide an output that contains a minimal amount of data. This process of compressing large amounts of data from successive video frames is called video compression, and is performed in the video encoder
12
.
During conventional video encoding, the video encoder
12
will take each frame
10
and divide it into blocks. In particular, each frame
10
can be first divided into macroblocks MB, as shown in FIG.
2
. Each of these macroblocks MB can have, for example, 16 rows and 16 columns of pels. Each macroblock MB can be further divided into four blocks B, each block having
8
rows and
8
columns of pels. Once each frame
10
has been divided into blocks B, the video encoder
12
is ready to compress the data in the frame
10
.
FIG. 3
illustrates the different steps, and the possible hardware components, that are used by the conventional video encoder
12
to carry out the video compression. Each frame
10
is provided to a motion estimation engine
30
which performs motion estimation. Since each frame
10
contains a plurality of blocks B, the following steps will process each frame
10
on a block-by-block basis.
Motion estimation calculates the displacement of one frame in a sequence with respect to the previous frame. By calculating the displacement on a block basis, a displaced frame difference can be computed which is easier to code, thereby reducing temporal redundancies. For example, since the background of a picture or image usually does not change, the entire frame does not need to be encoded, and only the moving objects within that frame (i.e., representing the differences between sequential frames) need to be encoded. Motion estimation will predict how much the moving object will move in the next frame based on certain motion vectors, and will then take the object and move it from a previously reconstructed frame to form a predicted frame. At the video decoder
22
, the previously reconstructed frame, together with the motion vectors used for that frame, will reproduce the predicted frame at the video decoder
22
(also known as “motion compensation”). The predicted frame is then subtracted from the previously reconstructed frame to obtain an “error” frame. This “error” frame will contain zeros at the pels where the background did not move from the previously reconstructed frame to the predicted frame. Since the background makes up a large part of the picture or image, the “error” frame will typically contain many zeros.
Each frame
10
can be either an “intraframe” (also known as “I” frame) or an “interframe” (also known as “P” frame). Each I frame is coded independently, while each P frame depends on previous frames. In other words, a P frame uses temporal data from previous P frames to remove temporal redundancies. An example of a temporal redundancy can be the background of an image that does not move from one frame to another, as described above. For example, the “error” frame described above would be a P frame. In addition to I and P frames, there also exists another type of frame, known as a “B” frame, which uses both previous and future frames for prediction purposes.
Now, referring back to
FIG. 3
, all digital frames
10
received from the motion estimation engine
30
are provided to a frame-type decision engine
40
, which operates to divide all the incoming frames
10
into I frames, P frames and B frames. Whether a frame
10
becomes an I, P or B frame is determined by the amount of motion experienced by that frame
10
, the degradation of distortion, type of channel decisions, and desired user parameters, among other factors. From this point onward, all I, P and B frames are processed in the same manner.
Each block B from each frame
10
is now provided to a QP decision engine
50
which determines a QP or quantization step size number for the block or groups of blocks. This QP number is determined by a rate control mechanism which divides a fixed bit budget of a frame among different blocks, and is used by the quantization engine
80
to carry out quantization as described below.
Each block B is now provided to a DCT engine
60
. DCT of individual blocks helps in removing the spatial redundancy by bringing down the most relevant information into the lower
Bist Anurag
Hsueh Albert A-Chuan
Wu Wei
An Shawn S.
Farjami & Farjami LLP
Skyworks Solutions Inc.
LandOfFree
Method and apparatus for motion compensated video coding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for motion compensated video coding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for motion compensated video coding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3225475