Image analysis – Image compression or coding – Interframe coding
Reexamination Certificate
1999-03-31
2003-06-17
Mehta, Bhavesh M. (Department: 2621)
Image analysis
Image compression or coding
Interframe coding
C382S278000, C348S456000, C375S240130
Reexamination Certificate
active
06580829
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to image processing, and, in particular, to video compression processing.
2. Description of the Related Art
The primary goal in video compression processing is to reduce the number of bits used to represent sequences of video images while still maintaining an acceptable level of image quality during playback of the resulting compressed video bitstream. Another goal in many video compression applications is to maintain a relatively uniform bit rate, for example, to satisfy transmission bandwidth and/or playback processing constraints. Video compression processing often involves the tradeoff between bit rate and playback quality. This tradeoff typically involves reducing the average number of bits used to encode images in the original video sequence by selectively decreasing the playback quality of each image that is encoded into the compressed video bitstream.
Many video compression systems, such as those based on an MPEG (Moving Picture Experts Group) standard, gain much of their compression capability by making predictions from other, previously coded pictures. Although the term “frame” is used throughout in this specification, those skilled in the art will understand that the teachings of this specification apply generally to video pictures, a term that covers both video frames and video fields.
MPEG coders have three main types of frames: I, P, and B. An I frame is coded independently without reference to any other frames. A P frame is coded as the motion-compensated difference between itself and a reference frame derived from the previously coded P or I frame. I and P frames are referred to as anchor frames, because they can be used to generate reference frames for coding other frames. Macroblocks in a B frame are coded as the difference between itself and either (1) the previous anchor frame (i.e., forward prediction), (2) the next anchor frame (i.e., backward prediction), or (3) the average of the previous and next anchor frames (i.e., interpolated or bidirectional prediction). B frames are non-anchor frames that are never used to predict other frames. Thus, errors in B frames do not propagate to other frames and are one picture in duration. Note that the human visual system objects less to errors of very short duration.
Although the MPEG standards make no restrictions on a particular sequence of frame types, many coders simply use a repeating pattern of I, P, and B frames. Since B frames can be predicted from not only a previous frame, but a future frame as well, B frames must be sent to the decoder after the anchor frames that surround them. To make this “out-of-order” decoding efficient, the frames are encoded into the corresponding compressed video bitstream out of temporal order.
FIG. 1
shows a block diagram of a conventional video compression system
100
for reordering and encoding a stream of video frames into a compressed video bitstream. System
100
implements a video coding scheme that is based on a repeating frame pattern having two B frames between each pair of consecutive anchor frames (e.g., IBBPBBPBBPBBPBBPBB for a 15-frame GOP (group of pictures)). Table I in
FIG. 2
shows the relationship between the temporal order of frames (as they appear in the input video stream) and the order in which those frames are coded into a compressed video bitstream by system
100
. Table I also shows the tap position of switch
104
used to reorder the video frames in order to generate the bitstream.
Frames are presented at the video input of system
100
in temporal order starting with Frame
0
, then Frame
1
, etc. As each new frame is presented at the video input, the frame stored in frame-delay buffer
102
c
is made available at tap T
0
and the new frame is made available at tap T
3
. Depending on the position selected for two-position switch
104
, encoder
106
codes either the frame at tap T
0
or the frame at tap T
3
. As encoder
106
codes the selected frame, the frame stored in frame-delay buffer
102
b
is moved into frame-delay buffer
102
c
, the frame stored in frame-delay buffer
102
a
is moved into frame-delay buffer
102
b
, and the new frame is stored into frame-delay buffer
102
a.
At the beginning of a video stream, when Frame
0
is presented at the video input and therefore at tap T
3
, switch
104
is positioned at tap T
3
to enable encoder
106
to encode Frame
0
as an I frame (i.e., I
0
in Table I). Processing of encoder
106
is then temporarily suspended until all the frame-delay buffers
102
are filled, such that Frame
0
is stored in buffer
102
c
and presented at tap T
0
, Frame
1
is stored in buffer
102
b
, Frame
2
is stored in buffer
102
a
, and Frame
3
is presented at the video input and at tap T
3
. At this time, switch
104
is again positioned at tap T
3
so that Frame
3
can be coded as a P frame (i.e., P
3
in Table I).
In the next processing cycle, Frame
1
is stored in buffer
102
c
and presented at tap T
0
, Frame
2
is stored in buffer
102
b
, Frame
3
is stored in buffer
102
a
, and Frame
4
is presented at the video input and at tap T
3
. At this time, switch
104
is positioned at tap T
0
so that Frame
1
can be coded as a B frame (i.e., B
1
in Table I).
In the next processing cycle, Frame
2
is stored in buffer
102
c
and presented at tap T
0
, Frame
3
is stored in buffer
102
b
, Frame
4
is stored in buffer
102
a
, and Frame
5
is presented at the video input and at tap T
3
. At this time, switch
104
is again positioned at tap T
0
so that Frame
2
can be coded as a B frame (i.e., B
2
in Table I).
In the next processing cycle, Frame
3
is stored in buffer
102
c
and presented at tap T
0
, Frame
4
is stored in buffer
102
b
, Frame
5
is stored in buffer
102
a
, and Frame
6
is presented at the video input and at tap T
3
. At this time, switch
104
is repositioned at tap T
3
so that Frame
6
can be coded as a P frame (i.e., P
6
in Table I).
This processing is continued for each frame in each 15-frame GOP in the video stream with switch
104
positioned at tap T
0
to code a B frame and at tap T
3
to code an anchor (I or P) frame according to the GOP pattern (IBBPBBPBBPBBPBB), as indicated in Table I.
Some video streams contain flash frames. For purposes of this specification, a sequence of flash frames is defined as set of one or more consecutive frames that are relatively poorly correlated to both the frame immediately preceding the flash sequence and the frame immediately following the flash sequence, where the frames immediately before and after the flash sequence are themselves relatively well-correlated to each other. A common example of a flash sequence is the phenomenon produced by still picture photographers at events, such as basketball games. A photographer's flash usually produces, in a video stream, a single frame that is mostly white, or at least with an intensity much higher than the frames both before and after. Such a flash frame (i.e., a one-frame flash sequence) will be poorly correlated to the temporally surrounding frames.
Some encoders are able to detect “scene cuts” by looking for a pair of consecutive frames that are highly uncorrelated to one another, where the degree of correlation may be characterized using a distortion measure, such as the mean absolute difference (MAD) of the motion-compensated interframe pixel differences. In response, such encoders may insert an I frame at the next scheduled anchor frame time (i.e., potentially replacing a regularly scheduled P frame with an I frame). Such encoders will mistakenly identify a flash sequence as a scene cut, based on the large distortion between the first frame in the flash sequence and its immediately preceding frame. Such a scene cut will be detected for individual, isolated flash frames as well as multi-frame flash sequences.
Assuming that the events that cause single flash frames (e.g., photographers' flashes) occur randomly with respect to the timing of the repeating GOP pattern, on average, a f
Hurst, Jr. Robert Norman
Lee Jung-woo
Burke William J.
Chawan Sheela
Sarnoff Corporation
LandOfFree
Detecting and coding flash frames in video data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Detecting and coding flash frames in video data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Detecting and coding flash frames in video data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3117164