Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal
Reexamination Certificate
2000-02-03
2003-12-30
Kelley, Chris (Department: 2613)
Pulse or digital communications
Bandwidth reduction or expansion
Television or motion video signal
C375S240140, C375S240210, C375S240270
Reexamination Certificate
active
06671319
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates generally to methods and apparatus for motion estimation for video image processing, and in particular, improved methods and apparatus for determining motion vectors between video image pictures with a hierarchical motion estimation technique using block-matching and integral projection data.
Advancements in digital technology have produced a number of digital video applications. Digital video is currently used in digital and high definition TV, camcorders, videoconferencing, computer imaging, and high-quality video tape recorders. Uncompressed digital video signals constitute a huge amount of data and therefore require a large amount of bandwidth and memory to store and transmit. Many digital video systems, therefore, reduce the amount of digital video data by employing data compression techniques that are optimized for particular applications. Digital compression devices are commonly referred to as “encoders”; devices that perform decompression are referred to as “decoders”. Devices that perform both encoding and decoding are referred to as “codecs”.
In the interest of standardizing methods for motion picture video compression, the Motion Picture Experts Group (MPEG) issued a number of standards. MPEG-
1
is a compression algorithm intended for video devices having intermediate data rates. MPEG-
2
is a compression algorithm for devices using higher data rates, such as digital high-definition TV (HDTV), direct broadcast satellite systems (DBSS), cable TV (CATV), and serial storage media such as digital video tape recorders (VTR). Digital Video (DV) format is another format used widely in consumer video products, such as digital camcorders. The DV format is further explained in the SD Specifications of Consumer-Use Digital VCRs dated December 1994.
A video sequence is composed of a series of still pictures taken at closely spaced intervals in time that are sequentially displayed to provide the illusion of continuous motion. Each picture may be described as a two-dimensional array of samples, or “pixels”. Each pixel describes a specific location in the picture in terms of brightness and hue. Each horizontal line of pixels in the two-dimensional picture is called a raster line. Pictures may be comprised of a single frame or two fields.
When sampling or displaying a frame of video, the video frame may be “interlaced” or “progressive.” Progressive video consists of frames in which the raster lines are sequential in time, as shown in FIG.
1
A. The MPEG-
1
standard allows only progressive frames. Alternatively, each frame may be divided into two interlaced fields, as shown in FIG.
1
B. Each field has half the lines in the full frame and the fields are interleaved such that alternate lines in the frame belong to alternative fields. In an interlaced frame composed of two fields, one field is referred to as the “top” field, while the other is called the “bottom” field. The MPEG-
2
standard allows both progressive and interlaced video.
One of the ways MPEG applications achieve data compression is to take advantage of the redundancy between neighboring pictures of a video sequence. Since neighboring pictures tend to contain similar information, describing the difference between neighboring pictures typically requires less data than describing the new picture. If there is no motion between neighboring pictures, for example, coding the difference (zero) requires less data than recoding the entire new picture.
An MPEG video sequence is comprised of one or more groups of pictures, each group of which is composed of one or more pictures of type I-, P-, or B-. Intra-coded pictures, or “I-pictures,” are coded independently without reference to any other pictures. Predictive-coded pictures, or “P-pictures,” use information from preceding reference pictures, while bidirectionally predictive-coded pictures, or “B-pictures,” may use information from preceding or upcoming pictures, both, or neither.
Motion estimation is the process of estimating the displacement of a portion of an image between neighboring pictures. For example, a moving soccer ball will appear in different locations in adjacent pictures. Displacement is described as the motion vectors that give the best match between a specified region, e.g., the ball, in the current picture and the corresponding displaced region in a preceding or upcoming reference picture. The difference between the specified region in the current picture and the corresponding displaced region in the reference picture is referred to as “residue”.
In general, two known types of motion estimation methods used to estimate the motion vectors are pixel-recursive algorithms and block-matching algorithms. Pixel-recursive techniques predict the displacement of each pixel iteratively from corresponding pixels in neighboring frames. Block-matching algorithms, on the other hand, estimate the displacement between frames on a block-by-block basis and choose vectors that minimize the difference.
In conventional block-matching processes, the current image to be encoded is divided into equal-sized blocks of pixel information. In MPEG-
1
and MPEG-
2
video compression standards, for example, the pixels are grouped into “macroblocks,” each consisting of a 16×16 sample array of luminance samples together with one 8×8 block of samples for each of the two chrominance components. The 16×16 array of luminance samples further comprises four 8×8 blocks that are typically used as input blocks to the compression models.
FIG. 2
illustrates one iteration of a conventional block-matching process. Current picture
220
is shown divided into blocks. Each block can be any size; however, in an MPEG device, for example, current picture
220
would typically be divided into blocks each consisting of 16×16-sized macroblocks. To code current picture
220
, each block in current picture
220
is coded in terms of its difference from a block in a previous picture
210
or upcoming picture
230
. In each iteration of a block-matching process, current block
200
is compared with similar-sized “candidate” blocks within search range
215
of preceding picture
210
or search range
235
of upcoming picture
230
. The candidate block of the preceding or upcoming picture that is determined to have the smallest difference with respect to current block
200
is selected as the reference block, shown in
FIG. 2
as reference block
250
. The motion vectors and residues between reference block
250
and current block
200
are computed and coded. Current picture
220
can be restored during decompression using the coding for each block of reference picture
210
as well as motion vectors and residues for each block of current picture
220
. The motion vectors associated with the preceding reference picture are called forward motion vectors, whereas those associated with the upcoming reference picture are called backward motion vectors.
Difference between blocks may be calculated using any one of several known criterion, however, most methods generally minimize error or maximize correlation. Because most correlation techniques are computationally intensive, error-calculating methods are more commonly used. Examples of error-calculating measures include mean square error (MSE), mean absolute distortion (MAD), and sum of absolute distortions (SAD). These criteria are described in Joan L. Mitchell et al.,
MPEG Video Compression Standard
, International Thomson Publishing (1997), pp. 284-86.
A block-matching algorithm that compares the current block to every candidate block within the search range is called a “full search”. In general, larger search areas generally produce a more accurate displacement vector, however, the computational complexity of a full search is proportional to the size of the search area and is too slow for some applications. A full search block-matching algorithm applied on a macroblock of size 16×16 pixels over a search range of ±N pixels with one pixel accuracy, for example, requires
Chang Ching-Fang
Yanagihara Naofumi
Finnegan Henderson Farabow Garrett & Dunner L.L.P.
Kelley Chris
Parsons Charles
Sony Corporation
LandOfFree
Methods and apparatus for motion estimation using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for motion estimation using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for motion estimation using... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3100783