Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal
Reexamination Certificate
2000-12-28
2003-09-23
Le, Vu (Department: 2613)
Pulse or digital communications
Bandwidth reduction or expansion
Television or motion video signal
C375S240090
Reexamination Certificate
active
06625213
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to the field of video encoders and, more particularly, to an encoding method based on an adaptation of the so-called matching pursuit method (MP) and provided to be implemented in a signal-to-noise ratio (SNR) scalable video encoder.
BACKGROUND OF THE INVENTION
In the multimedia domain, new interactive applications such as Internet video streaming, video database browsing or multi-quality video services are becoming widespread. This leads to an expansion of video services over heterogeneous networks (the Internet, mobile nets and In Home Digital Networks), which has raised new issues in terms of varying transport conditions (bandwidth, error rate . . . ) as well as varying consumer demands and terminal decoding capabilities (CPU, display size, application . . . ). Consequently, there is a need for new flexible and intrinsically robust video compression systems which can handle these new constraints, especially at very low bit rates. The notion of scalability is the expected functionality to address these issues. For instance, only one part of an embedded bitstream may be sent to a particular user so as to avoid network congestion, or the entire bitstream can be partially decoded according to the targeted level of quality. Current standards such as MPEG-4 are working toward video coding schemes that provide this functionality.
MPEG-4 lies on a classical block-based predictive hybrid scheme in which the prediction error coding after motion compensation is realized by means of a Discrete Cosine Transform (DCT). However, DCT appears to suffer from very annoying visual artifacts at low bit rates (especially block edge artifacts), because the number of coded coefficients becomes very low and the quantization is coarse. An alternative to DCT, better suited to low bit rate and tested in MEPG-4, is based on the so-called Matching Pursuit (MP) algorithm, described for instance in “Matching pursuits with time-frequency dictionaries”, by S. Mallat and Z. Zhang, IEEE Transactions on Signal Processing, vol.41, n°12, December 1993, pp.3397-3415.
As described in said document, MP is a time-frequency transform that expands the motion residual signal in an iterative manner, using an overcomplete dictionary of bidimensional (2D) functions called atoms. This algorithm iteratively calculates the atom minimizing the mean-squared error of the signal that has to be encoded. The multistage algorithm proceeds according to the following steps:
the highest-energy image block on a rectangular search window is computed; around this block, the projection that maximizes the inner product between the basis functions and the residual image at the current iteration is calculated: a high inner product value indicates that the corresponding atom fits the signal structure;
this weighted atom is then subtracted from the current error image to produce the new residual image for the next stage.
The method complexity comes from the fact that the algorithm searches the best atom among all atoms in the dictionary at each iteration. Table 1 shows the atom dictionary that is traditionally used in the classical block-based Matching Pursuit implementation (the parameters S
i
, F
i
and P
i
respectively designating the scale, the frequency and the phase of the atom):
i
S
i
F
i
P
i
1
1.0
0.0
0
2
3.0
0.0
0
3
5.0
0.0
0
4
7.0
0.0
0
5
9.0
0.0
0
6
12.0
0.0
0
7
14.0
0.0
0
8
17.0
0.0
0
9
20.0
0.0
0
10
1.4
1.0
&pgr;/2
11
5.0
1.0
&pgr;/2
12
12.0
1.0
&pgr;/2
13
16.0
1.0
&pgr;/2
14
20.0
1.0
&pgr;/2
15
4.0
2.0
0
16
4.0
3.0
0
17
8.0
3.0
0
18
4.0
4.0
0
19
4.0
2.0
&pgr;/4
20
4.0
4.0
&pgr;/4
The redundancy of the dictionary allows a larger degree of freedom in the choice of the signal decomposition. It is all the more interesting in the case of low bit rate coding, since MP only employs the required functions to represent accurately the signal, thus resulting in a better compression efficiency. Moreover, MP is iterative, and the residual energy therefore decreases at each iteration. The procedure is iterated until this residual energy falls down below a given threshold or the available amount of bits is exhausted. The representation is adaptive in the sense that the most significant coefficients are first extracted. Therefore, MP encodes the residual error in decreasing order of importance, which is attractive for quality (SNR) scalability. Additional coefficients corresponding to finer details in the signal structure may be generated during the next step of the algorithm.
The dictionary is generally composed of bidimensional (2D) separable Gabor functions represented by parameters governing the scale of a modulated Gaussian function, and the frequency and phase of a modulating function. The choice of the dictionary is important since it is strongly related to the convergence speed of the decomposition and thus to the efficiency of the algorithm. Gabor functions are chosen for their smooth decay at edges, which limits visual artifacts. Moreover, atoms may be positioned at any pixel location of the entire error image in order to match the signal structures. Atom overlapping is enabled, which increases the expansion smoothness.
Although the superiority of this MP method over the DCT has been clearly shown by several studies, the DCT remains the only texture coding method used in MPEG-4 (the reason is that the DCT is a very well mature method, for which many software and hardware optimizations exist). Moreover, MP is an iterative algorithm, for which no parallelism techniques exist and which is complex at the encoder side. However, MP seems well suited to texture description inside arbitrarily shaped objects and provides a progressive representation of information in an embedded bitstream, thus allowing scalability.
As current video standards have not been originally designed to provide scalability (they have been upgraded so as to include this new functionality while preserving their existing architecture), this scalability is generally achieved through a layer structure that delivers two or more separated bitstreams. This approach suffers from a lack of flexibility and a low coding efficiency of the enhancement layer. Moreover, the basic types of scalability (spatial, temporal, SNR) cannot be reached at the same time. As a result, ITU and MPEG committee experts are looking, in the domain of very low bit rate, for fully scalable video coding schemes that allow a greater interactivity. A promising approach deals with the use of a triangular mesh-based representation (i.e. a subdivision of the image domain into elementary patches, most currently triangular because this configuration is very flexible) for object-based compression and composition. Such an operation aims at defining homogeneous regions with a limited number of elements, according to the targeted accuracy of the scene description, and can even be hierarchical. The hierarchy is then initialized to an arbitrary coarse mesh that is successively refined according to a specific criterion, thus allowing a description whose accuracy increases from level to level.
It has then be proposed to exploit the main advantages provided by meshes: spatial adaptability for efficient object contour encoding, deformation capacity for object tracking, compact and robust motion estimation even at low bit rates without the visual artifacts generally encountered with other representations. Triangular meshes are attractive for functionalities such as texture animation and deformation, content-based retrieval of video-objects, augmented reality, transfiguration, etc. Moreover, their extension to hierarchical meshes is a possible response to scalability. However, only few known works are related to the subject of texture coding inside meshes, these existing studies being based on adaptations of the DCT, originally designed only for rectangular decompositions.
SUMMARY OF THE INVENTION
It is therefore an object of the invention to propose a video encoding method that uses and adapts the MP algorithm to a
Benetiere Marion
Bottreau Vincent
Pesquet-Popescu Béatrice
Gross Russell
Le Vu
LandOfFree
Video encoding method based on the matching pursuit algorithm does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Video encoding method based on the matching pursuit algorithm, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Video encoding method based on the matching pursuit algorithm will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3039335