Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal
Reexamination Certificate
2000-07-06
2003-09-30
Kelley, Chris (Department: 2613)
Pulse or digital communications
Bandwidth reduction or expansion
Television or motion video signal
C375S240260
Reexamination Certificate
active
06628710
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to a method for an automatic extraction of the structure of a video sequence that corresponds to successive frames, comprising the following steps:
(1) a shot detection step, provided for detecting the boundaries between consecutive shots—a shot being a set of contiguous frames without editing effects—and using a similarity criterion based on a computation of the mean displaced frame difference curve and the detection of the highest peaks of said curve;
(2) a partitioning step, provided for splitting each shot into sub-entities, called micro-segments;
(3) a clustering step, provided for creating a final hierarchical structure of the processed video sequence.
The invention also relates to a corresponding method for indexing data, to a device for carrying out said method, and to an image retrieval system in which said method is implemented. The technique of the invention will be particularly well-suited for use in applications that are in relation with the MPEG-7 standard.
BACKGROUND OF THE INVENTION
The future MPEG-7 standard is intended to specify a standard set of descriptors that can be used to describe various types of multimedia information. The description thus associated with a given content allows fast and efficient searching for material of a user's interest. The invention relates more specifically to the case of representation of video sequences, intended to provide for users modalities of search information. For a video sequence, the goal of a table of contents description of this document is to define the structure of this sequence in a hierarchical fashion, similarly to what is done for books, in which texts are divided into chapters and paragraphs: the original sequence is subdivided into sub-sequences, which may be further divided into shorter sub-entities.
A method for defining such a structure is described in a european patent application previously filed by the Applicant with the number 99402594.8 (PHF99593). According to said document, the method is divided into three steps, which are, as shown in FIG.
1
: a shot detection step
11
(in a sequence of pictures, a video shot is a particular sequence which shows a single background, i.e. a set of contiguous frames without editing effects), a partitioning step
12
, for the segmentation of the detected shots into entities exhibiting consistent camera motion characteristics, and a shot clustering step
13
.
Concerning the shot detection step, several solutions were already proposed in the document “A survey on the automatic indexing of video data”, R. Brunelli and al., Journal of Visual Communication and Image Representation, volume 10, number 2, June 1999, pp. 58 78-112. In the method described in the cited document, the first step
11
detects the transitions between consecutive shots by means of two main sub-steps: a computation sub-step
111
, allowing to determine a mean Displaced Frame Difference (mDFD) curve, and a segmentation sub-step
112
.
The mDFD curve computed during the sub-step
111
is obtained taking into account both luminance and chrominance information. With, for a frame at time t, the following definitions:
luminance
Y={f
k
(
i, j, t
)}
k=Y
(1)
chrominance components (
U, V
)={
f
k
(
i, j, t
)}
k=U, V
(2)
the DFD is given by
DFD
K
(
i,j; t
−1,
t+
1)=
f
k
(
i,j, t+
1)−
f
k
(
i−d
x
(
i,j
),
j−d
y
(
i,j
),
t−
1) (3)
and the mDFD by:
mDFD
⁡
(
t
)
=
1
I
x
⁢
I
y
⁢
∑
k
Y
,
U
,
V
⁢
w
k
⁢
∑
i
,
j
I
x
⁢
I
y
⁢
&LeftBracketingBar;
DFD
k
⁡
(
i
,
j
;
⁢
t
-
1
,
t
+
1
)
&RightBracketingBar;
(
4
)
where I
x
, I
y
are the image dimensions and w
k
the weights for Y, U, V components. An example of the obtained curve (and of the corresponding filtered one), showing ten shots s
1
to s
10
, is illustrated in
FIG. 2
with weights that have been for instance set to {w
Y
, w
U
, w
V
}={1, 3, 3}. In this example, the highest peaks of the curve correspond to the abrupt transitions from one frame to the following one (frames
21100
,
21195
,
21633
,
21724
), while, on the other side, the oscillation from frame
21260
to frame
21279
corresponds to a dissolve (a gradual change from one camera record to another one by simple linear combination of the frames involved in this dissolve process) and the presence of large moving foreground objects in frames
21100
-
21195
and
21633
-
21724
creates high level oscillations of the mDFD curve.
The sub-step
112
, provided for detecting the video editing effects and segmenting the mDFD curve into shots, uses a threshold-based segmentation to extract the highest peaks of the mDFD curve (or another type of mono-dimensional curve), as described for instance in the document “Hierarchical scene change detection in an MPEG-2 compressed video sequence”, T. Shin and al, Proceedings of the 1998 IEEE International Symposium on Circuits and Systems, ISCAS′98, vol.4, March 1998, pp.253-256.
The second step
12
is a temporal segmentation provided for splitting each detected shot into sub-entities presenting a very high level of homogeneity on camera motion parameters. It consists of two sub-steps: an oversegmentation sub-step
121
, intended to dividing each shot into so-called micro-segments which must show a very high level of homogeneity, and a merging sub-step
122
.
In order to carry out the first sub-step
121
, it is necessary to define first what will be called a distance, (the distances thus defined allow to compare the micro-segments), and also a parameter allowing to assess the quality of a micro-segment or a partition (=a set of micro-segments). In both cases, a motion histogram, in which each one of the bins shows the percentage of frames with a specific type of motion and which is defined as indicated by the following relation (5), is used:
H
s
⁡
[
i
]
=
N
i
L
s
(
5
)
where s represents the label of the concerned micro-segment inside the shot, i the motion type (these motions are called trackleft, trackright, boomdown, boomup, tiltdown, tiltup, panleft, panright, rollleft, rollright, zoomin, zoomout, fixed), L
s
the length of the micro-segment s, and N
i
the number of frames of the micro-segment s with motion type i (it is possible that &Sgr;H
S
[i]>1, since different motions can appear concurrently).
A micro-segment is assumed to be perfectly homogeneous (or to have a very high level of homogeneity) when it presents a single combination of camera motion parameters along all its frames, or to be not homogeneous when it presents important variations on these parameters. The micro-segment homogeneity is computed on its histogram (relation (5)): if a micro-segment is perfectly homogeneous, the histogram bins are equal either to 0 (the considered motion does not appear at all) or to 1 (the motion appears ont he whole segment), while if it is not, the bins can present intermediate values. The measure of the micro-segment homogeneity is then obtained by measuring how much its histogram differs from the ideal one (i.e. it is computed how much the bins of the histogram differ from 1 or 0). The distance corresponding to bins with high values is the difference between the bin value and 1; analogously, for bins with small values, the distance is the bin value itself. An example of histogram is shown in
FIG. 3
, the axes of which indicate for each motion type its proportion (=motion presence): two motion types introduce some error because the motion does not appear in all the frames of the micro-segment (panleft PL and zoomin ZI), and two other ones (boomdown BD and rollright RR) introduce some error for the opposite reason.
Mathematically, the homogeneity of a micro-segment s is given by the relation (6):
H
⁡
(
s
)
=
∑
i
⁢
e
⁡
(
i
)
(
6
)
where:
e(i)=1−H
S
[i] if H
S
[i] 0,5
e(i)=H
S
[i] if H
S
[i]<0,5
Hs[i&r
Llach-Pinsach Joan
Salembier Philippe
Czekaj David
Kelley Chris
Koninklijke Philips Electronics , N.V.
Russell Gross
LandOfFree
Automatic extraction method of the structure of a video... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic extraction method of the structure of a video..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic extraction method of the structure of a video... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3013416