Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-07-12
2002-02-26
Korzuch, William (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S205000, C704S203000, C704S226000, C704S207000, C381S094300, C324S076240
Reexamination Certificate
active
06351729
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to methods for the spectral analysis of time-sampled signals. More particularly, the invention relates to methods for producing spectrograms of human speech or other time-varying signals.
ART BACKGROUND
It is useful, in many fields of technology, to determine the changing frequency content of time-dependent signals. For example, the spectral analysis of speech is useful both for automatic speech recognition and for speech coding. As a further example, the spectral analysis of marine sounds is useful for acoustically aided undersea navigation.
When an acoustic signal, or other signal of interest, is sampled at discrete intervals, a time series is produced. A time series is said to be stationary if its statistical properties are invariant under displacements of the series in time. Although few of the signals of interest are truly stationary, many change slowly enough that, for purposes of spectral analysis, they can be treated as locally stationary over a limited time interval.
The spectral analysis of stationary time series has been a subject of research for one hundred years. The earliest attempts to obtain a representation, or periodogram, of the power spectral density of the time series x(0), x(1), . . . , x(n), . . . , x(N−1) involved summing N terms of the form x(n)×e
in&ohgr;
and then taking the squared magnitude of the result. (The symbol &ohgr; represents frequency in radians per second. The symbol ƒ, used below, represents frequency in cycles per second. Thus, &ohgr;=2&pgr;ƒ.) This operation was performed for each of N/
2
+1 discrete frequencies ƒ. This was unsatisfactory for several reasons. One reason is that the result is not statistically consistent. That is, the variance of the resulting periodogram does not decrease as the sample size N is increased. A second reason is that the result can be severely biased by truncation effects, leading to inaccurate representation of processes having continuous spectra.
An improved spectrum estimate (it is an estimate because it is derived from a finite sample of the original signal) is obtained from the following method, which is conveniently described in two steps:
First, form the spectrum estimate {tilde over (S)}
D
(&ohgr;) using a data window D
0
, D
1
, . . . , D
n
, . . . , D
N−1
to taper the sampled data sequence, according to:
S
~
D
⁡
(
ω
)
=
&LeftBracketingBar;
∑
n
=
0
N
-
1
⁢
x
⁡
(
n
)
⁢
D
n
⁢
ⅇ
-
ⅈ
⁢
⁢
ω
⁢
⁢
n
&RightBracketingBar;
2
.
(
1
)
The primary purpose of the data window is to control bias. That is, by tapering the sampled sequence, it is possible to mitigate the tendency of the frequency components where the power is highest to dominate the spectrum estimate.
Then, smooth the estimate {tilde over (S)}
D
(&ohgr;) by convolving it with a spectral window G(&ohgr;) to form the smoothed spectrum estimate {tilde over (S)}(&ohgr;) according to {tilde over (S)}(&ohgr;)={tilde over (S)}
D
(&ohgr;)*G(&ohgr;),
where * represents the convolution operation. The primary purpose of the spectral window is to make the spectrum estimate consistent. The spectral window is generally pulse-shaped in frequency space, and the width of this pulse is approximately the bandwidth of the spectrum estimate. Increasing the bandwidth decreases the variance of the resulting estimate, but it also reduces the frequency resolution of the estimate.
Although useful, the smoothed spectrum estimate {tilde over (S)}(&ohgr;) as described above has several drawbacks. The smoothing operation may obscure the presence of spectral lines. Moreover, the data window tends to give different weights to equally valid data points. The data window also tends to reduce statistical efficiency. That is, the amount of data needed to obtain a reliable estimate may exceed the theoretical ideal by a factor of two or more.
Recently, a new spectrum estimate having improved properties was proposed. This estimate is described, e.g., in D. J. Thomson, “Spectrum Estimation and Harmonic Analysis,”
Proc. IEEE
70 (September 1982) 1055-1096 (hereafter, “Thomson (1982)”). This estimate is computed using a sequence of window functions referred to as Slepian functions when expressed as functions of frequency, and as Slepian sequences when expressed as sequences in the time domain. Slepian functions are related to Slepian sequences through the Fourier transform. Because multiple window functions are used, such an estimate is referred to as a multitaper spectrum estimate, or occasionally as a multiple-window spectrum estimate.
The properties of Slepian functions and Slepian sequences are described in Thomson (1982), cited above, and in D. Slepian, “Prolate Spheroidal Wave Functions, Fourier Analysis, and Uncertainty—V: The Discrete Case,”
Bell System Tech. J.
57 (1978) 1371-1430, hereafter referred to as Slepian (1978). Briefly, the Slepian sequences depend parametrically on the size N of the data sample and on the chosen bandwidth W. (From practical considerations, the bandwidth is generally chosen to lie between 1/N and 20/N, and at least as a starting value it is typically about 5/N.) It should be noted that throughout this discussion, the well-known convention is used wherein all frequencies are normalized such that the Nyquist frequency equals 0.5.
Given values for these parameters, each Slepian sequence
v
(k)
(N,W) is a k'th solution to a matrix eigenvalue equation
M
v
=&lgr;
k
v
, where the element in the n'th row and m'th column of the matrix is given by:
sin
⁢
⁢
2
⁢
⁢
π
⁢
⁢
W
⁢
(
n
-
m
)
π
⁢
(
n
-
m
)
,
n=1, 2, . . . , N, m=1, 2, . . . , N.
If the eigenvalues &lgr;
k
of this equation are arranged in descending order, approximately the first K of them are very close to (but less than) unity. K is the greatest integer less than or equal to 2NW. At least for moderate values of N, the solutions are readily computed using standard techniques. (For such purpose, it is advantageous to use an alternative representation of these sequences which uses a matrix in tridiagonal form. For further information, see Slepian (1978), which is hereby incorporated by reference.)
The Slepian functions U
k
(N,W;ƒ) are computed from corresponding Slepian sequences through the formula
U
k
⁡
(
N
,
W
;
f
)
=
ϵ
k
⁢
∑
n
=
0
N
-
1
⁢
v
n
(
k
)
⁡
(
N
,
W
)
⁢
ⅇ
ⅈ
⁢
⁢
2
⁢
⁢
π
⁢
⁢
f
⁡
[
n
-
N
-
1
2
]
,
(
2
)
where &egr; is 1 when k is even, and i when k is odd.
Of any function which is the Fourier transform of an index limited sequence, the k=0 Slepian function has the greatest fractional energy concentration within the frequency range between −W and W. More generally, the k'th eigenvalue &lgr;
k
expresses the fraction of energy retained within this frequency range by the corresponding Slepian function. As noted, this fraction is very close to unity for the first K Slepian functions.
The spectrum estimate of Thomson (1982) is computed from K eigencoefficients y
0
(ƒ), Y
1
(ƒ) , . . . , y
K−1
(ƒ), wherein the k'th such eigencoefficient is computed through the formula,
y
k
⁡
(
f
)
=
∑
n
=
0
N
-
1
⁢
x
⁡
(
n
)
⁢
v
n
(
k
)
⁡
(
N
,
W
)
ϵ
k
⁢
ⅇ
-
ⅈ
⁢
⁢
2
⁢
π
⁢
⁢
f
⁡
(
n
-
N
-
1
2
)
.
(
3
)
At a given frequency ƒ=ƒ
0
, the spectrum estimate, denoted {overscore (S)}(ƒ), is band limited to a frequency range of ±W about ƒ
0
. The spectrum estimate is computed from the eigencoefficients according to,
S
_
⁡
(
f
)
=
1
2
⁢
NW
⁢
∑
k
=
0
K
-
1
⁢
1
λ
k
⁡
(
N
,
W
)
⁢
&LeftBracketingBar;
y
k
⁡
(
f
)
&RightBracketingBar;
2
.
(
4
)
It will be appreciated that each term in this summation is individually a spectrum estimate of the usual kind, as represented, e.g., by Equation (1), in which a respective Slepia
Chawan Vijay B
Finston Martin I.
Korzuch William
Lucent Technologies - Inc.
LandOfFree
Multiple-window method for obtaining improved spectrograms... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multiple-window method for obtaining improved spectrograms..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiple-window method for obtaining improved spectrograms... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2982997