Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2001-04-27
2003-10-14
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S228000
Reexamination Certificate
active
06633843
ABSTRACT:
FIELD OF INVENTION
This invention relates to speech recognition and more particularly to compensation of Gaussian mean vectors for noisy speech recognition.
BACKGROUND OF INVENTION
A speech recognition system comprises a recognizer for comparing input speech to speech models such as Hidden Markov Models (HMMs) as illustrated in FIG.
1
. The recognition system is often called upon to operate in noisy environments such as in a car with all the road sounds. Speech model such as Hidden Markov Models (HMMs) are often trained in a quiet environment. It is therefore desirable to take a set of speech models (HMM) trained with speech collected in a quiet environment and to recognize speech utterances recorded in a noisy background. In such case a mismatch exists between the environments of models and the utterances. The mismatch may degrade substantially recognition performance. (See Y. Gong. Speech recognition in noisy environments: A survey.
Speech Communication,
16(3):261-291, April 1995.) This problem is of importance in applications where it is too expensive to collect training speech in the noisy environment, or the changing nature of the noisy background makes it impossible to have a collection covering all situations.
Hands-free speech recognition in automobile is a typical case. Parallel model combination (PMC) can be used to reduce the mismatch. (See M. J. F. Gales and S. J. Young. HMM recognition in noise using parallel model combination.
In Proceedings of European Conference on Speech Communication and Technology
, volume II, pages 837-840, Berlin,1993.) PMC uses the HMM distribution of clean speech models and the noise distribution to give a maximum likelihood estimate of the corrupted-speech models.
FIG. 2
illustrates the process of obtaining a “noisy” HMM by taking an original quiet HMM and modifying the models to accommodate the noise as illustrated in
FIG. 2
to get “noisy” HMM.
Two advantages of PMC can be mentioned. Firstly, no speech data is required for compensation. Secondly, all the models are individually compensated.
As accurate PMC has no closed-form expression, simplification assumptions must be made in implementation. The results can be directly applied to feature parameters linearly transformed from log-spectral parameters, such as MFCC (by DCT) and PFV3B (by KLT).
PMC adaptation of dynamic parameters (i.e., ∩MFCC) can be approached from two different directions. In a first direction a mismatch function for (difference-based) dynamic parameters is established. (See M. J. F. Gales and S. J. Young. Robust continous speech recognition using parallel model compensation.
IEEE Trans. on Speech and Audio Processing,
4:352-359, 1996.) It can be shown that the adapted dynamic parameters at time t are a function of static parameters at time t-w an undesired requirement for practical applications. Besides, the results doesn't apply to dynamic parameters obtained by linear-regression. A solution to this problem which sums up several difference-based compensated dynamic parameters has been proposed. (See R. Yang, M. Majaniemi, and P. Haavisto. Dynamic parameter compensation for speech recognition in noise. In
Proc. of IEEE Internat. Conf. on Acoustics, Speech and Signal Processing
, pages 469-472, Detroit, 1995.) However, only little improvement due to dynamic coefficients were reported.
In the second direction a continuous time derivative of static parameters as dynamic parameters is used. (See M. J. F. Gales. “nice” model-based compensation schemes for robust speech recognition. In Proc. ESCA-NATO Workshop on
Robust speech recognition for unknown communication channels
, pages 55-64, Pont-a-mousson, France, 1997.) This is an approximation to the discrete nature of dynamic parameters. We will pursuit this direction in this teaching and application.
PMC deals with Gaussian distributions. Referring to
FIG. 3
there is illustrated the Gaussian distribution made up of the mean vector and covariance matrix parameters for the 1-dimentional case. The larger the width the larger the covariance value. In theory we need to modify both the mean vector and the covariance matrix. Although theoretically changing both is desirable it has been determined that changing the mean vector is enough. In a second prior art assumption and in the assumption according to the present invention nothing is done with respect to covariance. In PMC, an independent noise model is estimated from noise samples collected in the new environment. Distribution by distribution, clean speech model and the noise model are then combined using a mismatch function, to obtain a corrupted speech model matched to the new environment. The mismatch function assumes that speech and noise are independent and additive in the time domain. The mismatch function for computing the mean of the corrupted model in the log DFT domain has the form:
{circumflex over (&mgr;)}
log
=E
{log(exp(&mgr;
log
+h
log
+exp({tilde over (&mgr;)}
log
) (1)
where &mgr;
log
and {tilde over (&mgr;)}
log
represent speech and noise observations in the log DFT domain and their statistics are obtained from appropriate speech and noise state pair. h
log
is a convolutive (in time domain) noise representing channel, transducer and some speaker characteristics, which will be omitted in this study. The value in equation 1 is in the log scale. Reading the equation 1 it states the combined expectance (average) is the sum. The log domain is converted into the linear scale by the exponentiation of both speech and noise. The speech and noise are then linear terms. They are added together. The log is taken again. The expectation is then taken over the result. Since Eq-1 does not have a closed form, this can not be calculated because the formula is too complicated. This needs to be simplified. Approximations have been used, which allows trading-off between accuracy and hardware requirement: In the prior art is the log-normal approximation and the log-add approximation. In the following sections, we will derive PMC formula for each of the two prior art cases, with the notation:
{circumflex over (X)} denotes estimate (adapted value) of parameters X, {tilde over (X)} denotes parameters X of noise.
lin for linear domain parameters, log for log spectral domain.
In the prior art are two assumptions for the adaptation of log-spectrial parameters. They are the log-normal approximation and the log-add approximation. The mean vector has two parameters. They are the static parameter and dynamic parameter. The dynamic parameter is the time derivative of the static parameter.
The log-normal approximation for the static parameter is based on the assumption that the sum of two log-normally distributed random variables is itself log-normally distributed. In the linear domain, the mean of the compensated model is computed as
μ
^
i
lin
=
g
⁢
⁢
μ
i
lin
+
μ
~
i
lin
(
2
)
∑
^
i
,
j
lin
⁢
=
g
2
⁢
∑
i
,
j
lin
⁢
+
∑
~
i
,
j
lin
⁢
(
3
)
where i, j are indices for the feature vector dimension, and g accounts for the gain of speech produced in noise with respect to clean speech and, for speech and noise:
μ
i
lin
=
exp
⁡
(
⁣
μ
i
log
+
1
2
⁢
∑
i
log
)
(
4
)
∑
i
,
j
lin
⁢
=
μ
i
lin
⁢
μ
j
lin
⁡
[
exp
⁡
(
∑
i
,
j
log
)
-
1
]
(
5
)
The adapted mean and variance in log domain can be obtained by inverting the above equations:
μ
i
log
=
log
⁡
(
μ
i
lin
)
-
1
2
⁢
log
(
∑
i
,
i
lin
⁢
(
μ
i
lin
)
2
+
1
)
(
6
)
∑
i
,
j
log
⁢
=
log
(
∑
i
,
j
lin
μ
i
lin
⁢
μ
j
lin
+
1
)
(
7
)
Dynamic parameter
To derive the adaptation equation for dynamic parameters under the log-normal approximation, we further assume that in average:
∂
μ
_
i
lin
∂
t
=
0.
(
8
)
Following the idea presented in equation 2 of the static part, the adapted dynamic log-spectral vector is:
Δ
⁢
⁢
μ
^
i
log
⁢
=
Δ
⁢
∂
u
^
i
log
∂
t
=
g
⁢
β
i
β
i
+
1
⁢
Brady III W. James
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Log-spectral compensation of PMC Gaussian mean vectors for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Log-spectral compensation of PMC Gaussian mean vectors for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Log-spectral compensation of PMC Gaussian mean vectors for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3167494