Method of model adaptation for noisy speech recognition by...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S231000, C704S255000

Reexamination Certificate

active

06449594

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of speech recognition and, more particularly, to a method of model adaptation for noisy speech recognition.
2. Description of Related Art
In a conventional automatic speech recognition system, as shown in
FIG. 2
, speech signals of time domain, denoted by {x
t
}, are entered for executing an end point detection and feature extraction process to determine the speech and background noise, so as to extract the desired speech signals. Then, the extracted speech signals are applied for executing a pattern matching process with respect to speech reference models
21
to produce possible results, and, finally, a decision rule is applied to the possible results so as to obtain the recognition results, as denoted by {W
n
}.
Generally, the speech reference models
21
are preferred to be the well-known Hidden Markov Models (HMMs). Such statistic models represent the relevant feature distribution and time-variable transformation characteristics of the speech spectrum. In order to have reliable statistic models, it is required to record speech data from a great number of people before performing the process of training the model parameters. In such a speech data collecting process, the recording of speech data is generally performed in an ideal quiet environment, so as to obtain statistic models indicative of a noiseless environment. However, in practical application, it is impossible to have a completely noiseless environment. On the contrary, noise exists everywhere and anytime in the environment. Furthermore, the types of noise and intensity thereof are not predictable. As such, noise is likely to add an extra spectral component in the original clean speech signals, which results in degrading the speech recognition rate significantly.
As well known to those skilled in the art, a better speech recognition rate can be achieved if the environmental factors of training speech data and speech to be recognized are matched, and a description of such can be found in Juang, B. H. “Speech recognition in adverse environments”, Computer Speech and Language 5, pp. 275-294, 1991, and is hereby incorporated herein by reference. Therefore, it is possible to improve the recognition rate of noisy speech by using the speech data with the same noise as that of the noisy speech to train the statistic models. Although it is theoretically possible to train model parameters again when the environmental noise is changed, it can hardly be achieved in practical applications. One major reason is that the required speech database is relatively large, and thus the cost of a speech recognizer with such a database is too high. Furthermore, the computation amount is large and the time required to train parameters is long, so that the requirement of dynamical adaptation based on the change of the environment is difficult to achieved. Therefore, efforts are devoted to having noisy speech statistic models without involving a repetitive training process. As known, in the HMMs, the speech probability density is the parameter that is most susceptible to be influenced by external noise. Therefore, the speech recognition rate can be significantly improved if the speech probability density function is adjusted to match with the noise condition of the test utterance. However, the speech density is generally expressed in the cepstral domain, while the effect of noise is of an accumulation in the linear spectral domain. As a result, it is theoretically impossible to adjust the speech probability density function directly in the cepstral domain.
To eliminate the aforementioned problem, a Parallel Model Combination (PMC) method is proposed to combine the statistical data of speech and noise in the linear spectral domain by means of transformation between cepstral domain and linear spectral domain, thereby obtaining the cepstral means and variances of the noisy speech. The description of such a PMC method can be found in Gales, M. J. F. & Young, S. J. “Cepstral parameter compensation for HMM recognition in noise”, Speech Communication 12, pp. 231-239, 1993, which is hereby incorporated by reference into this patent application. Accordingly, speech models can be adjusted based on the change of the environmental noise by detecting the background noise in the speech inactive period and determining the statistical data of noise.
FIG. 3
shows an automatic speech recognition system utilizing such a PMC method. As shown, speech signals, denoted by {x
t
}, are entered to execute an end point detection and feature extraction process for determining the background noise and obtaining extracted speech signals. The background noise is provided for noise model estimation. The estimation results and the reference speech models
21
are applied together for PMC adaptation to obtain adapted speech models
31
that is varied according to the change of the environmental noise. Then, the extracted speech signals are applied for executing a pattern matching process with respect to the adapted speech models
21
to produce possible results, and, finally, determine the recognition results {W
n
}.
In executing the above PMC method, for simplicity of expression, it is assumed that the speech probability density function is represented by a Gaussian function ƒ(x|&mgr;
c
, &Sgr;
c
), where x represents a cepstral observation vector, &mgr;
c
represents a cepstral mean vector, and &Sgr;
c
represents a cepstral covariance matrix. The method first transforms the &mgr;
c
and &Sgr;
c
of the speech model from the cepstral domain to the log-spectral domain by performing inverse discrete cosine transform (IDCT) operations as follows:
&mgr;
l
=C
−1
&mgr;
c
and
&Sgr;
l
=C
−1
&Sgr;
c
(
C
−1
)
T
,
where the superscript l indicates the parameter in the log-spectral domain, C
−1
is a matrix for IDCT, and the superscript T indicates the transposed matrix. Each component of the mean vector and covariance matrix can be obtained as follows:
&mgr;
i
=exp(&mgr;
i
j
+&sgr;
ii
l
/2) and
&sgr;
ij
=&mgr;
i
&mgr;
j
[exp(&sgr;
ij
l
)−1].
After the mean vectors and covariance matrices of speech and noise are respectively obtained, the corresponding statistic of noisy speech can be obtained by performing parameter combination operations as follows:
{circumflex over (&mgr;)}
i
=g
&mgr;i
+{tilde over (&mgr;)}
i
and
{circumflex over (&sgr;)}
ij
=g
2
&sgr;
ij
+{tilde over (&sgr;)}
ij
,
where g is a scaling factor that provides the power matching between the training data and the test utterance, {tilde over (&mgr;)}
i
is the ith noise component, and {tilde over (&sgr;)}
ij
is the ijth variance component. Thereafter, the log-spectral mean vector and variance of the noisy speech can be obtained by taking the inverse transformation as follows:
{circumflex over (&mgr;)}
i
l
=log({circumflex over (&mgr;)}
i
)−0.5{circumflex over (&sgr;)}
ii
l
and
σ
^
ij
l
=
log

(
σ
^
ij
μ
^
i



μ
^
j
+
1
)
.
Finally, the cepstral mean vector and covariance matrix of noisy speech can be obtained by taking the discrete cosine transform (DCT) as follows:
{circumflex over (&mgr;)}
c
=C
{circumflex over (&mgr;)}
l
and
{circumflex over (&Sgr;)}
c
=C
{circumflex over (&Sgr;)}
l
C
T
.
From the aforementioned process, it is known that the noisy speech models can be obtained in using the PMC method by estimating the statistic of the background noise in the speech inactive period, so as to decrease the computation amount. However, in practice, the actual computation amount to adjust all the probability density functions in using the PMC method is still relatively huge, especially when the number of models is large. In order to effectively reduce the time for model adaptation, an improved PMC method is proposed to reduce the number of PMC processing times by introducing the distribution composition wit

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method of model adaptation for noisy speech recognition by... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method of model adaptation for noisy speech recognition by..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of model adaptation for noisy speech recognition by... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2893736

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.