On-line background noise adaptation of parallel model...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S233000

Reexamination Certificate

active

06188982

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a speech recognition method, and, more particularly, relates to a two stage Hidden Markov Model (HMM adaption method utilizing an “on-line” Parallel Model Combination (PMC) and a discriminative learning process to achieve accurate and robust results in real world applications without having to collect environment background noise in advance.
BACKGROUND OF THE INVENTION
Many electronic devices need to determine a “most likely” path of a received signal. For example, in speech, text, or handwriting recognition devices, a recognized unit (i.e., sound, syllable, letter, or word) of a received signal is determined by identifying the greatest probability that a particular sequence of states was received. This determination may be made by viewing the received signal as generated by a hidden Markov model (HMM). A discussion of Markov models and hidden Markov models is found in Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proceedings of the IEEE, Vol. 77, No. 2, February 1989. Also, this signal may be viewed as generated by a Markov model observed through a “noisy” process. This is discussed in Forney, “The Viterbi Algorithm”, Proceedings of the IEEE, Vol. 61, No. 3, March 1973. The contents of these articles are incorporated herein by reference.
Briefly, a Markov model is a system which may be described as being in any one of a set of N distinct states (while in a hidden Markov model the states are unknown). At regularly spaced time intervals, the system makes a transition between states (or remains in the same state) according to a set of transition probabilities. A simple three state Markov model is illustrated in FIG.
1
.
FIG. 1
shows a three state transition model
15
. In this model, it is assumed that any state may follow any other state, including the same state repeated. For each state, there is a known probability indicating the likelihood that it will be followed by any other state. For example, in the English language, this probability may be statistically determined by determining how often each letter is followed by another letter (or itself). In this illustration, assume that state
1
[indicated as S
1
] is the letter A, state
2
[indicated as S
2
] is the letter B, and state
3
[indicated as S
3
] is the letter C. Probabilities are assigned to the likelihood that any one of these letters will follow the same or another letter. In this example, an illustrative probability of 0.1 has been assigned to the likelihood that A will be followed by another A, 0.4 that A will be followed by a B, and 0.5 that A will be followed by a C. The same is done for the letters B and C, resulting in a total of nine probabilities. In this model, the state is apparent from the observation, that is, the state is either A, B, or C in the English language.
Often the states of the model generating the observations cannot be observed, but may only be ascertained by determining the probabilities that the observed states were generated by a particular model. For example, in the example of
FIG. 1
, assume that due to “noise”, there is a known probability that in state A the symbol may be corrupted to appear to be a B, and a known probability that in state A the symbol will be corrupted to appear as a C. The same is true for B and C. To determine the best state sequence associated with the observations of this “noisy” state sequence, the text recognition device must determine, through probabilities, which letters are most likely to be in the sequence.
With respect to speech recognition, current technologies have produced fairly good results in recognizing speech in an ideal noiseless environment. However, when speech recognition is conducted in real-life environments, the results have been far less desirable. One of the main causes of this phenomenon is the interference of background noise in the environment. Since background noise may be considered additive in nature, one can either filter the noise from the signal source or compensate a recognition model by transferring the model parameters obtained through clean speech training data to the speech model having noise interference (as will be described below with reference to the conventional parallel model combination (PMC) approach). In other words, an approach is necessary that separates actual speech from background noise.
The current speech signal processing methods can be generally divided into three categories: 1) seeking robust features, known as discriminative measurement similarity, 2) speech enhancement, and 3) model compensation.
The first category, seeking robust features, compares the background noises with a known databank of noises so that the detected noises may be canceled out. However, this method is quite impractical since it is impossible to predict every noise, as noises can vary in different environment situations. Further, the similarity of different noises and noises having particular signal-to-noise ratios (SNR) also make this method inadequate.
The second category, speech enhancement, basically preprocesses the input speech signals, prior to the pattern matching stage, so as to increase the SNR. However, an enhanced signal noise ratio does not necessarily increase the recognition rate, since the enhanced signals can still be distorted to some degree. For this reason, the methods of the speech enhancement category usually cannot deliver acceptable results.
The third category, model compensation, deals with recognition models. In particular, it compensates recognition models to adapt to the noisy environment. The most direct approach of this category is to separately collect the speech signals with the interference noise in the application environment and then train the recognition models. It is, however, difficult to accurately collect these kinds of training materials, thereby rendering this approach impractical. However, a recent model compensation method, parallel model combination (PMC), developed by Gales and Young, avoids the necessity to collect the training material in advance and is therefore very popular.
PMC assumes that speech to be recognized is modeled by a set of continuous density hidden Markov models (CDHMM) which have been trained using clean speech data. Similarly, the background noise can also be modeled using a single state CDHMM. Accordingly, speech that is interfered by additive noises can be composed of a clean speech model and a noise model. The parallel model combination is shown in FIG.
2
.
In brief, the symbols of &mgr;
c
and &Sgr;
c
, discussed below, represent the mean vector and the covariance matrix, respectively, of any state output distribution in a cepstral domain. Cepstral parameter are derived from the log spectrum via a discrete cosine transform and is represented by a matrix C. Since the discrete cosine transform is linear, the corresponding mean vector and the covariance matrix in the cepstral domain (represented by &mgr;
l
and &Sgr;
l
respectively) can be presented with the following equations:
&mgr;
l
=C
−1
&mgr;
c
&Sgr;
l
=C
−1
&Sgr;
c
(
C
−1
)
T
  (1)
If Gaussian distribution is assumed in both the cepstral and log spectral domains, then the mean vector and covariance matrix of the i
th
component in the linear domain can be expressed as:
&mgr;
i
=exp(&mgr;
i
l
+&Sgr;
ji
l
/2)
&Sgr;
ij
=&mgr;
i
&mgr;
j
[exp(&Sgr;
ij
l
)−1]  (2)
If the speech signal and the noise signal are assumed to be independent of each other and are additive in a linear domain, then the combined mean vector and the covariance matrix can be expressed as:
{overscore (&mgr;)}=
g&mgr;+{tilde over (&mgr;)}
{overscore (&Sgr;)}−
g
2
&Sgr;+{tilde over (&Sgr;)}  (3)
where (&mgr;, &Sgr;) are the speech model parameters and (&mgr;, &Sgr;) are the noise model parameters. The factor of g is a gain matching term introduced to account for the fact that the level of

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

On-line background noise adaptation of parallel model... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with On-line background noise adaptation of parallel model..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and On-line background noise adaptation of parallel model... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2570898

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.