Method for transforming HMMs for speaker-independent...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S256000

Reexamination Certificate

active

06658385

ABSTRACT:

FIELD OF INVENTION
This invention relates to speaker-independent speech recognition and more particularly to speaker-independent speech recognition in a noisy environment.
BACKGROUND OF THE INVENTION
Speech recognition for matched conditions has achieved low recognition errors. The matched conditions is where the training and testing are performed in the same acoustic conditions. A word error rate (WER) of 1% has been reported for connected digits over a telephone network. Results such as this are achieved using a large amount of training data under conditions as close as possible to the testing conditions. It is highly desirable to provide speech recognition in a noisy environment. One such environment is hands-free speech recognition in a car. The microphone is often placed somewhere remote from the user such as in the corner of the windshield. The road noise, the wind noise, and the speaker's remoteness from the microphone cause severe mismatch conditions for recognition. For such recognition tasks, a collection of large databases is required to train speaker-independent Hidden Markov Models (HMMs). This is very expensive. If HMMs are used in cross-condition recognition, such as using a close-talking microphone in a quiet office for training, and then testing on hands-free recognition in a car, the mismatch will degrade recognition performance substantially. In terms of power spectral density, the mismatch can be characterized by a linear filter and an additive noise: [Y(&ohgr;)|=|H(&ohgr;)|
2
.|X(&ohgr;)|+|N(&ohgr;)| where Y(&ohgr;) represents the speech to be recognized, H(&ohgr;) the linear filter, X(&ohgr;) the training speech, and N(&ohgr;) the noise. In the log spectral domain, this equation can be written as:
 log|
Y
(&ohgr;)|=log|
X
(&ohgr;)|+&psgr;(
N
(&ohgr;),
X
(&ohgr;),
H
(&ohgr;))  (1)
with
ψ

(
N

(
ω
)
,
X

(
ω
)
,
H

(
ω
)
)

Δ
_

log



log

&LeftBracketingBar;
H



(
ω
)
&RightBracketingBar;
2
+
log
(
1
+
&LeftBracketingBar;
N



(
ω
)
&RightBracketingBar;
&LeftBracketingBar;
X



(
ω
)
&RightBracketingBar;
·
&LeftBracketingBar;
H



(
ω
)
&RightBracketingBar;
2
(
2
)
&psgr; can be used to characterize the mismatch, which depends on the linear filter, the noise source and the signal itself.
To overcome the mismatch, several types of solutions have been reported. For example, Cepstral Mean Normalization (CMN) is known for its ability to remove the first term &ohgr; (i.e., stationary bias) in cepstra. See, for example, S. Furui article, “Cepstral Analysis Technique for Automatic Speaker Verification,”
IEEE Trans. Acoustics, Speech and Signal Processing
ASSP-29(2):254-272, April 1981. It has been shown that using CMN, telephone quality speech models can be trained with high quality speech. See article of L. G. Neumeyer, V. V. Digalakis, and M. Weintraub, “Training Issues and Channel Equalization Techniques for The Construction of Telephone Acoustic Models Using A High-Quality Speech Corpus,”
IEEE Trans. on Speech and Audio Processing
, 2(4):590-597, October 1994. However, this is not effective for the second term, which is caused by additive noise and cannot be assumed constant within the utterance. Two-level CMN alleviates this problem by introducing a speech mean vector and a background mean vector. See, for example, S. K. Gupta, F. Soong, and R. Haimi-Cohen, High-Accuracy Connected Digit Recognition for Mobile Applications, in
Proc. of IEEE Internat. Conf. on Acoustics, Speech And Signal Processing
, pages 57-60, Atlanta, May 1996. Other, more detailed models of the mismatch include joint additive and convolutive bias compensation (see M. Afify, Y. Gong, and J.-P. Haton, “A Unified Maximum Likelihood Approach to Acoustic Mismatch Compensation: Application to Noisy Lombard Speech Recognition,” in
Proc. Of IEEE Int. Conf. on Acoustics, Speech and Signal Processing
, Germany, 1997) and channel and noise estimation. (See D. Matrouf and J. L. Gauvain article, “Model Compensation for Noises in Training And Test Data,” in
Proc. Of IEEE Internat. Conf. On Acoustics, Speech and Signal Processing
, Germany, 1997.)
SUMMARY OF THE INVENTION
In accordance with one embodiment of the present invention, an improved transformation method comprises providing an initial set of HMMs trained on a large amount of speech recorded in one condition, which provides rich information on co-articulation and speaker variation and a much smaller speech database collected in the target environment, which provides information on the test condition including channel, microphone, background noise and reverberation.


REFERENCES:
patent: 5715367 (1998-02-01), Gillick et al.
patent: 5727124 (1998-03-01), Lee et al.
patent: 5787394 (1998-07-01), Bahl et al.
patent: 5793891 (1998-08-01), Takahashi et al.
patent: 5924065 (1999-07-01), Eberman et al.
patent: 6067513 (2000-05-01), Ishimitsu
patent: 0 691 640 (1996-10-01), None
Chien, J.T. and H.-C. Wang, “Adaptation of Hidden Markov Model for Telephone Speech Recognition and Speaker Adaptatiion,” IEE Proc. Visioin, Image, and Signal Proc., vol. 144-3, Jun. 1997, pp. 129-135.*
Angelini, B., F. Brugnara,D. Falavigna,D. Giuliani,R. Gretter,and M. Omologo, “Speaker Independent Continuous Speech Recognition Using an Acoustic-Phonetic Italian Corpus,” Proc. Int. Conf. Speech Language Proc. ICSLP-94, v. 3, 1391-1394, Sep. 1994.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for transforming HMMs for speaker-independent... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for transforming HMMs for speaker-independent..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for transforming HMMs for speaker-independent... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3180098

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.