Single distribution and mixed distribution model conversion...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S256000

Reexamination Certificate

active

06266636

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
It is an object of the invention to perform a speech recognition by using a Hidden Markov Model (HMM).
Another object of the invention is to remove additive noise from an input speech.
2. Related Background Art
In the case of performing speech recognition in a real environment, there is a problem of noise as one of large problems. Although the noise is an additive noise which is additively added to spectrum characteristics, there is a Parallel Model Combination (PMC) method as a method which is effective for the additive noise.
The PMC method has been described in detail in M. J. Gales and S. Young, “An Improved Approach to the Hidden Markov Model Decomposition of Speech and Noise”, Proc. of ICASSP'92, I-233-236, 1992.
The PMC method is a method of adding and synthesizing an HMM (speech HMM) learned by speech collected and recorded in a noiseless environment and an HMM (noise HMM) learned by noise, thereby approaching a model to a noise superimposed environment and executing a conversion to add a noise to all of the models. In a noise process in the PMC, it is presumed that additiveness of noise and speech is established in a linear spectrum region. On the other hand, in the HMM, parameters of a logarithm spectrum system, such as a cepstrum and the like, are often used as a characteristic amount of the speech. According to the PMC method, those parameters are converted into the linear spectrum region and are added and synthesized in the linear spectrum region of the characteristic amount, which is derived from the speech HMM and noise HMM. After the speech and the noise were synthesized, an inverse conversion is performed to return the synthesized value from the linear spectrum region to the cepstrum region, thereby obtaining a noise superimposed speech HMM.
By using the foregoing PMC method, it is possible to cope with additive noises such as internal noise, background noise, and the like. However, the PMC method has problems such that since a nonlinear conversion is executed to all of the models, the amount of calculations is large, the processing time is very long, and it is not suitable for an instantaneous environment adaptation in which an adaptation to noise is performed simultaneously with recognition.
SUMMARY OF THE INVENTION
According to the invention, adaptation time can be remarkably reduced as compared with the conventional case of performing the noise adaptation by the PMC to all of the models. In particular, even in the case where the number of models increases like phoneme models of an environment depending type, it is possible to cope with such a situation in a short time. Moreover, since an adapting process is executed in a recognizing step, the adaptation and the recognition can be simultaneously performed. Further, by using a high speed PMC method, even in the case where the number of mixed models increases, it is possible to cope with such a situation by a small calculation amount.


REFERENCES:
patent: 5208863 (1993-05-01), Sakurai, et al.
patent: 5220629 (1993-06-01), Kosaka, et al.
patent: 5369728 (1994-11-01), Kosaka et al.
patent: 5621849 (1997-04-01), Sakurai et al.
patent: 5778340 (1998-07-01), Hattori
patent: 5787396 (1998-07-01), Komori et al.
patent: 5797116 (1998-08-01), Yamada et al.
patent: 5839105 (1998-11-01), Ostendorf et al.
patent: 0 847 041 (1998-06-01), None
patent: 10-161692 (1998-06-01), None
Gales et al “Robust continuous speech recognition using parallel model combination” Cambridge University,2-17, Mar. 1994.*
Gales et al “Parallel model combination for speech recognition in noise” Cambridge University,2-13, Jun. 1993.*
“An Improved Approach to the Hidden Markov Model Decomposition . . . ”, Gales, et al., Proc. Of ICASSP '92, I-233-236, 1992.
“A Tree-Trellis Based Fast Search for Finding the N Best Sentence . . . ”, oong, et al., Proc. Of ICASSP91, pp. 705-708, May 1991.
“The Forward-Backward Search Algorithm,” Schwartz, et al.,. Proc. Of ICASSP91, pp. 697-700, May 1991.
R. Schwartz, et al., “A Comparison of Several Approximate Algorithms for Finding Multiple (N-Best) Sentence Hypotheses”, ICASSP 91, vol. 1, May 1991, Toronto, Ontario, Canada, S10.4, pp. 701-704.
F.K. Soong, et al., “A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition”, ICASSP 91, vol. 1, May 1991, Toronto, Ontario, Canada, S10.5, pp. 705-708.
M.J.F. Gales, et al., “An Improved Approach to the Hidden Markov Model Decomposition of Speech and Noise”, ICASSP-91, vol. 1, Mar. 1992, San Francisco, California, pp. I-233-I-236.
Matsui, T., et al., “N-Best-Based Instantaneous Speaker Adaptation Method for Speech Recognition,” Proceedings ICSLP 96, Fourth International Conference on Spoken Language Processing, Philadelphia, PA, Oct. 3-6, 1996, vol. 2, pp. 973-976.
Gales, M.J.F., et al., “An Improved Approach to the Hidden Markov Model Decomposition of Speech and Noise”, Speech Processing 1, San Francisco, Mar. 23-26, 1992, pp. 233-236.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Single distribution and mixed distribution model conversion... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Single distribution and mixed distribution model conversion..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Single distribution and mixed distribution model conversion... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2549855

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.