Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1995-02-02
2001-08-14
Knepper, David D. (Department: 2645)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S237000
Reexamination Certificate
active
06275799
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a reference pattern learning system in speech recognition based on pattern matching with a reference pattern wherein a plurality of parameters which characterize reference patterns of each category are determined on the basis of a plurality of learning utterance data.
A Hidden Markov Model (to be referred to as an HMM hereinafter) is most popular as a system for recognizing a pattern represented as a feature vector time series of, e.g., speech signals. Details of the HMM are described in “Speech Recognition by Probability Model”, Seiichi Nakagawa, the Institute of Electronics and Communication Engineers of Japan, 1988 (to be referred to as Reference
1
hereinafter). Further background on HMM, as well as on dynamic programming matching (herineafter DP matching) is found in “Structural Methods in Automatic Speech Recognition” by Stephen E. Levinson, Proceedings of the IEEE 1985, Vol. 73, No. 11, pp.1625-50 (to be referred to as Reference
2
heinafter. In the HMM, modeling is performed on the basis of an assumption wherein a feature vector time series is generated by a Markov probability process. An EM reference pattern is represented by a plurality of states and transitions between these states. Each state outputs a feature vector in accordance with a predetermined probability density profile, and each transition between the states accompanies a predetermined transition probability. A likelyhood value representing a matching degree between an input pattern and a reference pattern is given by a probability at which a Markov probability model as a reference pattern generates an input pattern vector sequence. An interstate transition probability characterizing each reference pattern and parameters defining a probability density profile function can be determined by a “Baum-Welch algorithm” using a plurality of learning utterance data.
The “Baum-Welch algorithm” as a statistical learning algorithm requires a large volume of learning data to determine model parameters. A new user must utter a lot of speech inputs, resulting in inconvenience and impractical applications. In order to reduce the load on a new user, there are available several speaker adaptive systems for adaptively applying a recognition apparatus to a new speaker by using a relatively small number of utterances by the new speaker. Details of the speaker adaptive system are described in “Speaker Adaptive Techniques for Speech Recognition”, Sadaoki Furui, The Journal of the of Television Society, Vol. 43, No. 9, 1989, pp. 929-934 (to be referred to as Reference
3
hereinafter). See also “Speaker Adaptation for Demi-Syllable Based Continuous Density HMM” by Koichi Shinoda et al., Proc. ICASSP 1991, pp. 857-860.
The most important point in the speaker adaptive modeling system is the way of estimating parameters of a model representing an acoustic event not contained in a small number of adaptive utterances by a new user and the way of adaptively modeling using these parameters. In each of the existing speaker adaptive modeling systems, a similarity between acoustic events is basically defined using a physical distance between feature vectors as a measure, parameters of a model representing acoustic events not appearing in the adaptive utterances are estimated on the basis of the similarity, and adaptive modeling is performed using these parameters.
In the existing speaker adaptive modeling systems, by using reference patterns prepared in advance and adaptive utterance data of a new user, a similarity between acoustic events is basically defined using a physical distance between feature vectors as a measure, parameters of a model representing acoustic events not appearing in the adaptive utterances are estimated on the basis of the similarity, and adaptive modeling is performed using these parameters.
In adaptive modeling on the basis of estimation in accordance with the above physical distance, recognition precision can be improved as compared with that prior to adaptive modeling. However, a recognition result is far from recognition performance by reference patterns of a specific speaker which are constituted by a sufficient amount of utterance data, as can be apparent from experiment results described in the above references.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a reference pattern learning system capable of estimating a high-precision reference pattern very close to a reference pattern generated by a large amount of utterance data by a specific speaker, by using data associated with correlation between all acoustic events obtained from a large number of utterances of a large number of speakers in advance in addition to adaptive utterance data obtained by a small number of utterances by a new user.
In order to achieve the above object of the present invention, there is provided a reference pattern learning system wherein when a first parameter set constituting reference patterns of each category in speech recognition based on pattern matching with a reference pattern is to be determined from a plurality of learning utterance data, the first parameter set is determined so that a third evaluation function represented by a sum of a first evaluation function representing a matching degree between all learning utterances and corresponding reference patterns and a second evaluation function representing a matching degree between elements of the first parameter set is maximized.
REFERENCES:
patent: 3816722 (1974-06-01), Sakoe et al.
patent: 4394538 (1983-07-01), Warren et al.
patent: 4401851 (1983-08-01), Nitta et al.
patent: 4581756 (1986-04-01), Togawa et al.
patent: 4601054 (1986-07-01), Watari et al.
patent: 4618984 (1986-10-01), Das et al.
patent: 4651289 (1987-03-01), Maeda et al.
patent: 4751737 (1988-06-01), Gerson et al.
patent: 4797929 (1989-01-01), Gerson et al.
patent: 4827522 (1989-05-01), Matsuura et al.
patent: 4914703 (1990-04-01), Gillick
patent: 4918731 (1990-04-01), Muroi
patent: 4937870 (1990-06-01), Bossemeyer, Jr.
patent: 5293451 (1994-03-01), Brown et al.
patent: 5479523 (1995-12-01), Gaborski et al.
S.E. Levinson, L.R. Rabiner, and M.M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition”, the Bell System Technical Journal, No. 4, vol. 62, pp. 1035 to 1074, 1983.
K. Shikano, K.F. Lo, and R.Roddy, Speech Adaptive Through Vector Quantization, Proceedings of 1986 Conference on Acoustics, Speech, and Signal Processing, pp. 2643 to 2646.
Knepper David D.
NEC Corporation
Sughrue Mion Zinn Macpeak & Seas, PLLC
LandOfFree
Reference pattern learning system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Reference pattern learning system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reference pattern learning system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2542764