Speaker-independent model generation apparatus and speech recogn

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704231, G10L 702

Patent

active

058391053

ABSTRACT:
There is provided a speaker-independent model generation apparatus and a speech recognition apparatus which require a processing unit to have less memory capacity and which allow its computation time to be reduced, as compared with a conventional counterpart. A single Gaussian HMM is generated with a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers. A state having a maximum increase in likelihood as a result of splitting one state in contextual or temporal domains is searched. Then, the state having a maximum increase in likelihood is split in a contextual or temporal domain corresponding to the maximum increase in likelihood. Thereafter, a single Gaussian HMM is generated with the Baum-Welch training algorithm, and these steps are iterated until the states within the single Gaussian HMM can no longer be split or until a predetermined number of splits is reached. Thus, a speaker-independent HMM is generated. Also, speech is recognized with reference to the generated speaker-independent HMM.

REFERENCES:
Sagayama et al., ATREUS: a Speech Recognition Front-end for a Speech Translation System, Proceedings of European Conference on Speech Communication and Technology, (1993), pp. 1287-1290.
Kosaka et al., Tree-Structured Speaker Clustering for Speaker-Independent Continuous . . . , ICSLP, (1994), pp. 1375-1378.
Takami et al., Automatic Generation of Speaker-Common Hidden . . . Proceedings of Acoustic Society in Japan (partial English translation), (1992), pp. 155-156.
Bahl et al. Decision Trees for Phonological Rules in Continuous Speech, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1991), pp. 185-188.
Lee et al., Allophone Clustering For Continuous Speech Recognition, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1990), pp. 749-752.
Huang et al., An Overview of the SPHINX-II Speech Recognition System, Proceedings of ARPA Workshop on Human Language Technology, pp. 81-86.
Kannan et al., Maximum Likelihood Clustering of Gaussians for Speech Recognition, IEEE Transactions on Speech and Audio Processing, vol. 2, No. 3, (1994), pp. 453-455.
Young et al., Tree-Based State Tying for High Accuracy Acoustic Modelling, pp. 286-291.
Bahl et al., Context Dependent Vector Quantization for Continuous Speech Recognition, IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1993), pp. II-632-II-635.
Dempster et al., Maximum Likelihood from Incomplete Data . . . , Royal Statistical Society, Journal, Series B. vol. 39, No. 1, (1977), pp. 1-38.
Anderson, An Introduction To Multivariate Statistical Analysis, 2nd Ed., John Wiley & Sons, (1984), pp. 404-411.
Breiman et al., Classification And Regression Trees, Wadsworth, Inc., (1984), pp. 266-271.
Bahl et al., A Tree-Based Statistical Language Model . . . , IEEE Transactions on Acoustic Speech and Signal Processing, vol. 37, No. 7, (1989), pp. 507-514.
Chou, Optimal Partitioning for Classification And Regression Trees, IEEE Transactions On Pattern Analysis and Machine Intelligence, vol. 13, No. 4, Apr. 1991, pp. 340-354.
Linde et al., An Algorithm for Vector Quantizer Design, IEEE Transactions On Communications, vol. COM-28, No. 1, Jan. 1980, pp. 84-95.
Nadas et al., An Iterative "Flip-Flop" Approximation Of the Most Informative . . . , IEEE Proceedings of the International Conference on Acoustic Speech and Signal Processing, (1991), pp. 565-568.
Kurematsu et al., ATR Japanese Speech Database As A Tool Of Speech . . . , Speech Communication 9, Elsevier Science Publishers B.V. (North-Holland), (1990), pp. 367-363.
Singer et al., Speech Recognition Without Grammar Or Vocabulary Constrains, ICSLP, (1994), pp. 2207-2210.
Takami et al., A Successive State Splitting Algorithm for Efficient Allophone Modelling, IEEE, (1992), pp. I-573-I-576.
Nagai et al., The SSS-LR Continuous Speech Recognition System . . . , Proceedings of International Conference on Spoken Language Processing, (1992), pp. 1511-1514.
Nagai et al., Atreus: A Comparative Study of Continuous Speech . . . , 1993 IEEE ICASSP-93 reprint, pp. II-139-II-142.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speaker-independent model generation apparatus and speech recogn does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speaker-independent model generation apparatus and speech recogn, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speaker-independent model generation apparatus and speech recogn will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-896975

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.