Specific task composite acoustic models

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S255000, C704S231000

Reexamination Certificate

active

06260014

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to speech recognition models and, more particularly, to composite acoustic models used for speech recognition for specific tasks.
2. Description of the Related Art
Speech recognition systems are used in many areas today to transcribe speech into text. The success of this technology in simplifying man-machine interaction is stimulating the use of this technology into a plurality of useful applications, such as transcribing dictation, voicemail, home banking, directory assistance, etc. Though it is possible to design a generic speech recognition system and use it in a variety of different applications, it is generally the case that a system is tailored to the particular application being addressed. In this way, a more efficient system having better performance is realized.
Typical speech recognition systems include three components, an acoustic model that models the characteristics of speech, a language model that models the characteristics of language and a vocabulary that includes words that are relevant to that application. Some applications require a large vocabulary, for example voicemail transcription, because a voicemail message could be related to any topic. However, some applications, such as home banking are likely to have a smaller vocabulary since a smaller set of possible transactions and therefore words are used. Depending on the application, some words may be more important than others. For example, in home banking, digit recognition is more important since personal identification numbers and transaction amounts must be correctly recognized. Hence, the word error performance in recognizing digits is more important than the remainder of the vocabulary.
Therefore, a need exists for a speech recognition system and method for providing improved performance on an application specific subset of words. A further need exists for a system and method capable of providing speech recognition of non-task specific speech along with task specific speech to form a task specific composite model. A still further need exists for a task specific model that is easily constructed and needs only a limited amount of training data for training it parameters.
SUMMARY OF THE INVENTION
A method for recognizing speech, in accordance with the present invention, includes the steps of providing a generic model having a baseform representation of a vocabulary of words, identifying a subset of words relating to an application, constructing task specific models for the subset of words, constructing a composite model by combining the generic model and the task specific model and modifying the baseform representation of the subset of words such that the task specific models are used when recognizing the subset of words.
Another method for recognizing speech includes the steps of constructing a generic model having a phonetic baseform representation of a vocabulary of words, constructing task specific models for a subset of words, constructing a composite model by combining the generic model and the task specific models, modifying the phonetic baseform representation of the subset of words such that the task specific models are used when recognizing the subset of words and the generic model is used in recognizing words other than the subset of words.
In alternate methods, the step of identifying a subset of words may include the step of identifying a subset words pertinent to a particular task. The step of constructing a task specific model may include the step of constructing the task specific model by utilizing a mixture of gaussians, non-gaussians or a neural network. The method may further include the step of estimating parameters of the composite model using an estimation technique. The estimation technique for the generic model may be different from the estimation technique for the task specific model. The step of constructing a task specific model may include the step of constructing the task specific model by utilizing a different construction technique from a construction technique used for the generic model. The generic model and the task specific model preferably include parametric models and the method may include the step of modeling task specific words and generic words based on a probability density function. The generic model and the task specific model may have different probability density functions. The step of interchanging the task specific model with a different task specific model to create a new composite model may also be included.
A system for recognizing speech, in accordance with the present invention includes a composite model which includes a generic model having a baseform representation of a vocabulary of words and a task specific model for recognizing a subset of words relating to an application wherein the subset of words are recognized using a modified baseform representation. A recognizer is also included for comparing words input thereto with the generic model for words other than the subset of words and with the task specific model for the subset of words.
In alternate embodiments of the system, the recognizer preferably includes a processor. The generic model and the task specific model may use different parametric models for modeling probability density functions. The generic model and the task specific model may be constructed using different construction techniques. The generic model and the task specific model may provide different probability estimation techniques for recognizing speech. A plurality of task specific models for applying to a plurality of applications may also be included.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.


REFERENCES:
patent: 5819221 (1998-10-01), Kondo et al.
patent: 5825978 (1998-10-01), Digalakis et al.
patent: 5875426 (1999-02-01), Bahl et al.
patent: 5953701 (1999-09-01), Neti et al.
patent: 5963903 (1999-10-01), Hon et al.
patent: 5995931 (1999-11-01), Bahl et al.
patent: 6029124 (2000-02-01), Gillick et al.
patent: 6061653 (2000-05-01), Fisher et al.
patent: 6067517 (2000-05-01), Bahl et al.
patent: 6070139 (2000-05-01), Miyazawa et al.
patent: 6073096 (2000-06-01), Gao et al.
patent: 6076056 (2000-06-01), Huang et al.
Bahl et al., “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” Proceedings of the ICASSP, pp. 49-52, IEEE 1986.
Juang et al., “Minimum Classification Error Rate Methods for Speech Recognition,” IEEE Transactions on Speech and Audio Processing, vol. 5, No. 3, pp. 257-265, May 1997.
Dempster et al., “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society (B), No. 1, pp. 1-22, 1977.
Bahl et al., “A New Algorithm for the Estimation of Hidden Markov Model Parameters” IEEE, pp. 493-496, 1988.
Viterbi, “Error Bounds for Convolutional Codes and an Asymptotically Optimal Decoding Algorithm,” IEEE Trans. Information theory, vol. IT-13, pp. 260-269, Apr. 1967.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Specific task composite acoustic models does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Specific task composite acoustic models, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Specific task composite acoustic models will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2486319

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.