Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-08-14
2004-09-07
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S235000, C704S256000, C704S231000, C704S251000
Reexamination Certificate
active
06789061
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to speech recognition systems and, more particularly, to computerized methods and systems for automatically generating, from a first speech recognizer, a second speech recognizer tailored to a certain application.
BACKGROUND OF THE INVENTION
To achieve a good acoustic resolution across different speakers, domains, or applications, a general purpose large vocabulary continuous speech recognizer, for instance, those based on the Hidden Markov Model (HMM), usually employs several thousands of states and several tens of thousands of elementary probability density functions (pdfs), e.g., gaussian mixture components, to model the observation likelihood of a speech frame. While this allows for an accurate representation of the many variations of sounds in naturally spoken human speech, the storage and evaluation of several tens of thousands of multidimensional pdfs during recognition is a computationally expensive task with respect to both computing time and memory footprints.
Both the total number of context dependent states and gaussian mixture components are usually limited by some upper bound to avoid the use of computationally very expensive optimization methods, like, e.g., the use of a Bayesian information criterion.
However, this bears the disadvantage that some acoustic models are poorly trained because of a mismatch between the collected training data and the task domain or due to a lack of training data for certain pronunciations. In contrast, other models may be unnecessarily complex to achieve a good recognition performance and, in any case, the reliable estimation of several millions of parameters needs a large amount of training data and is a very time consuming process. Whereas applications like large vocabulary continuous dictation systems (like, e.g., IBM Corporation's ViaVoice) can rely on today's powerful desktop computers, this is clearly unfeasible in many applications that need to deal with limited hardware resources, like, e.g., in the embedded systems or consumer devices market. However, such applications often need to perform a limited task only, like, e.g., the (speaker dependent) recognition of a few names from a speaker's address book, or the recognition of a few command words.
A state-of-the-art method dealing with the reduction of resources and computing time for large vocabulary continuous speech recognizers is the teaching of Curtis. D. Knittle, “Method and System for Limiting the Number of Words Searched by a Voice Recognition System,” U.S. Pat. No. 5,758,319, issued in 1998, the disclosure of which is incorporated by reference herein. But as a severe drawback, these methods achieve a resource reduction only by proposing a runtime limitation of the number of candidate words in the active vocabulary by means of precomputed word sequence probabilities (the speech recognizer's language model). Such an approach seems to be not acceptable as it imposes an undesirable limitation of the recognition scope.
SUMMARY OF THE INVENTION
The present invention is based on the objective to provide a technology for fast and easy customization of a general speech recognizer to a given application. It is a further objective to provide a technology for providing specialized speech recognizers requiring reduced computation resources, for instance, in terms of computing time and memory footprints.
In one aspect of the invention, a computerized method and system is provided for automatically generating, from a first speech recognizer, a second speech recognizer tailored to a certain application and requiring reduced resources compared to the first speech recognizer.
The invention exploits the first speech recognizer's set of states s
i
and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in the states s
i
.
The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application.
The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.
The teachings of the present invention allow for the rapid development of new data files for recognizers in specific environments and for specific applications. The generated speech recognizers require significantly reduced resources, without decreasing the recognition accuracy or the scope of recognizable words.
The invention allows to achieve a scalable recognition accuracy for the generated application-specific speech recognizer; the generation process can be executed repeatedly until the generated application-specific speech recognizer achieves the required resource targets and accuracy target.
REFERENCES:
patent: 5384892 (1995-01-01), Strong
patent: 5719996 (1998-02-01), Chang et al.
patent: 5758319 (1998-05-01), Knittle
patent: 6070140 (2000-05-01), Tran
patent: 6122613 (2000-09-01), Baker
patent: 6260013 (2001-07-01), Sejnoha
patent: 6260014 (2001-07-01), Bahl et al.
patent: 6463413 (2002-10-01), Applebaum et al.
Fischer Volker
Kunzmann Siegfried
Waast-Ricard Claire
Dang Thu Ann
Dorvil Richemond
Han Qi
International Business Machines - Corporation
Ryan & Mason & Lewis, LLP
LandOfFree
Method and system for generating squeezed acoustic models... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for generating squeezed acoustic models..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for generating squeezed acoustic models... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3254144