Confidence measure system using a near-miss pattern

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S243000, C704S256000

Reexamination Certificate

active

06571210

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to computer speech recognition. More particularly, the present invention relates to a confidence measure system using a near-miss pattern or a plurality of possible words.
Speech recognition systems are generally known. During speech recognition, speech is provided as an input into the system in the form of an audible voice signal such as through a microphone. The microphone converts the audible speech signal to an analog electronic signal. An analog-to-digital converter receives the analog signal and produces a sequence of digital signals. A conventional array processor performs spectral analysis on the digital signals and computes a magnitude value for each frequency band of a frequency spectrum. In one embodiment, the digital signal received from the analog-to-digital converter is divided into frames. The frames are encoded to reflect spectral characteristics for a plurality of frequency bands. In the case of discrete and semi-continuous hidden Markov modeling, the feature vectors are encoded into one or more code words using vector quantization techniques and a code book derived from training data. Output probability distributions are then preferably computed against hidden Markov models using the feature vector (or code words) of the particular frame being analyzed. These probability distributions are later used in executing a Viterbi or similar type of processing technique. Stored acoustic models, such as hidden Markov models, a lexicon and a language model are used to determine the most likely representative word for the utterance received by the system.
While modern speech recognition systems generally produce good search results for utterances actually present in the recognition inventory, the system has no way of discarding the search results for out-of-vocabulary (OOV) input utterances that are deemed to be wrong. In such cases, use of a confidence measure as applied to the recognition results can provide assurances as to the results obtained. Confidence measures have been used in many forms of speech recognition applications, including supervised and unsupervised adaptation, recognition error rejection, out-of-vocabulary (OOV) word detection, and keyword spotting. A method that has been used for confidence modeling is the comparison of the score of the hypothesized word with the score of a “filler” model. One such system is described by R. C. Rose and D. B. Paul, in “A Hidden Markov Model Based Key Word Recognition System,” published in IEEE International Conference on Acoustics Speech, and Signal Processing, vol. 1, pp. 129-132, 1990.
It is believed by many that the confidence measure should be based on the ratio between the recognition score and the “filler model” (usually used to model OOV (out-of-vocabulary) words) score. The “filler model” models are often one of the following two types: (1) a context independent (CI) phone network where every phone is connected to every other phone; or (2) a large context dependent vocabulary system where phone connections represent almost all the possible words in a particular language. While the context independent phone network approach is very efficient, the performance is mediocre at best because of the use of imprecise CI models. The context dependent approach can generate decent confidence measures, but suffers from two shortcomings. First, the approach considers only the ratio of the scores of the best recognized word and the best “filler-model” word. Second, due to a single ratio comparison, the requirement of building all words in the OOV network is not practical and also makes the system ineffective for rejecting noise sources other than OOV words.
SUMMARY OF THE INVENTION
A method and system of performing confidence measure in speech recognition systems includes receiving an utterance of input speech and creating a near-miss pattern or a near-miss list of possible word entries for the input utterance. Each word entry includes an associated value of probability that the utterance corresponds to the word entry. The near-miss list of possible word entries is compared with corresponding stored near-miss confidence templates. Each near-miss confidence template includes a list of word entries and each word entry in each list includes an associated value. Confidence measure for a particular hypothesis word is performed based on the comparison of the values in the near-miss list of possible word entries with the values of the corresponding near-miss confidence template.
Another aspect of the present invention is a system and method for generating word-based, near-miss confidence templates for a collection of words in a speech recognition system. Each near-miss confidence template is generated from multiple near-miss lists produced by a recognizer on multiple acoustic data for the same word. Each near-miss confidence template of the set of near-miss confidence templates includes a list of word entries having an associated probability value related to acoustic similarity.


REFERENCES:
patent: RE31188 (1983-03-01), Pirz et al.
patent: 4783803 (1988-11-01), Baker et al.
patent: 4797929 (1989-01-01), Gerson et al.
patent: 4802231 (1989-01-01), Davis
patent: 5241619 (1993-08-01), Schwartz et al.
patent: 5509104 (1996-04-01), Lee et al.
patent: 5566272 (1996-10-01), Brems et al.
patent: 5613037 (1997-03-01), Sukkar
patent: 5625748 (1997-04-01), McDonough et al.
patent: 5649057 (1997-07-01), Lee et al.
patent: 5675706 (1997-10-01), Lee et al.
patent: 5677990 (1997-10-01), Junqua
patent: 5710864 (1998-01-01), Juang et al.
patent: 5710866 (1998-01-01), Alleva et al.
patent: 5712957 (1998-01-01), Waibel et al.
patent: 5749069 (1998-05-01), Komori et al.
patent: 5795123 (1998-08-01), Lovgren
patent: 5797123 (1998-08-01), Chou et al.
patent: 5805772 (1998-09-01), Chou et al.
patent: 5842163 (1998-11-01), Weintraub
patent: 5937384 (1999-08-01), Huang et al.
patent: 5983177 (1999-11-01), Wu et al.
patent: 6029124 (2000-02-01), Gillick et al.
Chen et al. “discrimiantive training . . . using N-best candidates” IEEE, 1994, pp 625-628.*
Rohlicek et al., “Continuous Hidden Markov Modeling for Speaker-Independent Word Spotting”, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 627-630, 1989.
Rose et al., A Hidden Markov Model Based Keyword Recognition System1, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 129-132, 1990.
Alleva et al., “Confidence Measure and Their Application to Automatic Speech Recognition”, IEEE Automatic Speech Recognition Workshop, (Snowbird, Utah), pp. 173-174, 1995.
Cox et al., Confidence Measures for the Switchboard Database, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 511-514, 1996.
Jeanrenaud et al., “Large Vocabulary Word Scoring as a Basis for Transcription Generation”, Proceedings of Eurospeech, vol. 3, pp. 2149-2152, 1995.
Weintraub, “LVCSR Log-Likelihood Ration Scoring for Keyword Spotting”, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 297-300, 1995.
Neti et al., “Word-Based Confidence Measures as a Guide for Stack Search in Speech Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 883-886, 1997.
Huang et al., “Microsoft Windows Highly Intelligent Speech Recognizer: Whisper”, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 93-96, 1995.
Huang et al., “Whistler: A Trainable Text-To-Speech System”, International Conference on Spoken Language Proceeding, vol. 4, pp. 2387-2390, 1996.
Tatsuya Kawahara et al., “Combining Key-Phrase Detection and Subword-Based Verification for Flexible Speech Understanding”, Proc. IEEE ICASSP 1997, vol. 2, pp. 1159-1162, Apr. 1997.
Tatsuya Kawahara et al., “Flexible Speech Understanding Based on Combined Key-Phrase Detection and Verification”, IEEE Trans. On Speech and Audio Processing vol. 6, pp. 558-568, Nov. 1998.
Asadi, A. et al., “Automatic Modeling of Adding New Words to a

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Confidence measure system using a near-miss pattern does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Confidence measure system using a near-miss pattern, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Confidence measure system using a near-miss pattern will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3054322

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.