Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-12-26
2004-06-22
McFadden, Susan (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S251000, C704S255000, C704S257000, C704S270000
Reexamination Certificate
active
06754625
ABSTRACT:
BACKGROUND
1. Technical Field
The present invention generally relates to speech recognition systems and, in particular, to a method for augmenting alternate word lists from which a correct word is selected in place of a word wrongly decoded by a speech recognition system. The method employs acoustic confusability criterion to augment such alternate word lists.
2. Background Description
Conventional speech recognition systems generally include facilities that allow a user to correct decoding errors. In particular, when a user determines that a word has been wrongly decoded, the user may query the system for a list of alternative words corresponding to that word. In general, such a list contains high-probability alternatives to the word decoded at each position of an audio stream. These alternatives are computed live from the audio stream in question, and reflect the normal operation of the speech recognition engine, which must typically choose, from among several possible decodings of each segment of the audio stream, the preferred word to transcribe.
By “normal operation of the speech recognition engine”, we mean the following. Let h=w
1
,w
2
, . . . , w
i−1
represent some sequence of decoded words, corresponding to some portion of the audio stream a(w
i
,w
2
, . . . , w
i−1
). Typically, the exact end time of word w
i−1
is not known, and the system proceeds by considering a range of possible end times of this word, and there for start times of the next word.
The system must now guess the identity of the next word w
i
based upon consideration of the acoustic signal a(w
i
, w
l+1
, . . . ) and likewise consideration of the words decoded up to that point, which is the sequence h defined above. There is a principled way of making this guess, which is to consider the product p(a(w
i
)|x)·p(x|h), as xruns over various words in the recognizer vocabulary. In this expression, the first factor, p(a(w
i
)|x), is known as the acoustic model probability, and the second factor, p(x|h), is known as the language model probability. In general, these raw values may be geometrically or otherwise weighted before being combined. However, to simplify this discussion, the acoustic model probability and the language model probability will be combined by simply computing their product, as indicated above.
Although in principle this product could be evaluated for every word x of the recognizer's vocabulary, this is seldom done in practice. Instead, some short list of candidates is first computed. For instance, only the top N words of the vocabulary may be retained for further consideration, when ranked according to the language model score p(x|h). Let us refer to this as the list of language model candidates C. Typically, acoustic model scores p(a(w
i
)|x) are then computed only for x&egr;C. Thereafter, a further winnowing of the elements of C will occur, retaining, for example, only the top M words of C when ranked according to the product p(a(w
i
)|x)·p(x|h). Alternatively, the system may retain only those words x′ such that the product p(a(w
i
)|x′)·p(x′|h) lies within some fixed fraction of the maximal value p(a(w
i
)|{circumflex over (x)})·p({circumflex over (x)}|h).
The resulting set of candidates or hypotheses then comprises the list of alternate words for the given segment of the acoustic signal. Note that it is entirely possible that this set may contain only one single element, {circumflex over (x)}. It is also possible that this word may be wrong, and the correct word may not be included within the alternate word list.
The system retains in memory this list of possibilities, associated with the given segment. The system typically computes and retains as well the product p(a(w
i
)|x)·p(x|h) cited above, or some other figure of merit for each word in the list. When the user determines that an error has been made in a particular position of the audio stream, the system presents this list of possible words to the user; the user may then select the correct word from among the list of possible words if the correct word is present, or type in a completely different word if the correct word is not present. It is of course much more convenient if the correct word appears in the list. Unfortunately this is not always the case; indeed frequently NO alternatives are presented. The invention is a method for augmenting such alternate word lists, increasing the odds that the correct word will be presented to the user.
Accordingly, it would be desirable and highly advantageous to have a method for augmenting such alternate word lists, to increase the probability that the correct word is presented to the user. Such a method should also increase the convenience of using a speech recognition system employing the same.
SUMMARY OF THE INVENTION
The problems stated above, as well as other related problems of the prior art, are solved by the present invention, a method for augmenting alternate word lists generated by a speech recognition system. The alternate word lists are used to provide words from which a user may select a correct word in a place of a wrongly decoded word by the system. The method employs acoustic confusability criterion to augment such alternate word lists.
The use of augmented alternate word lists according to the invention significantly increases the number of times that the alternate word lists contain the correct word. Thus, the convenience of using a speech recognition system is increased.
According to a first aspect of the invention, there is provided a method for augmenting an alternate word list generated by a speech recognition system. The alternate word list includes at least one potentially correct word for replacing a wrongly decoded word. The method includes the step of identifying at least one acoustically confusable word with respect to the wrongly decoded word. The alternate word list is augmented with the at least one acoustically confusable word.
According to a second aspect of the invention, the augmenting step includes the step of adding the at least one acoustically confusable word to the alternate word list.
According to a third aspect of the invention, the system includes a vocabulary having a plurality of words included therein, and the identifying step includes the steps of: respectively determining a similarity between pronunciations of each of at least one of the plurality of words included in the vocabulary with respect to the wrongly decoded word; and respectively expressing the similarity by a score.
According to a fourth aspect of the invention, the identifying step identifies the at least one acoustically confusable word based on the score.
According to a fifth aspect of the invention, the at least one acoustically confusable word includes a plurality of acoustically confusable words, and the augmenting step includes the steps of: ranking each of the plurality of acoustically confusable words based on the score; and adding at least one of the plurality of acoustically confusable words to the alternate word list, in descending order with respect to the score.
According to a sixth aspect of the invention, the augmenting step further includes the step of restricting a number of words added to the alternate word list based on a predefined threshold.
According to a seventh aspect of the invention, the predefined threshold corresponds to a maximum number of words to be added to the alternate word list.
According to a eighth aspect of the invention, the predefined threshold corresponds to a maximum size of the alternate word list.
According to a ninth aspect of the invention, the predefined threshold corresponds to a minimum score for words to be added to the alternate word list.
According to a tenth aspect of the invention, the at least one potentially correct word includes a plurality of potentially correct words and the at least one acoustically confusable word includes a plurality of acoustically confusable word
Olsen Peder Andreas
Picheny Michael Alan
Printz Harry W.
Visweswariah Karthik
Dang Thu A.
F.Chau & Associates LLC
McFadden Susan
LandOfFree
Augmentation of alternate word lists by acoustic... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Augmentation of alternate word lists by acoustic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Augmentation of alternate word lists by acoustic... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3366261