Selection of decoys for non-vocabulary utterances rejection

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S239000, C704S243000, C704S251000

Reexamination Certificate

active

06195634

ABSTRACT:

BACKGROUND TO THE INVENTION
1. Field of the Invention
The invention relates to methods of assessing decoys for use in an audio recognition process, to methods of audio recognition for identifying predetermined sounds in an unknown input audio signal, using decoys, to apparatus and to software for such methods.
2. Background Art
It is known to perform pattern matching such as speech recognition, using steps of:
1) matching an unknown input against a number of models of known speech, (a lexicon)
2) classifying the results (termed tokens), e.g. determining if the closest match is likely to be correct, with or without a positive rejection step.
Classifying recognition results without rejection is usually simple—the recognizer's top choice is either correct or wrong. With rejection, things are a little more complicated. Rejection attempts to detect when the recognition result is incorrect, either because the person said something that is outside the lexicon or because the recognizer has made an error. If the person has said something that is outside the lexicon, the utterance is called a non-vocabulary utterance, referred to herein at times as an imposter utterance. For example, a typical speech recognition application could have about 10% non-vocabulary utterances, which means that 10% of the time, the person says something that is not in the recognizer's vocabulary. The result, after rejection, is classified as one of:
correct acceptance (CA): The recognizer's top choice will lead to performing the correct action, and the rejection algorithm accepts the result (note that this does not mean that the recognizer has gotten every word correct, but just that it has gotten all the important ones correct. For example, in the locality task, it has gotten the locality correct but may have the wrong prefix or suffix).
false acceptance (FA): The top choice is incorrect, either because of a recognition error, or because the token is a non-vocabulary utterance but is not rejected.
correct rejection (CR): The token is an imposter and it is rejected.
false rejection (FR): The token is not a non-vocabulary utterance, but it is rejected (note that if the top choice of the recognizer is wrong, the rejection algorithm is correct to reject the result, but it is still referred to as a false rejection because the notion of correct and false are relative to what the speaker intended, and not the recognizer).
The other commonly used term is “forced choice accuracy,” which refers to the number of times the recognizer's top choice is correct, without considering rejection. The maximum value for forced choice accuracy is 100% minus the non-vocabulary utterance rate. The forced choice accuracy is the maximum possible value for CA, which occurs when the rejection algorithm accepts all correct recognitions. Typically, however, a (hopefully) small percentage of the correct recognitions are rejected, so that CA is less (typically, on the order of 10%) than the forced choice accuracy.
Classification of a token as a CR or FR is sometimes altered by the definition of a non-vocabulary utterance, because of the notion of word spotting. The goal of a true word-spotting system is to pick out the important words, regardless of what the speaker may say before, between, or after them. Technically, if a person says something with a valid core, but an invalid prefix or suffix (where invalid means it is not in the supported prefix or suffix), the token is a non-vocabulary utterance. In the past, such a token has been considered correctly accepted if the recognizer gets the core right, but also correctly rejected if the token is rejected. To be consistent, one definition should be used, and the trend is towards considering a token to be a non-vocabulary utterance only if it does not have a core, or the core is outside of the supported vocabulary, since the goal is towards having a true word-spotting system. More precisely, the goal is to improve the automation rate, which is achieved by having a recognizer which gets all the important words correct, and realizes when it has made an error on an important word.
Rejection using decoys is known, for example from U.S. Pat. No. 5,097,509 (Lennig). Some non-vocabulary utterances may occur much more frequently than others. For example, non-vocabulary utterance tokens could be “Hello”, “Ah”, or nothing but telephone noise (the person said nothing at all, but there was enough noise on the line so that the end-pointer did not detect the lack of speech). The most effective way to reject these tokens is to use decoys. A decoy is simply a model for the non-vocabulary utterance that is added to the recognizer's lexicon. If a decoy is the top choice returned by the recognizer, the token is rejected, regardless of the result of any classification algorithm.
However, it is possible that decoys can reduce the effectiveness or speed of the classification, if they produce close matches to utterances that are within the vocabulary. Accordingly decoys need to be carefully selected to suit the application, or the lexicon. This task requires expert input and may be time consuming, thus limiting the breadth of applicability or the ease of installation of new systems.
It is known from U.S. Pat. No. 4,802,231 (Davis) to generate error templates for a pattern matching process such as speech recognition, derived from words input to the recogniser, and erroneously recognised as matching a word in the vocabulary of the recogniser. Composite error templates may be generated by combining error templates.
It is known from U.S. Pat. No. 5,649,057 (Lee at al) to generate statistical models of extraneous speech and background noise, for use in an HMM (Hidden Markov Model) based speech recognition system. The system involves representing a given speech input as a keyword preceded and followed by sequences of such unwanted sounds. A grammar driven continuous word recognition system is used to determine the best-matching sequence of unwanted sounds and keywords. The model or models of the unwanted noises are refined by an iterative training process, i.e. varying the parameters of the HMM until the difference in likelihoods in consecutive iterations is sufficiently small. The iterative process starts with manual input of the keywords, the most important unwanted words, and noise samples, but may be performed automatically thereafter.
SUMMARY OF THE INVENTION
It is an object of the invention to provide improved methods and apparatus.
According to a first aspect of the invention there is provided a method of assessing decoys for use in an audio recognition process for identifying predetermined sounds in an unknown input audio signal, the method comprising the steps of:
carrying out a test recognition process by matching known training audio signals to models representing the predetermined sounds and the decoys; and
determining for each of the decoys, from the results of the test recognition process, a score representing the effect of the respective decoy in the recognition of any of the known training audio signals. An advantage arising from generating scores for decoys is that the chance of a poor selection of decoys can be reduced. Thus the possibility of poor recognition performance arising from poorly selected decoys can be reduced. Furthermore, the requirement for expert input into the decoy creation process, which may be time consuming, can be reduced. This can make it easier, or quicker, or less expensive to install or adapt to particular circumstances. Also, better rejection, or less false acceptance may be achieved if some decoys are identified which are unexpectedly good.
Preferably, the known training audio signals comprise known non-vocabulary utterances, and the score additionally represents the effect of the respective decoy on the rejection of any of the non-vocabulary utterances. An advantage arising from this is that decoys suited to rejecting given non-vocabulary utterances could achieve good scores and be included in the final dictionary.
Preferably, the method comprises th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Selection of decoys for non-vocabulary utterances rejection does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Selection of decoys for non-vocabulary utterances rejection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Selection of decoys for non-vocabulary utterances rejection will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2581087

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.