Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-03-27
2002-10-29
McFadden, Susan (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S270000
Reexamination Certificate
active
06473734
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to automatic speech recognition, and more particularly, to a method for improved vocabulary additions of speech interfaces.
BACKGROUND OF THE INVENTION
The object of automatic speech recognition (ASR) systems is to capture an acoustic signal representative of speech and determine the words that were spoken. Speech recognizers typically have a set of stored acoustic and language models represented as patterns in a computer database which are the result of stored rules of interpreting the language. These stored models are then compared to the captured signals. The contents of the computer database and the techniques used to determine the best match are distinguishing features of the various types of ASR systems available for use.
An ASR system, however, is only as effective as its ability to recognize words spoken by a user of the system. In small vocabulary applications of typically less than 50 words in which the number of words to be spoken can be readily ascertainable and therefore rigidly defined, a model can be generated and stored for each of the words in the recognition vocabulary. For large-vocabulary applications, such as applications of greater than 1000 words, however, models and recognition algorithms require impracticably large computations. Large-vocabulary applications, therefore, will typically have models for a smaller number of sub-word speech segments, referred to as phonemes, that can be concatenated to produce a model of one or more words. Even so, in most large-vocabulary applications it is not feasible to store models for every conceivable word that a user might use. This is especially true where the words that might be uttered by a user can be expected to be specific to that user, such as names or places that are of particular interest to only that user.
In applications subject to some degree of personalization by the user, such as large-vocabulary applications, in which it is anticipated and even expected that the user will at some point utter a word that is not recognized by the ASR system, there needs to be a mechanism in place for the user to dynamically add at-will the desired, unknown word to the vocabulary of the ASR system. Often, the user can enter the word to be added through some type of user interface, other than voice recognition, such as via a touch screen or a keypad of a device of the application. This approach may not be acceptable, however, in applications in which the only user interface to the application is via the ASR system. Such is the case in applications, such as radio wireless devices and other communication devices, that have undergone miniaturization to the point where communication with the user through traditional, non-speech recognition-based means like keyboards, keypads, etc. is no longer feasible nor desirable. For these applications the user uses speech to add the new word to the ASR vocabulary, except in the unlikely event that the user has access to a computer with a grammar utility. The typical approach for a user to add a new word or words to an ASR-based application using speech is for the user to spell the new word by saying individual letters. This approach, however, is tedious, time-consuming, and prone to error.
The difficulty of this approach for adding a word that is not part of the grammar of the ASR system of the application being used is illustrated by the following example. Suppose that a user of a radio wireless communication device having ASR capabilities receives a call from a person named Camille; Camille, perhaps the name of a friend of the user, is a word not recognized by the ASR system of the radio wireless communication device. After terminating the call, the user wishes to add Camille and her phone number to the phone book function of the wireless communication device by saying something like, “Add Camille to my phone book.” This is not possible, however, since the word “Camille” is not yet part of the speech recognizer's vocabulary and thus the communication device will not recognize the word “Camille.” The recognizer will most likely erroneously choose a name already in the vocabulary that sounds similar to Camille or reject the word as being unrecognized. And while the user may enter the word “Camille” to the recognizer's vocabulary by spelling the word, this is a time-consuming, error-prone, and tedious task that is not particularly user-friendly.
U.S. Pat. No. 5,724,481 to Garberg et al. describes a method for automatic speech recognition of arbitrarily spoken words in which the user must enter a piece of information related to the information to be added to the recognizer's vocabulary. If the user wanted to enter the name “Camille,” for instance, and the user knew that the application maintained a relationship between Camille's name and her phone number, then the user could say “Add 555 1212 to my phone book.” This would cause the application to add the name “Camille” to the phone book, assuming that 555 1212 is the phone number of Camille. A drawback of this approach, however, is that it requires user interaction and knowledge. The user could not have caused the name “Camille” to be added to the vocabulary if he had not known her phone number. Additionally, the related information required, in this case, the phone number, is different for each word or phrase to be added and so this approach is difficult to remember and use. Moreover, this approach requires that the related information for each word that could possibly be added must already be in the recognizer's vocabulary. This requirement, of course, significantly increases the vocabulary size.
U.S. Pat. No. 5,797,116 to Yamada et al. describes a method for adding unknown words by performing natural language processing on an input sentence to determine the part of speech of the word to be added. This information is used to create a query to the user for information that more narrowly defines the scope of the unknown word. The user's response to the query is contained within a supplementary database that has a high probability of containing the word to be added. Next, acoustical matching is performed to determine the entry in the supplemental database that is closest to the unknown word and this entry is used for the unknown word. The approach of the Yamada et al. patent has several shortcomings. First, natural language processing is required and this increases the overhead associated with the speech recognition process. Second, a supplemental database containing the user's answers to queries must be maintained. Third, the method requires human intervention by the user in order to add a word.
In light of the foregoing, there is therefore an unmet need in the art for a user of an application that utilizes a speech recognition interface to be able to add new words to a vocabulary of the interface in a manner that overcomes these various shortcomings, is user-friendly, and one that minimizes error and is not time-consuming.
REFERENCES:
patent: 5724481 (1998-03-01), Garberg et al.
patent: 5797116 (1998-08-01), Yamada et al.
patent: 5867495 (1999-02-01), Elliott et al.
Young, Steve, “A Review of Large-Vocabulary Continuous-Speech Recognition,” IEEE Signal Processing Magazine, Sep. f1996.
“Language Presents Devilishly Tough Challenges for Computers,” Wildfire Communications, Inc., Lexington, MA 02421, Sep. 7, 1999.
McFadden Susan
Miller Johnson Snell & Cummiskey, P.L.C.
Motorola Inc.
LandOfFree
Methodology for the use of verbal proxies for dynamic... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methodology for the use of verbal proxies for dynamic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methodology for the use of verbal proxies for dynamic... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2962616