Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1998-08-14
2001-07-31
Hudspeth, David (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S276000, C704S239000
Reexamination Certificate
active
06269335
ABSTRACT:
CROSS REFERENCE TO RELATED APPLICATIONS
This application is being filed concurrent with U.S. patent application Ser. No. 09/134,582 entitled “APPARATUS AND METHODS FOR IDENTIFYING POTENTIAL ACOUSTIC CONFUSIBILITY AMONG WORDS IN A SPEECH RECOGNITION SYSTEM” and U.S. patent application Ser. No. 09/134,259 entitled “APPARATUS AND METHODS FOR REJECTING CONFUSIBLE WORDS DURING TRAINING ASSOCIATED WITH A SPEECH RECOGNITION SYSTEM”.
BACKGROUND OF THE INVENTION
The invention relates to speech recognition and, more particularly, to apparatus and methods for identifying homophones among words in a speech recognition system.
It is generally very difficult to identify which words in an existing vocabulary of a speech recognition engine are or may be confusible with other words in the vocabulary. That is, when a user utters one word that the speech recognizer has been trained to decode, it is possible that the speech recognizer will output the wrong decoded word. This may happen for a variety of reasons, but one typical reason is that the word uttered by the speaker is acoustically similar to other words considered by the speech recognition engine. Mistakes are committed at the level of the output of the recognizer, by misrecognizing a word or dropping a word from an N-best list which, as is known, contains the top N hypotheses for the uttered word.
In addition, with the advent of large vocabulary name recognition employing speech (e.g., a voice telephone dialing application), the problem of resolving which particular spelling of a word was intended by the speaker, when many possible spellings exist within the vocabulary, has added to the difficulty. For example, the two spellings of “Gonzalez” and “Gonsalez” result in similar but perhaps not the same baseforms, as shown below:
GONZALEZ
| G AO N Z AO L EH Z
GONSALEZ
| G AO N S AO L EH Z
Furthermore, many words result in the same baseforms, which are somewhat arbitrarily treated by the speech recognizer. This creates a problem that is often tackled by hand editing the entire vocabulary file, prior to any real-time decoding session, to attempt to remove such potential problems. However, this hand-editing method is not possible if large lists of names are to be automatically incorporated into the vocabulary of the speech recognizer.
This problem exists in other speech recognition areas and up to now has largely been corrected by using the manual approach or using the context to resolve the correct spelling. For example, the words “to”, “two” and “too” are familiar examples of homonyms, i.e., words which have the same sound and/or spelling but have different meanings. The approach to detect which one of these words was actually meant when uttered by a speaker has traditionally been to use the context around the word. Some recognizers may even be capable of intelligently noting that the distance of the spoken speech to all of these words will be the same and thus may prevent such extra scoring by first noting that all three may have the same baseform.
U.S. Pat. No. 4,468,756 to Chan discloses a method for processing a spoken language of words corresponding to individual, transcribable character codes of complex configuration which includes displaying a set of homonyms corresponding to a set of homonym set identifying codes. However, these homonyms and related codes are previously classified and stored in files in accordance with known rules of the particular spoken language (e.g., it is known that in Chinese, approximately 230 characters, among the approximately 2700 basic characters, are classified as homonyms). Then, whenever the spoken word corresponds to a word which was previously classified as a homonym, the method discloses using the code to access the homonym file and then displaying the known homonyms from that file. However, the Chan method is disadvantageously inflexible in that it is limited to the pre-stored classified homonyms. Therefore, among other deficiencies, the Chan method cannot perform real-time identification of words in a vocabulary that are acoustically similar to an uttered word and thus cannot display words that are not otherwise pre-classified and stored as homonyms.
Accordingly, it would be highly advantageous to provide methods and apparatus for substantially lowering the decoding error rate associated with a speech recognizer by providing an automatic real-time homophone identification facility for resolving the intended word in cooperation with the user without regard to known homophone rules of any particular spoken language. It would also be highly advantageous if the results of the homophone identification facility could be used in an off-line correction mode.
Further, it would be highly advantageous to use the output of the homophone identification facility to add homophones to the N-best list produced by the speech recognizer. The list could then be used for re-scoring, both acoustic and language model, or error correction in dictation applications.
SUMMARY OF THE INVENTION
This invention provides methods and apparatus for automatically identifying homophones in a speech recognition engine vocabulary in response to a word uttered by a speaker and preferably providing means for a user (e.g., speaker) to resolve the intended word from the identified homophones. It is to be appreciated that the present invention applies to the identification not only of homonyms (acoustically similar words) but to the more general category of acoustically similar sounds known as homophones. Accordingly, it is to be understood that the term homophone, as referred to herein, includes acoustically similar single and multiple phone words as well as individual phones themselves, whereby the words or phones may have meanings and/or no meanings at all.
In one aspect of the invention, a method of identifying homophones of a word uttered by a user from at least a portion of existing words of a vocabulary of a speech recognition engine comprises the steps of: decoding the uttered word to yield a decoded word; computing respective measures between the decoded word and at least a portion of the other existing vocabulary words, the respective measures indicative of acoustic similarity between the word and the at least a portion of other existing words; identifying, as homophones of the uttered word, the other existing words associated with measures which correspond to a threshold range.
The method also preferably includes the step of indicating, in real-time, to the user the identified homophones. The manner of indicating the identified homophones may include displaying the results to the user on a CRT display or speech synthesizing the results via a text-to-speech (TTS) system in order to produce a spoken version of the results. The user then preferably makes a selection depending on the word the user intended to utter. He may choose the word he uttered, one of the homophones, or he may choose to utter a new word. The selection may be accomplished in a variety of manners. For example, if the results are displayed on a CRT display, the user may make his selection on the screen using any type of input device, mouse, keyboard, touchscreen, etc. The input device may also be a microphone which permits the user to utter his selections.
It is to be appreciated that the TTS embodiment is preferable in speech recognition applications in telephony environments. For instance, such an embodiment is adaptable to IVR (interactive voice response) and directed initiative systems, where a prompted dialog naturally exists between a user and a machine, e.g., a call center IVR for order taking or form filing, like a retail catalog/ordering system. In a voice name dialing application, the user may provide a request and the TTS system permits the recognition system to provide a response such as: “Do you mean John Smith from Manhattan or John Schmidt from Manhattan?”
Also, rather than provide the user with the results of the homophone identification process at the time of utterance, the present invention preferably provides storing the resu
Ittycheriah Abraham
Maes Stephane Herman
Monkowski Michael Daniel
Sorensen Jeffrey Scott
F. Chau & Associates LLP
Hudspeth David
International Business Machines - Corporation
Wieland Susan
LandOfFree
Apparatus and methods for identifying homophones among words... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and methods for identifying homophones among words..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and methods for identifying homophones among words... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2502208