Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-09-13
2001-12-25
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S257000
Reexamination Certificate
active
06334102
ABSTRACT:
CROSS REFERENCE TO RELATED APPLICATIONS
(Not Applicable)
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
(Not Applicable)
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to the field of speech recognition software and more particularly to an improved method of adding vocabulary to a speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of text words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. Improvements to speech dictation systems provide an important way to enhance user productivity.
Currently within the art, speech recognition systems possess a finite set of recognizable vocabulary words. These systems model and classify acoustic signals to form acoustic models, which are representations of basic linguistic units such as phonemes. From an acoustic analysis, speech recognition systems derive a list of potential word candidates for a given series of acoustic models. The potential word candidates are ordered from the most likely user intended word to the least likely. Next, the speech recognition system performs a contextual analysis between a language model, each potential word candidate, and the most recent words derived by the speech recognition system. The system may determine that although the first word candidate is the closest acoustic match to the user utterance, it does not fit the context of the text being dictated. The second word candidate, though not a perfect acoustic match to the user utterance, may more closely match the context of the text being dictated by the user. The system then makes a determination as to which word candidate is the correct user intended word.
The language model used within the speech recognition system is comprised of statistical models. Such statistical models, or language model statistics, are one, two, and three word groupings called unigrams, bigrams, and trigrams respectively, wherein each unigram, bigram, and trigram has an associated frequency. For example, trigrams can be formed by taking each word in a large corpus of text, called a training corpus, and constructing all possible three word permutations. The system can observe the frequency of each trigram that appears in the training corpus. This observed frequency is a measure of trigram probability. Trigrams that do not appear in the training corpus result in a trigram probability of zero. Unigrams, bigrams, and trigrams that do appear in the training corpus can be assigned corresponding frequency values.
In order for a user to add a word with no language model statistics to a speech recognition system, the user can analyze another training corpus to develop unigrams, bigrams, trigrams, and frequency data for the word. This situation occurs when a word has been left out of the training corpus. The user must develop the needed language model statistics for the word before adding it to the speech recognition system vocabulary. Alternatively, the user can edit each document that will contain the word by manually inserting the word in the document. Although this process can function relatively well when editing a small file or a small number of files, the process is cumbersome for persons that build specialized speech recognition vocabularies for different topics such as medical, legal, and travel. Such users deal with thousands of files. Moreover, the files can be too large for conventional editors.
The disadvantage is further compounded when the word to be added to the system behaves in the same or similar manner as another word recognizable to the system. In this case, developing language model statistics wastes time because the resulting information will differ only slightly from the language model statistics corresponding to the recognizable word. For example, if a user wants to add the word “Laguardia” to reference the airport located in New York, the user must develop language model statistics for “Laguardia”. In this case, rather than developing completely new statistical information, the language model statistics for “Laguardia” can be based upon existing language model statistics for the word “Heathrow” in reference to the airport located in London.
Currently, a method of adding new words to speech recognition systems utilizing class files exists in the art. Class files allow the user to generate a file of words with similar properties. An example of a class file is a list of airport names. After the class file is created, the speech recognition system removes each word of the class file from the language model, replacing it with a reference to the class file. For example, if a class file called “airport” contained “O'Hare”, “Heathrow”, and “Laguardia”, the system would remove all occurrences of those specific airport names contained in the class file “airport” from the language model. Each occurrence of a member of the class file would be replaced with the reference “[airport]”. As a result, the trigram “Heathrow in England” would be changed to “[airport] in England”.
Although words can be added to the speech recognition system vocabulary in this manner, class files neither incorporate frequency data, nor ensure contextual accuracy. Consequently, although the context of a trigram may clearly indicate an airport in England, the airport “Laguardia” located in New York is as likely a candidate as “Heathrow” to the speech recognition system. The lack of word frequency data and the lack of a method of ensuring contextual accuracy within class files can result in nonsensical trigrams such as “Laguardia in England”. The user has no way of avoiding such a nonsensical outcome and no way to check for contextual accuracy. As a result, there has arisen a need for a more efficient way to add new vocabulary words to speech recognition systems.
SUMMARY OF THE INVENTION
The invention concerns a method and a system for adding new vocabulary by using language model statistics corresponding to an existing vocabulary word. The method of the invention involves a plurality of steps. First, the system receives a user input identifying a first word for which no language model statistics exist in the speech recognition system. The first word is for inclusion within the existing vocabulary of the speech recognition system. In response to a second user input identifying a second word for which language model statistics exist in the speech recognition system, the system recalls from a computer memory the language model statistics for the second word. Then the system automatically creates language model statistics for the first word by duplicating the language model statistics of the second word and replacing each occurrence of the second word in the duplicated language model statistics with the first word.
The speech recognition system receives a user input specifying a relative frequency of the first word in relation to the second word. Next the system automatically updates the language model statistics for the first word by modifying frequency values in the language model statistics for the first word according to the user specified relative frequency of the first word.
In one aspect of the invention, the system presents the user with at least one of a bigram and trigram from the language model statistics for the first word in a user readable format. Then the system receives user input specifying modifications to the bigrams and trigrams from the language model statistics for the first word, for inclusion in a language model of the speech recognition system.
Each of the user inputs received by the system can be in the form of a spoken utterance, and the first word and the second word can be related in meaning. Within the system, the language model statistics for the second word are comprised of each unigram, bigram, and trigram containing the second word and a frequency value for each
Lewis James R.
Ortega Kerry A.
Akerman & Senterfitt
Dorvil Richemond
International Business Machines Corp.
LandOfFree
Method of adding vocabulary to a speech recognition system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of adding vocabulary to a speech recognition system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of adding vocabulary to a speech recognition system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2558086