Method and apparatus for expanding the vocabulary of a...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S251000, C704S270000

Reexamination Certificate

active

06801893

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates in general to improved speech systems. More particularly the present invention relates to a method and apparatus for adding new words with yet unseen spellings and pronunciations to the vocabulary of a speech system.
BACKGROUND OF THE INVENTION
Today's speech recognition systems, such as “command and control” or “dictation” systems, all typically contain predefined vocabularies, consisting of words, their pronunciations and some model of the usage of these words described by a language model. State-of-the-art systems may contain several tens of thousands of such entries which are used at runtime to determine what is being said.
Regardless of the size of the vocabulary, all systems suffer from the fact that they offer only a limited, fixed vocabulary to the user. The fact that commercially available systems typically only contain full form vocabularies (i.e., they do not model separately the morphology of the language) further limits the effective scope of today's vocabularies. This is especially limiting for highly inflective languages such as French, German or Slavic languages. Consequently almost every user will need to add to this vocabulary their own special terms, names or expressions to fit their individual needs. Being able to extend the base vocabulary with specific terms thus becomes an important issue and frequent activity when using speech recognition systems. From a principle point of view, the language vocabularies have to be viewed as “open or living systems” which never can comprise all possible words of a certain language; in addition, technical limitations (storage requirements and processing load) make it even more impossible to achieve this goal. Thus the methodology and quality of the process to extend a certain vocabulary with new words is an important success factor of speech systems.
The pronunciations of words in a vocabulary are typically stored as phonetic transcriptions (be it phonemes, sub-phonemes or combinations of phonemes). Adding new words to the vocabulary requires the generation of such phonetic transcriptions (pronunciations) to allow for the subsequent recognition of these words. It is imperative that a speech recognition system build adequate acoustic models for these new words, as recognition accuracy is strongly dependent on the quality of these models. Generating inadequate models is likely to result in degraded overall performance and lower recognition accuracy of the system. Therefore, any improvement of the methodology and quality of this extension process is of great importance.
According to the current state of the art, a word is typically added to the system by having the user type in the new word and constructing, from the spelling (and most often a sound sample, i.e., the user pronouncing the new word), a new acoustic pattern to be used in future recognition. An algorithmic or statistical system, broadly called a “Letter-to-Sound System” (LTS), is used to derive the most likely pronunciation(s) of the sequence of letters composing the orthographic representation of the word. In general, a Letter-To-Sound System maps individual letters or combinations of letters to a sequence of phonemes which match their pronunciation. Frequently, a statistical approach is used to generate such systems. An important example for the statistical approach are CARTs (classification and regression trees). The results generated by a LTS are then combined with the acoustics provided by the user to generate the actual pronunciation(s). A detailed description of one example of how a statistical system may be employed for this task is taught by J. M. Lucassen and R. L. Mercer “An Information Theoretic Approach to the Automatic Determination of Phonemic Baseforms,” Proc. of ICASSP-84, 42.5.1-42.5.4, 1982, the disclosure of which is incorporated by reference herein.
Frequently, however, the words added are words derived of a foreign language, customers' names, acronyms, or technical terms generally not obeying the pronunciation rules of the language per se. This is likely to result in inferior pronunciations being generated which will cause frequent misrecognitions when running the system, thus degrading the overall performance and quality of the speech system. Sophisticated systems may detect that the acoustics provided (for instance, by the user pronouncing the word) do not match the generated candidate pronunciations and prompt the user for some additional input. However, since users of these systems usually are not phoneticians or even versed in phonetics, it is important, both from a usability and efficacy point of view to limit their involvement in the generation of these pronunciations to a minimum.
Some systems allow to specify a “sounds-like-spelling” (SLS) pattern (a pseudo-spelling of the word that corresponds to the way the word is pronounced in the given language, like “eye-triple-ee” for English for the word “IEEE”) to support this process. This approach puts the onus on the user to determine whether the word to be added indeed follows the standard pronunciation rules or not, and to provide an alternative spelling that does. These rules are not clearly defined and may even vary within subdomains of a language. This approach tends to break down with users who are either not very careful, not very familiar with the language and/or domain or who are not very well versed in phonetics.
Letter-to-Sound Systems are also used in various other applications of speech systems, such as speech synthesis of words that are not in the basic lexicon. Like speech recognition systems, these “text-to-speech” synthesis systems (TTS) are faced with a similar difficulty when trying to generate the pronunciation of a word that is not in their basic lexicon.
To demonstrate the urgency of improvements in this area, reference is made for instance to the “Angie” framework (an example of a Letter-to-Sound System) description in Aarati D. Parmar—master Thesis, MIT 97, A Semi-Automatic System for the Syllabification and Stress Assignment of Large Lexicons, available at: http://www.sls.lcs.mit.edu/sls/publications/index.html. In this experiment, on the TIMIT database, 10 words out of 2500 failed to generate a correct pronunciation because of “irregular spelling” or “failed letter rules.” And this test set even does not include acronyms, or anything of the like which are likely to be encountered in everyday business environments.
SUMMARY OF THE INVENTION
The present invention provides an improved method and apparatus for adding new words with yet unseen spellings and pronunciations to a vocabulary of a speech system.
In one aspect of the invention, a computerized method is provided for adding a new word to a vocabulary of a speech system, the vocabulary comprising words and corresponding acoustic patterns for a language or language domain. Within a determination step for the new word, a regularity value is determined which measures the conformity with respect to the pronunciation in the language or language domain. In a comparison step, the regularity value is compared to a threshold value to decide whether the conformity is insufficient. Only in the affirmative case of insufficient conformity, a prompting step is performed, prompting for additional information on the pronunciation of the new word. Finally, in an extension step, the new word and an acoustic pattern of the new word are added to the vocabulary.
The present invention provides an automatic determination of the regularity of a proposed word with respect to the standard pronunciation of the language. This lowers the requirement for attention and skills on the user's part in the extension process of a vocabulary. It is neither left up to the user when additional information concerning the pronunciation of a new word is to be introduced to the speech system, nor is this additional information omitted when it is needed. Otherwise, in both cases, the construction of inferior pronunciation models would be the consequence. As the recognition accura

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for expanding the vocabulary of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for expanding the vocabulary of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for expanding the vocabulary of a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3310749

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.