Assigning meanings to utterances in a speech recognition system

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S243000, C704S244000, C704S251000

Reexamination Certificate

active

06311157

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to speech recognition systems. More specifically, this invention relates to the generation of language model(s) and the interpretation of speech based upon specified sets of these language model(s).
2. Background of Related Art
To increase the utility of computer systems, many manufacturers have been seeking to achieve the goal of speaker independent speech recognition. This technology would allow the computer system to be able to recognize and respond to words spoken by virtually anyone who uses it. Unfortunately, the performance of processors in personal computer systems and the techniques used to implement the technology have been typically inadequate for handling the complexity of such speech recognition tasks.
One problem is simply the complexity of the algorithms used for speech recognition. Even the fastest personal computers have difficulty performing all of the computation required for speech recognition in real time (the time it takes for a human to speak the utterance being recognized), so that there is a noticeable delay between the time the user has finished speaking and the time the computer generates a response. If that time delay is too large, the usefulness and acceptance of the computer system will be greatly diminished.
Another problem with speech recognition systems is accuracy. In general, as the number of utterances that a speech recognition system is programmed to recognize increases, the computation required to perform that recognition also increases, and the accuracy with which it distinguishes among those utterances decreases.
One problem is due to the large vocabulary required for interpreting spoken commands. These tasks will typically require a search of the entire vocabulary in order to determine the words being spoken. For example, this vocabulary may comprise all the words in a specified language, including any specialized words. Such vocabularies must also include plurals, all conjugations of verbs (regular and irregular), among other items, creating a very large vocabulary to be recognized. This requires a very large database search. It also mandates the use of very high performance search capabilities by using a high performance processor, or the use of a special search techniques. Even assuming all these things, typical prior art search techniques and processors have been inadequate for full “natural language” speech recognition, that is, recognizing speech in a manner in which people normally speak to each other. It is desirable to provide a system which provides some natural language capabilities (e.g., allowing people to speak in a manner in which they might normally speak) but yet avoid the overhead associated with full natural language systems.
Another problem posed by speech recognition systems is the dynamic adding of additional words to the vocabulary that may be recognized depending on data contained within the computer. In other words, prior art speech recognition systems have not provided a means for recognizing additional words which have pronunciations which are unknown to the system.
Another prior art problem posed by speech recognition systems is the transformation of the spoken commands being recognized into data to be used by the system, or actions to be performed. For example, a person may speak a date as a sequence of many words such as “the third Friday of next month”, while the computer system requires a specific numeric representation of that date, e.g., the number of seconds since Jan. 1, 1900. In summary, prior art speech recognition systems suffer from many deficiencies that prohibit incorporating such technology into non-dedicated devices such as a personal computer.
SUMMARY AND OBJECTS OF THE INVENTION
One of the objects of the present invention is to provide a means for associating meanings with spoken utterances in a speech recognition system.
Another of the objects of the present invention is to provide an improved method for associating expressions (e.g. actions and variable values) to speech rules in a speech recognition system.
These and other objects of the present invention are provided for by a method and apparatus for assigning meanings to spoken utterances in a speech recognition system. A plurality of speech rules is generated, each of the speech rules comprising a language model and an expression associated with the language model. Upon the detection of speech in the speech recognition system, a current language model is generated from each language model in the speech rules for use by a recognizer. When a sequence of words is received from the recognizer, a set of speech rules which match the sequence of words received from the recognizer is determined. Each expression associated with the language model in each of the set of speech rules is evaluated, and actions performed in the system according to the expressions associated with each language model in the set of speech rules. In various embodiments, language models may reference other language models which also have associated expressions. Each of the expressions for referenced language models are evaluated first, and then the language models comprising the speech rules are evaluated. Thus, actions such as variable assignments and commands may be performed according to these speech rules.


REFERENCES:
patent: 4618984 (1986-10-01), Das et al.
patent: 4827520 (1989-05-01), Zeinstra
patent: 4994983 (1991-02-01), Landell et al.
patent: 5027406 (1991-06-01), Roberts et al.
patent: 5033087 (1991-07-01), Bahl et al.
patent: 5046099 (1991-09-01), Nishimura
patent: 5315689 (1994-05-01), Kanazawa et al.
patent: 5384892 (1995-01-01), Strong
patent: 5390279 (1995-02-01), Strong
patent: 0293259 (1988-11-01), None
patent: 0299572 (1989-01-01), None
patent: 0327408 (1989-06-01), None
Schmandt et al., “Augmenting a Window System with Speech Input,”Computer,Aug. 1990, 23(8):50-56.*
“Integrated Audio-Graphics User Interface,” IBM Technical Disclosure, Apr. 1991, 33(11):368-71.*
Holmes,Speech Synthesis and Recognition,Chapman & Hall, London, UK, 1988, pp. 129-135, 152-153.*
Alan Roskiewicz, “BackTalk: Lip Service,”A+ Magazine,pp. 60-61 (Feb. 1984).*
Mountford, S. Joy, et al.The Art of Human-Computer Interface Design“Talking and Listening to Computers”, pp. 310-334, Addison-Wesley Publishing Co., Inc. Reading, MA (1990).
“Speech Editor”, IBM Technical Disclosure Bulletin, vol. 29 No. 10, Mar. 1987; pp. 4512-4514.
International Conference on Acoustics, Speech and Signal Processing 90, vol. 1, Apr. 3, 1990, pp. 573-576, Murveit et al., “Integrating Natural Language Constraints into HMM-based Speech Recognition”.
Computer, vol. 24, No. 6, Jun. 1992, pp. 36-50, Kitano, “PhiDM-Dialog”.
IBM Technical Disclosure Bulletin, vol. 34, No. 1, Jun. 1991, “Speech Recognition with Hidden Markov Models of Speech Waveforms”.
Kai-Fu Lee, Large-Vocabulary Speaker-Independent Continuous Speech Recognition: The SPHINX System, Apr. 18, 1988 (submitted from fulfillment of requirements for Ph.D. at Carnegie Mellon University).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Assigning meanings to utterances in a speech recognition system does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Assigning meanings to utterances in a speech recognition system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Assigning meanings to utterances in a speech recognition system will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2593141

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.