Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2001-03-27
2004-08-03
McFadden, Susan (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S236000, C704S255000
Reexamination Certificate
active
06772116
ABSTRACT:
CROSS REFERENCE TO RELATED APPLICATIONS
(Not Applicable)
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
(Not Applicable)
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to the field of speech recognition, and more particularly, to detecting and decoding telegraphic speech within a speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of text words, numbers, or symbols by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry, and command and control. Improvements to speech recognition systems provide an important way to enhance user productivity.
Speech recognition systems can model and classify acoustic signals to form acoustic models, which are representations of basic linguistic units referred to as phonemes. Upon receipt of the acoustic signal, the speech recognition system can analyze the acoustic signal, identify a series of acoustic models within the acoustic signal and derive a list of potential word candidates for the given series of acoustic models.
Subsequently, the speech recognition system can contextually analyze the potential word candidates using a language model as a guide. Specifically, the language model can express restrictions imposed on the manner in which words can be combined to form sentences. The language model can express the likelihood of a word appearing immediately adjacent to another word or words. Language models used within speech recognition systems typically are statistical models. A common example of a language model can be an n-gram model. In particular, the bigram and trigram models are exemplary n-gram models typically used within the art.
Conventional speech recognition system language models are derived from an analysis of a grammatical training corpus of text. A grammatical training corpus contains text which reflects the ordinary grammatical manner in which human beings speak. The training corpus can be processed to determine the statistical and grammatical language models used by the speech recognition system for converting speech to text, also referred to as decoding speech. It should be appreciated that such methods are known in the art and are disclosed in
Statistical Methods for Speech Recognition
by Frederick Jelinek (The MIT Press, 1997), which is incorporated herein by reference.
Telegraphic expressions are commonly used as newspaper headlines, as bulleted lists in presentations, or any other place where brevity may be desired. A telegraphic expression is speech that is limited in meaning and produced without inflections or function words. Function words, also called closed-class words, can include determiners such as “a” and “the” and demonstratives such as “this” or “that”. Other closed-class words can include pronouns, except for nominative case pronouns such as “he” and “she”, auxiliary verbs such as “have”, “be”, “will”, and auxiliary verb derivatives. Closed-class words serve the functional purpose of tying open-class words, called content words, together. For example, the closed-class words within the grammatical text phrase, “the boy has pushed the girl”, are “the”, and “has”. By removing these closed-class words, the resulting text, “boy pushed girl” is said to be a telegraphic expression. Notably, closed-class words, such as demonstratives and pronouns, typically are comprised of a limited number of members. Such words are said to be closed-class words because new functional words are rarely added to a language. Accordingly, the number of closed-class words remains fairly constant.
In contrast to close-class words, open-class words can contain an infinite number of members. Open-class words can include nouns, verbs, adverbs, and adjectives. These words can be invented and added to a language as a need arises, for example when a new technology is invented.
Human beings can easily and naturally read and speak in terms of telegraphic expressions. Conventional speech recognition systems using grammatical language models, however, can be inaccurate when converting telegraphic speech to text and often introduce errors into the text output. Specifically, because conventional speech recognition systems rely on grammatically based language models, such systems often insert unwanted function words into the textual representation of a received telegraphic user spoken utterance. The unwanted words result in inaccurate decoding of user spoken utterances to text.
SUMMARY OF THE INVENTION
The invention disclosed herein concerns a method and a system for use in a speech recognition system for applying a telegraphic language model to a received user spoken utterance. The user spoken utterance can be converted to text, or decoded, using the telegraphic language model. The invention also can include generating the telegraphic language model from an existing training corpus.
In particular, subsequent to generating a telegraphic language model, the speech recognition system can enable or disable decoding using the telegraphic language model, referred to as telegraphic decoding. The speech recognition system can continually calculate a running average of closed-class word confidence scores. If that average falls below a predetermined threshold value, the speech recognition system can begin decoding received user spoken utterances with a conventional grammatically based language model, referred to as a conventional language model, and a telegraphic language model. The resulting text having the highest confidence score can be provided as output text. If the running average later exceeds the threshold value, the speech recognition system can disable the telegraphic decoding. It should be appreciated that if the system has sufficient computational resources, the mechanism for engaging and disabling telegraphic decoding is not necessary. In that case, for example, the speech recognition system can process all received user spoken utterances using both language models, selecting the resulting text having the highest confidence score. Briefly, a confidence score reflects the likelihood that a particular word candidate accurately reflects the user spoken utterance from which the word candidate was derived.
One aspect of the invention can include a method of selecting a language model in a speech recognition system for decoding received user spoken utterances. The method can include the steps of computing confidence scores for identified closed-class words and computing a running average of the confidence scores for a predetermined number of decoded closed-class words. Based upon the running average, the step of selectively enabling telegraphic decoding to be performed can be included. Notably, telegraphic decoding can be enabled in addition to conventional decoding. Also included can be the step of selectively disabling telegraphic decoding based upon the running average.
Another embodiment of the invention can include a method of decoding received user spoken utterances in a speech recognition system. In that case, the method can include decoding the received user spoken utterance with a conventional language model resulting in a first word candidate and decoding the received user spoken utterance with an alternate language model resulting in a second word candidate. The alternate language model can be a telegraphic language model. Also included can be the steps of computing a confidence score for the first word candidate and the second word candidate. The step of selecting the word candidate having the highest confidence score also can be included. The first word candidate and the second word candidate can be the same word, but have different confidence scores. Also, if the first word candidate and the second word candidate are not the same word but have the same confidence scores, either the first or the second word candidate can be selected.
Another aspect of the invention can include a method
Akerman & Senterfitt
International Business Machines - Corporation
McFadden Susan
LandOfFree
Method of decoding telegraphic speech does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of decoding telegraphic speech, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of decoding telegraphic speech will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3358590