Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
2000-09-08
2003-08-12
McFadden, Susan (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S275000, C704S004000, C707S793000
Reexamination Certificate
active
06606597
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to language models used in automatic systems to represent word sequences and their probability of occurrence. More particularly, the present invention relates to a language model that includes augmented words that are augmented with lexical (for example, linguistic) information regarding the corresponding word.
BACKGROUND OF THE INVENTION
Language models are employed in various automatic systems, such as speech recognition systems, handwriting recognition systems, spelling correction systems, and other word-oriented pattern recognition systems. A language model represents word sequences and the probability of that sequence occurring in a given context. Although the systems and methods of the present invention are applicable to any word-oriented pattern recognition problem, the invention will be discussed herein with respect to speech recognition, as that is a common application of language models.
Speech recognition systems employ models of typical acoustic patterns and of typical word patterns in order to determine a word-by-word transcript of a given acoustic utterance. The word-patterns used by a speech recognition system are collectively referred to as a language model. The acoustic patterns are referred to as an acoustic model.
Many current speech recognition systems use language models that are statistical in nature. Such language models are typically constructed using known techniques from a large amount of textual training data that is presented to a language model builder. An n-gram language model may use known statistical “smoothing” techniques for assigning probabilities to n-grams that were not seen in the construction/training process. In using these techniques, the language models estimate the probability that a word w
n
will follow a sequence of words w
1
, w
2
, . . . w
n−1
. These probability values collectively form the n-gram language model.
There are many known methods that can be used to estimate these probability values from a large text corpus presented to the language model builder, and the exact methods for computing these probabilities are not of importance to the present invention. Suffice it to say that the language model plays an important role in improving the accuracy and speed of the recognition process by allowing the recognizer to use information about the likelihood, grammatical permissibility, or meaningfulness, of sequences of words in the language. In addition, language models that capture more information about the language lead to faster and more accurate speech recognition systems.
Current approaches to language modeling consider words to be equivalent to their orthographic (written) form. However, in many cases, the orthographic form is not sufficient for drawing distinctions that have an impact on the way the word is spoken. Often, the meaning of a word, including its syntactic role, determines its pronunciation. The pronunciations used in the following examples employ a phonetic notation known as the “ARPABET.” The numbers attached to vocalic phonemes indicate syllabic stress. A favorite example is the word “object”. The syntactic role (in this case, part of speech) for “object” can be noun or verb:
OBJECT/N
/AA1 B JH EH0 K T/
OBJECT/V
/AH0 B JH EH1 K T/
Accordingly, the pronunciation of the word depends on the syntactic role. In the case of the noun “object,” the stress is on the first syllable, and for the verb “object,” the stress is on the second syllable.
Another favorite example is the word “wind”. Again, the syntactic role (part of speech again here) determines the pronunciation:
WIND/N
/W IH N D/
WIND/V
/W AH IY N D/
A final favorite example is the word “read”. Here the syntactic role that affects pronunciation is the tense of the verb (present or past):
READ/V + PRES
/R IY D/
READ/V + PAST
/R EH D/
Words with different syntactic properties, such as those in the above examples, tend to appear in different contexts. Thus, statistical language models that do not distinguish between words with identical orthography but different, senses or syntactic roles will model those words and their contexts poorly.
Class-based language models deal with training data sparseness by first grouping words into classes and then using these classes as the basis for computing n-gram probabilities. Classes can be determined either by automatic clustering, or they can be domain-specific semantic categories or syntactic categories (e.g., parts of speech (POS)). Although the latter approach has the advantage of capturing some linguistic information in the language model, using syntactic classes in traditional formulations has a major drawback: the POS tags hide too much of the specificlexical information needed for predicting the next word.
An alternative approach has been proposed in which part-of-speech (POS) tags are viewed as part of the output of the speech recognizer, rather than intermediate objects, as in class-based approaches. However, in this approach the words and tags are viewed as being produced by separate processes.
The present invention addresses to these and other problems and offers other advantages over the prior art.
SUMMARY OF THE INVENTION
The present invention relates to a speech recognition system (or any other word-oriented pattern recognition system) that employs a language model that includes augmented words that are augmented with lexical information regarding the corresponding word.
One embodiment of the present invention is directed to a computer-readable medium having stored thereon a data structure that includes a first data field, optional previous-word data fields, and a probability data field. The first data field contains data representing a first word and includes an orthography subfield and a tag subfield. The orthography subfield contains data representing the orthographic representation (written form) of the word. The tag subfield contains data representing a tag that encodes lexical information regarding the word. Each of the previous-word data fields contains data representing a potentially preceding word and includes an orthography subfield and a tag subfield. The orthography subfield contains data representing the orthographic representation of the word. The tag subfield contains data representing a tag that encodes lexical information regarding the word. The probability data field contains data representing the probability of the first word and tag occurring (possibly after the optional preceding words and accompanying tags) in a word sequence, which may comprise a sentence or a conversational utterance.
Another embodiment of the present invention is directed to a method of building a language model. Pursuant to this embodiment, a training corpus comprising a body of text is received. Words in the training corpus are each augmented with a tag encoding lexical information regarding the corresponding word. A plurality of sequences of n augmented words are selected, n being a positive integer. Each selected sequence includes a sub-sequence made up of the first n−1 augmented words of the selected sequence. For each selected sequence of n augmented words, the method computes the probability that, given an occurrence of the sub-sequence in a block of text, the immediately following word will be the nth augmented word of the selected.
Another embodiment of the invention is directed to a method of automatically recognizing speech. Pursuant to this embodiment, a language model having a plurality of n-grams is provided. Each n-gram includes a sequence of n augmented words. Each augmented word includes a word and a tag encoding lexical information regarding the word. The language model further includes a probability indicator for each n-gram. Each probability indicator is indicative of a probability that, given an occurrence of the first n−1 words of the corresponding n-gram in a block of text, the immediately following word in the block of text will be the nth word of the n-gram. The speech recognition process hypothesizes m
Galescu Lucian
Ringger Eric K.
Kelly Joseph R.
McFadden Susan
Microsoft Corporation
Westman Champlin & Kelly P.A.
LandOfFree
Augmented-word language model does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Augmented-word language model, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Augmented-word language model will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3090206