Method and apparatus for adapting the language model's size in a

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704257, 704248, 704238, G10L 900

Patent

active

058999736

DESCRIPTION:

BRIEF SUMMARY
TECHNICAL FIELD

The present invention concerns speech recognition systems being implemented in a digital computer, or speech recognition devices like dictaphones or translation devices for telephone installations. In particular, the invention is directed to a mechanism for decreasing the size of the statistical language model in such speech recognition systems in order to reduce the needed resources, such as storage requirements, to process such systems. The language model's size can be also adapted to the system environment conditions or user specific speech properties.


BACKGROUND OF THE INVENTION

In speech recognition systems being based on a statistical language model approach instead of being knowledge based, for example the English speech recognition system TANGORA developed by F. Jelinek et al. at IBM Thomas J. Watson Research Center in Yorktown Heights, USA, and published in Proceedings of IEEE 73(1985)11, pp.1616-24), entitled "The development of an experimental discrete dictation recognizer", the recognition process can be subdivided into several steps. The tasks of these steps depicted in FIG. 1 (from article by K. Wothke, U. Bandara, J. Kempf, E. Keppel, K. Mohr, G. Walch (IBM Scientific Center Heidelberg), entitled "The SPRING Speech Recognition System for German", in Proceedings of Eurospeech 89, Paris 26.-28.IX.1989), are signal by a signal processor; to produce the observed label sequence; the language by means of a statistical language model.
The whole system can be either implemented on a digital computer, for example a personal computer (PC), or implemented on a portable dictaphone or a telephone device. The speech signal is amplified and digitized, and the digitized data are then read into a buffer memory contained for example in the signal processor. From the resulting frequency spectrum a vector of a number of elements is taken and the spectrum is adjusted to account for an ear model.
Each vector is compared with a number of (say 200) speaker dependent prototype vectors. The identification number which is called an acoustic label, of the most similar prototype vector, is taken and sent to the subsequent processing stages. The speaker dependent prototype vectors are generated from language specific prototype vectors during a training phase for the system with a speech sample.
The fast acoustic match determines for every word of a reference vocabulary the probability with which it would have produced the sequence of acoustic labels observed from the speech signal. The probability of a word is calculated until either the end of the word is reached or the probability drops below a pre-specified level. The fast match uses as reference units for the determination of this probability a so-called phonetic transcription for each word in the reference vocabulary, including relevant pronunciation variants, and a hidden Markov model for each allophone used in the phonetic transcription. The phonetic transcriptions are generated by use of a set of phoneticization rules (l.c.)
The hidden Markov model of an allophone describes the probability with which a substring of the sequence of acoustic labels corresponds to the allophone. The Markov models are language specific and the output and transition probabilities are trained to individual speakers. The Markov model of the phonetic transcription of a word is the chain of the Markov models of its allophones.
The statistical language model is one of the most essential parts of a speech recognizer. It is complementary to the acoustic model in that it supplies additional language-based information to the system in order to resolve the uncertainty associated with the word hypothesis proposed by the acoustic side. In practice, the acoustic side proposes a set of possible word candidates with the probabilities being attached to each candidate. The language model, on the other hand, predicts the possible candidates with corresponding probabilities. The system applies maximum likelihood techniques to find the most probable candidate out of these two sets of

REFERENCES:
patent: 5072452 (1991-12-01), Brown et al.
patent: 5127043 (1992-06-01), Hunt et al.
patent: 5444617 (1995-08-01), Merialdo
patent: 5680511 (1997-10-01), Baker et al.
patent: 5710866 (1998-01-01), Alleva et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for adapting the language model's size in a does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for adapting the language model's size in a, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for adapting the language model's size in a will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-1867058

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.