Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Patent
1996-11-18
1998-08-18
Hudspeth, David R.
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
704252, G10L 506
Patent
active
057971226
DESCRIPTION:
BRIEF SUMMARY
FIELD OF THE INVENTION
The present invention relates to a speech recognition method suitable for compound words which can be employed for either discrete or continuous dictation and is suitable, in particular, for real-time speech recognition. The invention also relates to a speech recognition system for the use of this method.
BACKGROUND OF THE INVENTION
The invention is based on the TANGORA speech recognition system developed by the Applicant. TANGORA is a real time speech recognition system for large vocabularies of more than 20,000 word forms which can be speaker-trained with little cost to the user.
The starting point in these known systems is the breakdown of the speech recognition process into a part based on acoustic data (decoding) and a language statistics part referring back to bodies of language or text for a specific area of application (language model). The decision on candidate words is thus derived both from a decoder and a model language probability. For the user, the fitting of the vocabulary processed by this recognition system, to the specific field or even to individual requirements, is of particular significance.
With this speech recognition system the acoustic decoding first supplies hypothetical words. The further evaluation of competing hypothetical words is then based on the language model. This represents estimates of word string frequencies obtained from application-specific bodies of text based on a collection of text samples from a desired field of application. From these text samples are generated the most frequent forms of words and statistics on word sequences.
In the method used here for estimating the frequency of sequences of words the frequency of occurrence of the so-called word form trigrams in a given text are estimated (see, i.a. Na-das, A., "On Turing's Formula for Word Probabilities", IEEE Proc. ASSP, 33, 6, 1985, pp. 1414-1416). With a vocabulary of 20,000 word forms, as currently employed in the TANGORA speech recognition system, however, about 8 billion trigrams are possible. The corpora which are collected in practice are therefore still always some orders of magnitude too small even to be able to observe all trigrams.
This problem of the limited vocabulary is tackled, inter alia, by creating so-called object classes which appear in the corpus of the language with equal frequency. The estimate is here based on the assumption of a binomial distribution of a random variable which generally describes the drawing of an object from a frequency class.
In known speech recognition systems the so-called Hidden-Markov-Model is frequently used for estimating the probabilities. Here, several frequencies observed in the text are set down. For a trigram "uvw" these are a nullgram term f.sub.0, a unigram term f(w), a bigram term f(w.vertline.v) and a trigram term f(w.vertline.uv). These terms correspond to the relative frequencies observed in the text, where the nullgram term has only a corrective significance.
If these terms are interpreted as probabilities of the word w under various conditions, a so-called latent variable can be added, from which one of the four conditions which produce the word w is achieved by substitution. If the transfer probabilities for the corresponding term are designated .lambda..sub.0 .lambda..sub.1 .lambda..sub.2 .lambda..sub.3, then we obtain the following expression for the trigram probability sought f(w)+.lambda..sub.2 f(w.vertline.v)+.lambda..sub.3 f(w.vertline.uv)(1)
The actual estimation of the transfer probabilities is effected by means of the so-called "deleted estimation" method (see Jelinek, F. and Mercer, R., "Interpolated Estimation of Markov Source Parameters from Sparse Data", in Pattern Recognition in Practice, Amsterdam, North Holland, 1980, pp. 381-397). In this method, several smaller text random samples are produced by neglecting portions of the text. For every random sample there is an evaluation by the above-mentioned method, relating to word sequence statistics.
The known speech recognition systems have the disadvantag
REFERENCES:
P. Geutner, "Using Morphology Towards Better Large-Vocabulary Speech Recognition Systems," Proc. ICASSP 95, pp. 445-448, May 1995.
Wayne Ward and Sunil Issar, "A Class Based Language Model for Speech Recognition," Proc. ICASSP 96, pp. 416-418, Jun. 1996.
Andre Breton, Pablo Fetter, and Peter Regel-Brietzmann, "Compound Words in Large-Vocabulary German Speech Recognition Systems," Proc. Fourth International Conference on Spoken Language Processing (ICSLP 96), Oct. 1996.
Kai Hubener, Uwe Jost, and Henrik Heine, "Speech Recognition for Spontaneously Spoken German Dialogues," Proc. Fourth International Conference on Spoken Language Processing (ICSLP 96), Oct. 1996.
Hudspeth David R.
International Business Machines - Corporation
Murray James E.
Smits Talivaldis Ivars
LandOfFree
Method and system using separate context and constituent probabi does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system using separate context and constituent probabi, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system using separate context and constituent probabi will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1126416