Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-06-26
2004-03-30
Chawan, Vijay (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S270000, C704S231000, C704S241000
Reexamination Certificate
active
06714910
ABSTRACT:
BACKGROUND OF THE INVENTION
The invention relates to a method of training an automatic speech recognizer.
Automatic speech recognizers are based, on the one hand, on acoustic models and, on the other hand, also on speech models—as is customary with speaker-independent dictating systems. For acoustic modeling, so-called HMM (Hidden Markov Models) are normally used, whose model parameters can be determined for the respective application. For example, special transition probabilities and output probability functions can be determined for each HMM. The HMM parameters are normally initialized in a training phase prior to the actual speech recognition being taken into operation. Speech data which are input during the speech mode are then frequently used for adapting the speech recognizer, more particularly to a certain speaker or to certain background noises, to further improve the acoustic models.
In the training phase of a speech recognizer, a user is requested by the speech recognition system to input predefined speech utterances which, for example, are to be pronounced several times when a speaker-independent speech recognizer is used. The inputted speech utterances are evaluated and the associated HMM is determined accordingly. The training phase usually lasts a rather long period of time, may last several hours, and is often experienced by the user as annoying, boring and/or tiring.
SUMMARY OF THE INVENTION
It is an object of the invention to provide a more pleasant training for the user.
The object is achieved in that, during the training, speech utterances are presented to a user by means of a game, which utterances are provided for training the speech recognizer.
A game considered as such represents an amusing activity according to certain rules, which amusing activity is basically the result of the fact that people enjoy it (pastime, entertainment). By incorporating a game with a speech recognizer training, the user is, on the one hand, entertained and, in parallel therewith, speech inputs of a user are processed which are automatically used for models (particularly HMM models) implemented in the speech recognizer. In the respective phases of the game, the user is presented with speech utterances i.e. words, word components and word combinations especially by visually displaying them, in order to make the user actually produce the speech utterances predefined for him and enter them in the speech recognition system. Basically, it is alternatively possible for the user to be requested by acoustic signals (instead of visually displayed signals) to enter certain speech utterances.
In an embodiment of the invention, there is provided that the user is shown at least a first speech utterance to which the speech recognizer has already been trained, and that the user is furthermore shown a further speech utterance to which the speech recognizer is still to be trained. This is an advantageous method especially for speaker-dependent speech recognizers with a small vocabulary. The number of trained words is successively (word for word) enlarged here. The speech recognizer then recognizes an entered speech utterance either as an already trained word or as an unknown word, which is still to be trained during the respective game period. For the word to be trained, both the optimum number of HMM states and the associated HMM parameters can be determined. In this respect, a variant of the invention proposes that the speech utterances predefined for the user are marks for assigned picture screen areas, which marks are shifted over the picture screen when the respective speech utterance is entered, so as to generate as a target a predefinable structure on the picture screen, which is shown anew after the target has been achieved.
Another embodiment of the invention implies that when a speech utterance to which the speech recognizer has already been trained is inputted, the speech recognizer is adapted by means of this speech input. So doing, also a user's speech inputs relating to an already trained speech utterance are utilized, that is, for further improvement of associated HMM models for which parameter values have already been determined during a training phase.
Furthermore, the invention may be expanded in this respect in that the classification of a speech input as a speech input to be used for the training depends on a degree of confidence which indicates a measure that the speech utterance entered by the user corresponds to a speech utterance predefined by the speech recognizer. In this way it can be avoided that a training is performed based on acoustic signals received by the speech recognizer during the training phase, which acoustic signals are not deemed eligible for speech inputs suitable for the training. For example, in this way it can be avoided that background noise (for example, the opening or closing of a door) is used for training the speech recognizer, Instead of an evaluation with confidence measures, in another variant of embodiment an evaluation by means of a so-called garbage modeling can be used. For this purpose, reference is made to the article “Robust Rejection Modeling for a Small-Vocabulary Application”, D. Langmann, R. Haeb-Umbach, T. Eisele, S. Gamm Proc. ITG-Fachtagung Sprachkommunikation, Frankfurt am Main, 17/18 September 1996.
The invention also relates to a method of adapting an automatic speech recognizer to a speaker in which speech utterances are presented to a user by means of a game, which speech utterances are provided for adapting the speech recognizer to the user. The speaker adaptation is particularly provided for speaker-independent speech recognition systems such as, for example, dictating systems. The variants of embodiment mentioned above with respect to a training of a speech recognizer can accordingly be used for speaker adaptation.
The invention also relates to a speech recognition system for implementing one of the methods described above and an electrical device, more particularly a home entertainment device including a speech recognition system arranged in this manner.
REFERENCES:
patent: 5025471 (1991-06-01), Scott et al.
patent: 5502774 (1996-03-01), Bellegarda et al.
patent: 5675706 (1997-10-01), Lee et al.
patent: 5710864 (1998-01-01), Juang et al.
patent: 5710866 (1998-01-01), Alleva et al.
patent: 5812972 (1998-09-01), Juang et al.
patent: 5832063 (1998-11-01), Vysotsky et al.
patent: 5832430 (1998-11-01), Lleida et al.
patent: 5857173 (1999-01-01), Beard et al.
patent: 5893064 (1999-04-01), Kudirka et al.
patent: 6015344 (2000-01-01), Kelly et al.
patent: 6085160 (2000-07-01), D'hoore et al.
patent: 6125345 (2000-09-01), Modi et al.
patent: 6226612 (2001-05-01), Srenger et al.
patent: 6374221 (2002-04-01), Haimi-Cohen
By D. Langmann et al. “Robust Rejection Modeling for a Small-Vocabulary Application” ITG-Fachtagung Sprachkommunikation, Frankfurt am Main, Sep. 17/18, 1996. PP 55-17.
By J.G.A. Dolfing et al. “Combination of Confidence Measures in Isolated Word Recognition”ICSLP 1998, pp. S.5-S.8.
Eggen Joseph Hubertus
Rose Georg
Van Der Sluis Bartel Marinus
LandOfFree
Method of training an automatic speech recognizer does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of training an automatic speech recognizer, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of training an automatic speech recognizer will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3246760