Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1998-04-09
2004-08-03
Knepper, David D. (Department: 2645)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
Reexamination Certificate
active
06772117
ABSTRACT:
OBJECT OF THE INVENTION
The present invention relates to speech recognition and particularly to a method for modifying feature vectors to be determined in speech recognition. The invention also relates to a device that applies the method, according to the present invention, for improving speech recognition.
BACKGROUND OF THE INVENTION
The invention is related to automatic speech recognition, particularly to speech recognition based on Hidden Markov Models (HMM). Speech recognition, based on the HMM, is based on statistical models of recognisable words. At the recognition phase, observations and state transitions, based on Markov chains, are calculated in a pronounced word and, based on probabilities, a model, stored in the training phase of the speech recognition device and corresponding to the pronounced word, is determined. For example, the operation of speech recognition, based on the Hidden Markov Models, has been described in the reference: “L. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition”, Proceedings of the IEEE, Vol. 77, No. 2. February 1989.
The problem in the current speech recognition devices is that the recognition accuracy decreases considerably in a noisy environment. In addition, the performance of speech recognition devices decreases in particular if the noise conditions during the operation of the speech recognition device differ from the noise conditions of the training phase of the speech recognition device. This is, indeed, one of the most difficult problems to solve in speech recognition systems in practice, because it is impossible to take into consideration the effects of all noise environments, wherein a speech recognition device can be used. A normal situation for a user of a device utilising a speech recognition device is that the speech recognition device's training is carried out typically in an almost noiseless environment, whereas in the speech recognition device's operating environment, e.g., when used in a car, the background noise, caused by surrounding traffic and the vehicle itself, differs considerably from the nearly quiet background noise level of the training phase.
The problem in the current speech recognition devices is also that the performance of a speech recognition device is dependent on the microphone used. Especially in a situation, wherein a different microphone is used at the training phase of the speech recognition device than at the actual speech recognition phase, the performance of the speech recognition device decreases substantially.
Several different methods have been developed for eliminating the effect of noise in the calculation of feature vectors. However, the speech recognition devices that utilise these methods can only be used in fixed computer/work station applications, wherein speech is recognised in an off-line manner. It is typical of these methods that the speech to be recognised is stored in a memory of a computer. Typically, the length of the speech signal to be stored is several seconds. After this, the feature vectors are modified utilising, in the calculation, parameters defined from the contents of the entire file. Due to the length of the speech signal to be stored, these kinds of methods are not applicable to real-time speech recognition.
In addition, there is provided a normalisation method, wherein both speech and noise have their own normalisation coefficients, which are updated adaptively using a voice activity detector (VAD). Due to adaptive updating, the normalisation coefficients are updated with delay, whereupon the normalisation process is not carried out quickly enough in practice. In addition, this method requires a VAD, the operation of which is often too inaccurate for speech recognition applications with low signal to noise ratio (SNR) values. Neither does this method meet the real-time requirements due to said delay.
SHORT SUMMARY OF THE INVENTION
Now, a method and an apparatus have been invented for speech recognition to prevent problems presented above and, by means of which, feature vectors determined in speech recognition are modified to compensate the effects of noise. The modification of the feature vectors is carried out by defining mean values and standard deviations for the feature vectors and by normalising the feature vector using these parameters. According to a preferred embodiment of the present invention, the feature vectors are normalised using a sliding normalisation buffer. By means of the invention, the updating of the normalisation parameters of the feature vector is carried out almost without delay, and the delay in the actual normalisation process is sufficiently small to enable a real-time speech recognition application to be implemented.
In addition, by means of the method according to the present invention, it is possible to make the performance of a speech recognition device less dependent on the microphone used. By means of the invention, an almost as high a performance of the speech recognition device is achieved in a situation, wherein a different microphone is used at the experimental and recognition phase of the speech recognition device than in a situation, wherein the same microphone is used at both the training and recognition phase.
The invention is characterised in what has been presented in the characterising parts of claims
1
and
4
.
REFERENCES:
patent: 4227176 (1980-10-01), Moshier
patent: 4713778 (1987-12-01), Baker
patent: 5131043 (1992-07-01), Fujii
patent: 5293588 (1994-03-01), Satoh et al.
patent: 5369726 (1994-11-01), Kroecker et al.
patent: 5640485 (1997-06-01), Ranta
patent: 0 301 199 (1989-02-01), None
patent: 0 586 996 (1994-03-01), None
patent: 0 694 906 (1996-01-01), None
“Real-Time Recognition of Broadcast Radio Speech”, Cook et al., 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 141-144 vol. 1.
“A Recursive Feature Vector Normalization Approach For Robust Speech Recognition in Noise”, Viikki et al., 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 733-736 vol. 2.
L. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition”, Proceedings of IEEE, vol. 77, No. 2. Feb. 1989.
J. Picone, “Signal modeling techniques in speech recognition”, IEEE Proceedings, vol. 81, No. 9, pp. 1215-1247, Sep. 1993.
Laurila Kari
Viikki Olli
Nokia Mobile Phones Limited
Perman & Green LLP
LandOfFree
Method and a device for recognizing speech does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and a device for recognizing speech, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and a device for recognizing speech will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3350557