Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Patent
1998-03-26
2000-06-20
Hudspeth, David R.
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
704256, 704259, 704248, G10L 1508, G10L 1520
Patent
active
060788844
DESCRIPTION:
BRIEF SUMMARY
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to pattern recognition systems for instance speech recognition or image recognition systems.
2. Related Art
Practical speech recognition systems need to be capable of operation in a range of different environmental conditions which may be encountered in every day use. In general, the best performance of such a system is worse than that of an equivalent recogniser designed to be tailored to a particular environment, however the performance of such a recogniser falls off severely as background conditions move away from the environment for which the recogniser has been designed. High levels of ambient noise are one of the main problems for automatic speech recognition processors. Sources of ambient noise include background speech, office equipment, traffic, the hum of machinery etc. A particularly problematic source of noise associated with mobile phones is that emanating from a car in which the phone is being used. These noise sources often provide enough acoustic noise to cause severe performance degradation of a speech recognition processor.
In image processing, for instance handwriting recognition, a user usually has to write very clearly for a system to recognise the input handwriting. Anomalies in a person's writing may cause the system continually to misrecognise.
It is common in speech recognition processing to input speech data, typically in digital form, to a processor which derives from a stream of input speech data a more compact, perceptually significant set of data referred to as a feature set or vector. For example, speech is typically input via a microphone, sampled, digitised, segmented into frames of length 10-20 ms (e.g. sampled at 8 kHz) and, for each frame, a set of coefficients is calculated. In speech recognition, the speaker is normally assumed to be speaking one of a known set of words or phrases, the recogniser's so-called vocabulary. A stored representation of the word or phrase, known as a template or model, comprises a reference feature matrix of that word as previously derived from, in the case of speaker independent recognition, multiple speakers. The input feature vector is matched with the model and a measure of similarity between the two is produced.
In the presence of broadband noise, certain regions of the speech spectrum that are of a lower level will be more affected by the noise than others. Noise masking techniques have been developed in which any spurious differences due to different background noise levels are removed. As described in "A digital filter bank for spectral matching" by D H Klatt, Proceedings ICASSP 1976, pages 573-576, this is achieved by comparing the level of each extracted feature of an input signal with an estimate of the noise and, if the level for an input feature is lower than the corresponding feature of the noise estimate, the level for that feature is set to the noise level. The technique described by Klatt relies on a user speaking a pre-determined phrase at the beginning of each session. The spectrum derived from the input is compared to a model spectrum for that phrase and a normalisation spectrum calculated which is added to all spectrum frames of the utterance for the rest of the session.
Klatt also states that, prior to the normalisation spectrum calculation, a common noise floor should be calculated. This is achieved by recording a one second sample of background noise at the beginning of each session. However this arrangement relies on a user knowing that they should keep silent during the noise floor estimation period and then utter the pre-determined phrase for calculation of the normalisation spectrum.
In the article "Noise compensation for speech recognition using probabilistic models" by J N Holmes and N C Sedgwick, Proceedings ICASSP 1986, it is suggested that features of the input signal are "masked" by the noise level only when the resulting masked input feature is greater than the level of a corresponding feature of the template(s) of the system.
Both
REFERENCES:
patent: 4811399 (1989-03-01), Landell et al.
patent: 5333275 (1994-07-01), Wheatley et al.
patent: 5721808 (1998-02-01), Minami et al.
Abebe Oauiei
British Telecommunications public limited company
Hudspeth David R.
LandOfFree
Pattern recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Pattern recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pattern recognition will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1862827