Feature extraction for automatic speech recognition

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S255000, C704S251000, C704S231000

Reexamination Certificate

active

06308155

ABSTRACT:

BACKGROUND OF THE INVENTION
Automatic speech recognition has only recently emerged from the research laboratory as a viable technology. Among the automatic speech recognition systems that have been developed are large-vocabulary document dictation systems and automated telephone directory systems. Large vocabulary dictation systems typically require the use of head-mounted close-talking microphones, a relatively quiet operating environment and a considerable amount of speaker adaptation. Telephone-based systems, on the other hand, are operable over a wide range of telephone channel conditions with relatively little or no speaker adaptation; however, users typically are limited with respect to the kinds of speech input that is recognizable. For example, such systems typically require discontinuous speech input or limit the grammar or vocabulary of the recognizable speech.
The performance of all automatic speech recognition systems degrades when acoustic interference is present in the input speech signal. Such interference may include one or more of the following: extraneous sounds (additive noise) received from the speaker's environment or the communication channel, spectral shaping or nonlinear distortion imposed by the microphone or communication channel, and reverberation from the room in which the speaker is talking.
SUMMARY OF THE INVENTION
The invention features an automatic speech recognition apparatus and method with a front end feature extractor that improves recognition performance under adverse acoustic conditions. The inventive feature extractor is characterized by a critical bandwidth spectral resolution, an emphasis on slow changes in the spectral structure of the speech signal, and adaptive automatic gain control. The use of critical-band-like frequency resolution reduces the recognizer's sensitivity to speaker-dependent signal characteristics and enhances the recognizer's sensitivity to speech-dependent signal characteristics. The emphasis on slow changes in the spectral structure of the speech signal focuses the recognizer on the primary carrier of linguistic information in the speech signal, thereby improving the accuracy of the recognizer. The use of adaptive automatic gain control reduces the recognizer's sensitivity to unknown spectral shaping imposed on the speech signal. The combination of these features improves the reliability of the recognizer in the presence of acoustic interference (e.g., reverberation, additive noise, and unknown spectral shaping).
In one aspect, the invention features an apparatus for generating a parametric representation of a speech signal, comprising: a feature generator configured to compute short-term parameters of the speech signal; a filter system configured to filter the time sequences of the short-term parameters; and a normalizer configured to normalize the filtered parameters with respect to one or more previous values of the filtered parameters.
Embodiments may include one or more of the following features.
The feature generator preferably is configured to compute short-term spectral parameters of the speech signal. The feature generator preferably is configured to compute parameters of an auditory-like spectrum. The filter system preferably includes one or more linear filters. In one embodiment, the filter system includes a lowpass filter and a bandpass filter configured to operate in parallel. The lowpass filter may be characterized by a cutoff frequency of about 8 Hz and the bandpass filter may be characterized by a passband of about 8-16 Hz. The lowpass filter may be characterized by a moderate degree (e.g., 5 dB) of DC attenuation.
The normalizer preferably is configured to normalize the filtered parameters with respect to an average of preceding parameter values. The normalizer may include one or more feedback automatic gain control (AGC) networks. In one embodiment, each feedback network includes a feedback loop with a feedback lowpass filter. The feedback lowpass filter preferably is a single-pole IIR filter. In one embodiment, the normalizer includes two or more series-connected feedback AGC networks each having a single-pole IIR filter, the single-pole IIR filter of any one of the AGC networks being characterized by a cutoff frequency that is less than or equal to the cutoff frequency of the IIR filters in preceding AGC networks and greater than or equal to the cutoff frequency of the IIR filters in succeeding AGC networks. In accordance with this embodiment, the normalizer preferably includes first and second series-connected feedback AGC networks, the first AGC network having a single-pole lowpass IIR filter characterized by a cutoff frequency of about 1 Hz, and the second AGC network having a single-pole lowpass IIR filter characterized by a cutoff frequency of about 0.5 Hz.
In another aspect, the invention features a method for generating a parametric representation of a speech signal, comprising: computing short-term parameters of the speech signal; filtering time sequences of the short-term parameters; and normalizing the filtered parameters with respect to one or more previous values of the filtered parameters.
The steps of filtering and normalizing preferably are performed independently of one another.
Among the advantages of the invention are the following. The invention improves speech recognition performance by combining modulation filtering (in the amplitude domain) and automatic gain control processing in the front-end feature generator. The separation of these operations enables each step to be independently optimized, leading to better recognition performance. The invention reduces the error rate of automatic speech recognizers under degraded acoustic conditions, including reverberant conditions, additive noise and unknown spectral shaping. The invention may be applied to small-vocabulary and large-vocabulary recognizers to improve performance under degraded acoustic conditions.
Other features and advantages will become apparent from the following description, including the drawings and the claims.


REFERENCES:
patent: 4771472 (1988-09-01), Williams, III et al.
patent: 5119432 (1992-06-01), Hirsch
patent: 5450522 (1995-09-01), Hermansky et al.
patent: 5537647 (1996-07-01), Hermansky et al.
patent: 5604839 (1997-02-01), Acero et al.
patent: 5960390 (1999-09-01), Ueno et al.
DeFatta “Digital Signal Processing”, 1988, John Wiley, p. 45.*
Tavares, G.N. et al, “High Performance Algorithms for Digital Signal Processing”, IEEE, May 1990, 1529-1532.*
Greenberg, S et a “The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech” IEEE, Apr. 1997, 1647-1650.*
Wu, Su-Lin et al “Integrated Syllable Boundary Information into Speech Recognition”, IEEE, 1997, 987-990.*
Kingsbury, B.E.D. et al “Recognizing Reverbrant Speech with RASTA-PLP” IEEE, Apr. 1997,1259-1262.*
Avendo,C. et al “Data based Filter Design for RASTA-like Channel Normalization in ASR” ICSLP, Oct. 1996, 2087-2090.*
Drullman, R et al “Effect of Temporal Envelope Smearing on Speech Reception” JASA, Feb. 1994, 1053-1064.*
Wu, Su-Lin et al “Incorporating Information from Syllabel-length Time Scales into Automatic Speech Recognition” IEEE, May 1998, 1-159.*
Greenberg & Kingsbury (“The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech,” International Conference on Acoustics, Speech & Signal Processing, Apr. 1997).*
PCT Search Report from PCT/US00/01591 (counterpart foreign application).
Greenberg et al., “The Modulation Spectrogram: In Pursuit of an Invariant Representation of Speech,” ICASSP, Oct. 1997, vol. 3, pp. 1647-1650.
Taveres et al., “High Performance Algorithms for Digital Signal Processing AGC Circuits and Systems,” May 1990, vol. 2, pp., 1529-1532.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Feature extraction for automatic speech recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Feature extraction for automatic speech recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Feature extraction for automatic speech recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2578470

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.