Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-03-31
2004-07-27
Knepper, David D. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
Reexamination Certificate
active
06768979
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to electronic speech recognition systems and relates more particularly to an apparatus and method for noise attenuation in a speech recognition system.
2. Description of the Background Art
Implementing an effective and efficient method for system users to interface with electronic devices is a significant consideration of system designers and manufacturers. Automatic speech recognition is one promising technique that allows a system user to effectively communicate with selected electronic devices, such as digital computer systems. Speech typically consists of one or more spoken utterances which each may include a single word or a series of closely-spaced words forming a phrase or a sentence.
Conditions with significant ambient background noise levels present additional difficulties when implementing a speech recognition system. Examples of such noisy conditions may include speech recognition in automobiles or in certain manufacturing facilities. To accurately analyze a particular utterance in such user applications, a speech recognition system may be required to selectively differentiate between a spoken utterance and the ambient background noise.
Referring now to FIG.
1
(
a
), an exemplary waveform diagram for one embodiment of noisy speech
112
is shown. In addition, FIG.
1
(
b
) depicts an exemplary waveform diagram for one embodiment of speech
114
without noise. Similarly, FIG.
1
(
c
) shows an exemplary waveform diagram for one embodiment of noise
116
without speech
114
. In practice, noisy speech
112
of FIG.
1
(
a
) therefore is typically comprised of several components, including speech
114
of FIG. (
1
(
b
) and noise
116
of FIG.
1
(
c
). In FIGS.
1
(
a
),
1
(
b
), and
1
(
c
), waveforms
112
,
114
, and
116
are presented for purposes of illustration only. The present invention may readily incorporate various other embodiments of noisy speech
112
, speech
114
, and noise
116
.
The two main sources that typically create acoustic distortion are the presence of additive noise (such as car noise, music or background speakers), and convolutive distortions due to the use of various different microphones, use of a telephone channel, or reverberation effects. Different types of additive noise will have different signal characteristics. A speech recognition system designed to reduce one type of additive noise may not be robust to other types of additive noise, thereby reducing the effectiveness of the system.
From the foregoing discussion, it therefore becomes apparent that noise attenuation in a speech recognition system is a significant consideration of system designers and manufacturers of speech recognition systems.
SUMMARY OF THE INVENTION
In accordance with the present invention, an apparatus and method are disclosed for noise attenuation in a speech recognition system. The invention includes a noise suppressor configured to attenuate noise in a noisy speech signal, and a processor coupled to the system to control the noise suppressor. The noise suppressor utilizes statistical characteristics of the noise signal to attenuate amplitude values of the noisy speech signal that have a probability of containing noise.
In one embodiment, a Fast Fourier transformer generates amplitude energy values for the noisy speech signal in units of frames. The Fast Fourier transformer also generates amplitude energy values for a noise signal in units of frames. The amplitude energy values may be magnitude energy values or power energy values.
The noise suppressor preferably utilizes an attenuation function having a shape determined in part by a noise average and a noise standard deviation. The shape of the attenuation function as the function increases is an inverse of the shape of a probability density curve of a noise signal. The noise average determines where the attenuation function begins to increase from a maximum attenuation level, which is determined by an attenuation coefficient. The noise standard deviation determines the shape of the attenuation function as the function increases from the maximum attenuation level to unity, or full transmission.
In a further embodiment, the noise suppressor also utilizes an adaptive attenuation coefficient that depends on signal-to-noise conditions in the speech recognition system. The adaptive attenuation coefficient will typically be larger for high noise conditions, and smaller for low noise conditions. The adaptive attenuation coefficient also depends on frequency because noise typically does not affect the speech signal equally at all frequencies.
The noise suppressor of the present invention provides attenuated noisy speech energy to a filter bank. The filter bank filters the attenuated noisy speech energy into channel energy, and then provides the channel energy to a logarithmic compressor to be converted to logarithmic channel energy. A frequency cosine transformer then converts the logarithmic channel energy into corresponding static features that are separately provided to a normalizer, a first time cosine transformer, and a second time cosine transformer.
The first time cosine transformer converts the static features into delta features that are provided to the normalizer. Similarly, the second time cosine transformer converts the static features into delta-delta features that are also provided to the normalizer. The normalizer performs a normalization procedure on the static features to generate normalized static features to a recognizer. The normalizer also performs a normalization procedure on the delta features and delta-delta features to generate normalized delta features and normalized delta-delta features, respectively, to the recognizer.
The recognizer analyzes the normalized static features, the normalized delta features, and the normalized delta-delta features to generate a speech recognition result, according to the present invention. The present invention thus efficiently and effectively implements an apparatus and method for noise attenuation in a speech recognition system.
REFERENCES:
patent: 4592085 (1986-05-01), Watari et al.
patent: 5003601 (1991-03-01), Watari et al.
patent: 5204874 (1993-04-01), Falconer et al.
patent: 5319736 (1994-06-01), Hunt
patent: 5390278 (1995-02-01), Gupta et al.
patent: 5513298 (1996-04-01), Stanford et al.
patent: 5604839 (1997-02-01), Acero et al.
patent: 5615296 (1997-03-01), Stanford et al.
patent: 5621859 (1997-04-01), Schwartz et al.
patent: 5715367 (1998-02-01), Gillick et al.
patent: 5742694 (1998-04-01), Eatwell
patent: 5991718 (1999-11-01), Malah
patent: 6098040 (2000-08-01), Petroni et al.
patent: 6173258 (2001-01-01), Menendez-Pidal et al.
Mischa Schwartz, Information Transmission, Modulation, and Noise, 1959, McGraw-Hill Book Company, Inc., pp. 362-373.*
O'Shaughnessy, Douglas, “Speech Communcation, Human and Machine,” 1990, pp. 422-23.
Proakis, John and Dimitris Manolakis, “Digital Signal Processing,” 1992, pp. 706-08.
Milner, Ben & Saeed Vaseghi, “Analysis of Cepstral-Time Matrices for Noise and Channel Robust Speech Recognition,” 1995, pp. 519-22, ESCA EUROSPEECH'95.
Davis, Steven & Paul Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” pp. 353-60, IEEE Trans on ASSP, No. 4, Aug. 1980.
Iwahashi, N. et al., “Stochastic Features for Noise Robust Speech Recognition,” IEEE 1998, pp. 633-36.
Neumeyer, Leonardo et al., “Training Issues and Channel Equalization Techniques for the Construction of Telephone Acoustic Models Using a High-Quality Speech Corpus,” pp. 590-97, IEEE Trans on Speech and Audio Processing, vol. 2, No. 4, Oct. 1994.
Tibrewala, Sangita & Hynek Hermansky, “Multi-Band and Adaptation Approaches to Robust Speech Recognitoin,” 1997, pp. 2619-22, ESCA Eurospeech 97, Rhodes, Greece.
Nolazco Flores, J.A. & S.J. Young, “Adapting a HMM-Based Recognizer for Noisy Speech Enhanced by Spectral Subtraction,” 1993, pp. 1-30.
Hanson, Brian et al., Speech Technology Laboratory, Panason
Chen Ruxin
Menendez-Pidal Xavier
Tanaka Miyuki
Knepper David D.
Koerner Gregory J.
Simon & Koerner LLP
LandOfFree
Apparatus and method for noise attenuation in a speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for noise attenuation in a speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for noise attenuation in a speech... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3258017