Method and apparatus for enhancing noise-corrupted speech

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S226000, C381S094200

Reexamination Certificate

active

06415253

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to a method and an apparatus for enhancing noise-corrupted speech through noise suppression. More particularly, the invention is directed to improving the speech quality of a noise suppression system employing a spectral subtraction technique.
2. Description of the Related Art
With the advent of digital cellular telephones, it has become increasingly important to suppress noise in solving speech processing problems, such as speech coding and speech recognition. This increased importance results not only from customer expectation of high performance even in high car noise situations, but also from the need to move progressively to lower data rate speech coding algorithms to accommodate the ever-increasing number of cellular telephone customers.
The speech quality from these low-rate coding algorithms tends to degrade drastically in high noise environments. Although noise suppression is important, it should not introduce undesirable artifacts, speech distortions, or significant loss of speech intelligibility. Many researchers and developers have attempted to achieve these performance goals for noise suppression for many years, but these goals have now come to the forefront in the digital cellular telephone application.
In the literature, a variety of speech enhancement methods potentially involving noise suppression have been proposed. Spectral subtraction is one of the traditional methods that has been studied extensively. See, e.g., Lim, “Evaluations of Correlation Subtraction Method for Enhancing Speech Degraded by Additive White Noise,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 26, No. 5, pp. 471-472 (1978); and Boll, “Suppression of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 27, No. 2, pp. 113-120 (April, 1979). Spectral subtraction is popular because it can suppress noise effectively and is relatively straightforward to implement.
In spectral subtraction, an input signal (e.g., speech) in the time domain is converted initially to individual components in the frequency domain, using a bank of band-pass filters, typically, a Fast Fourier Transform (FFT). Then, the spectral components are attenuated according to their noise energy.
The filter used in spectral subtraction for noise suppression utilizes an estimate of power spectral density of the background noise, thereby generating a signal-to-noise ratio (SNR) for the speech in each frequency component. Here, the SNR means a ratio of the magnitude of the speech signal contained in the input signal, to the magnitude of the noise signal in the input signal. The SNR is used to determine a gain factor for a frequency component based on a SNR in the corresponding frequency component. Undesirable frequency components then are attenuated based on the determined gain factors. An inverse FFT recombines the filtered frequency components with the corresponding phase components, thereby generating the noise-suppressed output signal in the time domain. Usually, there is no change in the phase components of the signal because the human ear is not sensitive to such phase changes.
This spectral subtraction method can cause so-called “musical noise.” The musical noise is composed of tones at random frequencies, and has an increased variance, resulting in a perceptually annoying noise because of its unnatural characteristics. The noise-suppressed signal can be even more annoying than the original noise-corrupted signal.
Thus, there is a strong need for techniques for reducing musical noise. Various researchers have proposed changes to the basic spectral subtraction algorithm for this purpose. For example, Berouti et al., “Enhancement of Speech Corrupted by Acoustic Noise,” Proc. IEEE ICASSP, pp. 208-211 (April, 1979) relates to clamping the gain values at each frequency so that the values do not fall below a minimum value. In addition, Berouti et al. propose increasing the noise power spectral estimate artificially, by a small margin. This is often referred to as “oversubtraction.”
Both clamping and oversubtraction are directed to reducing the time varying nature associated with the computed gain modification values. Arslan et al., “New Methods for Adaptive Noise Suppression,” Proc. IEEE ICASSP, pp. 812-815 (May, 1995), relates to using smoothed versions of the FFT-derived estimates of the noisy speech spectrum, and the noise spectrum, instead of using the FFT coefficient values directly. Tsoukalas et al., “Speech Enhancement Using Psychoacoustic Criteria,” Proc. IEEE ICASSP, pp. 359-362 (April, 1993), and Azirani et al., “Optimizing Speech Enhancement by Exploiting Masking Properties of the Human Ear,” Proc. EEE ICASSP, pp. 800-803 (May, 1995), relate to psychoacoustic models of the human ear.
Clamping and oversubtraction significantly reduce musical noise, but at the cost of degraded intelligibility of speech. Therefore, a large degree of noise reduction has tended to result in low intelligibility. The attenuation characteristics of spectral subtraction typically lead to a de-emphasis of unvoiced speech and high frequency formants, thereby making the speech sound muffled.
There have been attempts in the past to provide spectral subtraction techniques without the musical noise, but such attempts have met with limited success. See, e.g., Lim et al., “All-Pole Modeling of Degraded Speech,” IEEE Trans. Acoustic, Speech and Signal Processing, Vol. 26, pp. 197-210 (June, 1978); Ephraim et al., “Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimator,” IEEE Trans. Acoustics, Speech and Signal Processing, Vol. 32, pp. 1109-1120 (1984); and McAulay et al., “Speech Enhancement Using a Soft-Decision Noise Suppression Filter,” IEEE Trans. Acoustic, Speech and Signal Processing, Vol. 28, pp. 137-145 (April, 1980).
In spectral subtraction techniques, the gain factors are adjusted by SNR estimates. The SNR estimates are determined by the speech energy in each frequency component, and the current background noise energy estimate in each frequency component. Therefore, the performance of the entire noise suppression system depends on the accuracy of the background noise estimate. The background noise is estimated when only background noise is present, such as during pauses in human speech. Accordingly, spectral subtraction with high precision requires an accurate and robust speech
oise discrimination, or voice activity detection, in order to determine when only noise exists in the signal.
Existing voice activity detectors utilize combinations of energy estimation, zero crossing rate, correlation functions, LPC coefficients, and signal power change ratios. See, e.g., Yatsuzuka, “Highly Sensitive Speech Detector and High-Speed Voiceband Data Discriminator in DSI-ADPCM Systems,” IEEE Trans. Communications, Vol 30, No. 4 (April, 1982); Freeman et al., “The Voice Activity Detector for the Pan-European Digital Cellular Mobile Telephone Service,” IEEE Proc. ICASSP, pp. 369-372 (February, 1989); and Sun et al., “Speech Enhancement Using a Ternary-Decision Based Filter,” IEEE Proc. ICASSP, pp. 820-823 (May, 1995).
However, in very noisy environments, speech detectors based on the above-mentioned approaches may suffer serious performance degradation. In addition, hybrid or acoustic echo, which enters the system at significantly lower levels, may corrupt the noise spectral density estimates if the speech detectors are not robust to echo conditions.
Furthermore, spectral subtraction assumes noise source to be statistically stationary. However, speech may be contaminated by color non-stationary noise, such as the noise inside a compartment of a running car. The main sources of the noise are an engine and the fan at low car speeds, or the road and wind at higher speeds, as well as passing cars. These non-stationary noise sources degrade performance of speech enhancement systems using spectral subtraction. This is because the non-stationary noise corrupts th

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for enhancing noise-corrupted speech does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for enhancing noise-corrupted speech, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for enhancing noise-corrupted speech will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2878409

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.