Method and apparatus for adaptively suppressing noise

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S226000, C704S228000, C704S233000, C704S268000

Reexamination Certificate

active

06591234

ABSTRACT:

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
Not applicable.
BACKGROUND OF THE INVENTION
The present invention relates to suppressing noise in telecommunications systems. In particular, the present invention relates to suppressing noise in single channel systems or single channels in multiple channel systems.
Speech quality enhancement is an important feature in speech communication systems. Cellular telephones, for example, are often operated in the presence of high levels of environmental background noise present in moving vehicles. Background noise causes significant degradation of the speech quality at the far end receiver, making the speech barely intelligible. In such circumstances, speech enhancement techniques may be employed to improve the quality of the received speech, thereby increasing customer satisfaction and encouraging longer talk times.
Past noise suppression systems typically utilized some variation of spectral subtraction.
FIG. 1
shows an example of a noise suppression system
100
that uses spectral subtraction. A spectral decomposition of the input noisy speech-containing signal
102
is first performed using the filter bank
104
. The filter bank
104
may be a bank of bandpass filters such as, for example, the bandpass filters disclosed in R. J. McAulay and M. L. Malpass, “Speech Enhancement Using a Soft-Decision Noise Suppression Filter,”
IEEE Trans. Acoust., Speech, Signal Processing
, vol. ASSP-28, no. 2, (April 1980), pp. 137-145. In this context, noise refers to any undesirable signal present in the speech signal including: 1) environmental background noise; 2) echo such as due to acoustic reflections or electrical reflections in hybrids: 3) mechanical and/or electrical noise added due to specific hardware such as tape hiss in a speech playback system; and 3) non-linearities due to, for example, signal clipping or quantization by speech compression.
The filter bank
104
decomposes the signal into separate frequency bands. For each band, power measurements are performed and continuously updated over time in the noisy signal power & noise power estimator
106
. These power measures are used to determine the signal-to-noise ratio (SNR) in each band. The voice activity detector
108
is used to distinguish periods of speech activity from periods of silence. The noise power in each frequency band is updated only during silence while the noisy signal power is tracked at all times. For each frequency band, a gain (attenuation) factor is computed in the gain computer I
10
based on the SNR of the band to attenuate the signal in the gain multiplier
112
. Thus, each frequency band of the noisy input speech signal is attenuated based on its SNR. In this context, speech signal refers to an audio signal that may contain speech, music or other information bearing audio signals (e.g., DTMF tones, silent pauses, and noise).
A more sophisticated approach may also use an overall SNR level in addition to the individual SNR values to compute the gain factors for each band. The overall SNR is estimated in the overall SNR estimator
114
. The gain factor computations for each band are performed in the gain computer
110
. The attenuation of the signals in different bands is accomplished by multiplying the signal in each band by the corresponding gain factor in the gain multiplier. Low SNR bands are attenuated more than the high SNR bands. The amount of attenuation is also greater if the overall SNR is low. The possible dynamic range of the SNR of the input signal is large. As such, the speech enhancement system must be capable of handling both very clean speech signals from wireline telephones as well as very noisy speech from cellular telephones. After the attenuation process, the signals in the different bands are recombined into a single, clean output signal
116
. The resulting output signal
116
will have an improved overall perceived quality.
In this context, speech enhancement system refers to an apparatus or device that enhances the quality of a speech signal in terms of human perception or in terms of another criteria such as accuracy of recognition by a speech recognition device, by suppressing, masking, canceling or removing noise or otherwise reducing the adverse effects of noise. Speech enhancement systems include apparatuses or devices that modify an input signal in ways such as, for example: 1) generating a wider bandwidth speech signal from a narrow bandwidth speech signal; 2) separating an input signal into several output signals based on certain criteria, e.g., separation of speech from different speakers where a signal contains a combination of the speakers' speech signals; 3) and processing (for example by scaling) different “portions” of an input signal separately and/or differently, where a “portion” may be a portion of the input signal in time (e.g., in speaker phone systems) or may include particular frequency bands (e.g., in audio systems that boost the base), or both.
The decomposition of the input noisy speech-containing signal can also be performed using Fourier transform techniques or wavelet transform techniques.
FIG. 2
shows the use of discrete Fourier transform techniques (shown as the Windowing & FFT block
202
). Here a block of input samples is transformed to the frequency domain. The magnitude of the complex frequency domain elements are attenuated at the attenuation unit
208
based on the spectral subtraction principles described above. The phase of the complex frequency domain elements are left unchanged. The complex frequency domain elements are then transformed back to the time domain via an inverse discrete Fourier transform in the IFFT block
204
, producing the output signal
206
. Instead of Fourier transform techniques, wavelet transform techniques may be used to decompose the input signal.
A voice activity detector may be used with noise suppression systems. Such a voice activity detector is presented in, for example, U.S. Pat. No. 4,351,983 to Crouse et al. In such detectors, the power of the input signal is compared to a variable threshold level. Whenever the threshold is exceeded, the system assumes speech is present. Otherwise, the signal is assumed to contain only background noise.
For most implementations of speech enhancement, it is desirable to minimize processing delay. As such, the use of Fourier or wavelet transform techniques for spectral decomposition is undesirable because these techniques introduce large delays when accumulating a block of samples for processing.
Low computational complexity is also desirable as the network noise suppression system may process multiple independent voice channels simultaneously. Furthermore, limiting the types of computations to addition, subtraction and multiplication is preferred to facilitate a direct digital hardware implementation as well as to minimize processing in a fixed-point digital signal processor-based implementation. Division is computationally intensive in digital signal processors and is also cumbersome for direct digital hardware implementation. Finally, the memory storage requirements for each channel should be minimized due to the need to process multiple independent voice channels simultaneously.
Speech enhancement techniques must also address information tones such as DTMF (dual-tone multi-frequency) tones. DTMF tones are typically generated by push-button/tone-dial telephones when any of the buttons are pressed. The extended touch-tone telephone keypad has 16 keys: (1,2,3,4,5,6,7,8,9,0,*,#,A,B,C,D). The keys are arranged in a four by four array. Pressing one of the keys causes an electronic circuit to generate two tones. As shown in Table
1
, there is a low frequency tone for each row and a high frequency tone for each column. Thus, the row frequencies are referred to as the Low Group and the column frequencies, the High Group. In this way, sixteen unique combinations of tones can be generated using only eight unique tones. Table 1 shows the keys and the corresponding nominal frequencies. (Although discussed with re

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for adaptively suppressing noise does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for adaptively suppressing noise, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for adaptively suppressing noise will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3073807

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.