Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-02-18
2002-03-26
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S233000, C704S205000
Reexamination Certificate
active
06363345
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to noise cancellation and reduction and, more specifically, to noise cancellation and reduction using spectral subtraction.
BACKGROUND OF THE INVENTION
Ambient noise added to speech degrades the performance of speech processing algorithms. Such processing algorithms may include dictation, voice activation, voice compression and other systems. In such systems, it is desired to reduce the noise and improve the signal to noise ratio (S/N ratio) without effecting the speech and its characteristics.
Near field noise canceling microphones provide a satisfactory solution but require that the microphone in the proximity of the voice source (e.g., mouth). In many cases, this is achieved by mounting the microphone on a boom of a headset which situates the microphone at the end of a boom proximate the mouth of the wearer. However, the headset has proven to be either uncomfortable to wear or too restricting for operation in, for example, an automobile.
Microphone array technology in general, and adaptive beamforming arrays in particular, handle severe directional noises in the most efficient way. These systems map the noise field and create nulls towards the noise sources. The number of nulls is limited by the number of microphone elements and processing power. Such arrays have the benefit of hands-free operation without the necessity of a headset.
However, when the noise sources are diffused, the performance of the adaptive system will be reduced to the performance of a regular delay and sum microphone array, which is not always satisfactory. This is the case where the environment is quite reverberant, such as when the noises are strongly reflected from the walls of a room and reach the array from an infinite number of directions. Such is also the case in a car environment for some of the noises radiated from the car chassis.
OBJECTS AND SUMMARY OF THE INVENTION
The spectral subtraction technique provides a solution to further reduce the noise by estimating the noise magnitude spectrum of the polluted signal. The technique estimates the magnitude spectral level of the noise by measuring it during non-speech time intervals detected by a voice switch, and then subtracting the noise magnitude spectrum from the signal. This method, described in detail in Suppression of Acoustic Noise in Speech Using Spectral Subtraction, (Steven F Boll, IEEE ASSP-27 NO.2 April, 1979), achieves good results for stationary diffused noises that are not correlated with the speech signal. The spectral subtraction method, however, creates artifacts, sometimes described as musical noise, that may reduce the performance of the speech algorithm (such as vocoders or voice activation) if the spectral subtraction is uncontrolled. In addition, the spectral subtraction method assumes erroneously that the voice switch accurately detects the presence of speech and locates the non-speech time intervals. This assumption is reasonable for off-line systems but difficult to achieve or obtain in real time systems.
More particularly, the noise magnitude spectrum is estimated by performing an FFT of 256 points of the non-speech time intervals and computing the energy of each frequency bin. The FFT is performed after the time domain signal is multiplied by a shading window (Hanning or other) with an overlap of 50%. The energy of each frequency bin is averaged with neighboring FFT time frames. The number of frames is not determined but depends on the stability of the noise. For a stationary noise, it is preferred that many frames are averaged to obtain better noise estimation. For a non-stationary noise, a long averaging may be harmful. Problematically, there is no means to know a-priori whether the noise is stationary or non-stationary.
Assuming the noise magnitude spectrum estimation is calculated, the input signal is multiplied by a shading window (Hanning or other), an FFT is performed (256 points or other) with an overlap of 50% and the magnitude of each bin is averaged over 2-3 FFT frames. The noise magnitude spectrum is then subtracted from the signal magnitude. If the result is negative, the value is replaced by a zero (Half Wave Rectification). It is recommended, however, to further reduce the residual noise present during non-speech intervals by replacing low values with a minimum value (or zero) or by attenuating the residual noise by 30 dB. The resulting output is the noise free magnitude spectrum.
The spectral complex data is reconstructed by applying the phase information of the relevant bin of the signal's FFT with the noise free magnitude. An IFFT process is then performed on the complex data to obtain the noise free time domain data. The time domain results are overlapped and summed with the previous frame's results to compensate for the overlap process of the FFT.
There are several problems associated with the system described. First, the system assumes that there is a prior knowledge of the speech and non-speech time intervals. A voice switch is not practical to detect those periods. Theoretically, a voice switch detects the presence of the speech by measuring the energy level and comparing it to a threshold. If the threshold is too high, there is a risk that some voice time intervals might be regarded as a non-speech time interval and the system will regard voice information as noise. The result is voice distortion, especially in poor signal to noise ratio cases. If, on the other hand, the threshold is too low, there is a risk that the non-speech intervals will be too short especially in poor signal to noise ratio cases and in cases where the voice is continuous with little intermission.
Another problem is that the magnitude calculation of the FFT result is quite complex. This involves square and square root calculations which are very expensive in terms of computation load. Yet another problem is the association of the phase information to the noise free magnitude spectrum in order to obtain the information for the IFFT. This process requires the calculation of the phase, the storage of the information, and applying the information to the magnitude data—all are expensive in terms of computation and memory requirements. Another problem is the estimation of the noise spectral magnitude. The FFT process is a poor and unstable estimator of energy. The averaging-over-time of frames contributes insufficiently to the stability. Shortening the length of the FFT results in a wider bandwidth of each bin and better stability but reduces the performance of the system. Averaging-over-time, moreover, smears the data and, for this reason, cannot be extended to more than a few frames. This means that the noise estimation process proposed is not sufficiently stable.
It is therefore an object of this invention to provide a spectral subtraction system that has a simple, yet efficient mechanism, to estimate the noise magnitude spectrum even in poor signal-to-noise ratio situations and in continuous fast speech cases.
It is another object of this invention to provide an efficient mechanism that can perform the magnitude estimation with little cost, and will overcome the problem of phase association.
It is yet another object of this invention to provide a stable mechanism to estimate the noise spectral magnitude without the smearing of the data.
In accordance with the foregoing objectives, the present invention provides a system that correctly determines the non-speech segments of the audio signal thereby preventing erroneous processing of the noise canceling signal during the speech segments. In the preferred embodiment, the present invention obviates the need for a voice switch by precisely determining the non-speech segments using a separate threshold detector for each frequency bin. The threshold detector precisely detects the positions of the noise elements, even within continuous speech segments, by determining whether frequency spectrum elements, or bins, of the input signal are within a threshold set according to a minimum value of the frequency spectrum elements
Berdugo Baruch
Marash Joseph
Andrea Electronics Corporation
Dorvil Richemond
Frommer Lawrence & Haug
Kowalski Thomas J.
LandOfFree
System, method and apparatus for cancelling noise does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System, method and apparatus for cancelling noise, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System, method and apparatus for cancelling noise will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2832895