Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1999-12-15
2001-09-11
Tsang, Fan (Department: 2645)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S226000, C704S268000, C704S227000, C704S217000, C381S094400, C381S094300
Reexamination Certificate
active
06289309
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates generally to noise suppression apparatus and methods and in particular to a noise suppression system and method which detects and removes background noise from a voice signal using spectral subtraction.
Voice control of devices and appliances is becoming more and more prevalent. One example of this technology is in “hands free” control of mobile telephones. This application is especially important as it allows a driver to make calls or answer a telephone while keeping both hands on the steering wheel and both eyes on the traffic. Although the invention is described below in terms of a noise reduction method and system for a hands-free mobile telephone, it is contemplated that it may be practiced with any system which would benefit from noise reduction in a voice signal.
Voice control of mobile telephones, however, is complicated by the ambient noise in the automobile. Engine noise, windshield wiper noise, construction noise and the noise of passing cars and trucks can interfere with the voice recognition process making the mobile telephone difficult to control using just vocal signals. Existing voice control systems typically employ some form of speech enhancement to reduce the level of noise in the signal and then apply the noise-reduced signal to a voice recognition system. As the number of words that are recognized by a typical voice control system is relatively low, a speaker-independent voice recognition system may be used. Such a speaker-independent system is disclosed, for example, in U.S. Pat. No. 5,799,276 entitled KNOWLEDGE-BASED SPEECH RECOGNITION SYSTEM AND METHODS HAVING FRAME LENGTH COMPUTED BASED ON ESTIMATED PITCH PERIOD OF VOCALIC INTERVALS. Alternatively, it is contemplated that other speech recognition systems such as a conventional dynamic time warping system may be used.
In addition to reducing noise in user commands that control the mobile telephone, the speech enhancement system may also be used to reduce noise in the voice signal that is delivered through the telephone and, thus, enhance the speech signal that is received by the person being called.
Low complexity spectrum-based speech enhancement systems are generally based on the spectral subtraction principle: the noise power spectrum which has been estimated (and averaged) during noise-only periods is subtracted from the “speech-plus-noise” spectrum in order to estimate the power spectrum of the clean speech signal. The enhanced speech waveform makes use of the unaltered noisy phase. Formally, the enhanced speech spectrum can be expressed as Ŝ
k
(f)=G
k
(f)X, where X
k
(f) is the (discrete) Fourier transform (DFT) of the noisy speech signal x(n) at frame index k, Ŝ
k
(f) is estimated clean speech power spectrum, and G the gain factor. In the case of (power) spectral subtraction, the gain factor, G, is a vector given by
G
=
1
-
P
^
n
P
x
,
where {circumflex over (P)}
n
and P
x
are the estimated noise power spectrum and speech-plus-noise power spectrum respectively.
Before a speech enhancement system reduces the noise in a noisy voice signal, therefore, it first identifies the noise and estimates its power spectrum. Next, the noisy voice signal is processed according to the determined gain factor to selectively reduce the amplitude of the speech-plus-noise signal for the frequency bands in which noise dominates.
Even though these systems are “low complexity” they typically require a relatively large number of calculations and may not be appropriate for implementation in a mobile telephone environment. One method for reducing the complexity of the speech enhancement process is to assume that the noise component of the signal is stationary, that is to say, it does not change significantly over short time intervals. This assumption, however, is not appropriate for a hands-free mobile telephone controller as engine noise and traffic noise are poorly modeled by a stationary noise source.
SUMMARY OF THE INVENTION
The present invention is embodied in a low complexity speech enhancement method and system that employs a noise tracking method and system that effectively tracks non-stationary noise. The noise tracking system employs a time-varying forgetting factor that reduces the contribution, over time, of data frames that exhibit rapid changes in signal power.
According to one aspect of the invention, the noise tracking system tracks Teager energy to more effictively distinguish noise from unvoiced consonant sounds.
According to another aspect of the invention, the noise tracking system identifies noisy intervals in the system using a recognizer that classifies signals as containing only noise or containing a mixture of voice and noise. In the exemplary system, the transition between a noise-only signal and a voice and noise signal is made gradual to reduce the sensitivity of the noise tracking system to system parameters.
According to yet another aspect of the invention, the speech enhancement system employs a simplified gain factor calculation system based on a posteriori audible signal to noise ratios.
REFERENCES:
patent: 5757937 (1998-05-01), Itoh et al.
patent: 5781883 (1998-07-01), Wynn
patent: 5799276 (1998-08-01), Komissarchik et al.
patent: 5839101 (1998-11-01), Vahatalo et al.
patent: 5842162 (1998-11-01), Fineberg
patent: 5893059 (1999-04-01), Raman
patent: 5943429 (1999-08-01), Handel
patent: 5974373 (1999-10-01), Chan et al.
patent: 5991718 (1999-11-01), Malah
patent: WO 97/22116 (1997-06-01), None
patent: WO 97/28527 (1997-08-01), None
Gulzow, T., Engelsberg, A., and Heute, U., “Comparison of a Discrete Wavelet Transformation and a Nonuniform Polyphase Filterbank Applied to Spectral-Subtraction Speech Enhancement”, Signal Processing, vol. 64, No. 1, pp. 5-19, 1998.
Ying, G. S., Mitchell, C. D., and Jamieson, L. H., “Endpoint Detection of Isolated Utterances Based on a Modified Teager Energy Measurement”, Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 732-735, 1993.
Doblinger, G., “Computationally Efficient Speech Enhancement by Spectral Minima Tracking in Subbands”, Proc. Europseech95, Madrid, Spain, 1995, pp. 1513-1516.
Ephraim, Y. and Malah, D., “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-32, No. 6, pp. 1109-1121, 1984.
McAulay, R. J. and Malpass, M. L., “Speech Enhancement Using a Soft-Decision Noise Suppression Filter”, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-28, No. 2, pp. 137-145, 1980.
Johnston, James D., “Estimation of Perceptual Entropy Using Noise Masking Criteria”, Proc. ICASSP '88, pp. 2524-2527, 1988.
PCT International Search Report, Application No. PCT/US99/29901, Jun. 6, 2000.
Burke William J.
Sarnoff Corporation
Tsang Fan
LandOfFree
Noise spectrum tracking for speech enhancement does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Noise spectrum tracking for speech enhancement, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Noise spectrum tracking for speech enhancement will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2507324