Noise suppression and channel equalization preprocessor for...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S228000

Reexamination Certificate

active

06266633

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to speech recognition generally, and more particularly to a signal pre-processor for enhancing the quality of a speech signal before further processing by a speech or speaker recognition device.
BACKGROUND OF THE INVENTION
Speech and speaker recognition devices must often operate on speech signals corrupted by noise and channel distortions. This is the case, for example, when using “far-field” microphones placed on a desktop near computers or other office equipment. Noise, such as noise originating from disk drives or cooling fans can be transmitted both mechanically, by direct contact of the microphone to the computer equipment or through the furniture it rests on, and by acoustic transmission through the air. Noise can also be picked up through electrical or magnetic coupling as in the case of power line “hum”.
The “channel” through which speech is measured includes the processes of acoustic propagation from the speaker's mouth, transduction by the microphone, analog signal processing, and analog-to-digital conversion. The distortion introduced by this composite channel may be modeled as a linear process and characterized by its frequency response. Factors affecting the channel frequency response include microphone type, distance and off-axis angle of the speaker relative to the microphone, room acoustics, and the characteristics of the analog electronic circuits and anti-aliasing filter.
Speech and speaker recognition systems operate by comparing the input speech with acoustic models derived from prior “training” speech material. Loss of accuracy occurs when the input speech is corrupted by noise or channel frequency response that differ significantly from those affecting the training speech. The present invention addresses this problem by suppressing noise and equalizing channel distortions in an input speech signal.
Certain methods for noise suppression are well known. One method used for noise suppression is known as spectral subtraction (SS). SS requires an estimate of the noise magnitude spectrum, which is assumed to be stationary over time. This estimate is subtracted from the measured magnitude spectrum of a noisy speech input at each time interval or “frame” to obtain an estimate of the magnitude spectrum of the speech in the absence of noise. Further details regarding noise suppression may be obtained from the publication entitled “Suppression of acoustic noise in speech using spectral subtraction,”
IEEE Transactions on Acoustics, Speech, and Signal Processing,
vol. ASSP-27, no. 2, pp. 113-120, IEEE, New York, N.Y., 1979, and incorporated herein by reference.
Certain methods which operate to perform channel equalization are also known. One method used for channel equalization, known as blind deconvolution (BD), estimates the spectrum of the input signal over its whole duration and applies a linear filter designed to make the spectrum of the signal equal to the long term spectrum of speech. This method effectively compensates for the channel when the input speech material is of sufficient length that its spectrum approximates the long-term spectrum of speech. Further details regarding Blind Deconvolution will be obtained from the publication by T. G. Stockham, T. M. Cannon, and R. B. Ingebretsen, entitled “Blind deconvolution through digital signal processing,”
Proceedings of the IEEE,
vol. 63, No. 4 pp. 678-692, 1975, incorporated herein by reference.
In addition, a publication by D. Hardt and K. Fellbaum, entitled “Spectral Subtraction and RASTA Filtering in Text-Dependent HMM-Based Speaker Verification”, IEEE Doc. No. 0-8186-7919-0/97, p ICASSP 97, Munich, Germany, April, 1997 and incorporated by reference herein describes a comparison of speaker verification performance using “internal” versus “external” spectral subtraction. Internal SS, integrated with an existing verifier front end system, was found to be inferior to external SS, which was implemented as an independent processing step, prior to input to the verifier. Using external SS, verification accuracy was found to improve with increasing spectral analysis window size up to 128 milliseconds. Such findings were confirmed in a set of experiments involving the SpeakerKey voice verifier system described in commonly assigned copending patent application Ser. No. 08/960,509 entitled “VOICE AUTHENTICATION SYSTEM” filed on Oct. 29, 1997 to Blais et al, and incorporated herein by reference, and a specially-collected database using far-field microphones. In our experiments, the improvement with increasing window size was found to be related to the nature of the noise. The loudest noise components in the data are stationary, narrow bandwidth spectral lines, for which estimation accuracy increases with window length. High spectral resolution is therefore needed to reject this type of noise. Analysis windows of 128 ms length are sufficient to provide the needed resolution.
In another publication by C. Avendano and H. Hermansky entitled “On the Effects of Short-Term Spectrum Smoothing in Channel Normalization”, 5, p. 372,
IEEE Transactions on Speech and Audio Processing,
vol. 5, No. 4, July, 1997, an improvement to the performance of blind deconvolution was reported in the context of a speech recognition system. The system used measurements of the power spectrum in critical bands, where each such measurement was derived by integrating the fast Fourier transform (FFT) power spectrum over frequencies within the critical band. BD was reported to perform better when applied prior to critical-band integration (i.e., to the FFT power spectrum) than after (to the critical band measurements). The disparity of performance was greatest for channels whose magnitude response varies for channels whose magnitude response varies within the frequency limits of the individual critical band filters. In the present invention, it was found that increasing the window size from 20 ms (typically used in speech and speaker recognition systems) to 128 ms led to additional performance improvements. The reason for this improvement is similar to that offered above in connection with narrow bandwidth noise. It is known that reverberant environments can introduce sharp spectral nulls (as narrow as 10 Hz in width) in the frequency response of acoustic transmission from the talker to the microphone caused by interference between direct and reflected signal paths. These effects cannot be adequately compensated if BD is applied to critical bands, whose bandwidths greatly exceed 10 Hz. When applied before critical band integration, spectral nulls present in the channel can be resolved if sufficiently long analysis windows are used. Windows of at least 100 ms length are required to provide the needed 10 Hz frequency resolution.
However, none of the prior art applications combines noise suppression with channel equalization, including channel frequency response normalization and signal level normalization to a signal preprocessor apparatus which accepts as input a noisy speech signal such as that introduced from a microphone and which produces an enhanced output speech signal for subsequent processing.


REFERENCES:
Stockham, Jr., Thomas G., Cannon Thomas M., and Ingebretsen, Robert B., “Blind Deconvolution through Digital Signal Processing”, Proceedings of the IEEE, vol. 63, No. 4, Apr. 1975, pp. 678-692.
Boll, Steven F., “Suppression of Acoustic Noise in Speech Using Spectral Subtraction”,IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, No. 2, Apr. 1979, pp. 113-120.
Avendano, Carlos and Hermansky, Hynek, “On the Effects of Short-Term Spectrum Smoothing in Channel Normalization”,IEEE Transactions on Speech and Audio Processing, vol. 5, No. 4, Jul. 1997, pp. 372-374.
Hynek Hermansky, et al. “RASTA Processing of Speech”, IEEE Trans. Speech and Audio Processing, vol. 2, No. 4, pp. 578-589, Oct. 1994.*
Johan de Veth, et al. “Comparison of Channel Normalisation Techniques for Automatic Speech Recognition over the Phone,” Proc. Intl. Conf. on Spoken Language, ICSLP 96, vol.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Noise suppression and channel equalization preprocessor for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Noise suppression and channel equalization preprocessor for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Noise suppression and channel equalization preprocessor for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2557654

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.