Method for reducing noise distortions in a speech...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method for reducing noise distortions in a speech... Method for reducing noise distortions in a speech...

: 1998-10-22
: 2001-01-09
: Zele, Krista (Department: 2748)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Recognition

: C704S226000, C704S234000
: Reexamination Certificate
: active
: 06173258
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to electronic speech recognition systems, and relates more particularly to a method for reducing noise distortions in a speech recognition system.
2. Description of the Background Art
Implementing an effective and efficient method for system users to interface with electronic devices is a significant consideration of system designers and manufacturers. Automatic speech recognition is one promising technique that allows a system user to effectively communicate with selected electronic devices, such as digital computer systems. Speech typically consists of one or more spoken utterances which each may include a single word or a series of closely-spaced words forming a phrase or a sentence.
Conditions with significant ambient background-noise levels present additional difficulties when implementing a speech recognition system. Examples of such noisy conditions may include speech recognition in automobiles or in certain manufacturing facilities. In such user applications, in order to accurately analyze a particular utterance, a speech recognition system may be required to selectively differentiate between a spoken utterance and the ambient background noise.
Referring now to FIG.
1
(
a
), an exemplary waveform diagram for one embodiment of noisy speech
112
is shown. In addition, FIG.
1
(
b
) depicts an exemplary waveform diagram for one embodiment of speech
114
without noise. Similarly, FIG.
1
(
c
) shows an exemplary waveform diagram for one embodiment of noise
116
without speech
114
. In practice, noisy speech
112
of FIG.
1
(
a
) therefore is typically comprised of several components, including speech
114
of FIG. (
1
(
b
) and noise
116
of FIG.
1
(
c
). In FIGS.
1
(
a
),
1
(
b
), and
1
(
c
), waveforms
112
,
114
, and
116
are presented for purposes of illustration only. The present invention may readily incorporate various other embodiments of noisy speech
112
, speech
114
, and noise
116
.
An automatic speech recognizer typically builds a comparison database for performing speech recognition when a potential user “trains” the recognizer by providing a set of sample speech. Speech recognizers tend to significantly degrade in performance when a mismatch exists between training conditions and actual operating conditions. Such a mismatch may result from various types of acoustic distortion.
The two main sources that typically create acoustic distortion are the presence of additive noise, (such as car noise, music or background speakers), and, convolutive distortions due to the use of various different microphones, use of a telephone channel, or reverberation effects. From the foregoing discussion, it therefore becomes apparent that reducing noise distortions in a speech recognition system is a significant consideration of system designers and manufacturers of speech recognition systems.
SUMMARY OF THE INVENTION
In accordance with the present invention, a method is disclosed for reducing noise distortions in a speech recognition system. In one embodiment of the present invention, a feature extractor includes a fast Fourier transform, a noise suppressor, a filter bank, a logarithmic compressor, a frequency cosine transform, a first time cosine transform, a second time cosine transform, and a normalizer. In alternate embodiments, the feature extractor may readily be implemented using various other appropriate configurations.
In operation, the feature extractor initially receives and then provides source speech data to a fast Fourier transform (FFT) that responsively generates frequency-domain speech data by converting the source speech data from time domain to frequency domain to facilitate subsequent noise compensation. The FFT then provides the generated frequency-domain speech data to a noise suppressor that preferably performs a spectral subtraction procedure on the received frequency-domain speech data to generate noise-suppressed speech data to a filter bank.
The filter bank responsively filters the noise-suppressed speech data into channel energy, and then provides the filtered channel energy to a logarithmic compressor to be converted into logarithmic channel energy. A frequency cosine transform then converts the logarithmic channel energy into corresponding static features that are separately provided to a normalizer, to a first time cosine transform, and to a second time cosine transform.
The first time cosine transform preferably operates in a centered-mode to convert the received static features into delta features that are provided to the normalizer. Similarly, the second time cosine transform operates in a centered mode to convert the received static features into delta-delta features that are also provided to the normalizer.
The normalizer responsively performs an effective normalization procedure on the received static features to generate normalized static features to a recognizer, in accordance with the present invention. Similarly, the normalizer performs a normalization process on the received delta features to generate normalized delta features to the recognizer. The normalizer also performs a normalization process on the received delta-delta features to generate normalized delta-delta features to the recognizer.
The normalizer performs the normalization procedure by calculating and utilizing normalization values, including mean values, left variances, and right variances. The recognizer then analyzes the normalized static features, normalized delta features, and normalized delta-delta features to generate a speech recognition result, in accordance with the present invention. The present invention thus efficiently and effectively reduces noise distortions in a speech recognition system.

REFERENCES:
patent: 4284846 (1981-08-01), Marley
patent: 4592085 (1986-05-01), Watari et al.
patent: 5003601 (1991-03-01), Watari et al.
patent: 5390278 (1995-02-01), Gupta et al.
patent: 5513298 (1996-04-01), Stanford et al.
patent: 5604839 (1997-02-01), Acero et al.
patent: 5615296 (1997-03-01), Stanford et al.
patent: 5621859 (1997-04-01), Schwartz et al.
patent: 5715367 (1998-02-01), Gillick et al.
patent: 5742927 (1998-04-01), Crozier et al.
O'Shaughnessy, Douglas, “Speech Communication, Human and Machine,” 1990, pp. 422-423.
Proakis, John and Dimitris Manolakis, “Digital Signal Processing,” 1992, pp. 706-708.
Milner, Ben & Saeed Vaseghi, “Analysis of Cepstral-Time Matrices for Noise and Channel Robust Speech Recognition,” 1995, pp. 519-522.
Davis, Steven & Paul Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” 1980, pp. 353-360.
Iwahashi, N. et al., “Stochastic Features for Noise Robust Speech Recognition,” 1998, pp. 633-636.
Milner, Ben, “Inclusion of Temporal Information into Features For Speech Recognition,” pp. 256-259.
Hanson, Brian et al., “Spectral Dynamics for Speech Recognition under Adverse Conditions,” pp. 331-356.
Neumeyer, Leonardo et al., “Training Issues and Channel Equalization Techinques for the Construction of Telephone Acoustic Models Using a High-Quality Speech Corpus,” 1994, pp. 590-597.
Tibrewala, Sangita & Hynek Hermansky, “Multi-Band and Adaptation Approaches to Robust Speech Recognition,” 1997, pp. 2619-2622.
Vikki, Olli & Kari Laurila, “Noise Robust HMM-Based Speech Recognition Using Segmental Cepsrtal Feature Vector Normalization,” pp. 1-4.
Nolazco Flores, J.A. & S.J. Young, “Adapting a HMM-Based Recognizer for Noisy Speech Enhanced by Spectral; Subtraction,” 1993, pp. 1-30.
Chen, Ruxin et al., “A Parameter Sharing, Discrete and Continuous HMM Unified, Speech Recgnition System.”

Affiliated with

Chen Ruxin

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Menendez-Pidal Xavier

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Tanaka Miyuki

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wu Duanpei

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Koerner Gregory J.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sax Robert Louis

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Simon & Koerner LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sony Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Zele Krista

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for reducing noise distortions in a speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for reducing noise distortions in a speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for reducing noise distortions in a speech... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2454917

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure