Adaptive weiner filtering using line spectral frequencies

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S205000, C704S230000

Reexamination Certificate

active

06263307

ABSTRACT:

BACKGROUND OF THE INVENTION
The invention relates to electronic devices, and, more particularly, to speech analysis and synthesis devices and systems.
Human speech consists of a stream of acoustic signals with frequencies ranging up to roughly 20 KHz; but the band of 100 Hz to 5 KHz contains the bulk of the acoustic energy. Telephone transmission of human speech originally consisted of conversion of the analog acoustic signal stream into an analog electrical voltage signal stream (e.g., microphone) for transmission and reconversion to an acoustic signal stream (e.g., loudspeaker) for reception.
The advantages of digital electrical signal transmission led to a conversion from analog to digital telephone transmission beginning in the 1960s. Typically, digital telephone signals arise from sampling analog signals at 8 KHz and nonlinearly quantizing the samples with 8-bit codes according to the &mgr;-law (pulse code modulation, or PCM). A clocked digital-to-analog converter and companding amplifier reconstruct an analog electrical signal stream from the stream of 8-bit samples. Such signals require transmission rates of 64 Kbps (kilobits per second). Many communications applications, such as digital cellular telehone, cannot handle such a high transmission rate, and this has inspired various speech compression methods.
The storage of speech information in analog format (e.g., on magnetic tape in a telephone answering machine) can likewise be replaced with digital storage. However, the memory demands can become overwhelming: 10 minutes of 8-bit PCM sampled at 8 KHz would require about 5 MB (megabytes) of storage. This demands speech compression analogous to digital transmission compression.
One approach to speech compression models the physiological generation of speech and thereby reduces the necessary information transmitted or stored. In particular, the linear speech production model presumes excitation of a variable filter (which roughly represents the vocal tract) by either a pulse train for voiced sounds or white noise for unvoiced sounds followed by amplification or gain to adjust the loudness. The model produces a stream of sounds simply by periodically making a voiced/unvoiced decision plus adjusting the filter coefficients and the gain. Generally, see Markel and Gray, Linear Prediction of Speech (Springer-Verlag 1976).
More particularly, the linear prediction method partitions a stream of speech samples s(n) into “frames” of, for example, 180 successive samples (22.5 msec intervals for a 8 KHz sampling rate); and the samples in a frame then provide the data for computing the filter coefficients for use in coding and synthesis of the sound associated with the frame. Each frame generates coded bits for the linear prediction filter coefficients (LPC), the pitch, the voiced/unvoiced decision, and the gain. This approach of encoding only the model parameters represents far fewer bits than encoding the entire frame of speech samples directly, so the transmission rate may be only 2.4 Kbps rather than the 64 Kbps of PCM. In practice, the LPC coefficients must be quantized for transmission, and the sensitivity of the filter behavior to the quantization error has led to quantization based on the Line Spectral Frequencies (LSF) representation.
To improve the sound quality, further information may be extracted from the speech, compressed and transmitted or stored along with the LPC coefficients, pitch, voicing, and gain. For example, the codebook excitation linear prediction (CELP) method first analyzes a speech frame to find the LPC filter coefficients, and then filters the frame with the LPC filter. Next, CELP determines a pitch period from the filtered frame and removes this periodicity with a comb filter to yield a noise-looking excitation signal. Lastly, CELP encodes the excitation signals using a codebook. Thus CELP transmits the LPC filter coefficients, pitch, gain, and the codebook index of the excitation signal.
The advent of digital cellular telephones has emphasized the role of noise suppression in speech processing, both coding and recognition. Customer expectation of high performance even in extreme car noise situations plus the demand to move to progressively lower data rate speech coding in order to accommodate the ever-increasing number of cellular telephone customers have contributed to the importance of noise suppression. While higher data rate speech coding methods tend to maintain robust performance even in high noise environments, that typically is not the case with lower data rate speech coding methods. The speech quality of low data rate methods tends to degrade drastically with high additive noise. Noise supression to prevent such speech quality losses is important, but it must be achieved without introducing any undesirable artifacts or speech distortions or any significant loss of speech intelligibility. These performance goals for noise suppression have existed for many years, and they have recently come to the forefront due to digital cellular telephone application.
FIG. 1
a
schematically illustrates an overall system
100
of modules for speech acquisition, noise suppression, analysis, transmission/storage, synthesis, and playback. A microphone converts sound waves into electrical signals, and sampling analog-to-digital converter
102
typically samples at 8 KHz to cover the speech spectrum up to 4 KHz. System
100
may partition the stream of samples into frames with smooth windowing to avoid discontinuities. Noise suppression
104
filters a frame to suppress noise, and analyzer
106
extracts LPC coefficients, pitch, voicing, and gain from the noise-suppressed frame for transmission and/or storage
108
. The transmission may be any type used for digital information transmission, and the storage may likewise be any type used to store digital information. Of course, types of encoding analysis other than LPC could be used. Synthesizer
110
combines the LPC coefficients, pitch, voicing, and gain information to synthesize frames of sampled speech which digital-to-analog convertor (DAC)
112
converts to analog signals to drive a loudspeaker or other playback device to regenerate sound waves.
FIG. 1
b
shows an analogous system
150
for voice recognition with noise suppression. The recognition analyzer may simply compare input frames with frames from a database or may analyze the input frames and compare parameters with known sets of parameters. Matches found between input frames and stored information provides recognition output.
One approach to noise suppression in speech employs spectral subtraction and appears in Boll, Suppression of Acoustic Noise in Speech Using Spectral Subtraction, 27 IEEE Tr.ASSP 113 (1979), and Lim and Oppenheim, Enhancement and Bandwidth Compression of Noisy Speech, 67 Proc.IEEE 1586 (1979). Spectral subtraction proceeds roughly as follows. Presume a sampled speech signal s(j) with uncorrelated additive noise n(j) to yield an observed windowed noisy speech y(j)=s(j)+n(j). These are random processes over time. Noise is assumed to be a stationary process in that the process's autocorrelation depends only on the difference of the variables; that is, there is a function r
N
(.) such that:
E{n
(
j
)
n
(
i
)}=
r
N
(
i−j
)
where E is the expectation. The Fourier transform of the autocorrelation is called the power spectral density, P
N
(&ohgr;). If speech were also a stationary process with autocorrelation r
S
(j) and power spectral density P
S
(&ohgr;), then the power spectral densities would add due to the lack of correlation:
P
Y
(&ohgr;)=
P
S
(&ohgr;)+
P
N
(&ohgr;)
Hence, an estimate for P
S
(&ohgr;), and thus s(j), could be obtained from the observed noisy speech y(j) and the noise observed during intervals of (presumed) silence in the observed noisy speech. In particular, take P
Y
(&ohgr;) as the squared magnitude of the Fourier transform of y(j) and P
N
(&ohgr;) as the squared magnitude of the Fourier transform of the observed noise.
Of course, speech is not a stationary proce

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Adaptive weiner filtering using line spectral frequencies does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Adaptive weiner filtering using line spectral frequencies, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Adaptive weiner filtering using line spectral frequencies will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2442018

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.