Spectral magnitude modeling and quantization in a frequency...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S221000, C704S265000

Reexamination Certificate

active

06493664

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention proposes novel techniques for modeling, quantization and error concealment of the components of prototype waveform (PW) representation of the speech prediction residual signal, and more particularly to improved coding of the spectral magnitudes of the slowly evolving waveform (SEW) and rapidly evolving waveform (REW) components. Encoding of other components of the PW representation, such as the PW gain vector, the SEW and REW phase spectra are also discussed for completeness, but these are the subjects of separate inventions. These techniques are applicable to low bit rate speech coders operating in the range of 2-4 kbit/s. In this invention, novel techniques are proposed for the quantization of the variable dimension SEW and REW spectral magnitude.
2. Background and Description of Related Art
The present invention describes techniques for efficient encoding of the speech signal applicable to speech coders typically operating at bit rates in the range of 2-4 kbit/s. In particular, such techniques are applicable to a representation of the speech prediction error (residual) signal known as the prototype waveform (PW) representation, see, e.g., W. B. Klejin and J. Haagen, “Waveform Interpolation for Coding and Synthesis”, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, “Encoding Speech Using Prototype Waveforms”, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993. The prototype waveforms are a sequence of complex Fourier transforms evaluated at pitch harmonic frequencies, for pitch period wide segments of the residual, at a series of points along the time axis. Thus, thePW sequence contains information about the spectral characteristics of the residual signal as well as the temporal evolution of these characteristics. A high quality of speech can be achieved at low coding rates by efficiently quantizing the important aspects of the PW sequence. In PW based coders, the PW is separated into a shape component and a level component by computing the RMS (or gain) value of the PW and normalizing the PW to unity RMS value. The normalized PW is decomposed into a slowly evolving waveform (SEW) which contains the periodic component of the residual and a rapidly evolving waveform (REW) which contains the aperiodic component of the residual. As the pitch frequency varies, the dimensions of the PW, SEW and REW vectors also vary, typically in the range 11-61.
This invention also proposes novel error concealment techniques for mitigating the effects of frame erasure or packet loss between the speech encoder and the speech decoder due to a degraded transmission medium.
W. B. Klejin and J. Haagen, “Waveform Interpolation for Coding and Synthesis”, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, “Encoding Speech Using Prototype Waveforms”, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; and J. Hagen and W. B. Klejin, “Waveform Interpolation”, in Modern Methods of Speech Processing, Edited by R. P. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995, describe the prototype waveform interpolation (PWI) modeling approach. However, the quantization of the PWI model is not specified in detail. The proposed invention pertains to the quantization of the various components of the PWI. The quantization approaches proposed in our invention are novel methods and are not in any way based on or derived from the quantization approaches described in the prior art in W. B. Klejin and J. Haagen, “Waveform Interpolation for Coding and Synthesis”, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, “Encoding Speech Using Prototype Waveforms”, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; and J. Hagen and W. B. Klejin, “Waveform Interpolation”, in Modern Methods of Speech Processing, Edited by R. P. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995. Additionally, W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, “A Low Complexity Waveform Interpolation Coder”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, and Y. Shoham, “Very Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbps”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, describe certain quantization schemes for prototype waveform encoding.
In the prior art of W. B. Klejin and J. Haagen, “Waveform Interpolation for Coding and Synthesis”, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995, and W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, “A Low Complexity Waveform Interpolation Coder”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, the PW gain vector is not quantized using a VQ designed by explicit population of steady state and transient codewords. This can result in poor performance during voicing onsets and other transitory events. The variable dimensionality of SEW and REW vectors is addressed by using fixed order analytical function approximations for the REW magnitude shape and by deriving the SEW magnitude approximately from the REW magnitude. The coefficients of the analytical function that provides the best fit to the vector are used to represent the vector for quantization. This approach suffers from three disadvantages: (i) A modeling error is now added to the quantization error, leading to a loss of performance, (ii) analytical function approximation for reasonable orders (5-10) deteriorates with increasing frequency, and (iii) if spectrally weighted distortion metrics are used during VQ, the complexity of these methods becomes formidable. In the prior art of W. B. Klejin and J. Haagen, “Waveform Interpolation for Coding and Synthesis”, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; and Y. Shoham, “Very Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbps”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997, only a predetermined low frequency sub-band (for e.g., 0-800 Hz band) of the SEW magnitude is encoded. This substantially reduces the dimension of the SEW vector, thereby permitting direct VQ. At the receiver, the remaining upper band is estimated using the REW magnitude spectrum. This method suffers from the disadvantage that if a significant amount of signal energy exists in the upper band, it is reproduced poorly, leading to poor speech quality. This condition can occur for a number of speech sounds, especially for unvoiced speech.
A number of prior techniques for encoding phase are in use in PWI based voice coders, see, e.g., W. B. Klejin and J. Haagen, “Waveform Interpolation for Coding and Synthesis”, in Speech Coding and Synthesis, Edited by W. B. Klejin, K. K. Paliwal, Elsevier, 1995; W. B. Klejin, “Encoding Speech Using Prototype Waveforms”, IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 4, 386-399, 1993; W. B. Klejin, Y. Shoham, D. Sen and R. Hagen, “A Low Complexity Waveform Interpolation Coder”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1996; J. Hagen and W. B. Klejin, “Waveform Interpolation”, in Modern Methods of Speech Processing, Edited by R. P. Ramachandran and R. Mammone, Kluwer Academic Publishers, 1995; Y. Shoham, “Very Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbps”, IEEE International Conference on Acoustics, Speech and Signal Processing, 1997. In these prior art, the SEW phase vector is either a random phase (for unvoiced sounds) or is the phase of a fixed pitch cycle waveform (for voiced sounds). This binary characterization of the SEW phase is too simplistic. This method may work for a narrow range of speakers and for clean speech signals. However, this method becomes unsatisfactory as the range of speakers increases and for speech corrupted by background noise. Noisy speech requires varying degrees of rand

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Spectral magnitude modeling and quantization in a frequency... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Spectral magnitude modeling and quantization in a frequency..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Spectral magnitude modeling and quantization in a frequency... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2928374

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.