Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1998-09-24
2001-05-29
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S219000, C704S222000
Reexamination Certificate
active
06240385
ABSTRACT:
FIELD OF INVENTION
The present invention relates to quantization of gain parameters in speech coders and is particularly relevant to Generalized Linear Prediction Analysis-by-Synthesis (GLPAS) speech coders.
BACKGROUND OF INVENTION
A major objective in designing digital speech coders is to optimize tradeoffs between minimizing the bit rate of the encoded speech and maximizing the speech quality. Other practical criteria, such as complexity, delay and robustness, also impose constraints on coder design. Optimization of the tradeoffs must be tailored to the particular application to which the coder is to be applied.
Waveform approximating coders and decoders rely on relatively simple speech models and on limitations of the human hearing system to encode and reconstruct waveforms which are perceived to be very similar to the original speech signal prior to encoding. Over the past decade, the performance of Generalized Linear Prediction Analysis-by-Synthesis (GLPAS) speech coders providing coded speech at 2 kbps to 16 kbps has improved considerably. Nevertheless, further effort is devoted to increasing the speech quality of such coders and or the reduction of bit rate for equivalent speech quality.
A GLPAS coder commonly operates on successive frames of a speech signal in a closed-loop fashion, each frame comprising a plurality of successive subframes. Processing at the subframe level provides better modelling of signal changes while meeting practical constraints on processing complexity and memory usage, and the closed-loop nature of the processing further improves the efficiency of the coding.
Typical GLPAS coding techniques comprise:
Linear Predictive Coding (LPC) analysis to model the spectral envelope of the speech signal, providing partial short term prediction of speech signal parameters;
Pitch Delay prediction or Adaptive CodeBook (ACB) alignment to model pitch harmonics of the speech signal;
Pitch or ACB Gain determination to model the energy of harmonic components of the speech signal;
Fixed CodeBook (FCB) alignment to model excitation parameters of the speech signal;
FCB Gain determination to model the energy of wide spectrum components of the speech signal; and
pre- and post-processing of the speech signal.
GLPAS techniques provide better solutions than LPAS techniques to efficient coding of the pitch by modifying the input signal to allow infrequent pitch updates without degrading performance. This speech signal modification may then be considered part of pre-processing with the modified signal being the input to the modelling and quantization process. In this specification, LPAS is considered to be a special case of GLPAS in which the modification of the signal to simplify pitch encoding is omitted.
One example of a GLPAS coder is the “North American Enhanced Variable Rate Codec” specified by Standard IS-127. This codec uses 20 msec frames, each frame comprising 3 successive subframes. The bit budget for each 20 msec frame when this coded is operating in “half rate mode” allows 22 bits per frame for Line Spectral Pairs (LSP) derived by LPC analysis, 7 bits per frame for Pitch Delay or ACB index, 3 bits per subframe (i.e. 9 bits per frame) for ACB Gain, 10 bits per subframe (i.e. 30 bits per frame) for FCB index, and 4 bits per subframe (i.e. 12 bits per frame) for FCB Gain, for a total of 80 bits per frame. The Pitch Gain or ACB Gain is determined for each subframe and converted into a 3 bit code for each subframe using scalar quantization. The FCB gain is also determined for each subframe and converted into a 4 bit code for each subframe using scalar quantization.
An example of a recent LPAS coder is the “Enhanced Full Rate Speech Codec for North American Cellular” defined by Standard IS-641. This codec uses 20 msec frames, each frame comprising 4 successive subframes. The bit budget for each 20 msec frame allows 26 bits per frame for Line Spectral Pairs (LSP) derived by LPC analysis, 26 bits per frame for Pitch Delay or ACB index, 17 bits per subframe (i.e. 68 bits per frame) for FCB index, and 7 bits per subframe (i.e. 28 bits per frame) for FCB and Pitch or ACB Gain, for a total of 148 bits per frame. The 26 bits per frame for Pitch Delay or ACB index are provided as 8 bits for each of the first and third subframes of each frame, and 5 bits for each of the second and fourth subframes of each frame. The Pitch Gain or ACB Gain for each subframe and the FCB gain for each subframe are determined for each subframe and converted into a 7 bit code for each subframe using two dimensional vector quantization, one component of the two dimensional gain vector for each subframe corresponding to the pitch gain for the subframe and the other component of the gain vector for each subframe corresponding to the FCB gain for the subframe.
The coders defined by IS-127 and IS-641 represent recent standards in GLPAS and LPAS speech coding techniques.
SUMMARY OF INVENTION
An object of this invention is to provide methods and apparatus for GLPAS speech coding which are more efficient than known GLPAS speech coding methods and apparatus as represented, for example, by the IS-127 and IS-641 specifications, for at least for some applications.
Another object of this invention is to provide efficient gain quantization in GLPAS encoders.
In this specification, the term “vector quantization” includes, but is not limited to, recursive vector quantization, such as analysis-by-synthesis vector quantization.
One aspect of this invention provides a method of encoding a gain parameter in a generalized linear predictive analysis-by-synthesis coder. The method comprises determining a subframe gain parameter for each of a plurality of successive subframes of a frame, and determining a quantized frame gain parameter for each frame using a delayed decision quantizer operating on the subframe gain parameters.
The step of determining a quantized frame gain parameter may comprise treating the subframe gain parameters as components of a gain vector and vector quantizing the gain vector to determine the quantized frame gain parameter. Alternatively, the step of determining a quantized frame gain parameter may comprise applying tree quantization or trellis quantization to the subframe gain parameters.
The step of vector quantizing the gain vector may comprise quantizing the gain vector by analysis-by-synthesis linear predictive vector quantization. The vector quantization technique may comprise adaptive linear vector quantization, for example moving average predictive vector quantization, auto-regressive predictive vector quantization, or a combination of two or more of these techniques.
The method may comprise determining multiple subframe gain parameters for each subframe, treating the subframe gain parameters as components of a gain vector and vector quantizing the gain vector to determine the quantized frame gain parameter. For example, the method may comprise determining a fixed codebook gain and an adaptive codebook gain or pitch gain for each subframe, treating the fixed codebook gains and adaptive codebook or pitch gains as components of a gain vector and vector quantizing the gain vector to determine the quantized gain parameter.
The method may further comprise updating parameters of the coder using the quantized frame gain parameter. This prevents parameters of the coder derived from the unquantized gain (for example Adaptive Codebook parameters) from becoming misaligned with corresponding parameters of a decoder based on the quantized gain, such that the decoder cannot accurately reconstruct the original signal from the encoded signal.
Another aspect of the invention provides a generalized linear predictive analysis-by-synthesis coder for encoding a speech signal. The coder comprises means for encoding a gain parameter comprising means for determining a subframe gain parameter for each of a plurality of successive subframes of a frame, and delayed decision quantization means operable on the subframe gain parameters for determining a quantized frame gain parameter for each frame.
The delayed decisi
Dorvil Richemond
Nortel Networks Limited
LandOfFree
Methods and apparatus for efficient quantization of gain... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods and apparatus for efficient quantization of gain..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods and apparatus for efficient quantization of gain... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2569767