Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-12-06
2002-05-21
Smits, Talivaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S219000, C704S220000, C704S222000, C704S223000
Reexamination Certificate
active
06393390
ABSTRACT:
FIELD OF INVENTION
The present invention relates to the improved method and system for digital encoding of speech signals, more particularly to Linear Predictive Analysis-by-Synthesis (LPAS) based speech coding.
BACKGROUND OF THE INVENTION
LPAS coders have given new dimension to medium-bit rate (8-16 Kbps) and low-bit rate (2-8 Kbps) speech coding research. Various forms of LPAS coders are being used in applications like secure telephones, cellular phones, answering machines, voice mail, digital memo recorders, etc. The reason is that LPAS coders exhibit good speech quality at low bit rates. LPAS coders are based on a speech production model
39
(illustrated in
FIG. 1
) and fall into a category between waveform coders and parametric coders (Vocoder); hence they are referred to as hybrid coders.
Referring to
FIG. 1
, the speech production model
39
parallels basic human speech activity and starts with the excitation source
41
(i.e., the breathing of air in the lungs). Next the working amount of air is vibrated through a vocal chord
43
. Lastly, the resulting pulsed vibrations travel through the vocal tract
45
(from vocal chords to voice box) and produce audible sound waves, i.e., speech
47
.
Correspondingly, there are three major components in LPAS coders. These are (i) a short-term synthesis filter
49
, (ii) a long-term synthesis filter
51
, and (iii) an excitation codebook
53
. The short-term synthesis filter includes a short-term predictor in its feed-back loop. The short-term synthesis filter
49
models the short-term spectrum of a subject speech signal at the vocal tract stage
45
. The short-term predictor of
49
is used for removing the near-sample redundancies (due to the resonance produced by the vocal tract
45
) from the speech signal. The long-term synthesis filter
51
employs an adaptive codebook
55
or pitch predictor in its feedback loop. The pitch predictor
55
is used for removing far-sample redundancies (due to pitch periodicity produced by a vibrating vocal chord
43
) in the speech signal. The source excitation
41
is modeled by a so-called “fixed codebook” (the excitation code book)
53
.
In turn, the parameter set of a conventional LPAS based coder consists of short-term parameters (short-term predictor), long-term parameters and fixed codebook
53
parameters. Typically short-term parameters are estimated using standard 10-12th order LPC (Linear predictive coding) analysis.
The foregoing parameter sets are encoded into a bit-stream for transmission or storage. Usually, short-term parameters are updated on a frame-by-frame basis (every 20-30 msec or 160-240 samples) and long-term and fixed codebook parameters are updated on a subframe basis (every 5-7.5 msec or 40-60 samples). Ultimately, a decoder (not shown) receives the encoded parameter sets, appropriately decodes them and digitally reproduces the subject speech signal (audible speech)
47
.
Most of the state-of-the art LPAS coders differ in fixed codebook
53
implementation and pitch predictor or adaptive codebook implementation
55
. Examples of LPAS coders are Code Excited Linear Predictive (CELP) coder, Multi-Pulse Excited Linear Predictive (MPLPC) coder, Regular Pulse Linear Predictive (RPLPC) coder, Algebraic CELP (ACELP) coder, etc. Further, the parameters of the pitch predictor or adaptive codebook
55
and fixed codebook
53
are typically optimized in a closed-loop using an analysis-by-synthesis method with perceptually-weighted minimum (mean squared) error criterion. See Manfred R. Schroeder and B. S. Atal, “Code-Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates,”
IEEE Proceedings of the International Conference on Acoustics, Speech and Signal Processing,
Tampa, Fla., pp. 937-940, 1985.
The major attributes of speech-coders are:
1. Speech Quality
2. Bit-rate
3. Time and Space complexity
4. Delay
Due to the closed-loop parameter optimization of the pitch-predictor
55
and fixed codebook
53
, the complexity of the LPAS coder is enormously high as compared to a waveform coder. The LPAS coder produces considerably good speech quality around 8-16 kbps. Further improvement in the speech quality of LPAS based coders can be obtained by using sophisticated algorithms, one of which is the multi-tap pitch predictor (MTPP). Increasing the number of taps in the pitch predictor increases the prediction gain, hence improving the coding efficiency. On the other hand, estimating and quantizing MTPP parameters increases the computational complexity and memory requirements of the coder.
Another very computationally expensive algorithm in an LPAS based coder is the fixed codebook search. This is due to the analysis-by-synthesis based parameter optimization procedure.
Today, speech coders are often implemented on Digital Signal Processors (DSP). The cost of a DSP is governed by the utilization of processor resources (MIPS/RAM/ROM) required by the speech coder.
SUMMARY OF THE INVENTION
One object of the present invention is to provide a method for reducing the computational complexity and memory requirements (MIPS/RAM/ROM) of an LPAS coder while maintaining the speech quality. This reduction in complexity allows a high quality LPAS coder to run in real-time on an inexpensive general purpose fixed point DSP or other similar digital processor.
Accordingly, the present invention method provides (i) an LPAS speech encoder reduced in computational complexity and memory requirements, and (ii) a method for reducing the computational complexity and memory requirements of an LPAS speech encoder, and in particular a multi-tap pitch predictor and the source excitation codebook in such an encoder. The invention employs fast structured product code vector quantization (PCVQ) for quantizing the parameters of the multi-tap pitch predictor within the analysis-by-synthesis search loop. The present invention also provides a fast procedure for searching the best code-vector in the fixed-code book. To achieve this, the fixed codebook is preferably formed of ternary values (1,−1,0).
In a preferred embodiment, the multi-tap pitch predictor has a first vector codebook and a second (or more) vector codebook. The invention method sequentially searches the first and second vector codebooks.
Further, the invention includes forming the source excitation codebook by using non-contiguous positions for each pulse.
REFERENCES:
patent: 6014618 (2000-01-01), Patel et al.
patent: 6144655 (2000-11-01), Kim
patent: 6161086 (2000-12-01), Mukherjee et al.
Chen, Juin-Hwey, “Toll-Quality 16 KB/S CELP Speech Coding with Very Low Complexity”,IEEE Proceedings of the International Conference on Acoustics, Speech and Signal Processing: pp. 9-12 (1995).
“ICSPAT Speech Analysis & Synthesis”, schedule of lectures, http://www.dspworld.com/ics98c/26.htm (Jul. 28, 1998).
“Enhanced Low Memory CELP Vocoder—C5x/C2xx”,DSP Software Solutions(catalog) (Sep. 1997).
Schroeder, M.R. and Atal, B.S., “Code-Excited Linear Prediction (CELP) : High-Quality Speech at Very Low Bit Rates”, IEEE Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 937-940 (1985).
Kroon, P. and Atal, B.S., “On Improving the Performance of Pitch Predictors in Speech Coding Systems”,Advances in Speech Coding, Kluwner Academic Publisher, Boson, Massachusetts, pp. 321-327 (1991).
Veeneman, D. and Mazor, B., “Efficient Multi-Tap Pitch Prediction for Stochastic Coding”,Speech and Audio Coding for Wireless and Network Applications, Kluwner Academic Publisher, Boston, Massachusetts, pp. 225-229 (1993).
Kolb Douglas E.
Patel Jayesh S.
Hamilton, Brook, Smith and Reynolds, P.C.
McFadden Susan
Smits Talivaldis Ivars
LandOfFree
LPAS speech coder using vector quantized, multi-codebook,... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with LPAS speech coder using vector quantized, multi-codebook,..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and LPAS speech coder using vector quantized, multi-codebook,... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2883625