Recursively excited linear prediction speech coder

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S221000

Reexamination Certificate

active

06704703

ABSTRACT:

FIELD OF THE INVENTION
The invention relates to digital speech coding, and more particularly to coding the excitation information for code-excited linear predictive speech coders.
BACKGROUND ART
Speech processing systems may first digitally encode an input speech signal before additionally processing the signal. Speech signals actually are non-stationary, but they can be considered as quasi-stationary signals over short periods such as 5 to 30 msec, a period of time generally known as a frame. Typically, the spectral information present in a speech signal during a frame is represented when encoding speech frames. Speech signals also contain an important short-term correlation between nearby samples, which can be removed from a speech signal by the technique of linear prediction. Linear predictive coding (LPC) defines a linear predictive filter representative of this short-term spectral information, which is computed for each frame. A general discussion of this subject matter appears in Chapter 7 of Deller, Proakis & Hansen, Discrete-Time Processing of Speech Signals (Prentice Hall, 1987), which is incorporated herein by reference.
The information not captured by the LPC coefficients is represented by a residual signal that is obtained by passing the original speech signal through the linear predictive filter defined by the LPC coefficients. This residual signal is normally very complex. In early residual excited linear predictive coders, a baseband filter processed the residual signal in order to obtain a series of equally spaced non-zero pulses that could be coded at significantly lower bit rates than the original signal, while preserving high signal quality. Even this processed residual signal can contain a significant amount of redundancy, however, especially during periods of voiced speech. This type of redundancy is due to the regularity of the vibration of the vocal cords and lasts for a significantly longer time span (typically 2.5-20 msec) than the correlation covered by the LPC coefficients (typically<2 msec).
Various other methods, e.g., LPC-10, seek to encode the residual signal as efficiently as possible while still preserving satisfactory quality of the decoded speech. Code-excited linear prediction (CELP) speech encoders are based on one or more codebooks of typical residual signals (or in this context, typical excitation signal code vectors) for the linear predictive filter defined by the LPC coefficients. See for example, Manfred R. Schroeder and Bishnu S. Atal, “Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates,” ICASSP 85, incorporated herein by reference. For each frame of speech, a CELP coder applies each individual excitation signal code vector to the LPC filter to generate a reconstructed speech signal, and compares the original input speech signal to the reconstructed signal to create an error signal. According to this technique, known as analysis-by-synthesis, the resulting error signal is then weighted by passing it through a weighting filter having a response based on human auditory perception. The optimum excitation signal is the code vector that produces the weighted error signal with the minimum energy for the current frame.
In CELP analysis, a pre-emphasized speech signal is filtered by a spectral envelope prediction error filter to produce a prediction error signal. Then, the error signal is filtered by a pitch prediction error filter to produce a residual excitation signal. This target excitation vector x is defined as:
x=g
p
·y+g
c
·z
where y is a filtered adaptive codebook vector, g
p
its associated gain, z is a fixed codebook vector, and g
c
its related gain. As shown in
FIG. 1
, the codebook may be searched by minimizing the mean-squared error between the weighted input speech and the weighted reconstructed speech. That is:
ƒ=
x−g
p
·y
During each subframe, the optimum excitation sequence may be found by searching possible codewords of the codebook, where an optimization criterion is closeness between the synthesized signal and the original signal. Typically, a fixed codebook consists of a set of N pulses (e.g., 2, 3, 4 or 5 pulses) in which each pulse can have a value of +1 or −1. The manner in which pulse positions are determined defines the structure of the codebook vector (ACELP, CS-ACELP, VSELP, HELP, . . . etc.).
One way to reduce the computational complexity of this codebook search is to do the search calculations in a transform domain. Another approach is to structure the codebook so that the code vectors are no longer independent of each other. This way, the filtered version of a code vector can be computed from the filtered version of the previous code vector. This approach uses about the same computational requirements as transform techniques, while significantly reducing the amount of ROM required.
Vector-sum excited linear prediction (VSELP) speech coders, described for example, by U.S. Pat. No. 4,817,157, seek to provide a speech coding technique that addresses both the problems of high computational complexity for codebook searching, and the large memory requirements for storing the code vectors. The VSELP approach—which still belongs to the CELP family of encoders—achieves its goals by efficient utilization of structured codebooks. The structured codebooks reduce computational complexity and increase robustness to channel errors. While in basic CELP encoders only one excitation codebook is used, VSELP introduced using more than one codebook simultaneously. In practice, only two codebooks are used.
In HELP encoders, such as described in U.S. Pat. No. 5,963,897, different kinds of waveforms compete or cooperate to best model the excitation. The waveform can have variable length. Within a frame, the first waveform is always defined with regard to the absolute position of the beginning of the frame. The other waveforms are defined relatively to the first waveform.
SUMMARY OF THE INVENTION
The excitation in a CELP-like speech coder is recursively calculated. For a given bitrate and a given complexity, the recursive approach described lowers the complexity with minimum impact on speech quality. The excitation signal is a sum of at least three vector terms, each vector term being a product of a codebook vector z
k
and an associated gain term g
k
. A first vector term g
0
z
0
is determined that is representative of a target excitation vector x. Each remaining vector term is recursively determined as a vector term g
k
z
k
representative of the difference between the target excitation vector x and the sum of previously determined vector terms,

i
=
0
k
-
1

g
i

z
i
.
In a further embodiment, the gain term of each vector term g
k
z
k
is determined by minimizing an error function E representative of the difference between the target excitation vector x and the sum of that vector term and all previously determined vector terms,

i
=
0
k

g
i

z
i
.
The error function E may be the mean squared error of the difference between the target excitation vector and the sum of that vector term and all previously determined vector terms,
[
x
-

i
=
0
k

g
i

z
i
]
2
.
For a given number of vector codebooks M such that M=k, the error E may be derived with respect to each gain g
1
to produce a set of (M+1) equations of the form Z.G=X where Z is a correlation matrix of the codebook vectors z
1
, G is a row vector of the gains g
i
, X is a correlation vector of the target excitation vector x and the codebook vectors z
1
, such that all the gain terms in the excitation signal may be jointly quantified from the row vector G.
In another embodiment, each vector term is further the product of a weighting term &agr;. Thus, the first vector term is defined as &agr;
0
g
0
z
0
, and each recursively determined vector term is defined as &agr;
k
g
0
z
k
, which is representative of the difference between the target excitation vector x and the sum of the previously determined vector terms,

i
=
0
k
-
1

&alpha

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Recursively excited linear prediction speech coder does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Recursively excited linear prediction speech coder, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Recursively excited linear prediction speech coder will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3245203

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.