Speech coding with variable model order linear prediction

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S219000

Reexamination Certificate

active

06202045

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to speech coding and more particularly to speech coding using linear predictive coding (LPC). The invention is applicable in particular, though not necessarily, to code excited linear prediction (CELP) speech coders.
BACKGROUND OF THE INVENTION
A fundamental issue in the wireless transmission of digitised speech signals is the minimisation of the bit-rate required to transmit an individual speech signal. By minimising the bit-rate, the number of communications which can be carried by a transmission channel, for a given channel bandwidth, is increased. All of the recognised standards for digital cellular telephony therefore specify some kind of speech codec to compress speech data to a greater or lesser extent. More particularly, these speech codecs rely upon the removal of redundant information present in the speech signal being coded.
In Europe, the accepted standard for digital cellular telephony is known under the acronym GSM (Global System for Mobile communications). GSM includes the specification of a CELP speech encoder (Technical Specification GSM 06.60). A very general illustration of the structure of a CELP encoder is shown in
FIG. 1. A
sampled speech signal is divided into 20 ms frames, defined by a vector x(j), of 160 sample points, j=0 to 159. The frames are encoded in turn by first applying them to a linear predictive coder (LPC)
1
which generates for each frame x(j) a set of LPC coefficients a(i), i=0 to n, which are representative of the short term redundancy in the frame. In GSM, n is predefined as ten.
The output from the LPC comprises this set of LPC coefficients a(i) and a residual signal r(j) produced by removing the short term redundancy from the input speech frame using a LPC analysis filter. The residual signal is then provided to a long term predictor (LTP)
2
which generates a set of LTP parameters b which are representative of the long term redundancy in the residual signal. In practice, long term prediction is a two stage process, involving a first open loop estimate of the LTP coefficients and a second closed loop refinement of the estimated parameters.
An excitation codebook
3
is provided which contains a large number of excitation codes. For each frame, each of these codes is provided in turn, via a scaling unit
4
, to a LTP synthesis filter
5
. This filter
5
receives the LTP parameters from the LTP
2
and introduces into the code the long term redundancy predicted by the LTP parameters. The resulting frame is then provided to a LPC synthesis filter
6
which receives the LPC coefficients and introduces the predicted short term redundancy into the code. The predicted frame x
pred
(j) is compared with the actual frame x(j) at a comparator
7
, to generate an error signal e(j) for the frame. The code c(j) which produces the smallest error signal, after processing by a weighting filter
8
, is selected by a codebook search unit
9
. A vector u(j) identifying the selected code is transmitted over the transmission channel
10
to the receiver. The LPC coefficients and the LTP parameters are also transmitted but, prior to transmission, they themselves are encoded to minimise still further the transmission bit-rate.
The LPC analysis filter (which removes redundancy from the input signal to provide the residual signal r(j)) is shown schematically in FIG.
2
. The input code ĉ(j) (as modified by the LTP synthesis filter) is combined with delayed versions of itself ĉ(j−i), the LPC coefficients a(i) providing the gain factors for respective delayed versions and with a(O)=1. The filter can be defined by the expression:
A
(
z
)=1
+a
(
l
)
z
−1
+. . .+a
(
n
)
z
−n
where z represents a delay of one sample.
The LPC coefficients are converted into a corresponding number of line spectral pair (LSP) coefficients, which are the roots of the two polynomials given by:
P
(
z
)=
A
(
z
)+
z
−(n+1)
A
(
z
−1
)
and
Q
(
z
)=
A
(
z
)−
z
−(n+1)
A
(
z
−1
)
Typically, the LSP coefficients of the current frame are quantised using moving average (MA) predictive quantisation. This involves using a predetermined average set of LSP coefficients and subtracting this average set from the current frame LSP coefficients. The LSP coefficients of the preceding frame are multiplied by respective (previously determined) prediction factors to provide a set of predicted LSP coefficients. A set of residual LSP coefficients is then obtained by subtracting the mean removed LSP coefficients from the predicted LSP coefficients. The LSP coefficients tend to vary little from frame to frame, as compared to the LPC coefficients, and the resulting set of residual coefficients lend themselves well to subsequent quantisation (‘Efficient Vector Quantisation of LPC Parameters at 24 Bits/Frame’, Kuldip K. P. and Bishnu S. A., IEEE Trans. Speech and Audio Processing, Vol 1, No 1, January 1993).
The number of LPC coefficients (and consequently the number of LSP coefficients), determines the accuracy of the LPC. However, for any given frame, there exists an optimal number of LPC coefficients which is a trade off between encoding accuracy and compression ratio. As already noted, in the current GSM standard, the order of the LPC is fixed at n=10, a number which is high enough to encode all expected speech frames with sufficient accuracy. Whilst this simplifies the LPC, reducing computational requirements, it does result in the ‘over-coding’ of many frames which could be coded with fewer LPC coefficients than are specified by this fixed rate.
Variable rate LPC's have been proposed, where the number of LPC coefficients varies from frame to frame, being optimised individually for each frame. Variable rate LPCs are ideally suited to CDMA networks, the proposed GSM phase 2 standard, and the future third generation standard (UTMS). These networks use, or propose the use of, ‘packet switched’ transmission to transfer data in packets (or bursts). This compares to the existing GSM standard which uses ‘circuit switched’ transmission where a sequence of fixed length time frames are reserved on a given channel for the duration of a telephone call.
Despite the advantages, a number of technical problems must be overcome before a variable rate LPC can be satisfactorily implemented. In particular, and as has been recognised by the inventors of the invention to be described below, a variable rate LPC is incompatible with the LSP coefficient quantisation scheme described above. That is to say that it is not possible to directly generate a predictive, quantised LSP coefficient signal when the number of LSP coefficients is varying from frame to frame. Furthermore, it is not possible to interpolate LPC (or LSP) coefficients between frames in order to smooth the transition between frame boundaries.
SUMMARY OF THE INVENTION
According to a first aspect of the present invention there is provided a method of coding a sampled speech signal, the method comprising dividing the speech signal into sequential frames and, for each current frame:
generating a first set of linear prediction coding (LPC) coefficients which correspond to the coefficients of a linear filter and which are representative of short term redundancy in the current frame;
if the number of LPC coefficients in the first set of the current frame differs from the number in the first set of the preceding frame, then generating a second expanded or contracted set of LPC coefficients from the first set of LPC coefficients generated for the preceding frame, the second set containing a number of LPC coefficients equal to the number of LPC coefficients in said first set of the current frame; and
encoding the current frame using the first set of LPC coefficients of the current frame and the second set of LPC coefficients of the preceding frame.
The present invention is applicable in particular to variable bit-rate wireless telephone networks in which data is transmitted in bursts

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech coding with variable model order linear prediction does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech coding with variable model order linear prediction, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech coding with variable model order linear prediction will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2479675

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.