Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1998-05-06
2001-03-06
Hudspeth, David (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S217000
Reexamination Certificate
active
06199035
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to speech coding and is applicable in particular to methods and apparatus for speech coding which use a long term prediction (LTP) parameter.
BACKGROUND OF THE INVENTION
Speech coding is used in many communications applications where it is desirable to compress an audio speech signal to reduce the quantity of data to be transmitted, processed, or stored. In particular, speech coding is applied widely in cellular telephone networks where mobile phones and communicating base controller stations are provided with so called “audio codecs” which perform coding and decoding on speech signals. Data compression by speech coding in cellular telephone networks is made necessary by the need to maximise network call capacity.
Modern speech codecs typically operate by processing speech signals in short segments called frames. In the case of the European digital cellular telephone system known as GSM (defined by the European Telecommunications Standards Institute—ETSI—specification 06.60), the length of each such frame is 20 ms, corresponding to 160 samples of speech at an 8 kHz sampling frequency. At the transmitting station, each speech frame is analysed by a speech encoder to extract a set of coding parameters for transmission to the receiving station. At the receiving station, a decoder produces synthesised speech frames based on the received parameters. A typical set of extracted coding parameters includes spectral parameters (known as LPC parameters) used in short term prediction of the signal, parameters used for long term prediction (known as LTP parameters) of the signal, various gain parameters, excitation parameters, and codebook vectors.
FIG. 1
shows schematically the encoder of a so-called CELP codec (substantially identical CELP codecs are provided at both the mobile stations and at the base controller stations). Each frame of a received sampled speech signal s(n), where n indicates the sample number, is first analysed by a short term prediction unit
1
to determine the LPC parameters for the frame. These parameters are supplied to a multiplexer
2
which combines the coding parameters for transmission over the air-interface. The residual signal r(n) from the short term prediction unit
1
, i.e. the speech frame after removal of the short term redundancy, is then supplied to a long term prediction unit
3
which determines the LTP parameters. These parameters are in turn provided to the multiplexer
2
.
The encoder comprises a LTP synthesis filter
4
and a LPC synthesis filter
5
which receive respectively the LTP and LPC parameters. These filters introduce the short term and long term redundancies into a signal c(n), produced using a codebook
6
, to generate a synthesised speech signal ss(n). The synthesised speech signal is compared at a comparator
7
with the actual speech signal s(n), frame by frame, to produce an error signal e(n). After weighting the error signal with a weighting filter
8
(which emphasises the ‘formants’ of the signal in a known manner), the signal is applied to a codebook search unit
9
. The search unit
9
conducts a search of the codebook
6
for each frame in order to identify that entry in the codebook which most closely matches (after LTP and LPC filtering and multiplication by a gain g at a multiplier
10
) the actual speech frame, i.e. to determine the signal c(n) which minimises the error signal e(n). The vector identifying the best matching entry is provided to the multiplexer
2
for transmission over the air-interface as part of an encoded speech signal t(n).
FIG. 2
shows schematically a decoder of a CELP codec. The received encoded signal t(n) is demultiplexed by a demultiplexer
11
into the separate coding parameters. The codebook vectors are applied to a codebook
12
, identical to the codebook
6
at the encoder, to extract a stream of codebook entries c(n). The signal c(n) is then multiplied by the received gain g at a multiplier
13
before applying the signal to a LTP synthesis filter
14
and a LPC synthesis filter
15
arranged in series. The LTP and LPC filters receive the associated parameters from the transmission channel and reintroduce the short and long term redundancies into the signal to produce, at the output, a synthesised speech signal ss(n).
The LTP parameters include the so called pitch-lag parameter which describes the fundamental frequency of the speech signal. The determination of the pitch-lag for a current frame of the residual signal is carried out in two stages. Firstly, an open-loop search is conducted, involving a relatively coarse search of the residual signal, subject to a predefined maximum and minimum delay, for a portion of the signal which best matches the current frame. A closed-loop search is then conducted over the already synthesised signal. The closed-loop search is conducted over a small range of delays in the neighbourhood of the open-loop estimate of pitch-lag. It is important to note that if a mistake is made in the open-loop search, the mistake cannot be corrected in the closed-loop search.
In early known codecs, the open-loop LTP analysis determines the pitch-lag for a given frame of the residual signal by determining the autocorrelation function of the frame within the residual speech signal, i.e.:
R
^
⁡
(
d
)
=
∑
n
=
0
N
-
1
⁢
r
⁡
(
n
-
d
)
⁢
r
⁡
(
n
)
⁢
⁢
d
=
d
L
,
…
⁢
,
d
H
where d is the delay, r(n) is the residual signal, and d
L
and d
H
are the delay search limits. N is the length of the frame. The pitch-lag d
p1
can then be identified as the delay d
max
which corresponds to the maximum of the autocorrelation function {circumflex over (R)}(d). This is illustrated in FIG.
3
.
In such codecs however, there is a possibility that the maximum of the autocorrelation function corresponds to a multiple or sub-multiple of the pitch-lag and that the estimated pitch-lag will therefore not be correct. EP0628947 addresses this problem by applying a weighting function w(d) to the autocorrelation function {circumflex over (R)}(d), i.e.
R
^
w
⁡
(
d
)
=
w
⁡
(
d
)
⁢
∑
n
=
0
N
-
1
⁢
r
⁡
(
n
-
d
)
⁢
r
⁡
(
n
)
where the weighting function has the following form:
w
(
d
)=
d
log
2
K
K is a tuning parameter which is set at a value low enough to reduce the probability of obtaining a maximum for {circumflex over (R)}
w
(d) at a multiple of the pitch-lag but at the same time high enough to exclude sub-multiples of the pitch-lag.
EP0628947 also proposes taking into account pitch lags determined for previous frames in determining the pitch lag for a current frame. More particularly, frames are classified as either ‘voiced’ or ‘unvoiced’ and, for a current frame, a search is conducted for the maximum in the neighbourhood of the pitch lag determined for the most recent voiced frame. If the overall maximum of {circumflex over (R)}
w
(d) lies outside of this neighbourhood, and does not exceed the maximum within the neighbourhood by a predetermined factor (3/2), then the neighbourhood maximum is identified as corresponding to the pitch lag. In this way, continuity in the pitch lag estimate is maintained, reducing the possibility of spurious changes in pitch-lag.
SUMMARY OF THE INVENTION
According to a first aspect of the present invention there is provided a method of speech coding a sampled signal using a pitch-lag parameter for each of a series of frames of the signal, the method comprising for each frame:
determining the autocorrelation function for the frame within the signal, between predefined maximum and minimum delays;
weighting the autocorrelation function to emphasise the function for delays in the neighbourhood of the pitch-lag parameter determined for a previous frame; and
identifying the delay corresponding to the maximum of the weighted autocorrelation function as the pitch-lag parameter for the frame.
Preferably, said sampled signal is a residual signal which is obtained from an audio signal by substantially removing short term redundancy from the audio sig
Haavisto Petri
Lakaniemi Ari
Ojala Pasi
Vainio Janne
Hudspeth David
Nokia Mobile Phones Limited
Perman & Green LLP
Zintel Harold
LandOfFree
Pitch-lag estimation in speech coding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Pitch-lag estimation in speech coding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Pitch-lag estimation in speech coding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2530830