Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
2000-05-11
2004-06-29
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S258000, C704S219000, C704S220000
Reexamination Certificate
active
06757654
ABSTRACT:
BACKGROUND
The present invention relates to a system and method for performing forward error correction in the transmission of audio information, and more particularly, to a system and method for performing forward error correction in packet-based transmission of speech-coded information.
1. Speech Coding
The shortcomings of state-of-the-art forward error correction (FEC) techniques can best be appreciated by an introductory discussion of some conventional speech coding concepts.
1.1 Code-Excited Linear Predictive (CELP) Coding
FIG. 1
shows a conventional code-excited linear predictive (CELP) analysis-by-synthesis encoder
100
. The encoder
100
includes functional units designated as framing module
104
, linear prediction coding (LPC) analysis module
106
, difference calculating module
118
, error weighting module
114
, error minimization module
116
, and decoder module
102
. The decoder module
102
, in turn, includes a fixed codebook
112
, a long-term predictor (LTP) filter
110
, and a linear predictor coding (LPC) filter
108
connected together in cascaded relationship to produce a synthesized signal ŝ(n). The LPC filter
108
models the short-term correlation in the speech attributed to the vocal tracts, corresponding to the spectral envelope of the speech signal. It is be represented by:
1
/
A
⁡
(
z
)
=
1
/
(
1
-
∑
i
=
1
p
⁢
⁢
a
i
⁢
z
-
i
)
,
(
Eq
.
⁢
1
)
where p denotes the filter order and a
i
denotes the filter coefficients. The LTP filter
110
, on the other hand, models the long-term correlation of the speech attributed to the vocal cords, corresponding to the fine periodic-like spectral structure of the speech signal. For example, it can have the form given by:
1
/
P
⁡
(
z
)
=
1
/
(
1
-
∑
i
=
-
1
1
⁢
⁢
b
i
⁢
z
-
(
D
+
i
)
)
,
(
Eq
.
⁢
2
)
where D generally corresponds to the pitch period of the long-term correlation, and b
i
pertains to the filter's long-term gain coefficients. The fixed codebook
112
stores a series of excitation input sequences. The sequences provide excitation signals to the LTP filter
110
and LPC filter
108
, and are useful in modeling characteristics of the speech signal which cannot be predicted with deterministic methods using the LTP filter
110
and LPC filter
108
, such as audio components within music, to some degree.
In operation, the framing module
104
receives an input speech signal and divides it into successive frames (e.g., 20 ms in duration). Then, the LPC analysis module
106
receives and analyzes a frame to generate a set of LPC coefficients. These coefficients are used by the LPC filter
108
to model the short-term characteristics of the speech signal corresponding to its spectral envelope. An LPC residual can then be formed by feeding the input speech signal through an inverse filter including the calculated LPC coefficients. This residual, shown in
FIG. 2
, represents a component of the original speech signal that remains after removal of the short-term redundancy by linear predictive analysis. The distance between two pitch pulses is denoted “L” and is called the lag. The encoder
100
can then use the residual to predict the long-term coefficients. These long-term coefficients are used by the LTP filter
110
to model the fine spectral structure of the speech signal (such as pitch delay and pitch gain). Taken together, the LTP filter
110
and the LPC filter
108
form a cascaded filter which models the long-term and short-term characteristics of the speech signal. When driven by an excitation sequence from the fixed codebook
112
, the cascaded filter generates the synthetic speech signal ŝ(n) which represents a reconstructed version of the original speech signal s(n).
The encoder
100
selects an optimum excitation sequence by successively generating a series of synthetic speech signals ŝ(n), successively comparing the synthetic speech signals ŝ(n) with the original speech signals s(n), and successively adjusting the operational parameters of the decoder module
102
to minimize the difference between ŝ(n) and s(n). More specifically, the difference calculating module
118
forms the difference (i.e., the error signal e(n)) between the original speech signal s(n) and the synthetic speech signal ŝ(n). An error weighting module
114
receives the error signal e(n) and generates a weighted error signal e
w
(n) based on perceptual weighting factors. The error minimization module
116
uses a search procedure to adjust the operational parameters of the speech decoder
102
such that it produces a synthesized signal ŝ(n) which is closest to the original signal s(n) as possible.
Upon arriving at an optimum synthesized signal ŝ(n), relevant encoder parameters are transferred over a transmission medium (not shown) to a decoder site (not shown). A decoder at the decoder site includes an identical construction to the decoder module
102
of the encoder
100
. The decoder uses the transferred parameters to reproduce the optimized synthesized signal ŝ(n) calculated in the encoder
100
. For instance, the encoder
100
can transfer codebook indices representing the location of the optimal excitation signal in the fixed codebook
112
, together with relevant filter parameters or coefficients (e.g., the LPC and LTP parameters). The transfer of the parameters in lieu of a more direct representation of the input speech signal provides notable reduction in the bandwidth required to transmit speech information.
FIG. 3
shows a modification of the analysis-by-synthesis encoder
100
shown in FIG.
1
. The encoder
300
shown in
FIG. 3
includes a framing module
304
, LPC analysis module
306
, LPC filter
308
, difference calculating module
318
, error weighting module
314
, error minimization module
316
, and fixed codebook
312
. Each of these units generally corresponds to the like-named parts shown in FIG.
1
. In
FIG. 3
, however, the LTP filter
110
is replaced by the adaptive codebook
320
. Further, an adder module
322
adds the excitation signals output from the adaptive codebook
320
and the fixed codebook
312
.
The encoder
300
functions basically in the same manner as the encoder
100
of FIG.
1
. In the encoder
300
, however, the adaptive codebook
320
models the long-term characteristics of the speech signal. Further, the excitation signal applied to the LPC filter
308
represents a summation of an adaptive codebook
320
entry and a fixed codebook
312
entry.
1.2 GSM Enhanced Full Rate Coding (GSM-EFR)
The prior art provides numerous specific implementations of the above-described CELP design. One such implementation is the GSM Enhanced Full Rate (GSM-EFR) speech transcoding standard described in the European Telecommunication Standard Institute's (ETSI) “Global System for Mobile Communications: Digital Cellular Telecommunications Systems: Enhanced full Rate (EFR) Speech Transcoding (GSM 06.60),” November 1996, which is incorporated by reference herein in its entirety.
The GSM-EFR standard models the short-term properties of the speech signal using:
H
⁡
(
z
)
=
1
/
A
^
⁡
(
z
)
=
1
/
(
1
+
∑
i
=
1
m
⁢
⁢
a
^
i
⁢
z
-
i
)
,
(
Eq
.
⁢
3
)
where â
i
represents the quantified linear prediction parameters. The standard models the long-term features of the speech signal with:
1
/B
(
z
)=1/(1
−g
p
z
−T
) (Eq. 4),
where T pertains to the pitch delay and g
p
pertains to the pitch gain. An adaptive codebook implements the pitch synthesis. Further, the GSM-EFR standard uses a perceptual weighting filter defined by:
W
(
z
)=(
A
(
z
/&ggr;
1
))/(
A
(
z
/&ggr;
2
)) (Eq. 5),
where A(z) defines the unquantized LPC filter, and &ggr;
1
and &ggr;
2
represent perceptual weighting factors. Finally, the GSM-EFR standard uses adaptive and fixed (innovative) codebooks to provide an excitation signal. In particular, the fixed codebook forms an algebraic codebook structured based on an interleaved sin
Nohlgren Anders
Sundqvist Jim
Svedberg Jonas
Uvliden Anders
Westerlund Magnus
Dorvil Richemond
Patel Kinari
Telefonaktiebolaget LM Ericsson
LandOfFree
Forward error correction in speech coding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Forward error correction in speech coding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Forward error correction in speech coding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3353586