Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1998-09-01
2001-02-20
Zele, Krista (Department: 2748)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S219000
Reexamination Certificate
active
06192335
ABSTRACT:
FIELD OF THE INVENTION
The invention relates generally to speech coding and, more particularly, to improved coding criteria for accommodating noise-like signals at lowered bit rates.
BACKGROUND OF THE INVENTION
Most modern speech coders are based on some form of model for generation of the coded speech signal. The parameters and signals of the model are quantized and information describing them is transmitted on the channel. The dominant coder model in cellular telephony applications is the Code Excited Linear Prediction (CELP) technology.
A conventional CELP decoder is depicted in FIG.
1
. The coded speech is generated by an excitation signal fed through an all-pole synthesis filter with a typical order of 10. The excitation signal is formed as a sum of two signals ca and cf, which are picked from respective codebooks (one fixed and one adaptive) and subsequently multiplied by suitable gain factors ga and gf. The codebook signals are typically of length 5 ms (a subframe) whereas the synthesis filter is typically updated every 20 ms (a frame). The parameters associated with the CELP model are the synthesis filter coefficients, the codebook entries and the gain factors.
In
FIG. 2
, a conventional CELP encoder is depicted. A replica of the CELP decoder (
FIG. 1
) is used to generate candidate coded signals for each subframe. The coded signal is compared to the uncoded (digitized) signal at 21 and a weighted error signal is used to control the encoding process. The synthesis filter is determined using linear prediction (LP). This conventional encoding procedure is referred to as linear prediction analysis-by synthesis (LPAS).
As understood from the description above, LPAS coders employ waveform matching in a weighted speech domain, i.e., the error signal is filtered with a weighting filter. This can be expressed as minimizing the following squared error criterion:
D
W
=∥S
W
−CS
W
∥
2
=∥W·S−W·H·
(
ga·ca+gf·cf
)∥
2
(Eq. 1)
where S is the vector containing one subframe of uncoded speech samples, S
W
represents S multiplied by the weighting filter W, ca and cf are the code vectors from the adaptive and fixed codebooks respectively, W is a matrix performing the weighting filter operation, H is a matrix performing the synthesis filter operation, and CS
W
is the coded signal multiplied by the weighting filter W. Conventionally, the encoding operation for minimizing the criterion of Equation 1 is performed according to the following steps:
Step 1. Compute the synthesis filter by linear prediction and quantize the filter coefficients. The weighting filter is computed from the linear prediction filter coefficients.
Step 2. The code vector ca is found by searching the adaptive codebook to minimize D
W
of Equation 1 assuming that gf is zero and that ga is equal to the optimal value. Because each code vector ca has conventionally associated therewith an optimal value of ga, the search is done by inserting each code vector ca into Equation 1 along with its associated optimal ga value.
Step 3. The code vector cf is found by searching the fixed codebook to minimize D
W
, using the code vector ca and gain ga found in step 2. The fixed gain gf is assumed equal to the optimal value.
Step 4. The gain factors ga and gf are quantized. Note that ga can be quantized after step 2 if scalar quantizers are used.
The waveform matching procedure described above is known to work well, at least for bit rates of say 8 kb/s or more. However, when lowering the bit rate, the ability to do waveform matching of non-periodic, noise-like signals such as unvoiced speech and background noise suffers. For voiced speech segments, the waveform matching criterion still performs well, but the poor waveform matching ability for noise-like signals leads to a coded signal with an often too low level and an annoying varying character (known as swirling).
For noise-like signals, it is well known in the art that it is better to match the spectral character of the signal and have a good signal level (gain) matching. Since the linear prediction synthesis filter provides the spectral character of the signal, an alternative criterion to Equation 1 above can be used for noise-like signals:
D
E
=({square root over (
E
S
+L )}−
{square root over (E
CS
+L )})
2
(Eq. 2)
where E
S
is the energy of the uncoded speech signal and E
CS
is the energy of the coded signal CS=H·(ga·ca+gf·cf). Equation 2 implies energy matching as opposed to waveform matching in Equation 1. This criterion can also be used in the weighted speech domain by including the weighting filter W. Note that the square root operations are included in Equation 2 only to have a criterion in the same domain as Equation 1; this is not necessary and is not a restriction. There are also other possible energy-matching criteria such as D
E
=|E
S
−E
CS
|.
The criterion can also be formulated in the residual domain as follows:
D
E
=({square root over (
E
r
+L )}−{square root over (
E
x
+L )})
2
(Eq. 3)
where E
r
is the energy of the residual signal r obtained by filtering S through the inverse (H
−1
) of the synthesis filter, and E
x
is the energy of the excitation signal given by x=ga·ca+gf·cf.
The different criteria above have been employed in conventional multi-mode coding where different coding modes (e.g., energy matching) have been used for unvoiced speech and background noise. In these modes, energy matching criteria as in Equations 2 and 3 have been used. A drawback with this approach is the need for mode decision, for example, choosing waveform matching mode (Equation 1) for voiced speech and choosing energy matching mode (Equations 2 or 3) for noise-like signals like unvoiced speech and background noise. The mode decision is sensitive and causes annoying artifacts when wrong. Also, the drastic change of coding strategy between modes can cause unwanted sounds.
It is therefore desirable to provide improved coding of noise-like signals at lowered bit rates without the aforementioned disadvantages of multi-mode coding.
The present invention advantageously combines waveform matching and energy matching criteria to improve the coding of noise-like signals at lowered bit rates without the disadvantages of multi-mode coding.
REFERENCES:
patent: 4969193 (1990-11-01), Scott et al.
patent: 5060269 (1991-10-01), Zinser
patent: 5517595 (1996-05-01), Kleijn
patent: 5602959 (1997-02-01), Bergstrom et al.
patent: 5649051 (1997-07-01), Rothweler et al.
patent: 5657418 (1997-08-01), Gerson et al.
patent: 5668925 (1997-09-01), Rothweiler et al.
patent: 5715365 (1998-02-01), Griffin et al.
patent: 5742930 (1998-04-01), Howitt
patent: 5794186 (1998-08-01), Bergstrom et al.
patent: 5812965 (1998-09-01), Massaloux
patent: 5819224 (1998-10-01), Xydeas
patent: 5826222 (1998-10-01), Griffin
patent: 5899968 (1999-05-01), Navarro et al.
patent: 5963898 (1999-10-01), Navarro et al.
patent: 5974377 (1999-10-01), Navarro et al.
patent: 6012023 (2000-01-01), Iijami et al.
patent: 0523979 (1993-01-01), None
patent: 0768770 (1997-04-01), None
patent: 0852376 (1998-07-01), None
patent: 9425959 (1994-11-01), None
1997 IEEE, Corporate Research, Texas Instruments, Dallas, TX, “A Variable-Rate Multimodal Speech Coder With Gain-Matched Analysis-By-Synthesis”, Erdal Paksoy et al., pp. 751-754.
IEEE Journal on Selected Areas Communications, vol. 10, No. 5, Jun. 1992, “Techniques for Improving the Performance of CELP-Type Speech Coders”, Ira A. Gerson et al., pp. 858-862.
European Telecommunication Standard, Global System for Mobile Communications, Digital Cellular Telecommunications System (Phase 2); Half Rate Speech: Part 2: Half Rate Speech Transcoding (GSM 06.20 version 4.3.0); Dec. 1997.
Prentice-Hall 1978, Engleood Cliffs, US, “Digital Processing of Speech Signals”, Rabiner et al., pp. 158-161, XP002084303.
Ekudden Erik
Hagen Roar
Jenkens & Gilchrist
Opsasnick Michael N.
Telefonaktieboiaget LM Ericsson (publ)
Zele Krista
LandOfFree
Adaptive combining of multi-mode coding for voiced speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Adaptive combining of multi-mode coding for voiced speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Adaptive combining of multi-mode coding for voiced speech... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2606041