Concealment of frame erasures and method

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S226000, C704S205000, C704S215000, C704S264000, C704S225000, C714S752000

Reexamination Certificate

active

06826527

ABSTRACT:

BACKGROUND OF THE INVENTION
The invention relates to electronic devices, and more particularly to speech coding, transmission, storage, and decoding/synthesis methods and circuitry.
The performance of digital speech systems using low bit rates has become increasingly important with current and foreseeable digital communications. Both dedicated channel and packetized-over-network(e.g., Voice over IP or Voice over Packet) transmissions benefit from compression of speech signals. The widely-used linear prediction (LP) digital speech coding compression method models the vocal tract as a time-varying filter and a time-varying excitation of the filter to mimic human speech. Linear prediction analysis determines LP coefficients a
i
, i=1, 2, . . . , M, for an input frame of digital speech samples {s(n)} by setting
r
(
n
)=
s
(
n
)+&Sgr;
M≧i≧1
a
i
s
(
n−i
)  (1)
and minimizing the energy &Sgr;r(n)
2
of the residual r(n) in the frame. Typically, M, the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network sampling for digital transmission); and the number of samples {s(n)} in a frame is typically 80 or 160 (10 or 20 ms frames). A frame of samples may be generated by various windowing operations applied to the input speech samples. The name “linear prediction” arises from the interpretation of r(n)=s(n)+&Sgr;
M≧i≧1
a
i
s(n−i) as the error in predicting s(n) by the linear combination of preceding speech samples −&Sgr;
M≧i≧1
a
i
s(n-i). Thus minimizing &Sgr;r(n)
2
yields the {a
i
} which furnish the best linear prediction for the frame. The coefficients {a
i
} may be converted to line spectral frequencies (LSFs) for quantization and transmission or storage and converted to line spectral pairs (LSPs) for interpolation between subframes.
The {r(n)} is the LP residual for the frame, and ideally the LP residual would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of equation (1). Of course, the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an excitation which emulates the LP residual from the encoded parameters. Physiologically, for voiced frames the excitation roughly has the form of a series of pulses at the pitch frequency, and for unvoiced frames the excitation roughly has the form of white noise.
The LP compression approach basically only transmits/stores updates for the (quantized) filter coefficients, the (quantized) residual (waveform or parameters such as pitch), and (quantized) gain(s). A receiver decodes the transmitted/stored items and regenerates the input speech with the same perceptual characteristics.
FIGS. 5-6
illustrate high level blocks of an LP system. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP coder can operate at bits rates as low as 2-3 kb/s (kilobits per second).
However, high error rates in wireless transmission and large packet losses/delays for network transmissions demand that an LP decoder handle frames in which so many bits are corrupted that the frame is ignored (erased). To maintain speech quality and intelligibility for wireless or voice-over-packet applications in the case of erased frames, the decoder typically has methods to conceal such frame erasures, and such methods may be categorized as either interpolation-based or repetition-based. An interpolation-based concealment method exploits both future and past frame parameters to interpolate missing parameters. In general, interpolation-based methods provide better approximation of speech signals in missing frames than repetition-based methods which exploit only past frame parameters. In applications like wireless communications, the interpolation-based method has a cost of an additional delay to acquire the future frame. In Voice over Packet communications future frames are available from a playout buffer which compensates for arrival jitter of packets, and interpolation-based methods mainly increase the size of the playout buffer. Repetition-based concealment, which simply repeats or modifies the past frame parameters, finds use in several CELP-based speech coders including G.729, G.723.1 and GSM-EFR. The repetition-based concealment method in these coders does not introduce any additional delay or playout buffer size, but the performance of reconstructed speech with erased frames is poorer than that of the interpolation-based approach, especially in a high erased-frame ratio or bursty frame erasure environment.
In more detail, the ITU standard G.729 uses frames of 10 ms length (80 samples) divided into two 5-ms 40-sample subframes for better tracking of pitch and gain parameters plus reduced codebook search complexity. Each subframe has an excitation represented by an adaptive-codebook contribution and a fixed (algebraic) codebook contribution. The adaptive-codebook contribution provides periodicity in the excitation and is the product of v(n), the prior frame's excitation translated by the current frame's pitch lag in time and interpolated, multiplied by a gain, g
P
. The algebraic codebook contribution approximates the difference between the actual residual and the adaptive codebook contribution with a four-pulse vector, c(n), multiplied by a gain, g
C
. Thus the excitation is u(n)=g
P
v(n)+g
C
c(n) where v(n) comes from the prior (decoded) frame and g
P
, g
C
, and c(n) come from the transmitted parameters for the current frame.
FIGS. 3-4
illustrate the encoding and decoding in block format; the postfilter essentially emphasizes any periodicity (e.g., vowels).
G.729 handles frame erasures by reconstruction based on previously received information; that is, repetition-based concealment. Namely, replace the missing excitation signal with one of similar characteristics, while gradually decaying its energy by using a voicing classifier based on the long-term prediction gain (which is computed as part of the long-term postfilter analysis). The long-term postfilter finds the long-term predictor for which the prediction gain is more than 3 dB by using a normalized correlation greater than 0.5 in the optimal delay determination. For the error concealment process, a 10 ms frame is declared periodic if at least one 5 ms subframe has a long-term prediction gain of more than 3 dB. Otherwise the frame is declared nonperiodic. An erased frame inherits its class from the preceding (reconstructed) speech frame. Note that the voicing classification is continuously updated based on this reconstructed speech signal. The specific steps taken for an erased frame are as follows:
1) repetition of the synthesis filter parameters. The LP parameters of the last good frame are used.
2) attenuation of adaptive and fixed-codebook gains. The adaptive-codebook gain is based on an attenuated version of the previous adaptive-codebook gain: if the (m+1)
st
frame is erased, use g
P
(m+1)
=0.9 g
P
(m)
. Similarly, the fixed-codebook gain is based on an attenuated version of the pervious fixed-codebook gain: g
C
(m+1)
=0.98 g
C
(m)
.
3) attenuation of the memory of the gain predictor. The gain predictor for the fixed-codebook gain uses the energy of the previously selected algebraic codebook vectors c(n), so to avoid transitional effects once good frames are received, the memory of the gain predictor is updated with an attenuated version of the average codebook energy over four prior frames.
4) generation of the replacement excitation. The excitation used depends upon the periodicity classification. If the last reconstructed frame was classified as periodic, the current frame is considered to be periodic as well. In that case only the adaptive codebook contribution is used, and the fixed-codebook cont

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Concealment of frame erasures and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Concealment of frame erasures and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Concealment of frame erasures and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3349225

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.