Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-02-12
2001-07-10
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S203000, C704S201000
Reexamination Certificate
active
06260009
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to code-excited linear prediction (CELP) speech processing. Specifically, the present invention relates to translating digital speech packets from one CELP format to another CELP format.
2. Related Art
Transmission of voice by digital techniques has become widespread, particularly in long distance and digital radio telephone applications. This, in turn, has created interest in determining the least amount of information which can be sent over the channel while maintaining the perceived quality of the reconstructed speech. If speech is transmitted by simply sampling and digitizing, a data rate on the order of 64 kilobits per second (kbps) is required to achieve a speech quality of a conventional analog telephone. However, through the use of speech analysis, followed by the appropriate coding, transmission, and resynthesis at the receiver, a significant reduction in the data rate can be achieved.
Devices which employ techniques to compress voiced speech by extracting parameters that relate to a model of human speech generation are typically called vocoders. Such devices are composed of an encoder, which analyzes the incoming speech to extract the relevant parameters, and a decoder, which resynthesizes the speech using the parameters which it receives over a channel, such as a transmission channel. The speech is divided into blocks of time, or analysis subframes, during which the parameters are calculated. The parameters are then updated for each new subframe.
Linear-prediction-based time domain coders are by far the most popular type of speech coder in use today. These techniques extract the correlation from the input speech samples over a number of past samples and encode only the uncorrelated part of the signal. The basic linear predictive filter used in this technique predicts the current sample as a linear combination of the past samples. An example of a coding algorithm of this particular class is described in the paper “A 4.8 kbps Code Excited Linear Predictive Coder” by Thomas E. Tremain et al., Proceedings of the Mobile Satellite Conference, 1988.
The function of the vocoder is to compress the digitized speech signal into a low bit rate signal by removing all of the natural redundancies inherent in speech. Speech typically has short term redundancies due primarily to the filtering operation of the lips and tongue, and long term redundancies due to the vibration of the vocal cords. In a CELP coder, these operations are modeled by two filters, a short-term formant filter and a long-term pitch filter. Once these redundancies are removed, the resulting residual signal can be modeled as white gaussian noise, which is also encoded.
The basis of this technique is to compute the parameters of two digital filters. One filter, called the formant filter (also known as the “LPC (linear prediction coefficients) filter”), performs short-term prediction of the speech waveform. The other filter, called the pitch filter, performs long-term prediction of the speech waveform. Finally, these filters must be excited, and this is done by determining which one of a number of random excitation waveforms in a codebook results in the closest approximation to the original speech when the waveform excites the two filters mentioned above. Thus the transmitted parameters relate to three items (1) the LPC filter, (2) the pitch filter and (3) the codebook excitation.
Digital speech coding can be broken in two parts; encoding and decoding, sometimes known as analysis and synthesis.
FIG. 1
is a block diagram of a system
100
for digitally encoding, transmitting and decoding speech. The system includes a coder
102
, a channel
104
, and a decoder
106
. Channel
104
can be a communications channel, storage medium, or the like. Coder
102
receives digitized input speech, extracts the parameters describing the features of the speech, and quantizes these parameters into a source bit stream that is sent to channel
104
. Decoder
106
receives the bit stream from channel
104
and reconstructs the output speech waveform using the quantized features in the received bit stream.
Many different formats of CELP coding are in use today. In order to successfully decode a CELP-coded speech signal, the decoder
106
must employ the same CELP coding model (also referred to as “format”) as the encoder
102
that produced the signal. When communications systems employing different CELP formats must share speech data, it is often desirable to convert the speech signal from one CELP coding format to another.
One conventional approach to this conversion is known as “tandem coding.”
FIG. 2
is a block diagram of a tandem coding system
200
for converting from an input CELP format to an output CELP format. The system includes an input CELP format decoder
206
and an output CELP format encoder
202
. Input format CELP decoder
206
receives a speech signal (referred to hereinafter as the “input” signal) that has been encoded using one CELP format (referred to hereinafter as the “input” format). Decoder
206
decodes the input signal to produce a speech signal. Output CELP format encoder
202
receives the decoded speech signal and encodes it using the output CELP format (referred to hereinafter as the “output” format) to produce an output signal in the output format. The primary disadvantage of this approach is the perceptual degradation experienced by the speech signal in passing through multiple encoders and decoders.
SUMMARY OF THE INVENTION
The present invention is a method and apparatus for CELP-based to CELP-based vocoder packet translation. The apparatus includes a formant parameter translator that translates input formant filter coefficients for a speech packet from an input CELP format to an output CELP format to produce output formant filter coefficients and an excitation parameter translator that translates input pitch and codebook parameters corresponding to the speech packet from the input CELP format to the output CELP format to produce output pitch and codebook parameters. The formant parameter translator includes a model order converter that converts the model order of the input formant filter coefficients from the model order of the input CELP format to the model order of the output CELP format and a time base converter that converts the time base of the input formant filter coefficients from the time base of the input CELP format to the time base of the output CELP format.
The method includes the steps of translating the formant filter coefficients of the input packet from the input CELP format to the output CELP format and translating the pitch and codebook parameters of the input speech packet from the input CELP format to the output CELP format. The step of translating the formant filter coefficients includes the steps of translating the formant filter coefficients from input CELP format to a reflection coefficient CELP format, converting the model order of the reflection coefficients from the model order of the input CELP format to the model order of the output CELP format, translating the resulting coefficients to a line spectral pair (LSP) CELP format, converting the time base of the resulting coefficients from the input CELP format time base to the output CELP format time base, and translate the resulting coefficients from LSP format to the output CELP format to produce output formant filter coefficients. The step of translating the pitch and codebook parameters includes the steps of synthesizing speech using the input pitch and codebook parameters to produce a target signal and searching for the output pitch and codebook parameters using the target signal and the output formant filter coefficients.
An advantage of the present invention is that it eliminates the degradation in perceptual speech quality normally induced by tandem coding translation.
REFERENCES:
patent: 5414796 (1995-05-01), Jacobs et al.
patent: 5497396 (1996-03-01), Delprat
patent: 5995923 (1999-11-01), Mermelstein et al.
patent: 6014622 (20
Dorvil Richemond
Nolan Daniel A.
Qualcomm Incorporated
Rouse Thomas R.
Wadsworth Philip R.
LandOfFree
CELP-based to CELP-based vocoder packet translation does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with CELP-based to CELP-based vocoder packet translation, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and CELP-based to CELP-based vocoder packet translation will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2471092