Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-02-15
2003-03-25
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S223000
Reexamination Certificate
active
06539349
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to voice compression, and in particular, to code excited linear prediction (CELP) vocoding.
A voice encoder/decoder (vocoder) compresses speech signals in order to reduce the transmission bandwidth required in a communications channel. By reducing the transmission bandwidth required per call, it is possible to increase the number of calls over the same communication channel. Early speech coding techniques, such as the linear predictive coding (LPC) technique, use a filter to remove the signal redundancy and hence compress the speech signal. The LPC filter reproduces a spectral envelope that attempts to model the human voice. Furthermore, the LPC filter is excited by receiving quasi periodic inputs for nasal and vowel sounds, while receiving noise-like inputs for unvoiced sounds.
There exists a class of vocoders known as code excited linear prediction (CELP) vocoders. CELP vocoding is primarily a speech data compression technique that at 4-8 kbps can achieve speech quality comparable to other 32 kbps speech coding techniques. The CELP vocoder has two improvements over the earlier LPC techniques. First, the CELP vocoder attempts to capture more voice detail by extracting the pitch information using a pitch predictor. Secondly, the CELP vocoder excites the LPC filter with a noise like signal derived from a residual signal created from the actual speech waveform.
CELP vocoders contain three main components; 1) short term predictive filter, 2) long term predictive filter, also known as pitch predictor or adaptive codebook, and 3) fixed codebook. Compression is achieved by assigning a certain number of bits to each component which is less than the number of bits used to represent the original speech signal. The first component uses linear prediction to remove short term redundancies in the speech signal. The error, or residual, signal that results from the short term predictor becomes the target signal for the long term predictor.
Voiced speech has a quasi-periodic nature and the long term predictor extracts a pitch period from the residual and removes the information that can be predicted from the previous period. After the long term and short term filters, the residual signal is a mostly noise-like signal. Using analysis-by-synthesis, the fixed codebook search finds a best match to replace the noise-like residual with an entry from its library of vectors. The code representing the best matching vector is transmitted in place of the noisy residual. In algebraic CELP (ACELP) vocoders, the fixed codebook consists of a few non-zero pulses and is represented by the locations and signs (e.g. +1 or −1) of the pulses.
In a typical implementation, a CELP vocoder will block or divide the incoming speech signal into frames, updating the short term predictor's LPC coefficients once per frame. The LPC residual is then divided into subframes for the long term predictor and the fixed codebook search. For example, the input speech may be blocked into a 160 sample frame for the short term predictor. The resulting residual may then be broken up into subframes of 53 samples, 53 samples, and 54 samples. Each subframe is then processed by the long term predictor and the fixed codebook search.
Referring to
FIG. 1
, an example of a single frame of a speech signal
100
is shown. The speech signal
100
is made up of voiced and unvoiced signals of different pitches. The speech signal
100
is received by a CELP vocoder having an LPC filter. The first step of the CELP vocoder is to remove short term redundancies in the speech signal. The resulting signal with the short term redundancies removed is the residual speech signal
200
, FIG.
2
.
The LPC filter is unable to remove all of the redundant information and the remaining quasi-periodic peeks and valleys in the filtered speech signal
200
are referred to as pitch pulses. The short term predictive filter is then applied to speech signal
200
resulting in the short term filtered signal
300
, FIG.
3
. The long term predictor filter removes the quasi-periodic pitch pulses from the residual speech signal
300
,
FIG. 3
, resulting in a mostly noise-like signal
400
,
FIG. 4
, which becomes the target signal for the fixed codebook search.
FIG. 4
is a plot of a 160 sample frame of a fixed codebook target signal
350
divided into three subframes
354
,
356
,
358
. The code value is then transmitted across the communication network.
In
FIG. 5
, the lookup table
400
maps the position of the pulses in a subframe is shown. The pulses within the subframe are constrained to lie in one of sixteen possible positions
402
within the lookup table. Because each track
404
has sixteen possible positions
402
, only four bits are required to identify each pulse location. Each pulse mapping occurs in an individual track
404
. Therefore, two tracks
406
,
408
are required to represent positions of two pulses in the subframe.
In the current example, the subframe
354
,
FIG. 4
, has only
53
samples in the excitation, making position
0
-
52
the only valid positions. Because of the way the tracks
406
,
408
,
FIG. 5
, are divided, the tracks
406
,
408
contain positions that exceed the length of the original excitation. Positions
56
and
60
in track
1
, and positions
57
and
61
in track
2
are invalid and unused. The location of the first two pulse
310
,
312
,
FIG. 4
, correspond to sample thirteen and sample seventeen. By using the table
400
,
FIG. 5
, it is determined that sample thirteen lies in position three
410
in the first track
406
. The second pulse is in sample seventeen and lies in second track
408
at position four
412
. Therefore, the pulses can be represented and transmitted as four bits each respectively. The other pulses
314
,
FIG. 4
,
316
,
318
,
320
and
322
in the subframe
354
are ignored because the code book has only two tracks.
The only pulse position constraint is provided by the pulse position in the tracks. Disadvantageously, the CELP vocoder tends to place pulses in adjacent positions in the tracks. By placing the pulses in adjacent positions in the tracks, the start of the speech sound is encoded rather than a more balance encoding of the utterance. Additionally, as the bit rate for the vocoder decreases and fewer pulses are used, the voice quality is adversely affected by inefficient placement of pulses into tracks. What is needed is a method of further constraint of the placement of pulses in tracks in order to achieve a more balance encoding of an utterance.
SUMMARY OF THE INVENTION
The inefficiency of track positions placement is eliminated by the implementation of additional constraints that restrict the valid placement of pulses in the pulse position tracks. Implementing additional constraints for constraining the placement of pulses in tracks during encoding of a signal results in an increase in the signal quality of the decoded signal.
REFERENCES:
patent: 4720865 (1988-01-01), Taguchi, Tetsu
patent: 5953697 (1999-09-01), Lin et al.
patent: 6260010 (2001-07-01), Gao et al.
patent: 0397628 (1990-11-01), None
patent: WO 0016501 (2000-03-01), None
Masami Akamine and Kimio Miseki; Video Systems & Technology Lab., Toshiba R&D Center Saiwai-ku, Kawasaki-shi, 210 Japan.
Dorvil Richemond
Grossman Patti & Brill
Lucent Technologies - Inc.
LandOfFree
Constraining pulse positions in CELP vocoding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Constraining pulse positions in CELP vocoding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Constraining pulse positions in CELP vocoding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3077359