Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-08-07
2004-04-27
To, Doris H. (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S223000
Reexamination Certificate
active
06728669
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to voice compression, and in particular, to code excited linear prediction (CELP) vocoding.
A voice encoder/decoder (vocoder) compresses speech signals in order to reduce the transmission bandwidth required in a communications channel. By reducing the transmission bandwidth required per call, it is possible to increase the number of calls over the same communication channel. Early speech coding techniques, such as the linear predictive coding (LPC) technique, use a filter to remove the signal redundancy and hence compress the speech signal. The LPC filter reproduces a spectral envelope that attempts to model the human voice. Furthermore, the LPC filter is excited by receiving quasi periodic inputs for nasal and vowel sounds, while receiving noise-like inputs for unvoiced sounds.
There exists a class of vocoders known as code excited linear prediction (CELP) vocoders. CELP vocoding is primarily a speech data compression technique that at 4-8 kbps can achieve speech quality comparable to other 32 kbps speech coding techniques. The CELP vocoder has two improvements over the earlier LPC techniques. First, the CELP vocoder attempts to capture more voice detail by extracting the pitch information using a pitch predictor. Secondly, the CELP vocoder excites the LPC filter with a noise like signal derived from a residual signal created from the actual speech waveform.
CELP vocoders contain three main components; 1) short term predictive filter, 2) long term predictive filter, also known as pitch predictor or adaptive codebook, and 3) fixed codebook. Compression is achieved by assigning a certain number of bits to each component which is less than the number of bits used to represent the original speech signal. The first component uses linear prediction to remove short term redundancies in the speech signal. The error, or residual, signal that results from the short term predictor becomes the target signal for the long term predictor.
Voiced speech has a quasi-periodic nature and the long term predictor extracts a pitch period from the residual and removes the information that can be predicted from the previous period. After the long term and short term predictive filters, the resulting residual signal is a mostly noise-like signal. Using analysis-by-synthesis, a fixed codebook search finds a best match to replace the noise-like residual with an entry from its library of vectors. The code representing the best matching vector is transmitted in place of the noisy residual. In algebraic CELP (ACELP) vocoders, the fixed codebook consists of a few non-zero pulses and is represented by the locations and signs (e.g. +1 or −1) of the pulses.
In a typical implementation, a CELP vocoder will block or divide the incoming speech signal into frames, updating the short term predictor's LPC coefficients once per frame. The LPC residual is then divided into subframes for the long term predictor and the fixed codebook search. For example, the input speech may be blocked into a
160
sample frame for the short term predictor. The resulting frame may then be broken up into subframes of 53 samples, 53 samples, and 54 samples. Each subframe is then processed by the long term predictor and the fixed codebook search.
Referring to
FIG. 1
, an example of a single frame of a speech signal
100
is shown. The speech signal
100
is made up of voiced and unvoiced signals of different pitches. The speech signal
100
is received by a CELP vocoder having an LPC filter. The first step of the CELP vocoder is to remove short term redundancies in the speech signal. The resulting signal with the short term redundancies removed is the residual speech signal
200
, FIG.
2
.
The LPC filter is unable to remove all of the redundant information and the remaining quasi-periodic peeks and valleys in the filtered speech signal
200
are referred to as pitch pulses. The short term predictive filter is then applied to speech signal
200
resulting in the short term filtered signal
300
, FIG.
3
. The long term predictor filter removes the quasi-periodic pitch pulses from the residual speech signal
300
,
FIG. 3
, resulting in a mostly noise-like signal
400
,
FIG. 4
, which becomes the target signal for the fixed codebook search.
FIG. 4
is a plot of a
160
sample frame of a fixed codebook target signal
350
divided into three subframes
354
,
356
,
358
. The code value is then transmitted across the communication network.
In
FIG. 5
, the lookup table
470
that maps the position of the pulses in a subframe is shown. The pulses within the subframe are constrained to lie in one of sixteen possible positions
402
within the lookup table. Because each track
404
has sixteen possible positions
402
, only four bits are required to identify each pulse location. Each pulse mapping occurs in an individual track
404
. Therefore, two tracks
406
,
408
enables the mapping of the pulse positions of two signal pulses from the subframe.
In the current example, the subframe
354
,
FIG. 4
, has only 53 samples in the excitation, making position
0
-
52
the only valid positions. Because of the way the tracks
406
,
408
,
FIG. 5
, are divided positions that exceed the length of the original excitation are present in each track. Positions
56
and
60
in track
1
, and positions
57
and
61
in track
2
are invalid and unused. The location of the first two pulses
310
,
312
,
FIG. 4
, corresponds to sample thirteen and sample seventeen. By using the table
400
,
FIG. 5
, it is determined that sample thirteen lies in position three
410
in the first track
406
. The second pulse is in sample seventeen and lies in second track
408
at position four
412
. Therefore, the pulses can be represented and transmitted as four bits each respectively. The other pulses
314
,
FIG. 4
,
316
,
318
,
320
and
322
in the subframe
354
are ignored because the code book has only two tracks.
The pulse position is constrained by the absolute pulse position in the tracks. Disadvantageously, the CELP vocoder tends to place pulses in adjacent positions in the tracks. By placing the pulses in adjacent positions in the tracks, the start of the speech sound is encoded rather than a more balance encoding of the utterance. Additionally, as the bit rate for the vocoder decreases and fewer pulses are used, the voice quality is adversely affected by inefficient placement of pulses into tracks. What is needed is a method to reduce the occurrence of pulses being placed in adjacent track positions.
SUMMARY OF THE INVENTION
The inefficiency of absolute track positions placement is eliminated by the implementation of placement of a signal pulse in a second track relative to the position of a signal pulse in the first track. Implementing relative positioning of the N+1 signal pulses in the N+1 tracks during encoding of a signal pulse results in increased signal quality of the decoded signal. The increased signal quality is achieved by more precise placement of pulses in the tracks and by reducing the occurrence of adjacent placement of signal pulse positions within the tracks.
REFERENCES:
patent: 4625286 (1986-11-01), Papamichalis et al.
patent: 4932061 (1990-06-01), Kroon et al.
patent: 5704003 (1997-12-01), Kleijn et al.
patent: 5708757 (1998-01-01), Massaloux
patent: 5754976 (1998-05-01), Adoul et al.
patent: 5778338 (1998-07-01), Jacobs et al.
patent: 5924062 (1999-07-01), Maung
patent: 5963897 (1999-10-01), Alpuente et al.
patent: 6067511 (2000-05-01), Grabb et al.
patent: 6094629 (2000-07-01), Grabb et al.
patent: 6119082 (2000-09-01), Zinser et al.
patent: 6138092 (2000-10-01), Zinser et al.
patent: 6233550 (2001-05-01), Gersho et al.
patent: 6240386 (2001-05-01), Thyssen et al.
patent: 6311154 (2001-10-01), Gersho et al.
patent: 6334105 (2001-12-01), Ehara
patent: 6539349 (2003-03-01), Benno
“Novel Quantization Schemes for Multi-Purpose Coder at 5 KBPS”; 1994 International Symposium on Speech, Image Processing and Neural Networks, Apr. 13-16, 1994, Hong Kong; Dept.
Lucent Technologies - Inc.
Opsasnick Michael N.
To Doris H.
LandOfFree
Relative pulse position in celp vocoding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Relative pulse position in celp vocoding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Relative pulse position in celp vocoding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3252939