Reusing invalid pulse positions in CELP vocoding

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S229000, C704S230000

Reexamination Certificate

active

06385574

ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates to voice compression, and in particular, to code excited linear prediction (CELP) vocoding.
A voice encoder/decoder (vocoder) compresses speech signals in order to reduce the transmission bandwidth required in a communications channel. By reducing the transmission bandwidth required per call, it is possible to increase the number of calls over the same communication channel. Early speech coding techniques, such as the linear predictive coding (LPC) technique, use a filter to remove the signal redundancy and hence compress the speech signal. The LPC filter reproduces a spectral envelope that attempts to model the human voice. Furthermore, the LPC filter is excited by receiving quasi periodic inputs for nasal and vowel sounds, while receiving noise-like inputs for unvoiced sounds.
There exists a class of vocoders known as code excited linear prediction (CELP) vocoders. CELP vocoding is primarily a speech data compression technique that at 4-8 kbps can achieve speech quality comparable to other 32 kbps speech coding techniques. The CELP vocoder has two improvements over the earlier LPC techniques. First, the CELP vocoder attempts to capture more voice detail by extracting the pitch information using a pitch predictor. Secondly, the CELP vocoder excites the LPC filter with a noise like signal derived from a residual signal created from the actual speech waveform. CELP vocoders contain three main components; 1) short term predictive filter, 2) long term predictive filter, also known as pitch predictor or adaptive codebook, and 3) fixed codebook. Compression is achieved by assigning a certain number of bits to each component which is less than the number of bits used to represent the original speech signal. The first component uses linear prediction to remove short term redundancies in the speech signal. The error, or residual, signal that results from the short term predictor becomes the target signal for the long term predictor.
Voiced speech has a quasi-periodic nature and the long term predictor extracts a pitch period from the residual and removes the information that can be predicted from the previous period. After the long term and short term filters, the residual signal is a mostly noise-like signal. Using analysis-by-synthesis, the fixed codebook search finds a best match to replace the noise-like residual with an entry from its library of vectors. The code representing the best matching vector is transmitted in place of the noisy residual. In algebraic CELP (ACELP) vocoders, the fixed codebook consists of a few non-zero pulses and is represented by the locations and signs (e.g. +1 or −1) of the pulses.
In a typical implementation, a CELP vocoder will block or divide the incoming speech signal into frames, updating the short term predictor's LPC coefficients once per frame. The LPC residual is then divided into subframes for the long term predictor and the fixed codebook search. For example, the input speech may be blocked into a 160 sample frame for the short term predictor. The resulting residual may then be broken up into subframes of 53 samples, 53 samples, and 54 samples. Each subframe is then processed by the long term predictor and the fixed codebook search.
Referring to
FIG. 1
, an example of a single frame of a speech signal
100
is shown. The speech signal
100
is made up of voiced and unvoiced signals of different pitches. The speech signal
100
is received by a CELP vocoder having an LPC filter. The first step of the CELP vocoder is to remove short term redundancies in the speech signal. The resulting signal with the short term redundancies removed is the residual speech signal
200
, FIG.
2
.
The LPC filter is unable to remove all of the redundant information and the remaining quasi-periodic peeks and valleys in the filtered speech signal
200
are referred to as pitch pulses. The short term predictive filter is then applied to speech signal
200
resulting in the short term filtered signal
300
, FIG.
3
. The long term predictor filter removes the quasi-periodic pitch pulses from the residual speech signal
300
,
FIG. 3
, resulting in a mostly noise-like signal
400
,
FIG. 4
, which becomes the target signal for the fixed codebook search.
FIG. 4
is a plot of a
160
sample frame of fixed codebook target signal
350
divided into three subframes
354
,
356
,
358
. The code value is then transmitted across the communication network.
In
FIG. 5
, the lookup table
400
used to map the position of the pulses in a subframe is shown. The pulses within the subframe are constrained to lie in one of sixteen possible positions
402
within the lookup table. Because each track
404
has sixteen possible positions
402
, only four bits are required to identify each pulse location. Each pulse mapping occurs in an individual track
404
. Therefore, two tracks
406
,
408
are required to represent positions of two pulses in the subframe.
In the current example
400
, the subframe
354
,
FIG. 4
, has only
53
samples in the excitation, making positions
0
-
52
the only valid positions. Because of the way the tracks
406
,
408
,
FIG. 5
, are divided, the tracks
406
,
408
contain positions that exceed the length of the original excitation. Positions
56
and
60
in track
1
, and positions
57
and
61
in track
2
are invalid. The location of the first two pulses
310
,
312
,
FIG. 4
, correspond to sample thirteen and sample seventeen. By using the table
400
,
FIG. 5
, it is determined that sample thirteen lies in position three
410
in the first track
406
. The second pulse is in sample seventeen and lies in second track
408
at position four
412
. Therefore, the pulses can be represented and transmitted as four bits each respectively. The other pulses
314
,
FIG. 4
,
316
,
318
,
320
and
322
in the subframe
354
are ignored because the code book has only two tracks.
Regardless of the reason why a pulse position in a track may be invalid, invalid track positions are simply excluded from the search for the best combination of pulse positions. This represents an inefficient use of the 2
n
track positions permitted by the “n” bits used to encode the pulse positions. What is needed is a way to efficiently use all 2
n
track positions, thus eliminating invalid positions.
SUMMARY OF THE INVENTION
The inefficiency and waste of the invalid track positions is eliminated by assigning additional valid pulse positions to the invalid track positions or by placing data into the invalid track positions. Assigning additional valid positions to invalid track positions increases the accuracy and quality of the resulting voice signal at a receiving CELP vocoder. The invalid track positions may selectively be used as flags to indicate to the receiving CELP vocoder a change in the processing of the voice signal or how the subsequent encoded bits are to be interpreted.


REFERENCES:
patent: 5752029 (1998-05-01), Wissner
patent: 6167375 (2000-12-01), Miseki et al.
patent: 6260010 (2001-07-01), Gao et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Reusing invalid pulse positions in CELP vocoding does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Reusing invalid pulse positions in CELP vocoding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reusing invalid pulse positions in CELP vocoding will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2890476

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.