Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-03-15
2001-09-25
Knepper, David D. (Department: 2645)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S220000, C704S219000
Reexamination Certificate
active
06295520
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to the methods and apparatus for the encoding and decoding of analog signals such as sound and more particularly speech signals to and from digital codes. More particularly this invention relates to methods and apparatus to convolve excitation signals with impulse response functions to form the sound contributions that form a synthesized output sound signal.
2. Description of the Related Art
The structure and function of a codebook excited linear predictive (CELP) coder is well known in the art. The specification for the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) has published a recommended standard entitled “Dual Rate Speech Coder for Multimedia Communications Transmitting at 5.3 and 6.3 k bit/s,” G.723.1, 1996, Geneva, Switzerland that specifies a coded representation that can be used for compressing speech or other audio signals for transmission at very low bit rates.
A speech coder complying with G.723.1 has an input of 16 bit linear Pulse Code Modulated sampled digital data. The sampling has a frequency rate of 8000 Hz. The samples are partitioned into frames of 240 samples that have a duration of 30 ms.
The faster transmission rate of 6.3 k bits/s uses a multi pulse maximum likelihood algorithm to quantize each frame. And the slower transmission rate of 5.3 k bits/s uses an algebraic code-excited linear predictor algorithm to quantize each frame.
The digital channel data transferred from the encoding source to the decoder is the linear split predictor indices, the adaptive codebook gain and lag (the pitch information), the fixed codebook index and gain (the residual information).
FIG. 1
shows a simplified block diagram of a decoder as shown in
FIGS. 1 and 2
of G.273.1 and included herein by reference.
The channel data
100
is divided and preprocessed into the filter coefficients h(n)
115
, which are retained in the buffer
110
, and the pitch/excitation signals
125
which are retained in the buffer
120
. The filter coefficients h(n)
115
determine the filter characteristics of the synthesis filter
130
. The excitation signals e
i
(n)
125
are then the input stimuli to the synthesis filter
130
. The excitation signals e
i
(n)
125
are then filtered to provide the synthesis speech signal y(n)
135
for a frame of 240 samples. The synthesis speech signal y(n)
135
is a digital signal that is the input to a digital-to-analog converter (DAC) that will reproduce a facsimile of the original audio signal.
It is well known in the art that the filtering process is a convolving of the excitation signals e
i
(n)
125
with the filter coefficients h(n)
115
. The convolution of the excitation signals e
i
(n)
12
with the filter coefficients h(n) is described according to the following function
y
⁡
(
n
)
=
e
i
⁡
(
n
)
*
h
⁡
(
n
)
=
∑
j
=
0
n
⁢
e
i
⁡
(
j
)
⁢
h
⁡
(
n
-
j
)
Eq
.
⁢
1
where:
n is an index having a value of from 0≦n≦N−1.
N is the number of samples within a frame of quantized speech.
j is an index counter for the performance of the summation.
e
i
(n) is the element of the vector e
i
of the excitation signal
125
.
h(n) is the vector of the filter coefficients
115
.
y(n) is the synthesized speech signal
135
.
FIG. 2
is a flow diagram of the operations necessary to complete the convolution of Eq. 1. A frame of the digital data describing the excitation signal e
i
n) and the impulse response with the filter coefficients h(n) is received and retained
200
. A counter is initialized
205
to the number N of the pitch impulses or samples within the frame. The index counter n is initialized
210
to zero and then tested
215
if the counter is greater than one less than the number of samples N in the frame. If the counter is not
218
greater than one less than the number of samples N in the frame, the value of the synthesized speech signal y(n) is initialized
220
to zero. The counter j for the summation is also initialized to zero. The contribution to the synthesized speech signal y(n) is then calculated
230
by the equation:
y(n)=y(n)+e
i
(n)h(n−j). Eq. 2
n=0 to (n−1)
The counter j for the summation is then incremented
235
and tested if it has exceeded the value of the index counter n. If the counter j has not
243
exceeded the value of the index counter n, an updated value of the synthesized speech signal is calculated
230
with new excitation signals e
i
(j) and new impulse response coefficients h(n−j) as described in Eq. 2. This reiterates until the value of the counter j of the summation is greater than
242
the value n of the index counter. When the value of the counter j is greater than
242
the index counter n, the index counter n is then incremented
245
and then compared
215
to one less than the number of samples N.
The above described steps are repeated until the index counter reaches the value of the number of samples N, at this point all contributions to the synthesized speech signal y(n) are determined and a new frame of the digital data is received
200
.
A calculation of one contribution to the synthesized speech signal y(n) requires (N+1)N/2 multiplications and (N−1)N/2 additions. This calculation of the algorithm has a delay of 37.5 ms.
U.S. Pat. No. 5,754,976 (Adoul et al. 976) describes a method and device for drastically reducing the complexity of a codebook search while encoding a sound signal. The method and device is capable of selecting a priori a subset of the codebook pulse combinations and restraining the combinations to search to the subset. Further, the size of the codebook is increased by allowing the individual code vectors to assume at least one of multiple possible amplitude, while not increasing search complexity.
U.S. Pat. No. 5,701,392 (Adoul et al. 392) provide methods for an algebraic codebook search to encode speech signals. The codebook of Adoul et al 392 consists of a set of code vectors in 40 positions and each comprising multiple non-zero amplitudes assignable to predetermined positions. To reduce the search complexity, a depth-first search is used which involves a tree structure with ordered levels. A path building operation takes place. A path originated at the first level and extended by the path building operations of subsequent levels determine the respective positions of the non-zero amplitudes of a candidate code vector. A signal-based pulse-position likelihood estimate is used during the first few levels to enable initial pulse screening to start the search on favorable conditions.
U.S. Pat. No. 4,944,013 (Gouvianakis et al.) teaches a method of coding speech such that it can be generated by a pulse excitation sequence in a linear predictive coding filter. The sequence contains, in each of successive frame periods, pulse whose positions and amplitudes may be varied. These variables are selected at the coding end to reduce the error between the input and regenerated speech signals. The selection process involves derivation of an initial estimate followed by an iterative adjustment process in which pulses having low energy contributions are tested in alternative positions and transferred to them if a reduced error results.
SUMMARY OF THE INVENTION
An object of this invention is to provide a method and device to encode frame data containing an excitation signal and impulse response filter coefficients, convolve the excitation signal and impulse response filter coefficients, and to produce a synthesized speech from the excitation signal and impulse response filter coefficients.
Another object of this invention is to provide a method to convolve the excitation signal and impulse response filter coefficients more efficiently and with fewer multiplications and additions.
To accomplish these and other objects a method to convolve begins by determining a number of non-zero pulses within the excitation signal. The pulse locations are sorted for the zero and nonzero pulses. The non-zero p
Ackerman Stephen B.
Knepper David D.
Knowles Billy
Saile Geroge O.
Tritech Microelectronics Ltd.
LandOfFree
Multi-pulse synthesis simplification in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multi-pulse synthesis simplification in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multi-pulse synthesis simplification in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2437616