Voice encoding and voice decoding using an adaptive codebook...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Voice encoding and voice decoding using an adaptive codebook... Voice encoding and voice decoding using an adaptive codebook...

: 2002-01-08
: 2003-07-15
: Chawan, Vijay (Department: 2654)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: For storage or transmission

: C704S264000, C704S265000, C704S223000, C704S207000, C704S262000
: Reexamination Certificate
: active
: 06594626
: ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates to a voice encoding and voice decoding apparatus for encoding/decoding voice at a low bit rate of below 4 kbps. More particularly, the invention relates to a voice encoding and voice decoding apparatus for encoding/decoding voice at low bit rates using an A-b-S (Analysis-by-Synthesis)-type vector quantization. It is expected that A-b-S voice encoding typified by CELP (Code Excited Linear Predictive Coding) will be an effective scheme for implementing highly efficient compression of information while maintaining speech quality in digital mobile communications and intercorporate communications systems.
In the field of digital mobile communications and intercorporate communications systems at the present time, it is desired that voice in the telephone band (0.3 to 3.4 kHz) be encoded at a transmission rate on the order of 4 kbps. The scheme referred to as CELP (Code Excited Linear Prediction) is seen as having promise in filling this need. For details on CELP, see M. R. Schroeder and B. S. Atal, “Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates,” Proc. ICASSP'85, 25.1.1, pp. 937-940, 1985. CELP is characterized by the efficient transmission of linear prediction coefficients (LPC coefficients), which represent the speech characteristics of the human vocal tract, and parameters representing a sound-source signal comprising the pitch component and noise component of speech.
FIG. 15
is a diagram illustrating the principles of CELP. In accordance with CELP, the human vocal tract is approximated by an LPC synthesis filter H(z) expressed by the following equation:
H
⁡
(
z
)
=
1
1
+
∑
i
=
1
p
⁢
a
i
⁢
z
-
i
(
1
)
and it is assumed that the input (sound-source signal) to H(z) can be separated into (1) a pitch-period component representing the periodicity of speech and (2) a noise component representing randomness. CELP, rather than transmitting the input voice signal to the decoder side directly, extracts the filter coefficients of the LPC synthesis filter and the pitch-period component and noise component of the excitation signal, quantizes these to obtain quantization indices and transmits the quantization indices, thereby implementing a high degree of information compression.
When the voice signal is sampled at a predetermined speed in
FIG. 15
, input signals (voice signals) X of a predetermined number (=N) of samples per frame are input to an LPC analyzer
1
frame by frame. If the sampling speed is 8 kHz and the period of a single frame is 10 ms, then one frame is composed of 80 samples.
The LPC analyzer
1
, which is regarded as an all-pole filter represented by Equation (1), obtains filter coefficients &agr;
i
(i=1, . . . , p), where p represents the order of the filter. Generally, in the case of voice in the telephone band, a value of 10 to 12 is used as p. LPC coefficients &agr;
i
(i=1, . . . , p) are quantized by scalar quantization or vector quantization in an LPC-coefficient quantizer
2
, after which the quantization indices are transmitted to the decoder side.
FIG. 16
is a diagram useful in describing the quantization method. Here sets of large numbers of quantization LPC coefficients have been stored in a quantization table
2
a
in correspondence with index numbers
1
to n. A distance calculation unit
2
b
calculates distance in accordance with the following equation:
d=W·&Sgr;
i
{&agr;
q
(
i
)−&agr;
i
}
2
(
i=
1
~p
)
When q is varied from 1 to n, a minimum-distance index detector
2
c
finds the q for which the distance d is minimum and sends the index q to the decoder side. In this case, an LPC synthesis filter constituting an auditory weighting synthesis filter
3
is expressed by the following equation:
H
q
⁡
(
z
)
=
1
1
+
∑
i
=
1
p
⁢
α
i
⁡
(
i
)
⁢
z
-
i
(
2
)
Next, quantization of the sound-source signal is carried out. In accordance with CELP, a sound-source signal is divided into two components, namely a pitch-period component and a noise component, an adaptive codebook
4
storing a sequence of past sound-source signals is used to quantize the pitch-period component and an algebraic codebook or noise codebook is used to quantize the noise component. Described below will be typical CELP-type voice encoding using the adaptive codebook
4
and algebraic codebook
5
as sound-source codebooks.
The adaptive codebook
4
is adapted to successively output N samples of sound-source signals (referred to as “periodicity signals”), which are delayed by one pitch (one sample), in association with indices
1
to L.
FIG. 17
is a diagram showing the structure of the adaptive codebook
4
in case of L=147, one frame, 80 samples (N=80). The adaptive codebook is constituted by a buffer BF for storing the pitch-period component of the latest 227 samples. A periodicity signal comprising 1 to 80 samples is specified by index 1, a periodicity signal comprising 2 to 81 samples is specified by index 2, . . . , and a periodicity signal comprising 147 to 227 samples is specified by index 147.
An adaptive-codebook search is performed in accordance with the following procedure: First, a bit lag L representing lag from the present frame is set to an initial value L
0
(e.g., 20). Next, a past periodicity signal (adaptive code vector) P
L
, which corresponds to the lag L, is extracted from the adaptive codebook
4
. That is, an adaptive code vector P
L
indicated by index L is extracted and P
L
is input to the auditory weighting synthesis filter
3
to obtain an output AP
L
, where A represents the impulse response of the auditory weighting synthesis filter
3
constructed by cascade connecting an auditory weighting filter W(z) and an LPC synthesis filter Hq(z).
Any filter can be used as the auditory weighting filter. For example, it is possible to use a filter having the characteristic indicated by the following equation:
W
⁡
(
z
)
=
1
+
∑
i
=
1
m
⁢
g
1
i
⁢
α
i
⁢
z
-
1
1
+
∑
i
=
1
m
⁢
g
2
i
⁢
α
i
⁢
z
-
1
(
3
)
where g
1
, g
2
are parameters for adjusting the characteristic of the weighting filter.
An arithmetic unit
6
finds an error power E
L
between the input voice and AP
L
in accordance with the following equation:
E
L
=|X−&bgr;AP
L
|
2
(4)
If we let AP
L
represent a weighted synthesized output from the adaptive codebook, Rpp the autocorrelation of AP
L
and Rxp the cross-correlation between AP
L
and the input signal X, then an adaptive code vector P
L
at a pitch lag Lopt for which the error power of Equation (4) is minimum will be expressed by the following equation:
P
L
=
arg
⁢

⁢
max
⁡
(
R
2
⁢
xp
Rpp
)
=
arg
⁢

⁢
max
⁡
[
(
X
T
⁢
AP
L
)
2
(
AP
L
)
T
⁢
(
AP
L
)
]
(
5
)
where T signifies a transposition. Accordingly, an error-power evaluation unit
7
finds the pitch lag Lopt that satisfies Equation (5). Optimum pitch gain &bgr;opt is given by the following equation:
&bgr;
opt=Rxp/Rpp
(6)
Though the search range of lag L is optional, the lag range can be made 20 to 147 in a case where the sampling frequency of the input signal is 8 kHz.
Next, the noise component contained in the sound-source signal is quantized using the algebraic codebook
5
. The algebraic codebook
5
is constituted by a plurality of pulses of amplitude 1 or −1. By way of example,
FIG. 18
illustrates pulse positions for a case where frame length is 40 samples. The algebraic codebook
5
divides the N (=40) sampling points constituting one frame into a plurality of pulse-system groups
1
to
4
and, for all combinations obtained by extracting one sampling point from each of the pulse-system groups, successively outputs, as noise components, pulsed signals having a +1 or a −1 pulse at each extracted sampling point. In this example, basically four pulses are deployed per frame.
FIG. 19
is a diagram useful in describing sampling points assigned to each o

Affiliated with

Ota Yasuji

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Suzuki Masanao

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Tsuchinaga Yoshiteru

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Chawan Vijay

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Fujitsu Limited

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Voice encoding and voice decoding using an adaptive codebook... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Voice encoding and voice decoding using an adaptive codebook..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Voice encoding and voice decoding using an adaptive codebook... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3057229

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure