Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-03-23
2001-06-12
Korzuch, William R. (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S222000
Reexamination Certificate
active
06246979
ABSTRACT:
BACKGROUND OF THE INVENTION
The invention concerns a method for encoding and/or decoding voice signals, in particular for digital dictating devices.
For encoding, a voice signal is usually initially low-pass filtered at a limiting frequency of less than 4 kHz and the resulting signal is sampled at a sampling rate of 8 kHz. The sampled signal is converted into a digital voice signal consisting of a sequence of voice signal sampling values. Prediction parameters are determined from this sequence of voice signal sampling values for use in the voice signal encoder and decoder. Moreover, for each voice signal sampling value a predicted value is calculated using the above-mentioned prediction parameters for each of the voice signal sampling values. The difference between each signal sampling value and its predicted value is quantized, digitally encoded and passed, modulated together with the prediction parameters, to a storage medium which may be e.g. a magnetic tape or a RAM memory. The signal stored by the storage medium is divided into its individual partial signals and used in a voice decoder to reproduce the original voice signal as precisely as possible.
Conventional methods operating according to the above-mentioned basic principles are disclosed in the patent documents U.S. Pat. No. 4, 133,976, U.S. Pat. No. 3,631 ,520 and
EP-A0657874 describes a voice signal encoder which calculates prediction parameters from a digitized voice signal. An adaptive code book is used to determine an excitation signal component. In addition, multiple pulse components of the excitation signal are determined using the voice signal. For processing, the voice signals are divided into varions time regions and subjected to individual further processing.
U.S. Pat. No. 5,327,520 discloses a voice encoder with which, using a backward adaptive AGC, stored code vectors are evaluated for comparison to input voice signals. For simplification, they are administered in tables.
The publication “Low Complexity Speed Coder for Personal Multimedia Communication”, J. Ikedo et al., 1995 fourth IEEE International Conference on Universal Personal Communications Record, Gateway to the 21
st
Century, Tokyo 06 to Nov. 10, 1995, describes an adaptive code book having entries from a delayed overall excitation signal. In this code book, each inital subblock is completely examined. Only a defined partial region is searched in each of the additional subblocks.
The publication “Efficient Computation and Encoding of the Multipulse Excitation for LPC”, M. Berouti et al., ICASSP 84
th
Procedings of the IEEE International Conference on Acoustics, Speech, and signal Processing, San Diego, USA, Mar. 23-25, 1984, pages 10.1/1-4, describes a coding procedure with which multipulse excitation vectors are encoded using their pulse positions and associated amplitudes.
Departing from this prior art, the underlying purpose of the invention is to improve the quality of reproduction of a voice signal recorded with a digital dictating device.
SUMMARY OF THE INVENTION
This object is achieved on the basis of the features of the independent claims. The dependent claims describe advantageous embodiments and further developments.
Advantageously and in accordance with the invention, the method operates without interblock encoding to enable editing functions such as recording insertion or deletion of parts of the recorded signals,
Although the claimed method is optimized with respect to recording and reproduction of voice signals, other signals can also be recorded and reproduced with satisfactory quality such as music or any sounds, mixtures of voices, vehicle noises etc.
The features of the invention are explained below using an exemplar y embodiment. The embodiment does not represent an exhaustive enumeration of all possible embodiments in accordance with the invention but has exemplary character only. The features in the claims may be utilized either individually or collectively in any arbitrary combination.
The method is carried out as follows:
After pre-processing, a digital voice signal is further processed in blocks. The pre-processed digital voice signal s is initially subjected to an LPC analysis (LPs=linear predictive encoding), wherein LPC parameters a are determined from the digital voice signal. These are used in an inverse filtration for generating an LPC residual signal r from the digital voice signal t. On the basis of the LPC parameters a and the LPC residual signal r, an LTP analysis, a so-called longterm prediction analysis is effected and pulse parameter generation carried out. In alternative embodiments, the voice signal can be supplied to the LTP analysis and/or the pulse parameter generation, either unfiltered or filtered using another filtration and not the above-mentioned inverse filtration.
In addition to the residual signal r and the LPC, parameters a, the total excitation signal e
v
, delayed by a subblock, is supplied to the DuP analysis and the pulse parameter generation. The LTP analysis generates parameters which define an excitation vector e
itp
, and the pulse generation produces parameters determining the excitation vector e
mpe
.
The excitation vectors e
mpe
and e
itp
are generated and added together to obtain the total excitation signal e. This total excitation signal e is subsequently delayed by a subblock to obtain the total excitation signal e
v
delayed by a subblock.
The input signal is a digital voice signal with a sampling rate of 12 kHz, This signal is, initially, high-pass filtered, the high-pass having a (lower) limiting frequency of 50 Hz. This eliminates D.C. and low-frequency components from the digital voice signal which might have otherwise disturbed the subsequent analysis. The transfer function of the high-pass filter is
H
⁢
(
z
)
=
z
-
1
z
-
0.99
.
Furthermore, the digital signal is subjected to a pre-emphasis using an FIR filter of first order having a transfer function
E
(
z
)=
z−
0.1
This pre-emphasis causes a slight level increase of approximately 1 to 1.5 dB.
The next step involves the formation of blocks. During block formation, the signals are segmented into superimposed analysis blocks of 324 sampling values, i.e. each having a duration of 27 ms. Neighboring blocks overlap with a duration of 3ms. Each of the 24 ms long synthesis blocks, centered in the analysis blocks, consists of four 6 ms subblocks, wherein the LTP analysis described below and the pulse parameter generation are carried out for each subblock, i.e. four times per block.
The LPC analysis may be carried out e.g. as described below. Each analysis block is initially subjected to trapezoidal window formation. The window is defined as follows:
w
⁡
(
n
)
=
{
n
+
1
14
0
≤
n
<
14
1
14
≤
n
<
310
324
-
n
14
310
≤
n
<
324.
The subsequent step calculates an auto-correlation sequence according to the following equation:
ϕ
xx
⁢
(
n
)
=
∑
i
=
0
323
-
n
⁢
s
w
⁢
(
i
)
·
s
w
⁢
(
i
+
n
)
⁢
⁢
(
0
≤
n
≤
14
)
,
wherein s
w
(n) represents a windowed input segment. The first value &phgr;
xx
(0) of the auto-correlation sequence is subsequently increased through multiplication with a factor of 1.0004, to make the subsequent calculations numerically more favorable.
From the resulting auto-correlation sequence, the LPC prediction parameters are calculated by solving the linear system of equations
[
ϕ
xx
⁡
(
0
)
ϕ
xx
⁡
(
1
)
…
ϕ
xx
⁡
(
13
)
ϕ
xx
⁡
(
1
)
ϕ
xx
⁡
(
0
)
…
ϕ
xx
⁡
(
12
)
…
…
…
…
ϕ
xx
⁡
(
13
)
ϕ
xx
⁡
(
12
)
…
ϕ
xx
⁡
(
0
)
]
·
[
a
1
″
a
2
″
…
a
14
″
]
=
-
[
ϕ
xx
⁡
(
1
)
ϕ
xx
⁡
(
2
)
…
ϕ
xx
⁡
(
14
)
]
e.g. by means of the Durbin-Levinson recursion algorithm and by setting a
0
″1.
These LPC prediction parameters are subjected to bandwidth broadening by 20 Hz, wherein the relation
a
i
′=a
i
″·&ggr;
bwe
Azad Abul K.
Grundig AG
Korzuch William R.
Vincent Paul
LandOfFree
Method for voice signal coding and/or decoding by means of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for voice signal coding and/or decoding by means of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for voice signal coding and/or decoding by means of a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2502485