Speech coding apparatus and pitch prediction method of input...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S219000

Reexamination Certificate

active

06243673

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speech coding apparatus and a pitch prediction method in speech coding, particularly a speech coding apparatus using a pitch prediction method in which pitch information concerning an input excitation waveform for speech coding is obtained as few computations as possible, and a pitch prediction method of an input speech signal.
2. Description of the Related Art
A speech coding method represented by CELP (Code Excited Linear Prediction) system is performed by modelimg the speech information using a speech waveform and an excitation waveform, and coding the spectrum envelop information corresponding to the speech waveform, and the pitch information corresponding to the excitation waveform separately, both of which are extracted from input speech information divided into frames.
As a method to perform such speech coding at a low bit rate, recently ITU-T/G.723.1 was recommended. The coding according to G.723.1 is carried out based on the principles of linear prediction analysis-by-synthesis to attempt so that a perceptually weighted error signal is minimized. The search of pitch information in this case is performed by using the characteristics that a speech waveform changes periodically in a vowel range corresponding to the vibration of a vocal cord, which is called pitch prediction.
An explanation is given to a pitch prediction method applied in a conventional speech coding apparatus with reference to FIG
1
.
FIG. 1
is a block diagram of a pitch prediction section in a conventional speech coding apparatus.
An input speech signal is processed to be divided into frames and sub-frames. An excitation pulse sequence X[n] generated in a immediately before sub-frame is input to pitch reproduction processing section
1
, and processed by the pitch emphasis processing for a current target sub-frame.
Linear predictive synthesis filter
2
provides at multiplier
3
the system filter processing such as formant processing and harmonic shaping processing to an output speech data Y[n] from pitch reproduction processing section
1
.
The coefficient setting of this linear predictive synthesis filter
2
is performed using a linear predictive coefficient A′(z) normalized by the LSP (linear spectrum pair) quantization of a linear predictive coefficient A(z) obtained by linear predictive analyzing a speech input signal y[n], a perceptual weighting coefficient W[z] used in perceptual weighting processing the input speech signal y[n], and a coefficient P(z) signal of harmonic noise filter for waveform arranging a perceptually weighted signal.
Pitch predictive filter
4
is a filter with five taps for providing in multiplier
5
the filter processing to an output data t′[n] out put from multiplier
3
using a predetermined coefficient. This coefficient setting is performed by reading out a codeword sequentially from adaptive codebook
6
in which a codeword of adaptive vector corresponding to each pitch period is stored. Further when coded speech data are decoded, this pitch predictive filter
4
has the function to generate a pitch period which sounds more natural and similar to a human speech in generating a current excitation pulse sequence from a previous excitation pulse sequence.
Further adder
7
outputs an error signal r[n]. The error signal r[n] is an error between an output data p[n] from multiplier
5
that is a pitch predictive filtering processed signal, and a pitch residual signal t[n] of a current sub-frame (a residual signal of the formant processing and the harmonic shaping processing). An index in adaptive codebook
6
and a pitch length are obtained as the optimal pitch information so that the error signal r[n] should be minimized by the least squares method.
The calculation processing in a pitch prediction method described above is performed in the following way.
First the calculation processing of pitch reproduction performed in pitch reproduction processing section
2
is explained briefly using FIG.
1
.
The excitation pulse sequence X[n] of a certain pitch is sequentially input to a buffer to which 145 samples can be input, then the pitch reproduced excitation sequence Y[n] of 64 samples are obtained according to equations (1) and (2) below, where Lag indicates a pitch period.

Y
(
n
)=
X
(145−Lag−2
+n
)
n=
0,1  (1)
Y
(
n
)=
X
(145−Lag+(
n
−2)%Lag)
n
=2-63  (2)
That is, equations (1) and (2) indicate that a current pitch information (vocal cord vibration) is imitated using a previous excitation pulse sequence.
Further, the convolution data (filtered data) t′[n] is obtained by the convolution of this pitch reproduced excitation sequence Y[n] and an output from linear predictive synthesis filter
2
according to equation (3) below.
t


(
n
)
=

j
=
0
n

I

(
j
)
·
Y

(
n
-
j
)



0

n

59
(
3
)
And, since the pitch prediction processing is performed using a pitch predictive filter in fifth order FIR (finitive impulse response) type, five convolution data t′[n] are necessary from Lag−2 up to Lag+2 as shown in equation (4) below, where Lag is a current pitch period.
Because of the processing, as shown in
FIG. 2
, the pitch reproduced excitation data Y[n] requires 64 samples which are 4 samples (from Lag−2 up to Lag+2 suggests total 4 samples) more than 60 samples forming a sub-frames,
t


(
l
)

(
n
)
=

j
=
0
n

I

(
j
)
·
Y

(
l
+
n
-
j
)



0

l

4



0

n

59
(
4
)
where l is a variable of two dimensional matrix, which indicates the processing is repeated five times.
However, as a method to reduce calculations in a DSP or the like, convolution data t′(4)(n) is obtained using equation (3) when l=4, and obtained using equation (5) below when l=0~3.
t′
(
l
)(
n
)=
I
(
l

Y
(
n
)+
t
′(
l
+1)(
n−
1) 0
≦l≦
3 0
≦n
≦59  (5)
By using equation (5), 60 times of convolution processing are enough, while 1,830 times of convolution processing are required without using equation (5).
Further the optimal value of convolution data P(n) in pitch predictive filter
4
is obtained using pitch residual signal t(n) so that the error signal r(n) should be minimized. In other words, the error signal r(n) shown in equation (6) below should be minimized by searching adaptive codebook data of pitches corresponding to Live filter coefficients of fifth order FIR type pitch predictive filter
4
from codebook
6
.
r
(
n
)=
t
(
n
)−
p
(
n
)  (6)
The estimation of error is obtained using the least squares method according to equation (7) below.

n
=
0
59

&LeftBracketingBar;
r

(
n
)
&RightBracketingBar;
2
(
7
)
Accordingly, equation (8) below is given.

n
=
0
59

&LeftBracketingBar;
r

(
n
)
&RightBracketingBar;
2
=

n
=
0
59

&LeftBracketingBar;
t

(
n
)
-
p

(
n
)
&RightBracketingBar;
2
=

n
=
0
59

t

(
n
)
2
-
2

t

(
n
)
·
p

(
n
)
+
p

(
n
)
2
(
8
)
Further, equation (9) below is given.
p

(
n
)
=

l
=
0
4

t


(
l
)

(
n
)



0

n

59
(
8
)
By substituting equation 9 in equation 9, adaptive codebook data of a pitch, in other words, the index of adaptive codebook data of a pitch to minimize the error is obtained.
Further pitch information that is closed loop pitch information and the index of adaptive code book data of a pitch are obtained by repeating the above operation corresponding to Lag−1 up to Lag+1 for the re-search so as to obtain the pitch period information at this time correctly. The number of re-searc

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech coding apparatus and pitch prediction method of input... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech coding apparatus and pitch prediction method of input..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech coding apparatus and pitch prediction method of input... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2472153

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.