Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-03-01
2002-06-04
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S219000, C704S220000
Reexamination Certificate
active
06401062
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to an apparatus for encoding and an apparatus for decoding speech and musical signals. More particularly, the invention relates to a coding apparatus and a decoding apparatus for transmitting speech and musical signals at a low bit rate.
BACKGROUND OF THE INVENTION
A method of encoding a speech signal by separating the speech signal into a linear prediction filter and its driving sound source signal is used widely as a method of encoding a speech signal efficiently at medium to low bit rates.
One such method that is typical is CELP (Code-Excited Linear Prediction). With CELP, a linear prediction filter for which linear prediction coefficients obtained by subjecting input speech to linear prediction analysis have been decided is driven by a sound source signal represented by the sum of a signal that represents the speech pitch period and a noise signal, whereby there is obtained a synthesized speech signal (i.e., a reconstructed signal). For a discussion of CELP, see the paper (referred to as “Reference 1”) “Code excited linear prediction: High quality speech at very low bit rates” by M. Schroeder et. al (Proc. ICASSP, pp. 937-940, 1985).
A method using a higher-order linear prediction filter representing the complicated spectrum of music is known as a method of improving music encoding performance by CELP. According to this method, the coefficients of a higher-order linear prediction filter are found by applying linear prediction analysis at a high order of from 50 to 100 to a signal obtained by inverse filtering a past reconstructed signal using a linear prediction filter. A signal obtained by inputting a musical signal to the higher-order linear prediction filter is applied to a linear prediction filter to obtain the reconstructed signal.
As an example of an apparatus for encoding speech and musical signals using a higher-order prediction linear filter, see the paper (referred to as “Reference 2”) “Improving the Quality of Musical Signals in CELP Coding”, by Sasaki et al. (Acoustical Society of Japan, Spring, 1996 Meeting for Reading Research Papers, Collected Papers, pp. 263-264, 1996) and the paper (referred to as “Reference 3”) “A 16 Kbit/s Wideband CELP Coder with a High-Order Backward Predictor and its Fast Coefficient Calculation” by M Serizawa et al. (IEEE Workshop on Speech Coding for Telecommunications, pp. 107-108, 1997).
A known method of encoding a sound source signal in CELP involves expressing a sound source signal efficiently by a multi pulse signal comprising a plurality of pulses and defined by the positions of the pulses and pulse amplitudes.
For a discussion of encoding of a sound source signal using a multipulse signal, see the paper (referred to as “Reference 4”) “MP-CELP Speech Coding based on Multi-Pulse Vector Quantization and Fast Search” by Ozawa et al. (Transaction A, Institute of Electronics, Information and Communication Engineers of Japan (Trans. IEICEJ), pp. 1655-1663, 1996). Further, by adopting a band splitting arrangement using a sound source signal found for each band and a higher-order backward linear prediction filter in an apparatus for encoding speech and musical signals based upon CELP, the ability to encode music is improved.
With regard to CELP using band splitting, see the paper (referred to as “Reference 5”) “Multi-band CELP Coding of Speech and Music” by A. Ubale et al. (IEEE Workshop on Speech Coding for Telecommunications, pp. 101-102, 1997).
FIG. 10
is a block diagram showing an example of the construction of an apparatus for encoding speech and music according to the prior art. For the sake of simplicity, it is assumed here that the number of bands is two.
As shown in
FIG. 10
, an input signal (input vector) enters from an input terminal
10
. The input signal is generated by sampling a speech or musical signal and gathering a plurality of the samples into a single vector as one frame.
A first linear prediction coefficient calculation circuit
140
receives the input vector as an input from the input terminal
10
. This circuit subjects the input vector to linear prediction analysis, obtains a linear prediction coefficient and quantizes the coefficient. The first linear prediction coefficient calculation circuit
140
outputs the linear prediction coefficient to a weighting filter
160
and outputs an index, which corresponds to a quantized value of the linear prediction coefficient, to a linear prediction filter
150
and to a code output circuit
690
.
A known method of quantizing a linear prediction coefficient involves converting the coefficient to a line spectrum pair (referred to as an “LSP”) to effect quantization. For a discussion of the conversion of a linear prediction coefficient to an LSP, see the paper (referred to as “Reference 6”) “Speech Data Compression by LSP Speech Analysis-Synthesis Technique” by Sugamura et al. (Transaction A, Institute of Electronics, Information and Communication Engineers of Japan (Trans. IEICEJ), Vol. J64-A, No. 8, pp. 599-606, 1981). In regard to quantization of an LSP, see the paper (referred to as “Reference 7”) “Vector Quantization of LSP Parameters Using Moving Average Interframe Prediction” by Omuro et al. (Transaction A, Institute of Electronics, Information and Communication Engineers of Japan (Trans. IEICEJ), Vol. J77-A, No. 3, pp. 303-312, 1994).
A first pulse position generating circuit
610
receives as an input an index that is output by a minimizing circuit
670
, generates a first pulse position vector using the position of each pulse specified by the index and outputs this vector to a first sound source generating circuit
20
.
Let M represent the number of pulses and let P
1
, P
2
, . . . , PM represent the positions of the pulses. The vector P, therefore, is written as follows:
=(P
−
1
, P
2
, . . . , P
M
)
(It should be noted that the bar over P indicates that P is a vector.)
A first pulse amplitude generating circuit
120
has a table in which M-dimensional vectors A
−
j
, j=1, . . . , NA have been stored, where NA represents the size of the table. The index output by the minimizing circuit
670
enters the first pulse amplitude generating circuit
120
, which proceeds to read an M-dimensional vector A
−
i
corresponding to this index out of the above-mentioned table and outputs this vector to the first sound source generating circuit
20
as a first pulse amplitude vector.
Letting A
i1
, A
i2
, . . . , A
iM
represent the amplitude values of the pulses, we have
A
−
i
=(A
i1
, A
i2
, . . . , A
iM
)
A second pulse position generating circuit
611
receives as an input the index that is output by the minimizing circuit
670
, generates a second pulse position vector using the position of each pulse specified by the index and outputs this vector to a second sound source generating circuit
21
.
A second pulse amplitude generating circuit
121
has a table in which M-dimensional vectors B
−
j
, j=1, . . . , N
B
have been stored, where N
B
represents the size of the table.
The index output by the minimizing circuit
670
enters the second pulse amplitude generating circuit
121
, which proceeds to read an M-dimensional vector B
−
j
corresponding to this index out of the above-mentioned table and outputs this vector to the second sound source generating circuit
21
as a second pulse amplitude vector.
The first pulse position vector P
−
=(P
1
, P
2
, P
M
) output by the first pulse position generating circuit
610
and the first pulse amplitude vector A
−
i
=(A
i1
, A
i2
, . . . , A
iM
) output by the first pulse amplitude generating circuit
120
enter the first sound source generating circuit
20
. The first sound source generating circuit
20
outputs an N-dimensional vector for which the values of the P
1
st, P
2
nd, . . . , P
M
th elements are A
i1
, A
i2
, . . . , A
iM
, respectively, and the values of the other elements are zero to a first gain circuit
30
as a first sound source signal (sound source vector).
A second pulse
Abebe Daniel
Dorvil Richemond
Foley & Lardner
Nec Corporation
LandOfFree
Apparatus for encoding and apparatus for decoding speech and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus for encoding and apparatus for decoding speech and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus for encoding and apparatus for decoding speech and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2949902