Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-03-15
2001-07-24
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S220000, C704S225000, C704S226000
Reexamination Certificate
active
06266632
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speech decoding apparatus and speech decoding method for decoding digital speech data coded based on excitation parameter information in accordance with ITU-T Recommendation G.723.1 and CELP (Coded Excited Linear Prediction) coding.
2. Related Art
One of the Recommendations concerning speech coding technique is ITU-T Recommendation G.723.1, which recommends about speech codec of ITU-T Recommendation H.324 concerning videophone using primarily analogue lines. In this speech coding technique, speech signals are coded at dual rates of 6.3 kbps and 5.3 kbps to represent human vocal mechanism.
A conventional coding apparatus is explained below with reference to a function block diagram in FIG.
1
.
In a coding section, a speech signal is input to LPC analysis section
1101
and perceptual weighting filter
1102
. LPC analysis section
1101
executes linear prediction of the speech signal to represent human voice path (throat form). LSP quantizer
1104
quantizes a linear predicted result to obtain LSP information that is one of speech parameters.
On the other hand, perceptual weighting filter
1102
modifies a frequency characteristic of speech signal to improve perception. Pitch estimator
1103
computes a pitch of the speech signal passed through the filter
1102
. Harmonic noise shaping filter
1105
adjusts a distortion of the speech signal so that a noise or the like that contained in the perceptual weighted speech signal processed in the filter
1102
is under the threshold. In other words, the filter
1105
adjusts a speech quality. Pitch predictor
1106
obtains the returned speech data previously processed in pitch predictor
1106
. Pitch predictor
1106
computes a pitch of current speech signal using the previously processed speech data to generate pitch information (pitch length and index to determine voiced sound or voiceless sound) Based on the generated pitch information, excitation parameter generator
1107
generates an exited signal to output to pseudo decoder
1108
. Excitation parameter generator
1107
computes energy of the exited signal as an excitation parameter (Mamp), anddetermines an index in which the exited signal is coded according to the excitation parameter (Mamp). Excitation parameter generator
1107
has a index table which is correspondingly registered index number and excitation parameter (Mamp). Pseudo decoder
1108
once decodes the index to obtain the exited signal and returns the exited signal to pitch predictor
1106
for pitch prediction of following speech data.
As described above, in the coding in accordance with ITU-T Recommendation G723.1, LSP information, pitch information and excitation parameter information (index) are generated and transmitted from a transmitting side to a receiving side via a line. The receiving side decodes the information received from the transmitting side to reproduce the speech signal.
In the decoder, the LSP information is input to LSP decoder
1121
, the pitch information is input to pitch decoder
1122
, and the excitation parameter information is input to excitation parameter decoder
1123
. Synthesis filter
1124
is constructed with coefficient corresponding to the decoded LSP information. A signal synthesized from the pitch data decoded in pitch decoder
1122
and an excited signal decoded by excitation decoder
1123
is input to synthesis filter
1124
. The speech signal synthesized in synthesis filter
1124
is subjected to a correction in perceptual weighting filter
1125
to improve perception.
As described above, in ITU-T Recommendation G723.1, speech signal is divided into a plurality of parameters for coding, while the speech signal is decoded based on these plurality of parameters.
This coding method is a kind of CELP (Code Excited Linear Prediction) coding. The coding in CELP has characteristics of both the coding in which a generation process of speech is coded and the waveform coding, in which the excitation parameter is generated in the same way as the coding in accordance with ITU-T Recommendation G723.1.
In the speech coding in accordance with ITU-T Recommendation G723.1, a speech volume difference occurs between at a receiving side and a transmitting side by a line deterioration or others in communicating a speech through a telephone line or the like. In other words, since a speech at one side is recorded higher while another speech at another side is recorded lower, the speeches coded then decoded become hard to listen.
The above problem is caused by a volume difference between original speeches. A control of a gain of low volume speech is expected to prevent the problem to be caused. As the gain control, the following methods are considered.
A speech signal existing together with high volume and low volume are reproduced as a waveform. The waveform of the speech signal is sampled and energy of each sample is computed. The energy of each sample is subjected to gain control. Specifically, the gain control is performed in order to increase energy of a low volume speech to the same level as a high volume speech while keeping the energy of the high volume speech the same level.
As described above, when a high volume speech and a low volume speech are present, the volume of decoded speech signal is made constant by controlling a gain of the low volume speech signal. It is considered to apply this-method to the case of speech decoding in accordance with ITU-T Recommendation G723.1.
However in this case, the following problems have been remained.
That is, it is necessary to sample a waveform of the reproduced speech signal. It is further necessary to perform this sampling at a high sampling frequency, resulting in a large number of samplings. Therefore, it is necessary to reserve a large memory capacity to save sampled data and a large amount of computations are required to process a large amount of sampled data for the gain control, resulting in a heavy load of a CPU and a low decoding rate.
SUMMARY OF THE INVENTION
It is an object of the present invention to achieve a speech decoding apparatus capable of reducing a computation amount in decoding speeches with different speech volumes that are caused by different talkers so as to reproduce a speech easy to listen when the speech data that is coded in accordance with ITU-T Recommendation G723.1 is decoded, especially a speech recording is performed.
The speech decoding apparatus of the present invention comprises a decoding function for decoding a speech signal that is coded into a plurality of speech parameters, and a correction function for correcting a speech based on the energy value computed based on an excitation parameter that is one of the plurality of parameters and a predetermined gain parameter.
According to the speech decoding apparatus of the present invention, it is possible to obtain a pleasant to listen-to speech by correcting the speech coded based on the energy value computed based on the excitation parameter and the predetermined gain parameter.
In addition, the speech decoding apparatus of the present invention corrects the speech using a gain parameter when the energy computed based on an excitation parameter is within a predetermined range.
According to the speech decoding apparatus, it is possible to obtain a pleasant to listen-to speech without correcting noise and without causing an overflow due to a large volume speech because the apparatus corrects the speech when the excitation energy is within the predetermined range.
In addition, the speech decoding apparatus of the present invention corrects speech data for every subframe, and increases or decreases a gain parameter so that the gain parameter approximately becomes a target value that is arbitrary set within the predetermined range every time the correction is performed. It is thereby possible to correct the decoded speech on a subframe-by-subframe basis and obtain a speech easy to listen and having no sense of incongruity by correcting gradually.
In addition, the speech decod
Kato Kiminori
Ohno Motoyasu
Dorvil Richemond
Greenblum & Bernstein P.L.C.
Matsushita Graphic Communication Systems Inc.
McFadden Susan
LandOfFree
Speech decoding apparatus and speech decoding method using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech decoding apparatus and speech decoding method using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech decoding apparatus and speech decoding method using... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2493957