Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2003-05-02
2004-07-27
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S223000
Reexamination Certificate
active
06768978
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a low rate speech coding/decoding method used for digital telephones, voice memories, and the like.
Recently, as a coding technology used for portable telephones, the internet, and the like to compress speech information and audio information to small information amounts and transmit or store them, the CELP (Code Excited Linear Prediction (M. R. Schroeder and B. S. Atal, “Code Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates,” Proc. ICASSP, pp. 937-940, 1985 (reference 1)) scheme has been often used.
The CELP scheme is a coding scheme based on linear predictive analysis, in which an input speech signal is separated into linear predictive coefficients representing phoneme information and a prediction residual signal representing characteristic such as pitch period of a speech by linear predictive analysis. A digital filter called a synthesis filter is formed on the basis of the linear predictive coefficients. The original input speech signal can be reconstructed by inputting the prediction residual signal as an excitation signal to the synthesis filter. For low bit rate speech coding, these linear predictive coefficients and prediction residual signal must be coded with a small number of bits.
In the CELP scheme, a signal obtained by coding a prediction residual signal is generated as an excitation signal by adding the products of two types of vectors, i.e., a pitch vector and a stochastic vector, and gains.
A stochastic vector is generally generated by searching for an optimal candidate from a codebook in which many candidates are stored. This search uses a method of generating synthesized speech signals by filtering all the stochastic vectors through the synthesis filter together with pitch vectors, and selecting a stochastic vector with which a synthesized speech signal such that an error between the synthesized speech signal and the input speech signal is minimum is generated. It is therefore an important point for the CELP scheme to efficiently store stochastic vectors in the codebook.
As a scheme for satisfying such a requirement, pulse excitation expressing a stochastic vector by a train of several pulses is known. An example of this scheme is the multi-pulse scheme disclosed in reference 2 (K. Ozawa and T. Araseki, “Low Bit Rate Multi-pulse Speech Coder with Natural Speech Quality,” IEEE Proc. ICASSP'86, pp. 457-460, 1986).
An Algebraic codebook (J-P. Adoul et al, “Fast CELP coding based on algebraic codes”, Proc. ICASSP'87, pp. 1957-1960 (reference 3) is another example and has a simple structure in which a stochastic vector is expressed by only the presence/absence of a pulse and polarity (+, −). In spite of the limitation that the amplitude of a pulse is 1, unlike a multi-pulse, this technique is widely used for low rate coding because speech quality does not deteriorate much and a fast search method is proposed. As a scheme using an algebraic codebook, an improved scheme of allowing a pulse to have an amplitude has been proposed as disclosed in reference 4 (Chang Deyuan, “An 8 kb/s low complexity CELP speech codec,” 1996 3rd International Conference on Signal Processing, pp. 671-4, 1996).
In each type of pulse excitation described above, pulse position candidates at which pulses are set are limited to integer sampling positions, i.e., sampling points of a stochastic vector. For this reason, even if an attempt is made to improve the performance of a stochastic vector by increasing the number of bits assigned to pulse position candidates, bits cannot be assigned beyond the number of bits required to express the number of samples contained in a frame.
Even in a case wherein adapting of pulse position candidates which is provided by U.S. patent application Ser. No. 09/220,062 is to be performed, if the number of bits expressing position information is large, pulse position candidates are set for most samples even at a section where pulse position candidates should be dispersed. As a consequence, this section is difficult to discriminate from a section on which pulse position candidates are concentrated, resulting in a poor adapting effect.
BRIEF SUMMARY OF THE INVENTION
It is an object of the present invention to provide a speech coding/decoding method which can assign an arbitrary number of bits to pulse position information regardless of the number of samples in a frame which is a length of an excitation signal generated based on the pulse position, and can improve sound quality.
It is an object of the present invention to provide a speech coding/decoding method which can resolve an saturation phenomenon occurred when a pulse position is fixed at an integer position using a method of adapting a pulse position candidate which is provided by U.S. patent application Ser. No. 09/220,062, the content of which is incorporated herein by reference, and improve a speech quality by making effectively function adapting of the pulse position candidate.
According to the invention, there is provided a speech coding method which comprises: analyzing an input speech signal to divide the input speech signal into a parameter representing a frequency characteristic of a speech and an excitation signal which is an input signal of a synthesis filter generated based on the parameter, to output a first index specifying the parameter representing the frequency characteristic as a coded result, the excitation signal being formed of a pulse train including a pulse selected from first pulses and second pulses, the first pulses being set at first positions located on sampling points of the excitation signal and the second pulses being set at second positions located between sampling points of the excitation signal; generating a synthesized speech signal based on the coded result and the excitation signal; generating a second index indicating a parameter with which an error between the input speech signal and the synthesized speech signal is minimized; selecting a pulse position candidate from a pulse position codebook in accordance with the second index; and outputting the first and second indexes.
According to the invention, there is provided a speech decoding method which comprises: extracting, from a coded stream, a first index indicting a frequency characteristic of a speech, a second index indicating a pitch vector, and a third index indicating a pulse train of an excitation signal; reconstructing a synthesis filter by decoding the first index; reconstructing the pitch vector on the basis of the second index; reconstructing on the basis of the third index the excitation signal formed by using a pulse train including a pulse selected from first pulses and second pulses, the first pulses being set on sampling points of the excitation signal and the second pulses being set at positions located between sampling points of the excitation signal; and generating a decoded speech signal by exciting a synthesis filter by means of the reconstructed excitation signal and pitch vector.
In other words, the present invention provides a speech coding/decoding method in which an excitation signal is formed by using a pulse train, and the pulse train contains a pulse selected from first pulses set on sampling points of the excitation signal and second pulses set at positions located between sampling points of the excitation signal.
According to the invention, there is provided a speech coding method which comprises: analyzing an input speech signal to divide the input speech signal into a parameter representing a frequency characteristic of a speech and an excitation signal formed based on the parameter and input to a digital filter, to output a first index specifying the parameter representing the frequency characteristic as a coded result, the excitation signal being generated by using a pitch vector and a stochastic vector for exciting a synthesis filter; generating the stochastic vector by using a pulse train including a pulse selected from first pulses and second pulses, the first pulses being s
Amada Tadashi
Tsuchiya Katsumi
LandOfFree
Speech coding/decoding method and apparatus does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech coding/decoding method and apparatus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech coding/decoding method and apparatus will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3227535