Speech encoding/decoding method using reduced subframe pulse...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S219000

Reexamination Certificate

active

06385576

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to an encoding/decoding method of a low bit rate used for digital telephone, voice memo, etc.
In recent years, the encoding techniques have found wide applications in the portable telephone or the internet in which the speech and music sound are transmitted and stored by being compressed at a low bit rate. Such techniques include the CELP method (Code Excited Linear Prediction (M. R. Schroeder and B. S. at al), “Code Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates”, Proc. ICASSP, pp.937-940, 1985 (reference 1) and W. S. Kleijin, D. J. Krasinski et al. “Improved Speech Quality and Efficient Vector Quantization in SELP”, Proc. ICASSP, pp.155-158, 1988 (reference 2)).
The CELP is an encoding scheme based on the linear predictive analysis. An input speech signal is divided into a linear prediction coefficient representing the phoneme information and a prediction residual signal representing the sound level, etc. according to the linear predictive analysis. Based on the linear predictive coefficients, a recursive digital filter called a synthesis filter is configured, and supplied with a prediction residual signal as an excitation signal thereby to restore the original input speech signal.
For encoding at low bit rate, it is necessary to encode, with as low bit rates as possible, the linear predictive coefficients constituting the synthesis filter information representing the characteristics of the synthesis filter and the prediction residual signal constituting the characteristic of the synthetic filter. In the CELP scheme, two types of signal including the pitch vector and the noise vector are each multiplied by an appropriate gain and added to each other thereby to generate an excitation signal in the form encoded from the prediction residual signal. A method of generating the pitch vector is described in detail in reference 2 for example. There is proposed a method of using a fixed coded vector on a rising portion (onset portion) of a speech other than the method of the reference 2. However, in the present invention, such vectors are used as pitch vectors.
The noise vector is normally generated by storing a multiplicity of candidates in a stochastic codebook and selecting an optimum one. In a method of searching for a noise vector, all the noise vectors are added to the pitch vector and then a synthesis speech signal is generated through a synthetic filter. The error of this synthesis speech signal with respect to the input signal is evaluated thereby to select a noise vector generating a synthesis speech signal with the smallest error. What is most important for the CELP scheme, therefore, is how efficiently to store the noise vectors in the stochastic codebook.
The algebraic codebook (J-P. Adoul et al, “Fast CELP Coding based on algebraic codes”, Proc. ICASSP '87, pp.1957-1960 (reference 3)) has a simple structure in which the noise vector is indicated only by the presence or absence of a pulse and the sign (+, −) thereof. The algebraic codebook, as compared with the stochastic codebook with a plurality of noise vectors stored therein, need not store any code vector and has the feature of a very small calculation amount. Also, the sound quality of the system using the algebraic codebook is not inferior to that of the prior art, and therefore has recently been used for various standard schemes.
In the algebraic codebook, however, the deterioration of the sound quality becomes more conspicuous with the decrease in the encoding bit rate. One reason is the shortage of the pulse position information. Specifically, in view of the fact that the algebraic codebook algebraically simplifies the positional information of the pulse, in spite of the advantage described above, position candidates sometimes exist at points where a pulse rise is not required for low bit rate encoding but not at required points. This not only deteriorates the efficiency but also deteriorates the sound quality.
Another reason for the deterioration of the sound quality when using the algebraic codebook is the shortage of the number of pulses. The shortage of pulses gives rise to a pulse-like noise in the decoded speech. This is because an excitation signal is generated from a pulse train and the presence or absence of a pulse can be easily acknowledged perceptually with the decrease in the number of pulses. For improving the sound quality, it is necessary to alleviate the pulse-like noise.
As described above, the conventional algebraic codebook has the advantage of a simple structure and a small amount of calculation, but poses the problem that the quality of the decoded speech is deteriorated due to the shortage of the pulses-and the positional information of the pulse train making up the excitation signal for the synthesis filter at a low bit rate.
BRIEF SUMMARY OF THE INVENTION
The object of the present invention is to provide a speech encoding/decoding method which can secure a superior sound quality even at a low bit rate encoding.
According to a first aspect of the invention, there is provided a speech encoding method comprising the steps of generating at least information representing the characteristics of a synthesis filter for a speech signal, and generating an excitation signal for exciting the synthesis filter, including a pulse train generated by setting pulses at a predetermined number of pulse positions selected from the pulse position candidates adaptively changed in accordance with the characteristics of the speech signal.
According to another aspect of the invention, there is provided a speech decoding method for inputting an excitation signal to a synthesis filter and decoding a speech signal, the excitation signal containing a pulse train generated by setting pulses at a predetermined number of pulse positions selected from the pulse position candidates adaptively changed in accordance with the characteristics of the speech signal.
In a speech encoding/decoding method according to this invention, the excitation signal for exciting the synthesis filter contains a pulse train generated by setting pulses at a predetermined number of pulse positions selected from the pulse position candidates adaptively changed in accordance with the characteristics of the speech signal. More specifically, the pulse position candidates are assigned in such a manner that more candidates exist at a domain of larger power of the speech signal.
Also, the excitation signal can be configured to include a pulse train generated by setting pulses at all the pulse position candidates adaptively changing in accordance with the characteristics of the voice signal and optimizing the amplitude of each pulse with predetermined means. In such a case, more specifically, the pulse position candidates are assigned so that more candidates exist at a domain of larger power of the voice signal.
Alternatively, the excitation signal can be generated by use of a pulse train generated by setting pulses at a predetermined number of pulse positions selected from first pulse position candidates changing adaptively in accordance with the characteristics of the voice signal or a pulse train generated by setting pulses at a predetermined number of pulse positions selected from second pulse position candidates including a part or the whole of the positions not used as the first pulse position candidates. In this case, the first pulse position candidates are arranged, more specifically, so that more candidates exist at a domain that the power of the speech signal is larger.
Also, in the case where the excitation signal includes a pitch vector and a noise vector, the noise vector is generated by setting pulses at a predetermined number of pulse positions selected from the pulse position candidates changed in accordance with the shape of the pitch vector. More specifically, more pulse position candidates are located at a domain of larger power of the pitch vector.
Also, the noise vector can be configured by use of a pulse train generated by setti

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech encoding/decoding method using reduced subframe pulse... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech encoding/decoding method using reduced subframe pulse..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech encoding/decoding method using reduced subframe pulse... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2821456

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.