Process for the vector quantization of low bit rate vocoders

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704230, G10L 702

Patent

active

060164697

DESCRIPTION:

BRIEF SUMMARY
BACKGROUND OF THE INVENTION

1. Field of the Invention
The present invention relates to a process for the vector quantization of low bit rate vocoders.
It applies in particular to linear-prediction vocoders similar to those described for example in the THOMSON-CSF Technical Journal, Volume 14. No. 3, Sep. 1982, pages 715 to 731, and according to which the speech signal is identified at the output of a digital filter whose input receives, either a periodic waveform corresponding to those of the voiced sounds, viz. vowels, or to a random waveform corresponding to those of the unvoiced sounds, viz. the majority of consonants.
2. Discussion of the Background
It is known that the auditory quality of linear-prediction vocoders depends in large part on the accuracy with which their predictive filter is quantized and that this quality generally decreases as the digital bit rate between vocoders decreases since the accuracy of quantization of the filter then becomes insufficient. Numerous quantization processes of the type of those described for example in Patent Application EP 0504485 A2 or in U.S. Pat. No. 4,907,276 have been developed in order to solve this problem. In general, the speech signal is segmented into independent frames of constant duration and the filter is renewed at each frame. Thus, to arrive at a bit rate of around 1820 bits per second, the filter must, according to a standard implementation, be represented by a packet of 41 bits transmitted every 22.5 milliseconds. For non-standard links with a lower bit rate, of the order of 800 bits per second, fewer than 800 bits per second have to be transmitted in order to represent the filter, this constituting a bit rate ratio of approximately 3 as compared with standard implementations. 30 bits on average are used to quantize one filter out of two, and these 30 bits are composed of 3 bits defining a quantization scheme and 27 bits for quantizing 10 quantities obtained from LAR (Log Area Ratio) coefficients by displacement and rotation in the 10-dimensional space thus defined. As a result the quantization now begins to be only approximately transparent, and auditory compensation of this artefact is necessary, by coarse quantization of the filters located in the transitions of the speech signal and fine quantization of those corresponding to stable zones. To obtain sufficient accuracy of quantization of the predictive filter despite everything, the conventional approach consists in employing a vector quantization scheme which is intrinsically more efficient than that used in standard systems where the 41 bits employed serve for the scalar quantization of the P=10 coefficients of their prediction filter. The method relies on using a dictionary containing a specified number of standard filters obtained by learning. It consists in transmitting only the page or the index at which the standard filter rate which is obtained, only 10 to 15 bits per filter being transmitted instead of the 41 bits required in scalar quantization mode, however this bit rate reduction is obtained at the cost of a very large increase in the memory size required to store the elements of the dictionary and of a considerable computational burden attributable to the complexity of the filter search algorithm.
By applying this approach also to low bit rate vocoders of 800 bits/s and less, it is commonly supposed that 24 bits are sufficient for a composite dictionary produced from two dictionaries with 4,096 elements accounting for the first four and last six LSPs respectively. The major drawback of this type of quantization again resides in the need to compile this dictionary, to store it and to perform the quantization proper.
Alternatives to the vector quantization scheme have also been proposed in order to reduce the number of elements stored in the dictionary. Thus, a technique of pyramidal vector quantization is in particular known, a description of which may be found in the journal IEEE trans. on INFTH Vol. IT 32 No. 4, July 1986, pages 568 to 582 by Thomas R. Fischer entitled "A pyr

REFERENCES:
patent: 4027261 (1977-05-01), Laurent et al.
patent: 4382232 (1983-05-01), Laurent
patent: 4603393 (1986-07-01), Laurent et al.
patent: 4799241 (1989-01-01), Laurent
patent: 4852098 (1989-07-01), Brechard et al.
patent: 4888778 (1989-12-01), Brechard et al.
patent: 4905256 (1990-02-01), Laurent
patent: 4907276 (1990-03-01), Aldersberg
patent: 4945312 (1990-07-01), Auger et al.
patent: 4982341 (1991-01-01), Laurent
patent: 5016278 (1991-05-01), Rochette et al.
patent: 5243685 (1993-09-01), Laurent
patent: 5313553 (1994-05-01), Laurent
patent: 5455892 (1995-10-01), Minot et al.
patent: 5522009 (1996-05-01), Laurent
patent: 5555320 (1996-09-01), Irie et al.
patent: 5568591 (1996-10-01), Minot et al.
patent: 5715367 (1998-02-01), Gillick et al.
patent: 5826224 (1998-10-01), Gerson et al.
Fischer et al., "Transform Coding of Speech with Pyramid Vector Quntization," 1985 IEEE Military Communications Conference, Oct. 20 to 23, 1985, pp. 620-623.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Process for the vector quantization of low bit rate vocoders does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Process for the vector quantization of low bit rate vocoders, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Process for the vector quantization of low bit rate vocoders will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-569684

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.