Method and apparatus for pre-processing speech signals prior...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S203000, C704S219000

Reexamination Certificate

active

06223151

ABSTRACT:

FIELD OF THE INVENTION
The invention relates generally to the coding of speech signals in communication systems and, more particularly, but not by way of limitation, to the coding of speech with speech coders using block transforms.
BACKGROUND OF THE INVENTION
High quality coding of speech signal s at low bit rates is of great importance to modern communications. Applications for such coding include mobile telephony, voice storage and secure telephony, among others. These applications would benefit from high quality coders operating at one to five kilobits per second. As a result, there is a strong research effort aimed at the development of coders operating at these rates. Most of this research effort is directed at coders based on a sinusoidal coding paradigm (e.g. R. J. McAulay and T. F. Quatieri, “Sinusoidal Coding”, in Speech Coding and Synthesis, W. B. Kleijn and K. K. Paliwal, editors, Elsevier Science, 1995, pages 121-173.) and a waveform interpolation paradigm (e.g., W. B. Kleijn, “Encoding Speech Using Prototype Waveforms”, IEEE Trans. Speech and Audio Process., vol. 4, pages 386-399, 1993). Furthermore, several standards based on sinusoidal coders already exist, for example, INMARSAT Mini-M 4 kb/s, and APCO Project 25 North American land mobile radio communication system.
Coders operating at bit rates greater than five kilobits per second commonly use coding paradigms for which the reconstructed signal is identical to the original signal when the quantization errors are zero (i.e. when quantization is turned off). In other words, signal reconstruction becomes exact when the operational bit rate approaches infinity. Such coders are referred to as Asymptotically Exact (AE) coders. Examples of standards which conform with such coders are the ITU G.729 and G.728 standards. These standards are based on a commonly known Code-Excited Linear Prediction(CELP) speech-coding paradigm. AE coders have an advantage in that the quality can be improved by increasing the operational bit rate. Thus, any shortcomings in models of the speech signal used by an AE coder which result in human perception can be compensated for by increasing the operational bit rate. As a result, any de-tuning of parameter settings in a good AE coder increases the required bit rate necessary to obtain a certain quality of the reconstructed speech. In practice, a majority of AE coders employ bit rates which result in the quality of the reconstructed speech to be of a good to excellent quality. Hereinafter, the meaning of “good” and “excellent” are defined by descriptions contained in the commonly known Mean Opinion Score (MOS) which is based on a subjective evaluation.
For most speech-coding paradigms implemented at bit rates below five kilobits per second, the reconstructed signal does not converge to the original signal when the quantization errors are set to zero. Hereinafter, such coders are referred to as parametric coders. Parametric coders are typically based on a model of the speech signal which is more sophisticated than those used in waveform coders. However, since these coders lack the AE property of improved reconstruction signal quality with increased bit rates, slight shortcomings in the model may greatly affect the quality of the reconstructed speech signal. Relatively seen, this effect on quality is most important with the use of high bit rate quantizers. Thus, the quality of the reconstructed speech signal cannot exceed a certain fixed maximum level which is primarily dependent on the particular model. Generally this maximum quality level is below a “good” rating on the MOS scale.
It would be advantageous therefore, to modify promising parametric coders to operate as AE coders. First, usage of sophisticated speech-signal models associated with parametric coders results in an efficient coding. Second, conversion to an AE coder removes limitations on the quality of the reconstructed speech associated with parametric coders. To convert a parametric coder to an AE coder, however, the parametric coder needs to be amenable to such a modification. As will be described below, the waveform interpolation coder is indeed amenable to such a change. Furthermore, use of the present invention allows certain sinusoidal coders to be converted from parametric coders to AE coders as well.
Until recently, all versions of commonly known waveform interpolation coders (e.g. I. S. Burnett and D. H. Pham, “Multi-Prototype Waveform Coding Using Frame-by-Frame Analysis-by-Synthesis”, Proc. International Conf. Acoust. Speech Sign. Process., 1997, pages 1567-1570, and Y. Shoham, “very Low Complexity Interpolative Speech Coding at 1.2 to 2.4 kbps”, Proc. International Conf. Acoust. Speech Sign. Process., 1997, pages 1599-1602.) were parametric coders. Since the quality of the reconstructed speech signal is limited by the particular model, implementations of waveform interpolation coders have been designed at bit rates of approximately two thousand four hundred bits per second where the shortcomings of the model are least apparent.
Recently, two AE versions of the waveform interpolation coder were proposed (W. B. Kleijn, H. Yang, and E. F. Deprettere, “Waveform Interpolation With Pitch-Spaced Subbands”, Proc. International Conf. Speech; and Language Process., 1998 pages 1795-1798). The basic coder operation is the same in both versions. Using either version of the proposed waveform interpolation coders, a pitch period track of the speech signal is estimated by a pitch tracking unit which uses standard commonly known techniques, with the pitch period track also continuing in regions of no discernable periodicity. Hereinafter, a speech signal is defined to be either the original speech signal or any signal derived from a speech signal, for example, a linear-prediction residual signal.
A digitized speech signal and the pitch-period track form an input to a time warping unit which outputs a speech signal having a fixed number of samples per pitch period. This constant-pitch-period speech signal forms an input to a nonadaptive filter bank. The coefficients coming out of the filter bank are quantized and the corresponding indices encoded with the quantization procedure potentially involving multiple steps. At the receiver, the quantized coefficients are reconstructed from the transmitted quantization indices. These coefficients form an input to a synthesis filter bank which produces the reconstructed signal as an output. The filter banks are perfect reconstruction filter banks (e.g., P. P. Vaidyanathan, “Multirate Systems and Filterbanks”, Prentice Hall, 1993) which result in an perfect reconstruction when the analysis and synthesis banks are concatenated, that is to say, when the quantization is turned off. Thus, the coder possesses the AE property if an appropriate unwarping procedure is used.
In the two AE versions of the waveform interpolation coder described above, a Gabor-transform and a Modulated Lapped Transform (MLT) were used as filter banks, respectively. Both procedures suffer from disadvantages which are difficult to overcome in practice. A primary disadvantage exhibited by both procedures is of increased delay. In general, the Gabor-transform based waveform interpolation coder requires an over-sampled filter bank for good performance. This means that the number of coefficients to be quantized is larger than the original speech signal, which is a practical disadvantage for coding. When the MLT is used, the coder parameters are not easily converted into either a description of the speech waveforms or a description of the harmonics associated with voiced speech. This makes it more difficult to evaluate the effects of time-domain and frequency-domain masking.
In the Gabor-transform approach, the reconstructed signal is a summation of smoothly windowed complex exponential (sinusoid) functions (vectors). The scaling and summing of the functions is equivalent to the implementation of the synthesis filter bank. The coefficients for each of these windowed exponential functions form the representation to be quantized. In s

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for pre-processing speech signals prior... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for pre-processing speech signals prior..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for pre-processing speech signals prior... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2449013

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.