Data processing: speech signal processing – linguistics – language – Speech signal processing – Psychoacoustic
Reexamination Certificate
1999-04-06
2003-08-05
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Psychoacoustic
C704S204000, C704S229000
Reexamination Certificate
active
06604069
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a signal having quantized values and variable length codes for encoding input digital data by so-called high-efficiency encoding.
2. Description of the Related Art
A variety of high-efficiency encoding techniques exist for encoding audio or speech signals. Examples of these techniques include so-called transform coding as a a blocking frequency splitting system of the blocking frequency spectrum splitting system (orthogonal transform) and a so-called sub-band coding system (SBC) as a non-blocking frequency spectrum splitting system. In the transform coding, audio signals on the time axis are blocked every pre-set time interval; the blocked time-domain signals are transformed into signals on the frequency axis, and the resulting frequency-domain signals are split into plural frequency bands and encoded from sub-band to sub-band. In the sub-band coding system, the audio signals on the time axis are split into plural frequency sub-bands and encoded without blocking. In a combination of the sub-band coding system and the transform coding system, the audio signals on the time axis are split into plural frequency sub-bands by sub-band coding system, and the resulting band-based signals are transformed into frequency-domain signals by orthogonal transform for encoding.
As band-splitting filters used in the sub-band coding system, there is a so-called quadrature mirror filter (QMF) discussed in R. E. Crochiere, “Digital Coding of Speech in Sub-bands”, Bell Syst. Tech. J., Vol. 55, No. 8, 1976. This QMF filter divides the frequency spectrum in two bands of equal bandwidths. With the QMF filter, so-called aliasing is not produced on subsequent synthesis of the band-split signals.
The technique of splitting the frequency spectrum is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filters—A New Subband Coding Technique”, ICASSP 83 BOSTON. With the polyphase quadrature filter, the signal can be split into plural frequency sub-bands of equal bandwidths.
Among the technique for orthogonal transform, there is a technique in which the input audio signal is split into frames of a predetermined time duration and the resulting frames are processed by discrete Fourier transform (DFT), discrete cosine transform (DCT) or modified DCT (MDCT) to convert the signals from the time axis to the frequency axis. Discussions of a MDCT may be found in J. P. Princen and A. B. Bradley, “Subband/Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation”, ICASSP 1987.
If DFT or DCT is used as method for orthogonal transform of the waveform signal, and a transform is performed with time blocks each consisting of, for example, M sample data, M independent real-number data are obtained. Since M1 sample data are overlapped between neighboring time blocks for reducing connection distortion of time blocks, M real-number data are obtained on an average for (M-M1) sample data with DFT or DCT, so that these M real-number data are subsequently quantized and encoded.
If the above-described MDCT is used as the orthogonal transform method, M independent real-number data are obtained from 2M samples resulting from overlapping N sample data with both neighboring time blocks. That is, if MDCT is used, M real-number data are obtained from M sample data on an average. These M real-number data are subsequently quantized and encoded. In the decoding apparatus, waveform elements obtained on inverse transform in each block from the codes obtained using MDCT are summed together with interference for reconstructing waveform signals.
In general, if the time block for orthogonal transform is lengthened, frequency resolution is increased, such that the signal energy is concentrated in specified spectral signal components. Therefore, by employing MDCT in which a long time block length obtained by overlapping one half of the sample data between neighboring time blocks is used for orthogonal transform and in which the number of resulting spectral signal components is not increased as compared to the number of the original time-domain sample data, a higher encoding efficiency may be realized than if the DFT or DCT is used. If a sufficiently long overlap between neighboring time blocks is used, the connection distortion between time blocks of waveforms signals can be reduced.
By quantizing signal components split from sub-band to sub-band by a filter or orthogonal transform, it becomes possible to control the sub-band subjected to quantization noise, thus enabling encoding with perceptually higher encoding efficiency by exploiting masking effects. By normalizing respective sample data with the maximum value of the absolute values of the signal components in each band prior to quantization, a still higher encoding efficiency may be achieved.
It is preferable that the psychoacoustic characteristics of human beings are taken into account in determining the band splitting width for quantizing the signal components resulting from splitting the frequency spectrum of the audio signals. That is, the frequency spectrum of the audio signals is divided into a plurality of, for example, 25, critical subbands. The width of the critical subbands increases with increasing frequency. In encoding the subband-based data in such case, bits are fixedly or adaptively allocated among the various critical subbands. For example, when applying adaptive bit allocation to the special coefficient data resulting from a MDCT, the spectra coefficient data generated by the MDCT within each of the critical subbands is quantized using an adaptively allocated number of bits. The following two techniques are known as the bit allocation technique.
In R. Zelinsky and P. Noll, “Adaptive transform Coding of Speech Signals”, IEEE Transaction of Acoustics, Speech and Signal processing”, vol. ASSP-25, August 1977, bit allocation is carried out on the basis of the amplitude of the signal in each critical subband. This technique produces a flat quantization spectrum and minimizes noise energy, but the noise level perceived by the listener is not optimum because the technique does not exploit the pyschoacoustic masking effect.
In M. A. Krassener, “The Critical Band Coder—Digital Encoding of the Perceptual Requirements of the Auditory System”, there is describe a technique in which the psychoacoustic masking effect is used to determine a fixed bit allocation that produces the necessary bit allocation for each critical subband. However, with this technique, since the bit allocation is fixed, non-optimum results are obtained even for a strongly tonal signal such as a sine wave.
For overcoming this problem, it has been proposed to divide the bits that may be used for bit allocation into a fixed pattern allocation fixed for each small block and a bit allocation portion dependent on the amplitude of the signal in each block. The division ratio is set depending on a signal related to the input signal such that the division ratio for the fixed allocation pattern portion becomes higher the smoother the pattern of the signal spectrum.
With this method, if the audio signal has high energy concentration in a specified spectral signal component, as in the case of a sine wave, abundant bits are allocated to a block containing the signal spectral component for significantly improving the signal-to-noise ratio as a whole. In general, the hearing sense of the human being is highly sensitive to a signal having sharp spectral signal components, so that, if the signal-to-noise ratio is improved by using this method, not only the numerical values as measured can be improved, but also the audio signal as heard may be improved in quality.
Various other bit allocation methods have been proposed and the perceptual models have become refined, such that, if the encoder is of high ability, a perceptually higher encoding efficiency may be realized.
In the methods, it has been customary to find a real-number reference value of bit allocation whereby the signal to noise ratio as found by calculations will be real
Sonnenschein Nath & Rosenthal
Sony Corporation
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Signals having quantized values and variable length codes does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Signals having quantized values and variable length codes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Signals having quantized values and variable length codes will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3079764