Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1997-01-15
2001-03-06
Hudspeth, David R. (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S230000
Reexamination Certificate
active
06199038
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a signal encoding method for encoding input digital data by high-efficiency encoding.
2. Description of the Related Art
A variety of high-efficiency encoding techniques exist for encoding audio or speech signals. Examples of these techniques include transform coding as a blocking frequency splitting system of the blocking frequency spectrum splitting system (orthogonal transform) and a sub-band coding system (SBC) as a non-blocking frequency spectrum splitting system. In transform coding, audio signals on the time axis are blocked every pre-set time interval, the blocked time-domain signals are transformed into signals on the frequency axis, and the resulting frequency-domain signals are split into plural frequency bands and encoded from band to band. In the sub-band coding system, the audio signals on the time axis are split into plural frequency bands and encoded without blocking. In a combination of the sub-band coding system and the transform coding system, the audio signals on the time axis are split into plural frequency bands by the sub-band coding system, and the resulting band-based signals are transformed into frequency-domain signals by orthogonal transform for encoding.
As band-splitting filters used in the sub-band coding system, there is a quadrature mirror filter (QMF) discussed in R. E. Crochiere, “Digital Coding of Speech in Subbands”, Bell Syst. Tech. J., Vol.55, No.8, 1976. This QMF filter divides the frequency spectrum into two bands of equal bandwidth. With the QMF filter, aliasing is not produced on subsequent synthesis of the band-split signals.
The technique of splitting the frequency spectrum is discussed in Joseph H. Rothweiler, Polyphase Quadrature Filters-A New Subband Coding Technique”, ICASSP 83 BOSTON. With a polyphase quadrature filter, the signal can be split into plural frequency bands of equal bandwidths.
Among the techniques for orthogonal transform, there is known such a technique in which an input audio signal is split into frames of a predetermined time duration and the resulting frames are processed by discrete Fourier transform (DFT), discrete cosine transform (DCT) or modified DCT (MDCT) to convert the signals from the time axis to the frequency axis. Discussions of a MDCT may be found in J. P. Princen and A. B. Bradley, “Subband/Transform Coding Using Filter Bank Based on Time Domain Aliasing Cancellation”, ICASSP 1987.
If DFT or DCT is used as the method for orthogonal transform of the waveform signal, and a transformation is performed with time blocks each consisting of, for example, M sample data, M independent real-number data are obtained. Since M1 sample data are overlapped between neighboring time blocks for reducing connection distortion of time blocks, M real-number data are obtained on an average for (M−M1) sample data with DFT or DCT, so that these M real-number data are subsequently quantized and encoded.
If the above-described MDCT is used as the orthogonal transform method, M independent real-number data are obtained from 2M samples resulting from overlapping N sample data with both neighboring time blocks. That is, if MDCT is used, M real-number data are obtained from M sample data on an average. These M real-number data are subsequently quantized and encoded. In the decoding apparatus, waveform elements obtained on inverse transform in each block from the codes obtained using MDCT are summed together with interference for reconstructing waveform signals.
In general, if the time block for orthogonal transform is lengthened, frequency resolution is increased, such that the signal energy is concentrated in specified spectral signal components. Therefore, by employing MDCT in which a long time block length obtained by overlapping one-half sample data between neighboring time blocks is used for orthogonal transform and in which the number of resulting spectral signal components is not increased as compared to the number of original time-domain sample data, a higher encoding efficiency may be realized than if the DFT or DCT is used. If a sufficiently long overlap between neighboring time blocks is used, connection distortion between time blocks of waveform signals can be reduced.
By quantizing signal components split from band to band by a filter or orthogonal transform, it becomes possible to control the band subjected to quantization noise, thus enabling encoding with perceptually higher encoding efficiency by exploiting masking effects. By normalizing respective sample data with maximum value of the absolute values of the signal components in each band prior to quantization, the encoding efficiency may be improved further.
As the band splitting width used for quantizing the signal components resulting from splitting of the frequency spectrum of the audio signals, the band width taking into account the psychoacoustic characteristics of the human being is preferably used. That is, the frequency spectrum of the audio signals is preferably split into a plurality of, for example, 25, critical bands. The width of the critical bands increases with increasing frequency. In encoding the band-based data in such case, bits are fixedly or adoptively allocated among the various critical bands. For example, when applying adaptive bit allocation to the special coefficient data resulting from a MDCT, the spectral coefficient data generated by the MDCT within each of the critical bands is quantized using an adoptively allocated number of bits. The following two techniques are known bit allocation techniques.
In R. Zelinsky and P. Noll, “Adaptive transform Coding of Speech Signals”, IEEE Transactions of Acoustics, Speech and Signal processing”, vol. ASSP-25, August 1977, bit allocation is carried out on the basis of the amplitude of the signal in each critical band. This technique produces a flat quantization spectrum and minimizes noise energy, but the noise level perceived by the listener is not optimum because the technique does not exploit the psychoacoustic masking effect.
In M. A. Krassener, “The Critical Band Coder-Digital Encoding of the Perceptual Requirements of the Auditory System”, there is described a technique in which the psychoacoustic masking effect is used to determine a fixed bit allocation that produces the necessary bit allocation for each critical band. However, with this technique, since the bit allocation is fixed, non-optimum results are obtained even for a strongly tonal signal such as a sine wave.
For overcoming this problem, it has been proposed to divide the bits that may be used for bit allocation into a fixed pattern allocation fixed for each small block and a bit allocation portion dependent on the amplitude of the signal in each block. The division ratio is set depending on a signal related to the input signal such that the division ratio for the fixed allocation pattern portion becomes higher the smoother the pattern of the signal spectrum.
With this method, if the audio signal has high energy concentration in a specified spectral signal component, as in the case of a sine wave, abundant bits are allocated to a block containing the signal spectral component for significantly improving the signal-to-noise ratio as a whole. In general, the hearing sense of a human being is highly sensitive to a signal having sharp spectral signal components, so that, if the signal-to-noise ratio is improved by using this method, not only the numerical values as measured can be improved, but also the audio signal as heard may be improved in quality.
Various other bit allocation methods have been proposed and the perceptual models have become refined, such that, if the encoder is of high ability, a perceptually higher encoding efficiency may be realized.
In these methods, it has been customary to find a real-number reference value of bit allocation whereby the signal to noise ratio as found by calculations will be realized as faithfully as possible and to use an integer approximate to this reference value as the allocated number of bits.
In
Hudspeth David R.
Smith Andrew V.
Sony Corporation
Zintel Harold
LandOfFree
Signal encoding method using first band units as encoding... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Signal encoding method using first band units as encoding..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Signal encoding method using first band units as encoding... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2546277