Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-04-20
2002-12-31
Knepper, David D. (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S220000, C704S221000
Reexamination Certificate
active
06502069
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to scalable audio coders and audio decoders and in particular to scalable coders and decoders for which at least one stage operates in the frequency domain.
BACKGROUND OF THE INVENTION AN DESCRIPTION OF PRIOR ART
Scalable audio coders are coders of modular design. An effort is therefore made to use already existing speech coders, which process signals which e.g. are sampled with 8 kHz and produce data rates of e.g. 4.8 to 8 kilobits per second. These known coders, such as e.g. the coders G.729, G.723, FS1016, CELP or parametric models for MPEG-4-Audio, which are known to persons skilled in the art, serve primarily for coding speech signals and are not generally suitable for coding higher quality music signals since they are normally designed for signals sampled with 8 kHz, so that they can only code an audio bandwidth of 4 kHz at the most. In general, however, they exhibit a low sampling rate and good quality for speech signals.
For the audio coding of music signals, e.g. to achieve HIFI quality or CD quality, with a scalable coder a speech coder is therefore combined with an audio coder, which can code signals with higher sampling rates, e.g. 48 kHz. Obviously it is also possible to replace the speech coder cited above by another coder, e.g. by a music/audio coder according to the Standards MPEG1, MPEG2 or MPEG4.
A chain circuit of this kind comprises a speech coder and a higher quality audio coder. An input signal, having a sampling rate of 48 kHz e.g., is converted by means of a downsampling filter to the appropriate sampling frequency for the speech coder. The sampling rate could, however, also be the same in both coders. The converted signal is then coded. The coded signal can be supplied directly to a bit stream formatting device for transmission. However, it only contains signals with a bandwidth of e.g. 4 kHz at the most. The coded signal is also decoded again and converted by means of an upsampling filter. Because of the downsampling filter, however, the signal now obtained only contains useful information with a bandwidth of e.g. 4 kHz. In addition it must be recorded that the spectral content of the converted coded/decoded signal in the lower band to 4 kHz does not correspond exactly to the first 4 kHz band of the input signal sampled with 48 kHz since in general coders introduce coding errors.
As has already been mentioned, a scalable coder comprises a generally known speech coder and an audio coder which can process signals with higher sampling rates. To be able to transmit signal components of the input signal which have frequencies above 4 kHz, the difference between the input signal at 8 kHz and the coded/decoded converted output signal of the speech coder is formed for each individual discrete-time sampled value. This difference can then be quantized and coded using a known audio coder, as is known to persons skilled in the art. It should be pointed out here that, apart from coding errors, the difference signal which is fed to the audio coder, which can code signals with higher sampling rates, is essentially zero in the lower frequency range. In the spectral range lying above the bandwidth of the upward converted coded/decoded output signal of the speech coder, the difference signal substantially corresponds to the true input signal at 48 kHz.
In the first stage, i.e. the speech coder stage, a coder with low sampling frequency is therefore generally used, since in general a very low bit rate of the coded signal is aimed at. At the present time a number of coders, including the cited coders, work with bit rates of a few kilobits (two to 8 kilobits or also more). Furthermore, these enable a maximum sampling frequency of 8 kHz, since more audio bandwidth is not possible anyway at this low bit rate and the coding at low sampling frequency is more advantageous as regards the computational effort. The maximum possible audio bandwidth is 4 kHz and in practice it is restricted to about 3.5 kHz. If a bandwidth improvement is to be achieved in the further stage, i.e. in the stage with the audio coder, this further stage must work with a higher sampling frequency.
The use of the so-called TNS technique in high quality audio coding to further reduce the amount of data has been known for some time (J. Herre, J. D. Johnston, “Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping (TNS)”, 101st AES Convention, Los Angeles 1996, Preprint 4384). The TNS technique (TNS=Temporal Noise Shaping), generally speaking, permits temporal shaping of the fine structure of the quantization noise by means of a predictive coding of the spectral values. The TNS technique is based on a consistent application of the dualism between the time domain and the frequency domain. In the technical field it is known that when the autocorrelation function of a time signal is transformed into the frequency domain it gives the spectral power density of this very time signal. The dual case hereto results when the autocorrelation function of the spectrum of a signal is formed and transformed into the time domain. The autocorrelation function transformed into or back into the time domain is also called the square of the Hilbert envelope curve of the time signal. The Hilbert envelope curve of a signal is thus connected directly with the autocorrelation function of its spectrum. The squared Hilbert envelope curve of a signal and the spectral power density of the same thus represent dual aspects in the time domain and in the frequency domain. If the Hilbert envelope curve of a signal remains constant for each partial bandpass signal over a range of frequencies, then the autocorrelation between neighbouring spectral values will also be constant. This means in fact that the series of spectral coefficients is stationary versus frequency, so that predictive coding techniques can be used efficiently to represent this signal and this, furthermore, by using a common set of prediction coefficients.
To clarify the situation, reference is made to FIG.
6
A and FIG.
6
B.
FIG. 6A
shows a short section of a temporally strongly transient “castanet” signal with a duration of about 40 ms. This signal was decomposed into a multiplicity of partial bandpass signals, each partial bandpass signal having a bandwidth of 500 Hz.
FIG. 6B
now shows the Hilbert envelope curves for these bandpass signals with middle frequencies ranging from 1500 Hz to 4000 Hz. To make things clearer, all the envelope curves have been normalized to their maximum amplitude. Clearly the shapes of all the single envelope curves are very similar to one another, which is why a common predictor can be used within this frequency range to code the signal efficiently. Similar observations can be made for speech signals in which the effect of the glottal excitation pulses is present over the whole frequency range because of the nature of the human speech generation mechanism.
FIG. 6B
thus shows that the correlation of neighbouring values e.g. at a frequency of 2000 Hz is similar to that at a frequency of e.g. 3000 Hz or 1000 Hz.
Alternatively, the property of spectral predictability of transient signals can be understood by considering the table shown in FIG.
5
. At the top left of the table a continuous time signal u(t) is shown in the form of a sine wave. Next to this is the spectrum U(f) of this signal, consisting of a single Dirac pulse. The optimal coding of this signal consists in the coding of spectral data or spectral values since, for the complete time signal, only the magnitude and the phase of the Fourier coefficient have to be transmitted here in order to be able to reconstruct the time signal completely. A coding of spectral data corresponds at the same time to a prediction in the time domain. A predictive coding would thus have to take place here in the time domain. The sinusoidal time signal thus has a flat temporal envelope curve, which corresponds to a maximally non-flat envelope curve in the frequency domain.
The opposite case will now be considered in
Brandenburg Karlheinz
Gerhauser Heinz
Grill Bernhard
Herre Jürgen
Teichmann Bodo
Abebe Daniel
Beyer Weaver & Thomas
Fraunhofer-Gesellschaft zur Forderung der ange-wandten Forschung
Knepper David D.
LandOfFree
Method and a device for coding audio signals and a method... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and a device for coding audio signals and a method..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and a device for coding audio signals and a method... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2937988