Method and device for detecting a transient in a...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Psychoacoustic

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S201000, C704S211000, C704S213000

Reexamination Certificate

active

06453282

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to the coding of audio signals and in particular to the coding of audio signals which exhibit transients (or “attacks” ).
BACKGROUND OF THE INVENTION AND DESCRIPTION OF PRIOR ART
In hearing-adjusted coding for the data reduction of audio signals the coding of the audio signals usually takes place in the frequency domain. This means that output values of a time-frequency transform are quantized and are then written into a bit stream, which can be stored or transmitted. A psychoacoustic model, which is implemented in the coder, calculates an instantaneous masked hearing or masking threshold and controls the quantization of the output values of the time-frequency transform in such a way that the coding error, i.e. the quantization error, is spectrally shaped and lies below this threshold so that the error is inaudible. As a result of this measure, however, the coding error is constant in time over the number of sampled values corresponding to the length of the transform window. The masked hearing or masking threshold is described in M. Zollner, E. Zwicker, Elektroakustik, Springer-Verlag, Berlin, Heidelberg, N. Y., 3rd edn, 1993.
To enable the calculation of the masked hearing threshold in the frequency domain to be performed as exactly as possible, a high frequency resolution of the time-frequency transform is necessary. In practical application instances, typical transform lengths in the range from 20 to 40 ms can occur.
If transient audio signals, i.e. audio signals with transients, are processed, the quantization noise may distribute itself “before” the maximum of the signal envelope curve, depending on the temporal position of the transient in the transform window. The nature of human perception is such that these so-called “pre-echos” can become audible if they occur more than 2 ms before the actual transient of the audio signal to be coded. This is the reason why, in many transform coders, the transform length of the time-frequency transform can be switched over to shorter windows, i.e. shorter block lengths, having a time length of typically 5 to 8 ms and consequently a higher time resolution. This enables a finer temporal shaping of the quantization noise and thus a suppression of these pre-echos, whereby these are no longer, or only very slightly, audible when the coded signal is decoded again in a decoder.
Devices for detecting a transient in an audio signal are thus used to match the transform length of the time-frequency transform to the properties, and in particular to the transient properties, of the audio signal as required by the human ear.
FIG. 3
shows a known transform coder
100
, which is in general implemented according to the Standard MPEG 1-2 Layer 3 (ISO/IEC IS 11172-3, Coding of Moving Pictures and Associated Audio, Part 3: Audio). A time signal arrives via an input
102
at a block Time/frequency transform
104
. The time signal at input
102
, which is typically a discrete-time audio signal obtained from a continuous-time time signal by means of a sampling device (not shown), is transformed by the block Time/frequency transform
104
into consecutive blocks of spectral values, which are passed to a block Quantization/coding
106
, the output signal of the block Quantization/coding consisting of quantized and redundancy-coded digital signals which, in a block Bit stream formatting
108
, are, together with necessary side information, formed into a bit stream, which appears at the output of the bit stream formatter
108
and which can be stored or transmitted.
The discrete-time audio signals at the input
102
are windowed in the block Time/frequency transform
104
so as to generate consecutive blocks with discrete-time windowed audio signals. The blocks of windowed discrete-time audio signals are subsequently, as already mentioned, transformed into the frequency domain. As is known from the field of telecommunications, the frequency resolution of the time-frequency transform is determined by the length of a block. To achieve sufficient time resolution for discrete-time audio signals with transient parts, the window length and thus the time length of a block of discrete-time sampled values must be shortened when coding these signals in order to avoid the pre-echos.
The known coder shown in
FIG. 3
performs the following method for detecting transients in an audio signal. From the block Time/frequency transform
104
the spectral components are fed into a block Psychoacoustic model
110
, the block
110
establishing on the one hand, as already mentioned at the outset, the masking or masked hearing threshold for the block Quantization/coding
106
and, on the other, from the signal energy characteristic of the discrete-time audio signal in the frequency domain and the calculated energy characteristic of the masked hearing threshold, an estimated value for the bit demand for coding the spectrum. The estimated bit demand, which experts also refer to as “perceptual entropy” (“pe” for short), is calculated from the following relationship:
pe
=

k
=
1
N

1
2

log
2

(
e

(
k
)
n

(
k
)
+
1
)
(
1
)
In equation (1) N is the number of spectral lines of a block, e(k) is the signal energy of the spectral components or spectral lines k and n(k) is the permitted interference energy of the line k. A rise in this perceptual entropy from one transform window to the next which exceeds a certain threshold value, designated as “switch_pe”, serves here to indicate a transient. If the threshold value switch_pe is exceeded, a switch over from a long window to a short window is effected in the block
104
so as to generate temporally shorter blocks of discrete-time audio signals in order to increase the time resolution of the transform coder
100
. The calculation rule depicted in equation (1) and the specification of the threshold value switch_pe are stipulated in a block Bit demand estimation
112
. The result of the bit demand estimation
112
is communicated to the time/frequency transform
104
and to the psychoacoustic model
110
, as is indicated in FIG.
3
.
A disadvantage of this known method is that the information on a possible transient or “attack” is not available until after the psychoacoustic model has been calculated. This has a particularly adverse effect on the temporal sequence structure of the coder, since the window information has to be fed back to the psychoacoustic model. Furthermore, changes in the parameters for calculating the masked hearing threshold always affect the value of the perceptual entropy. Changes in these parameters thus always entail changes in the window sequence, i.e. the sequence of long and short windows, of the transform.
FIG. 4
shows another known transform coder
150
, which is essentially similar in design to the transform coder
100
. In particular the same also has the input
102
for discrete-time audio signals, which are windowed and transformed into the frequency domain in the block
104
. Taking account of the psychoacoustic model
110
, the spectral output values of the block
104
are quantized and then coded in the block
106
and are written, together with side information, into an output bit stream by the bit stream formatter
108
.
The transform coder
150
shown in
FIG. 4
differs from the transform coder
100
shown in
FIG. 3
in the detection of transients in the audio signal. The detection of transients in the audio signal at input
102
which is shown in
FIG. 4
is described in the standard MPEG 2 AAC (see ISO/IEC IS 13818-7, Annex B, 2.1, MPEG-2 Advanced Audio Coding (AAC)). The block FFT transform and detection from the spectrum
152
performs detection of transients by means of a spectral energy rise. In particular, the discrete-time audio signal at input
102
is first transformed into the frequency domain by means of an FFT transform, the length of the FFT transform corresponding here to the transform length of the short windows. Then the FFT energies in the so-called “critical bands” are calculated. The “critical bands”

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and device for detecting a transient in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and device for detecting a transient in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and device for detecting a transient in a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2912152

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.