Block size determination and adaptation method for audio...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Psychoacoustic

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S230000, C704S504000, C704S229000

Reexamination Certificate

active

06424936

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the efficient information coding of digital audio signals for transmission or digital storage media.
2. Description of the Related Art
Audio compression algorithms using various frequency transforms such as subband coding, adaptive transform coding or their hybrids have been developed and used in a variety of commercial applications. Examples of adaptive transform coders include those reported by K. Brandenburg et al in “Aspec: Adaptive spectral entropy coding of high quality music signals”, 90
th
AES Convention, February 1991 and by M. Iwadare et al in “A 128 kb/s Hi-Fi Audio Codec based on Adaptive Transform Coding with Adaptive Block Size MDCT”, IEEE Journal on Selected Areas in Communications, Vol. 10, No. 1, January 1992. Examples of algorithms using hybrid subband and adaptive transform coding include the ISO/IEC 11172-3 Layer 3 algorithm and the ATRAC compression algorithm used in the Mini-Disc system. Details of these algorithms can be found in the “Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s Part 3: Audio (ISO/IEC 11172-3:1993)” document and chapter 10 of the MD system description document by Sony in Sep. 1992 respectively. The transform filter bank used by these algorithms is typically based on Modified Discrete Cosine Transform, first proposed by Princen and Bradley in “Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation”, Proceedings of the ICASSP 1987, pp 2161-2164.
In a typical transform encoder, as shown in
FIG. 5
, the input audio samples are first buffered by buffer
51
in frames, and at the same time passed to a block-size selector
52
to determine the suitable block size or window prior to the windowing and transform by window and transform unit
53
, of the audio samples. In a hybrid subband and transform coder such as the ATRAC algorithm, the input audio samples, sampled at 44.1 kHz, i.e., 44100 samples generated per second, are subjected to a hybrid subband and transform coding. The hybrid subband-transform front-end of the encoding process of the ATRAC algorithm is shown in FIG.
6
. The input audio samples are first subband filtered into two equal bandwidths using quadrature mirror filter
61
and the resultant lower frequency band is further subdivided into two equal bandwidths by another set of quadrature mirror filter
62
. Here, L, M, H means Low band, Middle band, and High band, respectively. A time delay
63
is used to time-align the signal in high-frequency band with those of the lower frequency bands. The subband samples are then separately passed to the block size selector
64
to determine suitable block sizes for the windowing and the modified discrete cosine transform processes in blocks
65
,
66
and
67
. One of the two block sizes or modes will be selected for each of the frequency bands. The transformed samples are then grouped into units and within each unit, a scale factor equivalent to or just exceeding the maximum amplitude of the unit samples is selected. The transformed samples are then quantized using the determined scale factors and the bit allocation information derived from the dynamic bit allocation unit
68
.
It is known that, in transform coding, a pre-echo or a noise/ringing effect in the silent period before a sudden increase of signal magnitude, or an attack, can occur, particularly if transform coding block size for the audio frame containing the attack is long. Modified discrete cosine transform with adaptive block sizes is typically used to reduce the pre-echo as well as the noise at block boundaries. The block sizes available for the transform coding must in the first place be selected such that if a signal attack were to be detected, a short block transform could be used to process the attack signal and will not give rise to ringing or noise signal to the adjacent blocks. When the size of the short block is made small enough, the pre-echo will not be audible. An important issue is the accurate detection of an attack signal itself.
The block size decision method outlined in the MD system description of Sep 1992 is shown in FIG.
7
. The peak detection step
71
identifies the peak value within each 32 sample block. The adjacent peak values are then compared in step
72
. In the decision step
73
, where the difference exceed 18 dB, mode
1
or the short block mode step
74
is selected. Otherwise, mode
3
or mode
4
which is the long block mode step
75
, for the different frequency bands, will be selected.
A highly effective audio signal classification and block size determination method is needed for very good reduction of pre-echo during adaptive transform or hybrid subband-transform coding. This is to render the pre-echo to be totally inaudible. While it is recognised that the actual block sizes being used for the transform is in itself an important factor, the accurate detection of signal attack and particularly the critical ones is very significant. Generally, it is desired to use long block for transform coding of the audio signals as the corresponding better frequency resolution obtained will give rise to more accurate redundancy and irrelevancy removal of the audio signal components. This is especially true for segments of the audio signals where the characteristics of the audio signal varies slowly. Short blocks are to be used only when identified to be absolutely necessary and for the critical attack signals. The block size decision method provided in the prior art does not give good result in transient or attack signal detection accuracy. It can fail to detect an attack signal which occur within a time interval of premasking duration. Premasking is the condition whereby presence of a fast buildup of loud sounds or attack occurring in time has a masking effect on the sound preceding the attack. Failure of such detection can sometimes give rise to undesirable audible effects. While single-tone masker experiments have demonstrated premasking duration lasting between 5 ms and 20 ms, empirically, pre-echo of shorter duration has been audible. The effective premasking duration is expected to be in the region of less than 5 ms. Post masking effect, the lingering masking effect after occurrence of a masker, typically spans 20 ms or more. Where long block frame size is typically less than 20 ms, the release of a peak signal is normally regarded as having insignificant effect. For very high accuracy block size determination, post masking effect could be taken into account.
SUMMARY OF THE INVENTION
This invention is based on the need for a high accuracy block size decision scheme and has taken into account temporal masking considerations, both the premasking and postmasking effects. In this invention, means of operating on full bandwidth audio signals or on limited bandwidth signals, for example, after subband filtering into frequency bands are possible. This invention has the means of grouping audio samples in a current considered frame into subframes of equal time interval of approximately 3 ms, in consideration of empirical premasking duration, excepting the final subframe which is of half the time interval; this said current considered frame together with the whole or half of the final subframe of the previous considered frame, and optionally, half subframe from the future frame constituting the extended frame, will be used for peak value estimate; means to identify the said peak values within the said subframes; means to compute the differences between said peak values of adjacent time intervals; means to, optionally, compute the differences between said peak values separated by a subframe time interval; means to decide whether long block size or short block size should be used after comparing the said differences with predetermined threshold. An alternative method comprises the means of grouping samples in current frame together with the whole or half of the final subframe of the previous considered

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Block size determination and adaptation method for audio... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Block size determination and adaptation method for audio..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Block size determination and adaptation method for audio... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2850932

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.