Data processing: speech signal processing – linguistics – language – Speech signal processing – Psychoacoustic
Reexamination Certificate
1999-05-28
2001-10-23
Korzuch, William (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Psychoacoustic
C704S229000, C704S501000
Reexamination Certificate
active
06308150
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to a dynamic bit allocation apparatus and method for audio coding, and in particular, to a dynamic bit allocation apparatus and method for audio coding for encoding digital audio signals so as to generate efficient information data in order to transmit digital audio signals via a digital transmission line or to store digital audio signals in a digital storage media or recording media.
DESCRIPTION OF THE PRIOR ART
Following the recent advent of digital audio compression algorithms, some of those algorithms have been applied in consumer applications. A typical example is the ATRAC algorithm used in Mini-Disc products. This algorithm is described in Chapter 10 of the Mini-Disc system description Rainbow Book by Sony in September 1992. The ATRAC algorithm belongs to a class of hybrid coding scheme that uses both subband and transform coding.
FIG. 21
is a block diagram showing a configuration of an ATRAC encoder
100
a
equipped with a dynamic bit allocation module
109
a
for performing dynamic bit allocation process according to the prior art.
Referring to
FIG. 21
, an incoming analog audio signal is, first of all, converted from analog to digital form by an A/D converter
112
with a specified sampling frequency so as to be segmented into frames each having 512 audio samples (audio sample data). Each frame of the audio samples is then inputted to a QMF analysis filter module
111
which performs two-level QMF analysis filtering. The QMF analysis filter module
111
comprises a QMF filter
101
, a delayer
102
and a QMF filter
103
. The QMF filter
101
splits an audio signal having 512 audio samples into two subband (high band and middle/low band) signals each having an equal number (256) of audio samples, and the middle/low subband signal is further split by the QMF filter
103
into two subband (middle band and low band) signals having another equal number (128) of audio samples. The high subband signal is delayed by a delayer
102
by a time required for the process of the QMF filter
103
, so that the high subband signal is synchronized with the middle subband signal and the low subband signal in the subband signals of individual frequency bands outputted from the QMF analysis filter module
111
.
Subsequently, a block size determination module
104
determines individual block size modes of MDCT (Modified Discrete Cosine Transform) modules
105
,
106
and
107
to be used for the three subband signals, respectively. The block size mode is fixed at either long block having a specified longer time interval or short block having a specified shorter time interval. When an attack signal having an abruptly high level of spectral amplitude value is detected, the short block mode is selected. All the MDCT spectral lines are grouped into 52 frequency division bands. Hereinafter, frequency division bands will be referred to as units. The grouping is done so that each of lower frequency units has smaller number of spectral lines compared to that of each of higher frequency units.
This grouping of units is performed based on a critical band. The term “critical band” or “critical bandwidth” refers to a band which is nonuniform on the frequency axis used in the processing of noise by the human auditory sense, where the critical-band width broadens with increasing frequency, for example, the frequency width is 100 Hz for 150 Hz, 160 Hz for 1 kHz, 700 Hz for 4 kHz, and 2.5 kHz for 10.5 kHz.
A scale factor SF[n] showing a level of each unit is computed in a scale factor module
108
by selecting in a specified table the smallest value from among values that are larger than the maximum amplitude spectral line in the unit. In a dynamic bit allocation module
109
a
, a word length WL[n], which is the number of bits allocated to quantize each spectral sample of a unit, is determined. Finally, the spectral samples of the units are quantized in a quantization module
110
with the use of side information comprising scale factor SF[n] and word length WL[n] of bit allocation data, and then audio spectral data ASD[n] is outputted.
The dynamic bit allocation module
109
a
plays an important role in determining the sound quality of the coded audio signal as well as the implementation complexity. Some of the existing methods make use of the variance of spectral level of the unit to perform the bit allocation. In the bit allocation process, the unit with the highest variance is, first of all, searched, and then, one bit is allocated to the unit. The variance of spectral level of this unit is then reduced by a certain factor. This process is repeated until all the bits available for bit allocation are exhausted. This method is highly iterative and consumes a lot of computational power. Moreover, the lack of use of psychoacoustic masking phenomenon makes it difficult for this method to achieve good sound quality. Other methods such as the ones used in the ISO/IEC 11172-3 MPEG Audio Standard use a very complicated psychoacoustic model and also an iterative bit allocation process.
It is well known to those skilled in the art that established digital audio compression systems such as MPEG1 Audio Standards make use of a psychoacoustics model of the human auditory system to estimate an absolute threshold of masking effect, by which quantization noise is made inaudible when the quantization noise is kept below the absolute threshold. Although two psychoacoustics models proposed by MPEG1 Audio Standards do achieve a good sound quality, those models are far too complicated to implement in low-cost LSIs for. consumer applications. This gives rise to a need of simplified masking threshold computation.
SUMMARY OF THE INVENTION
An essential object of the present invention is therefore to provide a dynamic bit allocation apparatus for audio coding which can be used widely for almost all digital audio compression systems and besides implemented simply with low cost.
Another object of the present invention is therefore to provide a dynamic bit allocation method for audio coding which can be used widely for almost all digital audio compression systems and besides implemented simply with low cost.
In order to achieve the aforementioned objective, according to the present invention, there is provided a dynamic bit allocation apparatus or method for audio coding for determining a number of bits used to quantize a plurality of decomposed samples of a digital audio signal, the plurality of samples being grouped into a plurality of units each having at least either one of different frequency intervals or time intervals, the different frequency intervals being determined based on a critical band of human audio characteristics and the different time intervals including a first time interval and a second time interval longer than the first time interval. The apparatus and method of the present invention includes the following steps of:
(a) an absolute threshold setting step for setting an absolute threshold for every unit based on a specified threshold characteristic in quiet representing whether or not a person is audible in quiet;
(b) an absolute threshold adjusting step for adjusting the absolute threshold of a unit having the first time interval by replacing the absolute threshold of the unit having the first time interval by a minimum absolute threshold among a plurality of units having the same frequency interval;
(c) a peak energy computing step for computing peak energies of the units based on the plurality of samples grouped into the plurality of units;
(d) a masking effect computing step for computing a masking effect that is a minimum audible limit with the simplified simultaneous masking effect model based on a specified simplified simultaneous masking effect model and a peak energy of a masked unit when all the units have the second time interval, and updating and setting the absolute threshold of each unit with the computed masking effect;
(e) a signal-to-maskratio (SMR) computation step for computing SMRs o
Neo Sua Hong
Shen Sheng Mei
Tan Ah Peng
Greenblum & Bernstein P.L.C.
Korzuch William
Lerner Martin
Matsushita Electric - Industrial Co., Ltd.
LandOfFree
Dynamic bit allocation apparatus and method for audio coding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Dynamic bit allocation apparatus and method for audio coding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic bit allocation apparatus and method for audio coding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2608791