Data processing: speech signal processing – linguistics – language – Speech signal processing – Psychoacoustic
Reexamination Certificate
2000-06-07
2004-06-22
Dorvil, Richemond (Department: 2697)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Psychoacoustic
C704S500000
Reexamination Certificate
active
06754618
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to the field of encoding and decoding audio information and particularly to the encoders and decoders employing the MPEG standard for audio information.
2. Description of the Prior Art
In modern communication systems there is an increasing demand for transfer and dissemination of greater quantities of information at faster speeds. In order to transfer greater quantities of information at ever increasing speeds without sacrificing accuracy, data compression is performed at the point of origination and data system. Compression and decompression result in a simpler format for the information to be transmitted thereby increasing the speed and efficiency of the transmission process.
Data compression is effected by employing a variety of encoding techniques presently available. Each of the encoding techniques results in a specific format for the compressed data. When the encoded information is transferred to the destination point, data decompression is performed by decoding the transmitted data in order to retrieve the original information. The process of encoding and decoding must be fast enough to allow for real-time presentation of data in such cases as in the transmission of audio and video information.
Digital audio is a basic component of any video or multimedia application. Due to the large bandwidth occupied by digital audio in any such application, compression of the audio data is an important part of the encoding process. Audio compression is generally performed by taking into consideration the characteristics of the audio signal and the human perception system as embodied in a psychoacoustic model. There are two main high-fidelity audio compression techniques: the Motion Picture Expert Group (MPEG) audio standard and the Dolby Digital audio compression algorithms developed by the Dolby Laboratories.
FIG.
1
(
a
) shows a block diagram of an MPEG encoder for a single audio channel. In multichannel systems the same process is repeated for each channel. The audio input
12
consisting of pulse code modulated (PCM) samples, each having a precision of 16 to 24 bits, is shown to constitute the input to the encoder
10
. The PCM samples are sampled at 32, 44.1 or 48 KHz frequency. The first stage of the encoder
10
is the analysis filterbank
14
which maps the input signal from the time domain into the frequency domain. The analysis filterbank
14
consists of 32 band-pass filters each of which is a 512-tap band-pass filter.
In addition, based on the frequency characteristics of the input signal and the desired bit rate of the compressed signal, the perceptual model
20
estimates the masking thresholds. Masking threshold is a sound pressure level below which the human ear is less sensitive so that any noise or distortion introduced by the encoder becomes almost imperceptible. For example, in the frequency domain a faint signal may be completely masked if it is in the vicinity of louder signals with similar frequency content. The masking thresholds are used in the quantization and coding step
16
as described hereinbelow.
The output of each subband filter is normalized by the scaling factors that will be transmitted as part of the compressed bitstream. Scaling factors correspond to the maximum absolute value of every twelve consecutive output values in each subband. The output of the analysis filterbank
14
is quantized in the quantization and coding step
16
in such a way that all quantization noise is below the masking thresholds thereby being almost imperceptible to the human ear. Finally, the quantized subband samples, the scaling factors and the bit-allocation information are multiplexed in the bitstream encoding step
18
and transmitted as the compressed stream output
22
.
FIG.
1
(
b
) shows a block diagram of an MPEG decoder
30
used in recovering the PCM audio samples from the encoded data. The encoded bitstream
24
is shown in FIG.
1
(
b
) as input to the decoder
30
. At the step frame unpacking
26
of decoding the encoded bitstream
24
is parsed and various pieces of coding information such as scaling factors and bit allocation information are demultiplexed. Subsequently, at the reconstruction step
28
the bit allocation information is decoded and the scaling factors are extracted. The bit allocation information is decoded and the scaling factors are used to requantize the coded samples. Finally, at the step inverse mapping
34
the mapped samples are transformed back into the PCM output
32
corresponding to the input signal of the encoder
10
.
Some of the steps used in the encoding process are computationally intensive. For example, the analysis filterbank step
14
and the perceptual model step
20
in the encoder flowchart
10
require intensive computations commonly performed by a fixed-point digital signal processor (DSP). Performing intensive computations requires considerable amount of time severely limiting the performance of the encoder during real-time transmission of audio signals.
One of the quantities to be computed in the perceptual model step
20
is the masking threshold as discussed hereinabove. According to the MPEG audio coding standard ISO/IEC 11172-3, “coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s—part 3: Audio,” ISO/IEC JTC 1/SC29, May 20, 1993, hereinafter referred to as the MPEG Standard, calculating masking threshold entails evaluating such trigonometric function as sine, cosine and inverse tangent which represents a computationally intensive task for a DSP. Evaluating such trigonometric function is needed in computing the unpredictability measure, which is in turn used in determining the masking threshold as described in detail in the MPEG Standard.
Another difficulty currently encountered in the perceptual model step
20
lies in the huge dynamic range of the input data. The MPEG Standard calls for a coverage of about 101 dB (−5 dB to 96 dB) in dynamic range. Every bit covers 3 dB so that the MPEG Standard requires 34 or more bits of digital representation. However, most fixed-point DSP chips for audio are 16 or 24 bits in data width. Although floating-point DSP chips can accommodate higher data widths, fixed-point DSP chips are by far more prevalent due to their smaller size and lower cost. According, the input data has to be scaled in order to fall within the dynamic range of the DSP architecture.
Scaling factors are used to scale down the large input signals in order to avoid clipping. i.e., cutting off an input signal whose sound energy level extends beyond the dynamic range of the DSP. Once the input data has been scaled down, a particular table in the MPEG Standard is used to determine the absolute threshold value used in computing the masking threshold. However, as the input data is consistently scaled down, too few bits may be assigned to represent the weak signal resulting in the problem of underflow, i.e., losing some of the information carried in the weaker signals.
Moreover, there are limitations currently associated with the decoder
30
in FIG.
1
(
b
). One such limitation is in the reconstruction step
28
of the decoding process wherein the coded samples have to be requantized so that a specific number of bits are allocated to each coded sample. Requantization is performed by determining the requantization step from a set of four 16 by 32 tables provided in the MPEG Standard. The four different tables correspond to four different bit rates and sampling frequencies. To each entry in the tables corresponds a set of four number. One of the numbers indicates the number of bits per sample and the rest of the numbers are used in the subsequent inverse mapping step
34
. Thus the total number of entries stored in the memory of the decoder corresponds to four 16 by 32 by 4 tables. Thus, considerable memory space has to be devoted to the reconstruction step of the decoding process rendering the decoder less efficient and more expensive.
In light of the above, it is
Chen Shaomei
Konstantinides Konstantinos
Zhou Linjun
Cirrus Logic Inc.
Dorvil Richemond
Imam, Esq. Maryam
Lin, Esq. Steven
Patel Kinari
LandOfFree
Fast implementation of MPEG audio coding does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Fast implementation of MPEG audio coding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fast implementation of MPEG audio coding will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3364425