Data processing: speech signal processing – linguistics – language – Speech signal processing – Psychoacoustic
Reexamination Certificate
2000-06-02
2004-01-13
Chawan, Vijay (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Psychoacoustic
C704S503000, C704S203000, C704S243000, C704S206000, C704S219000, C704S229000
Reexamination Certificate
active
06678647
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to audio coding techniques, and more particularly, to perceptually-based coding of audio signals, such as speech and music signals.
BACKGROUND OF THE INVENTION
Perceptual audio coders (PAC) attempt to minimize the bit rate requirements for the storage or transmission (or both) of digital audio data by the application of sophisticated hearing models and signal processing techniques. Perceptual audio coders are described, for example, in D. Sinha et al., “The Perceptual Audio Coder,” Digital Audio, Section 42, 42-1 to 42-18, (CRC Press, 1998), incorporated by reference herein. In the absence of channel errors, a PAC is able to achieve near stereo compact disk (CD) audio quality at a rate of approximately 128 kbps. At a lower rate of 96 kbps, the resulting quality is still fairly close to that of CD audio for many important types of audio material.
Perceptual audio coders reduce the amount of information needed to represent an audio signal by exploiting human perception and minimizing the perceived distortion for a given bit rate. Perceptual audio coders first apply a time-frequency transform, which provides a compact representation, followed by quantization of the spectral coefficients.
FIG. 1
is a schematic block diagram of a conventional perceptual audio coder
100
. As shown in
FIG. 1
, a typical perceptual audio coder
100
includes an analysis filterbank
110
, a perceptual model
120
, a quantization and coding block
130
and a bitstream encoder/multiplexer
140
.
The analysis filterbank
110
converts the input samples into a sub-sampled spectral representation. The perceptual model
120
estimates the masked threshold of the signal. For each spectral coefficient, the masked threshold gives the maximum coding error that can be introduced into the audio signal while still maintaining perceptually transparent signal quality. The quantization and coding block
130
quantizes and codes the spectral values according to the precision corresponding to the masked threshold estimate. Thus, the quantization noise is hidden by the respective transmitted signal. Finally, the coded spectral values and additional side information are packed into a bitstream and transmitted to the decoder by the bitstream encoder/multiplexer
140
.
FIG. 2
is a schematic block diagram of a conventional perceptual audio decoder
200
. As shown in
FIG. 2
, the perceptual audio decoder
200
includes a bitstream decoder/demultiplexer
210
, a decoding and inverse quantization block
220
and a synthesis filterbank
230
. The bitstream decoder/demultiplexer
210
parses and decodes the bitstream yielding the coded spectral values and the side information. The decoding and inverse quantization block
220
performs the decoding and inverse quantization of the quantized spectral values. The synthesis filterbank
230
transforms the spectral values back into the time-domain.
Generally, the amount of information needed to represent an audio signal is reduced using two well-known techniques, namely, irrelevancy reduction and redundancy removal. Irrelevancy reduction techniques attempt to remove those portions of the audio signal that would be, when decoded, perceptually irrelevant to a listener. This general concept is described, for example, in U.S. Pat. No. 5,341,457, entitled “Perceptual Coding of Audio Signals,” by J. L. Hall and J. D. Johnston, issued on Aug. 23, 1994, incorporated by reference herein.
Currently, most audio transform coding schemes implemented by the analysis filterbank
110
to convert the input samples into a sub-sampled spectral representation employ a single spectral decomposition for both irrelevancy reduction and redundancy reduction. The redundancy reduction is obtained by dynamically controlling the quantizers in the quantization and coding block
130
for the individual spectral components according to perceptual criteria contained in the psychoacoustic model
120
. This results in a temporally and spectrally shaped quantization error after the inverse transform at the receiver
200
. As shown in
FIGS. 1 and 2
, the psychoacoustic model
120
controls the quantizers
130
for the spectral components and the corresponding dequantizer
220
in the decoder
200
. Thus, the dynamic quantizer control information needs to be transmitted by the perceptual audio coder
100
as part of the side information, in addition to the quantized spectral components.
The redundancy reduction is based on the decorrelating property of the transform. For audio signals with high temporal correlations, this property leads to a concentration of the signal energy in a relatively low number of spectral components, thereby reducing the amount of information to be transmitted. By applying appropriate coding techniques, such as adaptive Huffinan coding, this leads to a very efficient signal representation.
One problem encountered in audio transform coding schemes is the selection of the optimum transform length. The optimum transform length is directly related to the frequency resolution. For relatively stationary signals, a long transform with a high frequency resolution is desirable, thereby allowing for accurate shaping of the quantization error spectrum and providing a high redundancy reduction. For transients in the audio signal, however, a shorter transform has advantages due to its higher temporal resolution. This is mainly necessary to avoid temporal spreading of quantization errors that may lead to echoes in the decoded signal.
As shown in
FIG. 1
, however, conventional perceptual audio coders
100
typically use a single spectral decomposition for both irrelevancy reduction and redundancy reduction. Thus, the spectral/temporal resolution for the redundancy reduction and irrelevancy reduction must be the same. While high spectral resolution yields a high degree of redundancy reduction, the resulting long transform window size causes reverbation artifacts, impairing the irrelevancy reduction. A need therefore exists for methods and apparatus for encoding audio signals that permit independent selection of spectral and temporal resolutions for the redundancy reduction and irrelevancy reduction. A further need exists for methods and apparatus for encoding speech as well as music signals using a psychoacoustic model (a noise-shaping filter) and a transform.
SUMMARY OF THE INVENTION
Generally, a perceptual audio coder is disclosed for encoding audio signals, such as speech or music, with different spectral and temporal resolutions for the redundancy reduction and irrelevancy reduction using cascaded filterbanks. The disclosed perceptual audio coder includes a first analysis filterbank for performing irrelevancy reduction in accordance with a psychoacoustic model and a second analysis filterbank for performing redundancy reduction. In this manner, the spectral/temporal resolution of the first filterbank can be optimized for irrelevancy reduction and the spectral/temporal resolution of the second filterbank can be optimized for maximum redundancy reduction.
The disclosed perceptual audio coder also includes a scaling block between the cascaded filterbank that scales the spectral coefficients, based on the employed perceptual model. The first analysis filterbank converts the input samples into a sub-sampled spectral representation to perform irrelevancy reduction. The second analysis filterbank performs redundancy reduction using a subband technique. A quantization and coding block quantizes and codes the spectral values according to the precision specified by the masked threshold estimate received from the perceptual model. The second analysis filterbank is optionally adaptive to the statistics of the signal at the input to the second filterbank to determine the best spectral and temporal resolution for performing the redundancy reduction.
REFERENCES:
patent: 5285498 (1994-02-01), Johnston
patent: 5481614 (1996-01-01), Johnston
patent: 5627938 (1997-05-01), Johnston
patent: 5727119 (1998-03-01), Davidson et al.
patent: 5852806 (1998-12-01),
Edler Bernd Andreas
Faller Christof
Agere Systems Inc.
Chawan Vijay
Ryan & Mason & Lewis, LLP
LandOfFree
Perceptual coding of audio signals using cascaded... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Perceptual coding of audio signals using cascaded..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Perceptual coding of audio signals using cascaded... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3200134