Data processing: speech signal processing – linguistics – language – Speech signal processing – Psychoacoustic
Reexamination Certificate
2000-06-02
2004-08-17
Chawan, Vijay (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Psychoacoustic
C704S219000, C704S220000, C704S230000, C704S503000, C704S205000
Reexamination Certificate
active
06778953
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to audio coding techniques, and more particularly, to perceptually-based coding of audio signals, such as speech and music signals.
BACKGROUND OF THE INVENTION
Perceptual audio coders (PAC) attempt to minimize the bit rate requirements for the storage or transmission (or both) of digital audio data by the application of sophisticated hearing models and signal processing techniques. Perceptual audio coders (PAC) are described, for example, in D. Sinha et al., “The Perceptual Audio Coder,” Digital Audio, Section 42, 42-1 to 42-18, (CRC Press, 1998), incorporated by reference herein. In the absence of channel errors, a PAC is able to achieve near stereo compact disk (CD) audio quality at a rate of approximately 128 kbps. At a lower rate of 96 kbps, the resulting quality is still fairly close to that of CD audio for many important types of audio material.
Perceptual audio coders reduce the amount of information needed to represent an audio signal by exploiting human perception and minimizing the perceived distortion for a given bit rate. Perceptual audio coders first apply a time-frequency transform, which provides a compact representation, followed by quantization of the spectral coefficients.
FIG. 1
is a schematic block diagram of a conventional perceptual audio coder
100
. As shown in
FIG. 1
, a typical perceptual audio coder
100
includes an analysis filterbank
110
, a perceptual model
120
, a quantization and coding block
130
and a bitstream encoder/multiplexer
140
.
The analysis filterbank
110
converts the input samples into a sub-sampled spectral representation. The perceptual model
120
estimates a masked threshold of the signal. For each spectral coefficient, the masked threshold gives the maximum coding error that can be introduced into the audio signal while still maintaining perceptually transparent signal quality. The quantization and coding block
130
quantizes and codes the spectral values according to the precision corresponding to the masked threshold estimate. Thus, the quantization noise is hidden by the respective transmitted signal. Finally, the coded spectral values and additional side information are packed into a bitstream and transmitted to the decoder by the bitstream encoder/multiplexer
140
.
FIG. 2
is a schematic block diagram of a conventional perceptual audio decoder
200
. As shown in
FIG. 2
, the perceptual audio decoder
200
includes a bitstream decoder/demultiplexer
210
, a decoding and inverse quantization block
220
and a synthesis filterbank
230
. The bitstream decoder/demultiplexer
210
parses and decodes the bitstream yielding the coded spectral values and the side information. The decoding and inverse quantization block
220
performs the decoding and inverse quantization of the quantized spectral values. The synthesis filterbank
230
transforms the spectral values back into the time-domain.
In perceptual audio coders, such as the perceptual audio coder
100
shown in
FIG. 1
, the masked threshold is used to control the quantization and encoding of subband signals by the quantization and coding block
130
.
FIG. 3
illustrates a masked threshold
310
computed according to a psychoacoustic model and the corresponding approximation
320
used by a conventional perceptual audio coder. As shown in
FIG. 3
, the masked threshold is usually approximated with a step function that is encoded and transmitted to the perceptual audio decoder as side information. Due to limited bandwidth in the side information, however, only a course approximation of the masked threshold is transmitted. Inadequate accuracy of the masked threshold representation impacts the perceptual quality.
A need therefore exists for methods and apparatus for representing the masked threshold more accurately. A further need exists for methods and apparatus for representing the masked threshold more accurately with as few bits as possible.
SUMMARY OF THE INVENTION
Generally, a method and apparatus are disclosed for representing the masked threshold in a perceptual audio coder, using line spectral frequencies (LSF) or another representation for linear prediction (LP) coefficients. The present invention calculates LP coefficients for the masked threshold using known LPC analysis techniques. In one embodiment, the masked thresholds are optionally transformed to a non-linear frequency scale suitable for auditory properties. The LP coefficients are converted to line spectral frequencies (LSF) or a similar representation in which they can be quantized for transmission.
According to one aspect of the invention, the masked threshold is represented more accurately in a perceptual audio coder using an LSF notation previously applied in speech coding techniques. According to another aspect of the invention, the masked threshold is transmitted only if the masked threshold is significantly different from the previous masked threshold. In between each transmitted masked threshold, the masked threshold is approximated using interpolation schemes. The present invention decides which masked thresholds to transmit based on the change of consecutive masked thresholds, as opposed to the variation of short-term spectra.
The present invention provides a number of options for modeling variations in the masked threshold over time. For signal parts that gradually change, the masked threshold changes gradually as well and can be approximated by interpolation. For a generally stationary signal part, followed by a sudden change, the masked threshold can be approximated by a constant masked threshold that changes at once. A relatively constant masked threshold that later changes gradually can be modeled by a combination of a constant masked threshold followed by interpolation. A stationary signal part with a short transient in the middle has a masked threshold that temporarily changes to another value but returns to the initial value. This case can be modeled efficiently by setting the masked threshold after the transient to the masked threshold before the transient, and thus not transmitting the masked threshold after the transient.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.
REFERENCES:
patent: 5623577 (1997-04-01), Fielder
patent: 5675701 (1997-10-01), Kleijn et al.
patent: 5687282 (1997-11-01), Van De Kerkhof
patent: 5778335 (1998-07-01), Ubale et al.
patent: 5781888 (1998-07-01), Herre
patent: 5787390 (1998-07-01), Quinquis et al.
patent: 5956674 (1999-09-01), Smyth et al.
patent: 6035177 (2000-03-01), Moses et al.
patent: 6094636 (2000-07-01), Kim
patent: 6233550 (2001-05-01), Gersho et al.
patent: 6260010 (2001-07-01), Gao et al.
patent: 6330533 (2001-12-01), Su et al.
patent: 6424939 (2002-07-01), Herre et al.
patent: 6453282 (2002-09-01), Hilpert et al.
patent: 6453289 (2002-09-01), Ertem et al.
patent: 6475245 (2002-11-01), Gersho et al.
patent: 6480822 (2002-11-01), Thyssen
patent: 6493665 (2002-12-01), Su et al.
patent: 6499010 (2002-12-01), Faller
patent: 6507814 (2003-01-01), Gao
patent: 0 987 827 (1999-09-01), None
Akune et al., “Super Bit Mapping: Psychoacoustically Optimized Digital Recording,” 93rdAES Convention, San Franciso, CA (Oct. 1992).
Brandenburg, K., “MP3 and AAC Explained,” AES 17thInternational Conference, pp. 99-110 (1999).
Edler et al., “Audio Coding Using a Psychoacoustic Pre- and Post-Filter,” IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings, vol. 2, pp. 881-884 (Jun. 2000).
Edler Bernd Andreas
Faller Christof
Schuller Gerald Dietrich
Agere Systems Inc.
Chawan Vijay
LandOfFree
Method and apparatus for representing masked thresholds in a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for representing masked thresholds in a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for representing masked thresholds in a... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3319023