Multi-channel audio encoder

Data processing: speech signal processing – linguistics – language – Audio signal bandwidth compression or expansion

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Multi-channel audio encoder Multi-channel audio encoder

: 1998-11-04
: 2002-11-26
: Tsang, Fan (Department: 2654)
: Data processing: speech signal processing, linguistics, language
: Audio signal bandwidth compression or expansion

: C704S201000
: Reexamination Certificate
: active
: 06487535
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to high quality encoding and decoding of multi-channel audio signals and more specifically to a subband encoder that employs perfect
on-perfect reconstruction filters, predictive
on-predictive subband encoding, transient analysis, and psycho-acousti c/minimum mean-square-error (mmse) bit allocation over time, frequency and the multiple audio channels to generate a data stream with a constrained decoding computational load.
2. Description of the Related Art
Pulse code modulation (PCM) based speech coders were first developed in the 1960's. In the early 1970's, low bit-rate speech coders were developed for use with the digital telephone networks, which had a restricted bandwidth of approximately 3.5 kHz. In 1979 Johnston outlined a 7.5 kHz sub-band differential PCM (DPCM) that was suitable for speech and music signals. In the early 1980's this work was developed using more sophisticated adaptive DPCM techniques (ADPCM), but it was not until 1988 that a true wideband high quality ADPCM coder was discussed.
In the mid-late 1980's new methods for coding very high quality audio signals were developed based on high resolution filter-banks and/or transform coders, in which the quantizer bit-allocations were determined by a psychoacoustic masking model. In general, the psychoacoustic masking model tries to establish a quantization noise audibility threshold at all frequencies. The threshold is used to allocate quantization bits to reduce the likelihood that the quantization noise will become audible. The quantization noise threshold is calculated in the frequency domain from the absolute energy of the frequency-transformed audio signal. The dominant frequency components of the audio signal tend to mask the audibility of other components which are close in the bark scale (human auditory frequency scale) to the dominant signal.
Thus, the known high quality audio and music coders can be divided into two broad classes of schemes.
1) Medium to high frequency resolution subband/transform coders which adaptively quantize the subband or coefficient samples within the analysis window according to a psychoacoustic mask calculation.
These coders exploit the large short-term spectral variances of general music signals by allowing the bit-allocations to adapt according to the spectral energy of the signal. The high resolution of these coders allows the frequency transformed signal to be applied directly to the psychoacoustic model, which is based on a critical band theory of hearing. Dolby's AC-3 audio coder, Todd et al., “AC-3: Flexible Perceptual Coding for Audio Transmission and Storage” Convention of the Audio Engineering Society, February, 1994, typically computes 1024-ffts on the respective PCM signals and applies a psychoacoustic model to the 1024 frequency coefficients in each channel to determine the bit rate for each coefficient. The Dolby system uses a transient analysis that reduces the window size to 256 samples to isolate the transients. The AC-3 coder uses a proprietary backward adaptation algorithm to decode the bit allocation. This reduces the amount of bit allocation information that is sent along side the encoded audio data. As a result, the bandwidth available to audio is increased over forward adaptive schemes which leads to an improvement in sound quality.
2) Low resolution subband coders which make-up for their poor frequency resolution by processing the subband samples using ADPCM. The quantization of the differential subband signals is either fixed or adapts to minimize the quantization noise power across all or some of the subbands, without any explicit reference to psychoacoustic masking theory. It is commonly accepted that a direct psychoacoustic distortion threshold cannot be applied to predictiv e/differential subband signals because of the difficulty in estimating the predictor performance ahead of the bit allocation process. The problems is further compounded by the interaction of quantization noise on the prediction process.
These coders work because perceptually critical audio signals are generally periodic over long periods of time. This periodicity is exploited by predictive differential quantization. Splitting the signal into a small number of sub-bands reduces the audible effects of noise modulation and allows the exploitation of long-term spectral variances in audio signals. If the number of subbands is increased, the prediction gain within each sub-band is reduced and at some point the prediction gain will tend to zero.
Digital Theater Systems, L. P. (DTS) makes use of an audio coder in which each PCM audio channel is filtered into four subbands and each subband is encoded using a backward ADPCM encoder that adapts the predictor coefficients to the sub-band data. The bit allocation is fixed and the same for each channel, with the lower frequency subbands being assigned more bits than the higher frequency subbands. The bit allocation provides a fixed compression ratio, for example, 4:1. The DTS coder is described by Mike Smyth and Stephen Smyth, “APT-X100: A LOW-DELAY, LOW BIT-RATE, SUB-BAN D ADPCM AUDIO CODER FOR BROADCASTING,” Proceedings of the 10th International AES Conference 1991, pp. 41-56.
Both types of audio coders have other common limitations. First, known audio coders encode/decode with a fixed frame size, i.e. the number of samples or period of time represented by a frame is fixed. As a result, as the encoded transmission rate increases relative to the sampling rate, the amount of data (bytes) in the frame also increases. Thus, the decoder buffer size must be designed to accommodate the worst case scenario to avoid data overflow. This increases the amount of RAM, which is a primary cost component of the decoder. Secondly, the known audio coders are not easily expandable to sampling frequencies greater than 48 kHz. To do so would make the existing decoders incompatible with the format required for the new encoders. This lack of future compatibility is a serious limitation. Furthermore, the known formats used to encode the PCM data require that the entire frame be read in by the decoder before playback can be initiated. This requires that the buffer size be limited to approximately 100 ms blocks of data such that the delay or latency does not annoy the listener.
In addition, although these coders have encoding capability up to 24 kHz, often times the higher subbands are dropped. This reduces the high frequency fidelity or ambiance of the reconstructed signal. Known encoders typically employ one of two types of error detection schemes. The most common is Read Solomon coding, in which the encoder adds error detection bits to the side information in the data stream. This facilitates the detection and correction of any errors in the side information. However, errors in the audio data go undetected. Another approach is to check the frame and audio headers for invalid code states. For example, a particular 3-bit parameter may have only 3 valid states. If one of the other 5 states is identified then an error must have occurred. This only provides detection capability and does not detect errors in the audio data.
SUMMARY OF THE INVENTION
In view of the above problems, the present invention provides a multi-channel audio coder with the flexibility to accommodate a wide range of compression levels with better than CD quality at high bit rates and improved perceptual quality at low bit rates, with reduced playback latency, simplified error detection, improved pre-echo distortion, and future expandability to higher sampling rates.
This is accomplished with a subband coder that windows each audio channel into a sequence of audio frames, filters the frames into baseband and high frequency ranges, and decomposes each baseband signal into a plurality of subbands. The subband coder normally selects a non-perfect filter to decompose the baseband signal when the bit rate is low, but selects a perfect filter when the bit rate is sufficiently high. A high frequency c

Affiliated with

Smith William Paul

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Smyth Michael Henry

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Smyth Stephen Malcolm

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Digital Theater Systems, Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Opsasnick Michael N.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Tsang Fan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multi-channel audio encoder does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multi-channel audio encoder, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multi-channel audio encoder will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2948723

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure