Digital audio coding apparatus, method and computer readable...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Psychoacoustic

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Digital audio coding apparatus, method and computer readable... Digital audio coding apparatus, method and computer readable...

: 2001-05-29
: 2004-08-03
: McFadden, Susan (Department: 2658)
: Data processing: speech signal processing, linguistics, language
: Speech signal processing
: Psychoacoustic

: C704S229000, C704S500000
: Reexamination Certificate
: active
: 06772111
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a digital audio coding method, a digital audio coding apparatus and a recording medium. More particularly, the present invention relates to a compression and coding technique of a digital audio signal used for DVD, digital broadcast and the like.
2. Description of the Related Art
As previously known, human psychoacoustic characteristics are utilized in the technique of high quality compression and coding of a digital audio signal. One of the characteristics is that small sound is masked by large sound so that small sound can not be heard. That is, when large sound having a frequency occurs, small sound near the frequency is masked so that it can not be heard. The lower limit intensity of the sound in which the sound is masked and can not be heard is called a masking threshold.
As for the human ear, the sensitivity becomes the highest for sound around 4 kHz irrespective of the masking. As the frequency band becomes more apart from 4 kHz, the sensitivity becomes worse. This characteristic can be represented as a lower limit intensity which the human ear can perceive in a silent situation. This lower limit intensity is called an absolute hearing threshold.
The characteristics will be described more particularly with reference to FIG.
1
. Intensity of audio signal is represented by the thick solid line. The masking threshold for the audio signal is represented by the dotted line. The thin solid line represents the absolute hearing threshold. That is, the human ear can perceive a sound only when the intensity is larger than the values represented by the dotted line and the thin solid line. Therefore, if information which is larger than the dotted line and the thin solid line is extracted from information represented by the thick solid line, the human ear perceives the extracted information to be the same as the original audio signal.
When performing coding, this is equivalent to assigning coding bits only to parts indicated by shaded regions in FIG.
1
. When assigning coding bits in this example, the whole frequency band of the audio signal is divided into a plurality of small bands so that coding bits are assigned to each divided band. The width of each shaded area corresponds to the divided bandwidth.
In each divided bandwidth, the human ear can not perceive a sound of intensity equal to or smaller than the lower limit of the shaded area. Thus, if the intensity difference between original sound and coded/decoded sound does not exceed this lower limit, the sound can not be heard. In this sense, the intensity of the lower limit is called an allowed distortion level. When an audio signal is compressed by performing quantization, the audio signal can be compressed without loss of quality of the original sound by performing quantization such that quantization distortion level of coded/decoded sound with respect to the original sound becomes equal to or smaller than the allowed distortion level.
Accordingly, assigning coding bits only to the shaded regions shown in
FIG. 1
corresponds to performing quantization such that quantization distortion level in each divided band becomes just the allowed distortion level.
There are MPEG Audio, Dolby Digital and the like as coding methods of a audio signal. Each of the methods uses the property described above. In the methods, MPEG-2 Audio AAC (Advanced Audio Coding) standardized in ISO/IEC13818-7 is regarded as being most efficient for coding.
FIG. 2
shows a basic block diagram of a coding apparatus for AAC. The psychoacoustic model part
1
calculates the allowed distortion level for each divided band of an input audio signal which is divided into frames along time base.
For the input audio signal which is divided into frames, a gain control part
2
performs gain control, a filter bank
3
converts the input audio signal to the frequency domain by MDCT (Modified Discrete Cosine Transform), a TNS 4 performs a temporal noise shaping process, an intensity/coupling stereo part
5
performs intensity/coupling, a prediction part
6
performs a predictive coding process, an M/S stereo part
7
performs a middle side stereo process. After that, a part
8
determines normalized coefficients, and a quantization part
9
quantizes the audio signal based on the normalized coefficients. The normalized coefficients correspond to the allowed distortion level shown in
FIG. 1
which is determined for each divided band.
After quantization, a noiseless coding part
10
performs a noiseless coding process by providing each of the normalized coefficient and the quantized value with Huffman code based on a predetermined Huffman code table. Finally, a code bit stream is formed by a multiplexor
11
.
According to the MDCT in the filter bank
3
, as shown in
FIG. 3
, DCT is performed in which each transform region overlaps with another transform region by 50% with respect to time axis. Accordingly, occurrence of distortion in boundary parts can be suppressed for each transform region. The number of MDCT coefficients is half of the number of samples of the transform region. According to AAC, a long transform region (long block) including 2048 samples or eight short transform regions including 256 samples in each transform region (short block) is applied for an input audio signal frame. Thus, the number of MDCT coefficients is 1024 for the long block and 128 for the short block. As for the short block, eight blocks are always used successively so that the number of the MDCT coefficients becomes the same as that of the long block.
Generally, as shown in
FIG. 4
, the long block is used for a steady-state part where variation of a signal waveform is small. As shown in
FIG. 5
, the short block is used for an attack part where variation of a signal waveform is large.
It is important to use the long block or the short block appropriately. When the long block is used for a signal like that shown in
FIG. 5
, noise which is called pre-echo occurs before attack. In addition, when the short block is used for a part shown in
FIG. 4
, bit assignment is not properly performed due to lack of resolution in the frequency domain so that coding efficiency decreases and noise also occurs.
As mentioned above, it is important to calculate the allowed distortion level for each divided band and to determine the long block or the short block properly. The psychoacoustic model part
1
shown in
FIG. 2
performs these processes. In the ISO/IEC13818-7, examples of a calculation method of the allowed distortion level for each divided band and a method of determining the long block or the short block for each current frame are shown. In the following, an outline of processes of the methods will be described. B.2.1.4 (p.93) in the ISO/IEC13838-7 can be referred to about details of these processes.
Step 1) Reconstruction of Audio Signal
1024 samples (128 samples for the short block) are newly read for the long block and a signal series of 2048 samples (258 samples) is reconstructed by concatenating the newly read samples and samples already read from a previous frame.
Step 2) Windowing by a Hann Window and FFT
The audio signal of 2048 samples (256 samples) reconstructed in step 1 is windowed by a Hann window and FFT (Fast Fourier Transform) is calculated so that 1024 (128) FFT coefficients are calculated.
Step 3) Calculation of Predicted Values of FFT Coefficients
Real parts and imaginary parts of FFT coefficients of a current frame are predicted from real parts and imaginary parts of FFT coefficients of previous two frames so that 1024 (128) predicted values are calculated for each of the real part and imaginary part.
Step 4) Calculation of an Unpredictability Measure
The unpredictability measure is calculated from the real part and the imaginary part of each FFT coefficient calculated in step 2 and predicted values of the real part and the imaginary part of each FFT coefficient calculated in step 3. The unpredictability measure takes from 0 to 1. The nearer to 0 the unpredictability measure is, the ne

Affiliated with

Araki Tadashi

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

McFadden Susan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Oblon & Spivak, McClelland, Maier & Neustadt P.C.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Ricoh & Company, Ltd.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Digital audio coding apparatus, method and computer readable... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Digital audio coding apparatus, method and computer readable..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Digital audio coding apparatus, method and computer readable... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3272237

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure