Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-03-23
2003-12-02
Dorvil, Richemond (Department: 2697)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S205000, C704S211000, C704S267000, C704S268000, C704S269000, C704S500000
Reexamination Certificate
active
06658382
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to methods and apparatus for encoding an audio signal into a digital code with high efficiency and for decoding the digital code into the audio signal, which can be employed for recording and reproduction of audio signals and their transmission and broadcasting over a communication channel.
A conventional high-efficiency audio-coding scheme is such a transform coding method as depicted in FIG.
1
. With this method, an audio signal input as a sequence of signal samples is transformed into frequency-domain coefficients in a time-frequency transformation part
11
upon each input of a fixed number of samples and then encoded and the encoded frequency-domain coefficients are preprocessed in a preprocessing part
2
and quantized in a quantization part
3
. A typical example of this scheme is TWINVQ (Transform-domain Weighted Interleave Vector Quantization).
The TWINVQ scheme uses weighted interleave vector quantization at the final stage of the quantization part
3
. The vector quantization features two-stage flattening of coefficients in the preprocessing part
2
since the quantization efficiency increases as the distribution of input coefficient values becomes more even. In the first stage, the frequency-domain coefficients are normalized by the LPC spectrum to thereby roughly flatten their total variations. In the second stage, frequency-domain coefficients are further normalized for each of subbands having the same bandwidth on the Bark scale, by which they are flattened more finely than in the first stage. The Bark scale is a kind of frequency scale.
The Bark scale has a feature that frequencies at equally spaced points provide pitches of sound nearly equally spaced apart in terms of the human auditory sense. The subbands of the same bandwidth on the Bark scale are approximately equal in width perceptually, but on a linear scale their bandwidth increases with an increase in frequency as shown in FIG.
2
. Accordingly, when the frequency-domain coefficients are split into subbands having similar bandwidth on the Bark scale, the higher the frequency of the subband, the more it contains coefficients.
The second-stage flattening on the Bark scale is intended to effectively allocate a limited amount of information, taking the human auditory sense into account. The flattening operation by normalization for each subband on the Bark scale is based on the expectation that the coefficients in the subbands are steady, but since the subbands at higher frequencies contain more coefficients, the situation occasionally arises where the coefficients are not steady in the subbands as depicted in FIG.
2
. This incurs impairment of the efficiency of vector quantization, leading to the degradation of sound quality of decoded audio signals. Such a problem is likely to occur especially when the input audio signal contains a lot of tone components in the high-frequency range.
By the way, the TWINVQ scheme is described in detail in N. Iwakami, et al., “Transformed Domain Interleave Vector Quantization (TwinVQ),” preprint of the 101st Audio Engineering Society Convention, 4377, (1996).
In the audio-coding of
FIG. 1
, the quantization may also be scalar quantization using adaptive bit allocation. Such a coding method splits the frequency-domain coefficients into subbands and conducts optimum bit allocation for each subband. The subbands may sometimes be divided so that they have the same bandwidth on the Bark scale with a view to achieving a better match to the human auditory sense. In this instance, however, the coefficients in the subbands at the higher frequencies are often unsteady as is the case with the TWINVQ scheme, leading to impairment of the quantization efficiency.
As a solution to such a problem, there is proposed in Japanese Patent Application Laid-Open Gazette No. 7-336232 a coding method that transforms the input signal to a frequency-domain signal and adaptively changes with the shape of the spectral envelope the bandwidth of each subband in which the frequency-domain coefficients are flattened (normalized). This method makes narrow the bandwidths of subbands containing tone components and wide the bandwidths of other subbands, thereby reducing the number of subbands and hence increasing the coding efficiency accordingly. With this method, however, when tone components are sparse, narrow bandwidths are applied to flat portions near the tone components, sometimes impairing the coding efficiency. Further, normalization information needs to be encoded and sent for each component; therefore, if many tone components are scattered, the amount of normalization information to be encoded increases accordingly.
With a view to increasing the coding efficiency, there is proposed in Japanese Patent Application Laid-Open Gazette No. 7-168593 a scheme of encoding the tone component and others separately of each other. With this scheme, since the spectrum of each maximal value and adjoining spectra are normalized and encoded as a tone component signal of one group, information about the position of the spectrum o the maximal value and the group size needs to be encoded and sent. On this account, when many tone components are present, it is necessary to encode many pieces of information about the positions of the spectra of maximal values and the group sizes—this is likely to constitute an obstacle to increasing the coding efficiency.
Japanese Patent Application Laid-Open Gazette No. 7-248145 describes a scheme which separates pitch components formed by equally spaced tone components and encoding them individually. The position information of the pitch components is given by the fundamental frequency of the pitch, and hence the amount of information involved is small; however, in the case of a metallic sound or the like of a non-integral harmonic structure, the tone components cannot accurately be separated.
SUMMARY OF THE INVENTION
It is an object of the present invention to provide a coding method which permits highly efficient transform coding of the input audio signal having many tone components in the high-frequency range, a decoding method for such a coded signal, apparatus using the coding and decoding methods, and recording media having recorded thereon the methods as computer-executable programs.
According to an aspect of the present invention, there is provided an audio signal coding method for coding input audio signal samples, the method comprising the steps of:
(a) time-frequency transforming every fixed number of input audio signal samples into frequency-domain coefficients;
(b) dividing said frequency-domain coefficients into coefficient segments each consisting of one or more coefficients to generate a sequence of coefficient segments;
(c) calculating the intensity of each coefficient segment in said sequence of coefficient segments;
(d) classifying the sequence of coefficient segments into either one of at least two groups according to the intensities of said coefficient segments to generate at least two sequences of coefficient segments, and encoding and outputting classification information as a classification information code; and
(e) encoding said at least two sequences of coefficient segments and outputting them as coefficient codes.
According to another aspect of the present invention, there is provided a decoding method for decoding input digital codes into audio signal samples and outputting them, the method comprising the steps of:
(a) decoding said input digital codes into plural sequences of coefficient segments;
(b) decoding said input digital codes to obtain classification information of coefficient segments, combining said plural sequences of coefficient segments based on said classification information to reconstruct original frequency-domain coefficients formed by a single contiguous sequence of coefficient segments; and
(c) transforming said frequency-domain coefficients into the time domain and outputting the resulting audio signal samples as an audio signal.
According to another aspect of the present inv
Chikira Kazuaki
Iwakami Naoki
Jin Akio
Mori Takeshi
Moriya Takehiro
Connolly Bove & Lodge & Hutz LLP
Dorvil Richemond
Nippon Telegraph and Telephone Corporation
Patel Kinari
LandOfFree
Audio signal coding and decoding methods and apparatus and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Audio signal coding and decoding methods and apparatus and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Audio signal coding and decoding methods and apparatus and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3132213