Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1999-09-16
2003-03-11
Knepper, David D. (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S205000, C704S218000, C707S793000, C707S793000
Reexamination Certificate
active
06532445
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an information processing apparatus and method, an information recording apparatus and method, a recording medium, and a distribution medium. More particularly, the present invention relates to an information processing apparatus and method for retrieving compressed and coded audio data on the basis of signal characteristics, an information recording apparatus and method, a recording medium, and a distribution medium.
2. Description of the Related Art
In recent years, with the advancement of low-bit-rate coding technology, it has become common to store audio data and image data in such a way that they are compressed and coded, and a method is required for efficiently retrieving desired data from a large amount of coded data.
FIG. 23
shows the functional construction of a conventional audio data retrieval apparatus. In a database
156
of this audio data retrieval apparatus, a text database for retrieval is recorded in advance in which compressed and coded audio data (hereinafter referred to as “coded audio data”), and attribute information (for example, title, author name, creation date, classification of the contents, etc.) of the audio data, which is made to correspond to the coded audio data, are written.
A retrieval condition input section
151
accepts an input of retrieval conditions (attribute information, and signal characteristics of a sample waveform) from a usger, supplies the attribute information to an attribute retrieval section
152
, and supplies the signal characteristics to a comparison and determination section
155
.
The attribute retrieval section
152
retrieves data that matches the attribute information (for example, the author name) input from the retrieval condition input section
151
from the text database for retrieval which is stored in the database
156
, extracts coded audio data corresponding thereto, and outputs it to a candidate selection section
153
.
The candidate selection section
153
outputs the coded audio data input from the attribute retrieval section
152
in sequence to a decoding section
154
. The decoding section
154
decodes the coded audio data input from the candidate selection section
153
and outputs it to the comparison and determination section
155
.
The comparison and determination section
155
determines the degree of similarity between the audio data input from the decoding section
154
and the signal characteristics (for example, the waveform amplitude, etc.) of the sample waveform supplied from the retrieval input section. If the degree of similarity is equal to or higher than a predetermined threshold value, the audio data is output as a retrieval result. In order to determine the degree of similarity, for example, a method is available for computing a correlation coefficient of a waveform amplitude, an amplitude average value, a power distribution, a frequency spectrum, etc., of a sample waveform and of that of retrieved audio data.
Next, a description is given of a coding apparatus for creating coded audio data which is prerecorded in the database
156
of FIG.
23
. Before that, a method for efficiently compressing and coding audio data is described. A method for efficiently compressing and coding audio data can be broadly classified into a band division coding method and a transform coding method. There is also a method in which both are combined.
The band division coding method is a method in which a discrete time waveform signal (for example, audio data) is divided into a plurality of frequency bands by a band division filter, such as a quadrature mirror filter QMF, and the most appropriate coding is performed for each band. This is also called “subband coding”. The details of the quadrature mirror filter are described in, for example, P.L. Chu, “Quadrature mirror filter design for an arbitrary number of equal bandwidth channels”, IEEE Trans. Acoust. Speech, Signal Processing, vol. ASSP-33, pp. 203-128, February 1985.
The transform coding method is also called a “block coding method”, which is a method in which a discrete time waveform signal is divided into blocks in predetermined sampling units, a signal of this block (referred to also as a “frame”) is converted into a frequency spectrum, and this is then coded. Examples of types of methods for conversion into a frequency spectrum include discrete Fourier transform DFT, discrete cosine transform DCT, and modified discrete cosine transform MDCT. The modified discrete cosine transform is able to perform efficient conversion with small block distortion by causing adjacent blocks on the time axis and the conversion sections to be superposed on each other. The details thereof are described in, for example, “Analysis/Synthesis Filter Bank Design Based on Time Domain Aliasing Cancellation”: J. P. Princen, A. B. Bradley, IEEE Transactions, ASSP-34, No. 5, Oct. 1986, pp. 1153-1161, and “Subband/Transform Coding Using Filter Band Design Based on Time Domain Aliasing Cancellation”: J. J. Princern, A. W. Johnson and A. B. Bradley (ICASSP 1987).
In the band division coding method, a signal which is divided for each frequency band is coded after being quantized, whereas in the transform coding method, a signal which is converted into a frequency spectrum is coded after being quantized, thereby making it possible to limit a band in which quantization noise occurs by using auditory properties, such as what is commonly called the “masking effect”. Also, before this quantization, by normalizing each signal, efficient coding can be performed.
For example, when quantization is to be performed in the band division coding method, it is preferable that, by considering the auditory characteristics of a human being, a band division width be divided in a band width called a “critical band” such that the higher the frequency regions, the wider the band width.
The signal which is divided into frequency bands is allocated with a bit (bit allocation) for each band and is coded. For example, if bit allocation is performed dynamically on the basis of the amplitude absolute value of a signal for each band, the quantized noise spectrum becomes flat, and the noise energy becomes minimal. This method is described in, for example, “Adaptive Transform Coding of Speech Signals”: R. Zelinski and P. Noll, IEEE Transactions of Acoustics Speech and Signal Processing, vol. ASSP-25, No. Aug. 4, 1997. However, in this method, a masking effect is not used, resulting in a problem in that this method is not the most appropriate from an auditory point of view.
Also, for example, if fixed bit allocation is performed so that satisfactory S/N is obtained for each band, a masking effect is obtained from an auditory point of view. However, for example, when characteristics of a sine wave are to be measured, there is a problem in that since bit allocation is fixed, a satisfactory characteristic value cannot be obtained. This method is described in “Thecritical band coder-digital encoding of the perceptual requirements of the auditory system”: M. A. Kransner, MIT, (ICASSP 1980).
In order to solve these problems, there is also a method in which all bits which can be used for bit allocation are classified into dynamic allocation portions and fixed allocation portions, and the division ratio is made to depend on an input signal so that the more smooth the spectrum distribution of the input signal, the larger the ratio of the fixed allocation portions.
In the quantization and coding of an audio signal, in a waveform in which a point of sudden change in amplitude (hereinafter referred to as an “attack”) is present such that the amplitude increases or decreases suddenly in a part of the audio waveform, a quantization error increases in an attack. Also, in a signal coded by a transform coding method, a quantization error of a spectrum coefficient in an attack is spread over the entire block on a time area during inverse spectrum transform (during decoding). As a result of this influence, noise which is commonly called a “pre-ec
Akagiri Kenzo
Toguri Yasuhiro
Azad Abul K.
Knepper David D.
Sonnenschein Nath & Rosenthal
LandOfFree
Information processing for retrieving coded audiovisual data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information processing for retrieving coded audiovisual data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information processing for retrieving coded audiovisual data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3063574