Coded data generation or conversion – Analog to or from digital conversion – Analog to digital conversion
Reexamination Certificate
1999-03-09
2001-07-24
Young, Brian (Department: 2819)
Coded data generation or conversion
Analog to or from digital conversion
Analog to digital conversion
C381S083000
Reexamination Certificate
active
06266003
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to encoding and manipulation of digital signals. More particularly, although not exclusively, the present invention relates to time-scale and/or pitch modification of audio signals. As such, the signal analysis and re-synthesis method described herein is not limited to audio signals. It is envisaged that the present invention may find application in the coding of other signals with the (wavelet-like) method described herein. An example of such an application includes image compression. Essentially the present invention may be applied where one wishes to simultaneously analyze different regions of the frequency domain with differing temporal/spatial resolutions.
BACKGROUND TO THE INVENTION
There are a number of existing techniques for time-scale/pitch modification of audio signals which are known in the art. These can be broadly classified as follows.
(a) Time domain methods:
These techniques attempt to estimate the fundamental period of a musical signal by detecting periodic activity in the audio signal. By this process, an input signal is delayed and multiplied by the undelayed signal, the product of which is then smoothed in a low pass filter to provide an approximate measure of the auto-correlation function. The autocorrelation function is then used to detect a nonperiodic signal or a weak periodic signal which might be hidden in the noise. Once the fundamental period of the musical signal is found the process is repeated and the analyzed sections of the signal are overlapped. A significant disadvantage in these techniques is that most audio signals do not have a fundamental period. For example polyphonic instruments, recordings with reverberation and percussion sounds do not have an identifiable fundamental period. Further, when applying such methods, transients in the music are repeated. This leads to notes having multiple starts and ends. Another problem with this technique is that overlapping of the delayed sections of the music can produce an audio effect which is metallic, mechanical or exhibits echo-like nature.
(b) Sinusoidal analysis methods:
These techniques assume that the input signal is made up from pure sinusoids. The inherent disadvantage of such a method is therefore self evident.
Sinusoidal analysis techniques use Short Time Fast Fourier Transforms (FFT) to estimate the frequency of the component sinusoids. The derived signal is then synthesized with a bank of tone generators to produce the desired output. Short Time Fourier Analysis captures information about the frequency content of a signal within a time interval, governed by the Window Function chosen. A significant disadvantage of such techniques is that a single time-domain window is applied to all the frequency content of the signal, so the signal analysis cannot correspond accurately to human perception of the signal content. Also, conventional sinusoidal analysis methods use a local maxima search of the magnitude spectrum to determine the frequency of the constituent sinusoids including consideration of relative phase changes between analysis frames. This technique ignores any side-band information located around each of the local maxima. The effect of this is to exclude any signal modulation occurring within a single analysis frame, resulting in a smearing of the sound and almost a complete loss of transients. An example of such a transient, in the audio context, is a guitar pluck.
(c) Phase vocoder methods:
This type of technique uses a Fast Fourier Transform as a large bank of filters and treats the output of each of the filters separately. The relative phase change between two consecutive analyses of the input are used to estimate the frequency of the signal content within each bin. A resulting frequency-domain signal is synthesized from this information, treating each bin as a separate signal. In contrast to sinusoidal analysis techniques, this method retains the spectral energy distribution of the original signal. However, it destroys the relative phase of any transient information. Therefore, the resulting sound is smeared and echo-like.
In view of the prior art techniques, it would therefore be desirable to analyze and process audio signals so that the resultant output retains the tonal characteristics of the original signal and is capable of accurately capturing transient sounds without smearing or introducing an echo-like character to the output signal.
Accordingly, it is an object of the present invention to provide a technique for processing audio signals which achieves the abovementioned aims and ameliorates at least some of the disadvantages inherent in the prior art or at least provides the public with a useful choice. Further, it is an object of the invention to provide a signal analysis and synthesis method which can also be applied to the coding of signals in general.
SUMMARY OF THE INVENTION
In one aspect the invention provides for a method of encoding and re-synthesizing a waveform, the method including:
sampling the waveform to obtain a series of discrete samples and constructing therefrom a series of frames, each frame spanning a plurality of samples;
multiplying each frame with a windowing, preferably raised cosine, function wherein the peak of the windowing function is centered substantially at a zero point of each frame;
applying a Fast Fourier Transform to each frame thereby producing a frequency-domain waveform;
convoluting the resultant frequency domain data with a variable kernel function, whose specification varies with frequency;
locating local maxima and surrounding minima in the magnitude spectrum of each convolved frame, wherein each local maxima and associated minima define a plurality of regions, each region corresponding to a frequency component of the signal; and
analyzing each of the regions in the frequency domain representation separately by summing the complex frequency components of bins falling within the defined region into a signal vector, wherein the variable kernel function can be usefully varied to achieve a differing tradeoff between frequency and temporal resolution across the frequency range of the signal.
In a preferred embodiment, the waveform corresponds to a digitized audio frequency waveform wherein the kernel function may be varied to approximate the perceptual characteristics of the human ear.
In the case where the waveform corresponds to an audio signal, the location of the maxima corresponds to the perceived pitch of the frequency component.
The method may further include the step of manipulating the signal while represented as signal vectors.
Such manipulation may take the form of modifying pitch or time scale (in an audio signal) or further data reduction adapted for efficient signal storage and/or transmission.
In the case of modifying an audio signal, the frequency location and phase of analyzed signal vectors can be shifted as necessary to achieve a scaling of time and/or pitch.
Converting back to the sampled time domain representation of the signal may be achieved by accumulating into the frequency domain an equivalent signal whose components correspond to those signal vectors determined in the analysis of the original signal.
Preferably an Inverse Fast Fourier Transform may be applied so as to give a time domain signal that may be suitably windowed and accumulated to produce the decoded signal.
Preferably the form of the convolution function is determined empirically by subjectively assessing the quality of the synthesized output.
Preferably the application of the kernel function to the frequency domain data is implemented as a single-pole low-pass filter operation on said data, the pole's location being varied with frequency.
Preferably, in the case of the analysis of audio signals, the pole may be specified by a control function s(f) of the form:
s
(
f
)=0.4+0.26
arctan
(4
ln(
0.1
f
)—18)
where f is the frequency in hertz (cycles per second).
The frequency domain filter may be specified by the relation:
y
out
(
f
)=[1
−s
(
f
)]
y
in
(
Nguyen John
Sigma Audio Research Limited
Skadden, Arps Slate Meagher & Flom LLP
Young Brian
LandOfFree
Method and apparatus for signal processing for time-scale... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for signal processing for time-scale..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for signal processing for time-scale... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2464150