Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1999-03-03
2001-07-24
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S258000, C704S504000
Reexamination Certificate
active
06266643
ABSTRACT:
FIELD OF INVENTION
This invention relates to audio and speech processing, more particularly, to speeding up the audio signal or speech without changing pitch, while maintaining acceptable quality and minimizing processing time.
This invention will demonstrate how designing a computer program that uses a fast Fourier transform can accomplish the goal of pitch stabilization, i.e., speed up wave audio files (extension: wav) without changing the pitch.
BACKGROUND OF THE INVENTION
Speeding up audio or speech generally results in change of pitch and decreased quality. Previous inventions were complex in their methods to protect the integrity of the original information.
When the playback speed of audio increases, the pitch increases respectively. According to the Similarity Theorem, decreased time (increased playback rate) results in higher frequencies which translates to higher pitch (Zonst 1995). This phenomenon is illustrated when a 33⅓ RPM record is played at 78 RPM. Not only is the resulting sound difficult to understand, but the speaker also is unidentifiable, sounding like a chipmunk.
An alternative method to achieve this goal is to remove data at a fixed sampling rate, whether the data is redundant (duplicate) or original. Other methods use more complex and process time consuming methods by performing an inverse mathematical manipulation such as an inverse Fourier transform to recreate the shortened information. A variety of encoding methods are used for transmitting audio signals that are not easily manipulated for speeding up the original signal. Simpler approaches which just eliminate periods of silence do not produce a quality result.
In general, while these other inventions examine various aspects of the objective of this invention, they have not provided a satisfactory conclusion of the combination of simplicity and quality.
OBJECTS OF THE INVENTION
It is the principal object of this invention to create a fast and low cost method to speed up an audio signal without changing pitch while maintaining integrity for the understanding of the information.
Another objective for this invention to operate with minimal processing requirements for the computer or other device that will be performing the required data manipulations.
Another objective for this invention is to provide sufficient final audio quality without the complications extreme processing requirements of other methods.
SUMMARY OF THE INVENTION
The trigonometric Fourier series, f(t) in Eq.1, can express any physically realizable function to a desired degree of accuracy by the summation of sinusoids (sine and cosine waves) of various frequencies and a constant term. In Eq. 1, “n” counts the frequencies. The fundamental, one cycle in the waveform domain, is represented by n=1. Successive values of n represent the respective harmonics. For example, n=3 represents the third harmonic, which corresponds to three cycles of the sinusoid in the waveform domain. (Hsu 1984; Zonst 1995)
Fourier Analysis
Fourier Series
f
⁡
(
t
)
=
a
0
+
∑
n
=
1
∞
⁢
⁢
(
a
n
⁢
cos
⁡
(
n
⁢
⁢
ω
0
⁢
t
)
+
b
n
⁢
sin
⁡
(
n
⁢
⁢
ω
0
⁢
t
)
)
Eq
.
⁢
1
where
&ohgr;
0
=2
&pgr;/T=
2
&pgr;f
In Eq. 1 the limit of summation of the frequencies is infinity, an impossibility in a “real life” system.
The traditional representation of a function is the time domain. Time is the independent variable, and amplitude is the dependent variable. The frequency domain is another way to represent the same function. Because of the Fourier series, any physically realizable function can be represented as a series of sinusoids. In the frequency domain, frequency (represented by “n” in Eq. 1) is the independent variable, and the corresponding amplitude (represented by “a
n
” or “b
n
” in Eq. 1) is the dependent variable. These amplitudes are also known as Fourier coefficients. (Zonst 1995). Most sound analysis, including this invention, is performed in the frequency domain. A Fourier transform is a mathematical device to convert between the time and frequency domains. The discrete Fourier transform, also known as the digital Fourier transform, or the DFT, is used to determine the Fourier coefficients for the data points of “digitized” data. Digitized data is a series of discrete data points, instead of a continuous curve of an infinite number of points. In “real-life” applications, discrete data and a finite number of frequencies must be used, because real-life situations must deal with finite quantities. (Bergland 1969)
Eq. 2 is an example of a DFT used to determine the Fourier coefficients for cosine. To find the coefficient of the cosine of frequency f, first multiply and sum each discrete value of the function by a unit cosine wave of that frequency. Then find the average value, the desired information, by dividing the summed value by the number of data points, N.
DFT
⁢
⁢
Cos
⁡
(
f
)
=
1
/
N
⁢
⁢
∑
t
=
0
N
-
1
⁢
⁢
f
⁡
(
t
)
⁢
Cos
⁡
(
f
⁢
⁢
t
)
Eq
.
⁢
2
where
f=discrete frequency
N=number of discrete data points
t=discrete times
DFTCos(f)=amplitude of the cosine wave of frequency f
To find the sin values, replace cosine with sine above equation
The problem with the DFT is its slow execution. An array of N points, N=2
n
, requires N
2
complex operations to perform a DFT. A “complex operation” includes evaluating sine and cosine functions, multiplying by the data point, and adding these products to the sums of the other operations. However, an FFT requires only N×n operations. For example, for an array size of 1,024 points (n=10) representing under one tenth of a second of audio, a DFT would require 1,048,576 complex operations, while an FFT would require only 10,240 complex operations. The difference in execution time between an FFT and a DFT is further magnified when full-length audio is used. (Zonst 1995)
In addition, the FFT reduces round-off errors, meaning it is more accurate than the DFT. (Cochran et al. 1967)
According to the addition theorem, the Fourier transform of the sum of two functions is equal to the sum of the Fourier transforms of the two functions (Zonst 1995)
According to the shifting theorem, “ . . . if a time domain function is shifted in time, the amplitude of the frequency components will remain constant, but the phases of the components will shift linearly—proportional to both the frequency of the component and the amount of the time shift.” The shift at the Nyquist frequency will always be 180° (&pgr; radians) multiplied by the number of data points the time domain function was shifted. (Zonst 1995)
“Stretching,” a method of expanding digitized data, is accomplished by placing zeros in between the data points in the time domain, thereby repeating the spectrum of the original function in the frequency domain, with the amplitudes (coefficients) of the frequency components halved. (Zonst 1995)
A DFT on an 8 point array would need 64 complex operations. However, if the 8 point array were split into two 4 point arrays, each 4 point DFT would need only 16 complex operations. Thus, the total number of operations for the two 4 point DFTs would be 32—half the number of the full 8 point DFT. This “divide and conquer” process is the key to the FFT. (Zonst 1995; Transnational College of LEX 1997)
The theory behind the FFT algorithm rests on the addition, shifting, and stretching theorems. The proof begins with the following 8 point array:
|DATA ARRAY 0|=|
D
0,
D
1,
D
2,
D
3,
D
4,
D
5,
D
6,
D
7| Eq.3
The addition theorem allows Eq. 3 to be divided into two arrays without changing the transform:
|DATA ARRAY 1
′|=|D
0, 0,
D
2, 0,
D
4, 0,
D
6, 0| Eq. 4
|DATA ARRAY 2′|=|0
, D
1, 0,
D
3, 0,
D
5, 0,
D
7| Eq. 5
where
|DATA ARRAY 0|=|D
Canfield Kenneth
deGraaf Bruce
deGraaf Kathyrn
Tom Hamill, National Patent Services
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Speeding up audio without changing pitch by comparing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speeding up audio without changing pitch by comparing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speeding up audio without changing pitch by comparing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2543307