Sound modification employing spectral warping techniques

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S203000, C704S205000, C704S268000, C704S500000

Reexamination Certificate

active

06182042

ABSTRACT:

BACKGROUND OF THE INVENTION
In one embodiment, the present invention relates to a method and apparatus for modifying an audio signal employing table lookup to perform non-linear transformations of the Short Time Fourier Transform of the audio signal.
Reproduction and modification of audio signals has posed a significant challenge for many years. Early attempts to accurately reproduce audio signals had various drawbacks. For example, an early attempt at reproducing speech signals employed linear predictive (LP) modeling, described by J. Makhoul, “Linear Prediction: A Tutorial Review,”
Proc. IEEE,
vol. 63, pp. 561-580, April 1975. In this approach, the speech production process is modeled as a linear time-varying, all-pole vocal tract filter driven by an excitation signal representing characteristics of the glottal waveform. However, LPC is inherently constrained by the assumption that the vocal tract may be modeled as an all-pole filter. Deviations of an actual vocal tract from this ideal results in an excitation signal without the purely pulse-like or noisy structure assumed in the excitation model. This results in reproduced speech having noticeable and objectionable distortions.
Frequency-domain representations of audio signals, such as speech, overcome many of the drawbacks associated with linear predictive modeling. Frequency domain representation of audio signals is based upon the observations that much of the speech information is frequency related and that speech production is an inherently non-stationary process. As discussed in the article by J. L. Flanagen and R. M. Golden, “Phase Vocoder,”
Bell Sys. Tech. J.,
vol. 45, pp. 1493-1509, 1966, a short-time Fourier transform (STFT) formulation of an audio signal may be employed to parameterize speech production information in a manner very similar to LP modeling. This is commonly referred to as the digital phase vocoder (DPV) and is capable of performing speech modifications without the constraints of LPC. However, the DPV is computationally intensive, limiting its usefulness in real-time applications.
To reduce the computational intensity of the DPV, another approach employs the discrete short-time Fourier transform (DSTFT), implemented using a Fast Fourier Transform (FFT) algorithm. This enables modeling of an audio signal as a discrete signal x(n) that can be reconstructed from a sequence X (k,m) of its windowed Discrete Fourier Transforms (DFTs) by applying an inverse Discrete Fourier Transform to each DFT and then properly weighting and overlap-adding the sequence of inverse DFTs
x

(
n
)
=

m
=
-



W

(
mL
-
n
)


k
=
0
n
-
1

X

(
k
,
m
)


j

2

π
N

kn
(
1
)
where

X

(
k
,
m
)
=

n
=
-



x

(
n
)

W

(
mL
-
n
)


-
j

2

π
N

kn
(
2
)
and L is the spacing between successive DFTs. It is also well known that modified versions of x(n) can be obtained by applying the above reconstruction formula to a sequence of modified DFTs. Due to the success of the DSTFT in reducing the computational complexity, many prior art methods have been employed to modify the differing audio information contained therein. For example, M. R. Portnoff, in “Time-Scale Modification of Speech Based on Short-Time Fourier Analysis,” IEEE Trans. Acoustics, Speech, and Signal Proc., pp. 374-390, vol. ASSP-29, No. 3 (1981) describes a technique for reducing phase distortions which arise when employing the modified DSTFT.
U.S. Pat. No. 4,856,068 to Quatieri, Jr. et al. describes an audio pre-processing method and apparatus to achieve a flattened time-domain envelope to satisfy peak power constraints. Specifically, an audio signal, representing a speech waveform, is processed before transmission to reduce the peak-to-RMS ratio of the waveform. The system estimates and removes natural phase dispersion in the frequency component of the speech signal. Artificial dispersion based on pulse compression techniques is then introduced with little change in speech quality. The new phase dispersion allocation serves to pre-process the waveform prior to dynamic range compression and clipping. In this fashion, deeper thresholding may be accomplished than would otherwise be the case on the original speech waveform.
U.S. Pat. No. 4,885,790 to McAulay et al. describes an analysis/synthesis technique for processing an audio signal, such as a speech waveform which characterizes the speech waveform by the amplitudes, frequencies and phases of component sine waves. These parameters are estimated from a short-time Fourier transform, with rapid changes in highly-resolved spectral components being tracked using the concept of “birth” and “death” of the underlying sine waves. The component values are interpolated from one frame to the next to yield a representation that is applied to a sine wave generator. The resulting synthetic waveform preserves the general waveform shape.
There exists a need, however, for computationally efficient approaches for selectively modifying a subportion of information contained in a DSTFT representation of audio signals without substantially effecting the remaining audio information contained therein.
SUMMARY OF THE INVENTION
The present invention provides a system and method which increases the computational efficiency of modifying an audio signal while allowing selectively modifying a subportion of information of the same, such as magnitude information, without substantially effecting the remaining audio information contained therein, such as phase information. An incoming audio signal is segmented into a sequence of overlapping frames as discussed by Mark Dolson et al. in U.S. patent application Ser. No. 08/745,930, assigned to the assignee of the present application, and incorporated by reference herein. Specifically, the audio signal is converted from a time-domain signal to a frequency-domain signal by forming a sequence of overlapping windowed DFT representations, during an analysis step. Each of the DFT representations consists of a plurality of frequency components obtained during a period of time. The frequency components typically have a complex value that includes magnitude information and phase information of the audio signal. Each of the plurality of frequency components is associated with a unique frequency among a sequence of frequencies. The audio signal is converted back into a time-domain signal during a synthesis step that follows the analysis step. Subsequent to the analysis step, but before the synthesis step, the frequency components of the DFT representations are re-mapped so that magnitudes are applied to a different frequency.
In accordance with a first embodiment of the present invention, a method for modifying an audio signal includes the step of capturing a frequency domain representation of successive time segments of the audio signal, defining a plurality of frequency domain representations, each of which includes a plurality of frequency components stored in input bins. Each of the plurality of frequency components has a complex value associated therewith comprising a first magnitude and a first phase. Thereafter, at a modifying step, the frequency components are modified by using a bin number of the input bin associated with the frequency component to be modified as an index to a look-up table that provides a bin number of an alternate warping bin holding a second magnitude to be used to replace the first magnitude. The modification is achieved by normalizing the magnitude of the frequency component to be modified, defining a normalized value, and obtaining a magnitude of the complex value associated with the warping bin and multiplying this magnitude value by the normalized value. In this fashion, the magnitude information of the audio signal may be modified without affecting the phase information, employing a minimal number of steps, thereby increasing the computational efficiency of the process.
In other embodiments, an additional step may be included, before the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Sound modification employing spectral warping techniques does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Sound modification employing spectral warping techniques, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Sound modification employing spectral warping techniques will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2553598

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.