Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-05-06
2001-10-02
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S200100, C704S209000, C704S220000
Reexamination Certificate
active
06298322
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to encoding and synthesizing tonal audio signals, especially voiced speech and music signals.
BACKGROUND OF THE INVENTION
Tonal sounds can be effectively modeled as a sum of sinusoids with time-varying parameters consisting of frequency, amplitude, and phase. The key word here is “effectively” because, in fact, all sounds can be modeled as sums of sinusoids, but the number of sinusoids may be extremely large, and the time-varying sinusoidal parameters may not have intuitive significance. Colored noise signals like breath noise, ocean waves, and snare drums are examples of sounds that are not effectively modeled by sums of sinusoids. Pitched musical instruments such as clarinet, trumpet, gongs, and certain cymbals, as well as ensembles of these instruments are examples of tonal sounds that are effectively modeled as sums of sinusoids.
Many sounds are modeled as a combination of tonal and non-tonal, or colored noise, sounds. Flute and violin both have tonal and colored noise components. Human speech is often modeled as a mixture of tonal or “voiced” speech, and colored noise or “unvoiced” speech. The present invention is concerned with encoding and synthesizing tonal audio signals. This invention can be used in conjunction with systems for encoding and synthesizing non-tonal or colored noise signals.
Pitched signals are a special class of tonal audio signals in which the sinusoidal frequencies are harmonically related. The present invention can be used for encoding and synthesizing both pitched and unpitched tonal audio signals. Specifically optimized embodiments are proposed for encoding and synthesizing pitched tonal audio signals.
In this specification we use the term “tonal audio signal” to refer to all audio signals that can be effectively modeled as a sum of sinusoids with time-varying parameters consisting of frequency, amplitude, and phase. These are all signals that are not noise-like in character. We use the term “pitched tonal audio signal” or simply “pitched signal” to refer to tonal audio signals whose sinusoidal frequencies are harmonically related. The term “voiced signal” is a common term of art that refers to the pitched tonal audio signal component of a speech signal. The term “unvoiced signal” is a term of art that refers to the noise-like component of a speech signal. This is the non-tonal part of the signal that cannot be effectively modeled as a sum of sinusoids with time-varying parameters consisting of frequency, amplitude, and phase.
One method of encoding and synthesizing tonal audio signals is additive sinusoidal encoding and synthesis. This method provides excellent results since the encoding and synthesis model is the same model as the signal: a sum of sinusoids with time-varying parameters. U.S. Pat. Nos. 4,885,790 and 4,937,873, both to McCauley et. al, and U.S. Pat. No. 4,856,068, to Quatieri, J R. et al., teach systems for encoding and synthesizing sound waveforms as a sums of sinusoids with time-varying amplitude, frequency, and phase. While sinusoidal encoding and synthesis provides excellent results for tonal audio signals, the synthesis requires large computational resources because many tonal audio signals may involve one hundred or more individual sinusoids.
To reduce the computational requirement of sinusoidal synthesis U.S. Pat. Nos. 5,401,897 to Depalle et al., 5,686,683, to Freed, and 5,327,518 teach systems for sinusoidal synthesis using Inverse Fast Fourier Transform (IFFT) techniques. While this approach reduces somewhat the computation requirements for synthesis of a large number of parameters, the computation is still expensive and new problems are introduced. Many synthesis environments, for example musical synthesizers, require multi-channel output. Using IFFT approaches, a separate IFFT system must be used for every channel. In addition, IFFT systems limit sinusoidal parameter update to once per frame, where a frame_length must be at least as long as the lowest frequency period. This parameter update rate may be insufficient at higher frequencies.
U.S. Pat. Nos. 5,581,656, 5,195,166, and 5,226,108, all to Hardwick et al., teach a system where a certain number of sinusoids, the dominant or low-frequency sinusoids, are synthesized using traditional time-domain sinusoidal additive synthesis, while the remaining sinusoids are synthesized using an IFFT approach. This permits higher update rate for the dominant sinusoid components while taking advantage of the lower IFFT computation rate for the bulk of the sinusoids. This approach has the disadvantages of IFFT computation cost especially with multi-channel synthesis. In addition, the dominant sinusoid components are usually at lower frequencies and it is the higher that often require an increased parameter update rate.
A number of less compute-intensive systems have been proposed for encoding and synthesizing tonal audio signals. Linear Predictive Coding (LPC) is well known in the art of speech coding and synthesis. Methods for using LPC for synthesizing tonal or voiced speech concentrate on methods for generating the tonal excitation signal. The numerous approaches include, generating a pulse-train at the desired pitch, generating a multi-pulse excitation signal at the desired pitch, vector quantizing (VQ) the excitation signal, and simply transmitting the excitation signal with fewer bits. U.S. Pat. No. 5,744,742, to Lindemann et al., teaches a system for encoding excitation signals as single pitch period loops. To synthesize excitation signals at different pitches or amplitudes, weighted sums of pitch period excitation signal loops are created. The excitation signal pitch periods are stored in single pitch period waveform memory tables. The phase response of all excitation signal waveforms is forced to be the same so that weighted sums of the waveforms do not cause phase cancellation. All of these techniques with the exception of simply transmitting the excitation signal give poorer results than full additive sinusoidal encoding and synthesis. The pulse based techniques in particular sound “buzzy” and unnatural.
U.S. Pat. Nos. 5,369,730 to Yajima, 5,479,564 to Vogten et al., European Patent 813,184 A1 to Dutoit et al., European Patents 0,363,233A1 and 0,363,233B1, both to Hamon, teach methods of pitch synchronous concatenated waveform encoding and synthesis. With this method a number of single pitch period waveforms are stored in memory. To synthesize a time-varying signal, a sequence of single pitch period waveforms is selected from waveform memory and concatenated over time. The waveform are usually overlap-added for continuity. To shift the pitch of the synthesized signal the overlap rate is modulated. While relatively inexpensive in terms of compute resources, this approach suffers from distortions especially associated with the pitch shifting mechanism. Is audibly inferior to full additive synthesis for most tonal audio signals.
In the music synthesizer field, an approach similar concatenated waveform synthesis is referred to as waveform sequencing. With waveform sequencing each single pitch period waveform is pitch shifted using sample rate conversion techniques and looped for a specified time to generate a stable magnitude spectrum. To generate time-varying magnitude spectra the waveforms are generally cross-faded over time. U.S. Pat. Nos. 3,816,664, to Koch, 4,348,929, to Gallitzendorfer, 4,461,199 and Reissue 34,913, to Hiyoshi et al., and U.S. Pat. No. 4,611,522 to Hideo teach systems of waveform sequencing relative to music synthesis. Waveform sequencing can be economical in computation resources but much of the complex time-varying character of the magnitude spectra is lost due to reduction to a limited number of waveforms.
A number of hybrid systems have been proposed that use additive sinusoidal encoding and synthesis for one part of a signal—usually the tonal part—and some other technique for the another part of the signal—usually the colored noise part. U.S. Pat. No. 5,029,509 to Serra et al. teaches a system for full
Lindemann Eric
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
Encoding and synthesis of tonal audio signals using dominant... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Encoding and synthesis of tonal audio signals using dominant..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Encoding and synthesis of tonal audio signals using dominant... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2607622