Extracting formant-based source-filter data for coding and...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S220000, C704S261000

Reexamination Certificate

active

06195632

ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to speech and waveform synthesis. The invention further relates to the extraction of formant-based source-filter data from complex waveforms. The technology of the invention may be used to construct text-to-speech and music synthesizers and speech coding systems. In addition, the technology can be used to realize high quality pitch tracking and pitch epoch marking. The cost functions employed by the present invention can be used as discriminatory functions or feature detectors in speech labeling and speech recognition.
One way of analyzing and synthesizing complex waveforms, such as waveforms representing synthesized speech or musical instruments, is to employ a source-filter model. Using the source-filter model, a source signal is generated and then run through a filter that adds resonances and coloration to the source signal. The combination of source and filter, if properly chosen, can produce a complex waveform that simulates human speech or the sound of a musical instrument.
In source-filter modeling, the source waveform can be comparatively simple: white noise or a simple pulse train, for example. In such case the filter is typically complex. The complex filter is needed because it is the cumulative effect of source and filter that produces the complex waveform. Alternatively, the source waveform can be comparatively complex, in which case, the filter can be more simple. Generally speaking, the source-filter configuration offers numerous design choices.
We favor a model that most closely represents the natural occurring degree of separation between human glottal source and the vocal tract filter. When analyzing the complex waveform of human speech, it is quite challenging to ascertain which aspects of the waveform may be attributed to the glottal source and which aspects may be attributed to the vocal tract filter. It is theorized, and even expected, that there is an acoustic interaction between the vocal tract and the nature of the glottal waveform which is generated at the glottis. In many cases this interaction may be negligible, hence in synthesis it is common to ignore this interaction, as if source and filter are independent.
We believe that many synthesis systems fall short due to a source-filter model with a poor balance between source complexity and filter complexity. The source model is often dictated by ease of generation rather than the sound quality. For instance linear predictive coding (LPC) can be understood in terms of a source-filter model where the source tends to be white (i.e. flat spectrum). This model is considerably removed from the natural separation between human vocal tract and glottal source, and results in poor estimates of the first formant and many discontinuities in the filter parameters.
An approach heretofore taken as an alternative of LPC to overcome the shortcomings of LPC involves a procedure called “analysis by synthesis.” Analysis by synthesis is a parametric approach that involves selecting a set of source parameters and a set of filter parameters, and then using these parameters to generate a source waveform. The source waveform is then passed through the corresponding filter and the output waveform is compared with the original waveform by a distance measure. Different parameter sets are then tried until the distance is reduced to a minimum. The parameter set that achieves the minimum is then used as a coded form of the input signal.
Although analysis by synthesis does a good job of optimizing a parametric voice source with a vocal tract modeling filter, it imposes a parametric source model assumption that is difficult to work with.
The present invention takes a different approach. The present invention employs a filter and an inverse filter. The filter has an associated set of filter parameters, for example, the center frequency and bandwidth of each resonator. The inverse filter is designed as the inverse of the filter (e.g. poles of one become zeros of the other and vice versa). Thus the inverse filter has parameters that bear a relationship to the parameters of the filter. A speech signal is then supplied to the inverse filter to generate a residual signal. The residual signal is processed to extract a set of data points that define a line or curve (e.g. waveform) that may be represented as plural segments.
Different processing steps may be employed to extract and analyze the data points, depending on the application. These processing steps include extracting time domain data from the residual signal and extracting frequency domain data from the residual signal, either performed separately or in combination with other signal processing steps.
The processing steps involve a cost calculation based on a length measure of the line or waveform which we term “arc-length.” The arc-length or its square is calculated and used as a cost parameter associated with the residual signal. The filter parameters are then selectively adjusted through iteration until the cost parameter is minimized. Once the cost parameter is minimized, the residual signal is used to represent an extracted source signal. The filter parameters associated with the minimized cost parameter may also then be used to construct the filter for a source-filter model synthesizer.
Use of this method results in a smoothness or continuity in the output parameters. When these parameters are used to construct a source-filter model synthesizer, the synthesized waveform sounds remarkably natural, without distortions due to discontinuities. A class of cost functions, based on the arc-length measure, can be used to implement the invention. Several members of this class are described in the following specification. Others will be apparent to those skilled in the art.


REFERENCES:
patent: Re. 32124 (1986-04-01), Atal
patent: 4944013 (1990-07-01), Gouvianakis et al.
patent: 5029211 (1991-07-01), Ozawa
“Automatic Formant Tracking by a Newton-Raphson Technique”, J. P. Olive, The Journal of the Acoustical Society of America, vol. 50, No. 2, revised May 18, 1971, pp. 661-670.
“An Algorithm for Automatic Formant Extraction Using Linear Prediction Spectra”, Stephanie S. McCandless, IEEE Transactions On Acoustics, Speech, and Signal Processing, vol. ASSP-22, No. 2, Apr. 1974, pp. 135-141.
Interactive Digital Inverse Filtering and Its Relation To Linear Prediction Methods:, Melvyn J. Hunt, John S. Bridle and John N. Holmes, Joint Speech Research Unit, IEEE, 1978, pp. 15-18.
“High Quality Glottal LPC-Vocoding”, per Hedelin, Chalmers University of Technology, Department of Information Theory, S-412 96 Goteborg, Sweden, IEEE, 1986, pp. 465-468.
“Globally Optimising Formant Tracker Using Generalised Centroids”, A. Crowe and M. A. Jack, Centre for Speech Technology Research, University of Edinburgh, United Kingdon, Aug. 7, 1987, pp. 1-2.
“Robust Arma Analysis As An Aid In Developing Parameter Control Rules For A Pole-Zero Cascade Speech Synthesizer”, J. De Veth, W. van Golstein Brouwers, H. Loman, and L. Boves, Nijmegan University, PTT Research Neher Laboratories, The Netherlands, S6a.3, IEEE 1990, pp. 305-307.
“Design and Performance of an Analysis-by-Synthesis Class of Predictive Speech Coders”, Richard C. Rose, Member IEEE, and Thomas P. Barnwell, III, Fellow IEEE, IEEE Transactions On Acoustics, Speech, and Signal Processing, vol. 38, No. 9, Sep. 1990, pp. 1489-1503.
“Glottal Wave Analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering”, Paavo Alku, Helsinki University of Technology, Acoustics Laboratory, Finland, Speech Communication 11, revised Jan. 23, 1992, pp. 109-118.
“Formant Location From LPC Analysis Data”, Roy C. Snell, Member IEEE and Fausto Milinazzo, IEEE Transactions On Speech and Audio Processing, vol. 1, No. 2, Apr. 1993, pp. 129-134.
“Automatic Estimation Of Formant and Voice Source Parameters Using A Subspace Based Algorithm”, Chang-Sheng Yang and Hideki Kasuya, Faculty of Engineering, Utsunomiya University, Japan, 1998, pp. 1-4.
“Estimation Of The Glottal Pulseform Based On Discre

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Extracting formant-based source-filter data for coding and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Extracting formant-based source-filter data for coding and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Extracting formant-based source-filter data for coding and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2559531

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.