Signal dependent speech modifications

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S201000

Reexamination Certificate

active

06324501

ABSTRACT:

BACKGROUND OF THE INVENTION
This invention relates to electronic processing of speech, and similar one-dimensional signals.
Processing of speech signals corresponds to a very large field. It includes encoding of speech signals, decoding of speech signals, filtering of speech signals, interpolating of speech signals, synthesizing of speech signals, etc. In connection with speech signals, this invention relates primarily to processing speech signals that call for time scaling, interpolating and smoothing of speech signals.
It is well known that speech can be synthesized by concatenating speech units that are selected from a large store of speech units. The selection is made in accordance with various techniques and associated algorithms. Since the number of stored speech units that are available for selection is limited, a synthesized speech that is derived from a catenation of speech units typically requires some modifications, such as smoothing, in order to achieve a speech that sounds continuous and natural. In various applications, time scaling of the entire synthesized speech segment or of some of the speech units is required. Time scaling and smoothing is also sometimes required when a speech signal is interpolated.
Simple and flexible time domain techniques have been proposed for time scaling of speech signals. See, for example, E. Moulines and W. Verhelst, “Time Domain and Frequency Domain Techniques for Prosodic Modification of Speech”, in
Speech Coding and Synthesis,
pp. 519-555, Elsevier, 1995, and W. Verhelst and M Roelands, “An overlap-add techniques based on waveforn similarity (WSOLA) for high quality time-scale modification of speech”,
Proc. IEEE ICASSP
-93, pp. 554-557, 1993.
What has been found is that the quality of time-scaled signal is good for time-scaling factors close to one, but a degradation of the signal is perceived when larger modification factors are required. The degradation is mostly perceived as tonalities and artifacts in the stretched signal. These tonalities do not occur everywhere in the signal. We found that the degradations are mostly localized in areas of transitions of speech, often at the junction of concatenation speech units.
SUMMARY
We discovered that the aforementioned artifacts problem is related to the level of stationarity of the speech signal within a small interval, or window. In particular, we discovered that speech signals portions that are highly non-stationary cause artifacts when they scaled and/or smoothed. We concluded, therefore, that the level of non-stationarity of the speech signal is a useful parameter to employ when performing time scaling of synthesized speech and that, in general, it is not desirable to modify or smooth highly non-stationary areas of speech, because doing so introduces artifacts in the resulting signal.
A simple yet useful indicator of non-stationarity is provided by the transition rate of the root mean squared (RMS) value of the speech signal. Another measure of non-stationarity that is useful for controlling modifications of the speech signal is the transition rate of spectral parameters (line spectrum frequencies, LSF's), normalized to lie between 0 and 1. A more improved measure of non-stationarity that is usefull for controlling modifications of the speech signal is provided by a combination of the transition rates of the RMS value of the speech signal and the LSFs, normalized to lie between 0 and 1.


REFERENCES:
patent: 3982070 (1976-09-01), Flanagan
patent: 4907484 (1990-03-01), Suzuki et al.
patent: 4922535 (1990-05-01), Dolby
patent: 5299281 (1994-03-01), Coolegem
patent: 6016468 (2000-01-01), Freeman et al.
patent: 05-323997-A (1993-12-01), None
Bangham et al (“Smoothing 1-Dimensional Signals using Sieves & Weightless Neural Nets,” IEE Colloquium on Non-Linear Filters, May 1994).*
Nandasena, “Spectral Stability Based Event Localizing Temporal Decomposition”,Processing of IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 2, pp. 957-960, 1998.
Verhelst et al, “An Overlap-add Technique Based on Waverform Similarity (WSOLA) for High Quality Time-Scale Modification of Speech”,Proc. IEEE ICASSP-93, pp. 554-557, 1993.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Signal dependent speech modifications does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Signal dependent speech modifications, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Signal dependent speech modifications will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2598136

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.