Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1999-08-18
2003-03-18
To, Doris H. (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S203000, C704S236000
Reexamination Certificate
active
06535843
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to electronic processing of speech, and similar one-dimensional signals.
Processing of speech signals corresponds to a very large field. It includes encoding of speech signals, decoding of speech signals, filtering of speech signals, interpolating of speech signals, synthesizing of speech signals, etc. In connection with speech signals, this invention relates primarily to processing speech signals that call for time scaling, interpolating and smoothing of speech signals.
It is well known that speech can be synthesized by concatenating speech units that are selected from a large store of speech units. The selection is made in accordance with various techniques and associated algorithms. Since the number of stored speech units that are available for selection is limited, a synthesized speech that derived from a catenation of speech units typically requires some modifications, such as smoothing, in order to achieve a speech that sounds continuous and natural. In various applications, time scaling of the entire synthesized speech segment or of some of the speech units is required. Time scaling and smoothing is also sometimes required when a speech signal is interpolated.
Simple and flexible time domain techniques have been proposed for time scaling of speech signals. See, for example, E. Moulines and W. Verhelst, “Time Domain and Frequency Domain Techniques for Prosodic Modification of Speech”, in
Speech Coding and Synthesis,
pp. 519-555, Elsevier, 1995, and W. Verhelst and M Roelands, “An overlap-add techniques based on waveform similarity (WSOLA) for high quality time-scale modification of speech”,
Proc. IEEE ICASSP
-93, pp. 554-557, 1993.
What has been found is that the quality of time-scaled signal is good for time-scaling factors close to one, but a degradation of the signal is perceived when larger modification factors are required. The degradation is mostly perceived as tonalities and artifacts in the stretched signal. These tonalities do not occur everywhere in the signal. We found that the degradations are mostly localized in areas of transitions of speech, often at the junction of concatenation speech units.
SUMMARY
We discovered that the aforementioned artifacts problem is related to the level of stationarity of the speech signal within a small interval, or window. In particular, we discovered that speech signals portions that are highly non-stationary cause artifacts when they scaled and/or smoothed. We concluded, therefore, that the level of non-stationarity of the speech signal is a useful parameter to employ when performing time scaling of synthesized speech and that, in general, it is not desirable to modify or smooth highly non-stationary areas of speech, because doing so introduces artifacts in the resulting signal. To that end, a measure of the speech signal's non-stationarity must be developed.
A simple yet useful indicator of non-stationarity is provided by the transition rate of the RMS value of the speech signal. Another measure of non-stationarity that is useful for controlling time scaling of the speech signal is the transition rate of spectral parameters, normalized to lie between 0 and 1. A more improved measure of non-stationarity that is useful for controlling time scaling of the speech signal is provided by a combination of the transition rates of the RMS value of the speech signal and the LSFs, normalized to lie between 0 and 1.
REFERENCES:
patent: 4720862 (1988-01-01), Nakata et al.
patent: 4802224 (1989-01-01), Shiraki et al.
patent: 5596676 (1997-01-01), Swaminathan et al.
patent: 5734789 (1998-03-01), Swaminathan et al.
patent: 5799276 (1998-08-01), Komissarchik et al.
patent: 5926788 (1999-07-01), Nishiguchi
patent: 6101463 (2000-08-01), Lee et al.
patent: 6240381 (2001-05-01), Newson
Nandasena, “Spectral Stability Based Event Localizing Temporal Decomposition”,Proceedings of IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 2, pp. 957-960, 1998.
Verhelst et al, “An Overlap-add Technique Based on Waverform Similarity (WSOLA) for High Quality Time-Scale Modification of Speech”,Proc. IEEE ICASSP-93, pp. 554-557, 1993.
Kapilow David A.
Schroeter Juergen
Stylianou Ioannis G.
AT&T Corp.
Brendzel Henry T.
Nolan Daniel A.
To Doris H.
LandOfFree
Automatic detection of non-stationarity in speech signals does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic detection of non-stationarity in speech signals, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic detection of non-stationarity in speech signals will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3073040