Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
2000-06-02
2004-08-10
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S208000, C704S217000
Reexamination Certificate
active
06775650
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention concerns digital speech signal processing techniques.
Many representations of speech signals take account of the harmonic content of such signals resulting from the manner in which they are produced. In most cases, this is reflected in the determination of a pitch frequency of the speech signal.
Digital processing of speech signals has recently expanded greatly in varied domains: speech coding for transmission and storage, speech recognition, noise reduction, echo cancellation, etc. Such processing very frequently uses an estimate of the pitch frequency and particular operations related to the estimated frequency.
Many methods have been developed for estimating the pitch frequency. One method that is routinely used is based on linear prediction which evaluates a prediction delay which is inversely proportional to the pitch frequency. The delay can be expressed as an integer or fractional number of digital signal sample times. Other methods detect directly breaks in the signal which can be attributed to glottal closures of the speaker, the time intervals between such breaks being inversely proportional to the pitch frequency.
If the digital speech signal is transformed into the frequency domain, as by a discrete Fourier transform, it is necessary to consider a discrete spectrum of the speech signal. The discrete frequencies considered are of the form (a/N)×F
e
, where F
e
is the sampling frequency, N is the number of samples of the blocks used in the discrete Fourier transform and a is an integer from 0 to N/2−1. These frequencies do not necessarily include the estimated pitch frequency and/or its harmonics. This causes inaccuracy in operations relating to the estimated pitch, which can cause distortion of the processed signal, affecting its harmonic character.
A principal object of the present invention is to propose a method of conditioning the speech signal which makes it less sensitive to the above drawbacks.
SUMMARY OF THE INVENTION
The invention therefore proposes a method of conditioning a digital speech signal processed by successive frames, wherein harmonic analysis of the speech signal is performed to estimate a pitch frequency of the speech signal over each frame in which it features vocal activity. After estimating the pitch frequency of the speech signal over one frame, the speech signal of the frame is conditioned by oversampling it at an oversampling frequency which is a multiple of the estimated pitch frequency.
In processing the speech signal, this enables the frequencies closest to the estimated pitch to be favoured over other frequencies. The harmonic character of the speech signal is therefore preserved as far as possible. To compute spectral components of the speech signal, the conditioned signal is distributed between blocks of N samples which are transformed into the frequency domain and the ratio between the oversampling frequency and the estimated pitch frequency is chosen as a factor of the number N.
The foregoing technique can be refined by estimating the pitch frequency of the speech signal over a frame in the following manner:
estimating time intervals between two consecutive breaks of the signal which can be attributed to glottal closures of the speaker occurring during the frame, the estimated pitch frequency being inversely proportional to said time intervals;
interpolating the speech signal in said time intervals, so that the conditioned signal resulting from such interpolation has a constant time interval between two consecutive breaks.
This approach artificially constructs a signal frame over which the speech signal features breaks at constant intervals. Any variations of the pitch over the duration of a frame are therefore taken into account.
In a further improvement, after processing each conditioned signal frame, a number of the signal samples supplied by such processing is retained which is equal to an integer multiple of the ratio between the sampling frequency and the estimated pitch frequency. This avoids the distortion problems caused by phase discontinuities between frames, which are generally not totally corrected by conventional overlap-add techniques.
Using the oversampling technique to condition the signal yields a good measurement of the degree of voicing of the speech signal over the frame, based on the entropy of the autocorrelation of the spectral components computed on the basis of the conditioned signal. The greater the disturbance of the spectrum, i.e. the more it is voiced, the lower the entropy values. Conditioning the speech signal accentuates the irregularity of the spectrum and therefore the entropy variations, with the result that the latter constitutes a measurement of good sensitivity.
In the remainder of this description, the conditioning method according to the invention is illustrated in a system for suppressing noise in a speech signal. Clearly the method can find applications in many other types of digital speech processing: coding, recognition, echo cancellation, etc.
REFERENCES:
patent: 5073938 (1991-12-01), Galand
patent: 5226084 (1993-07-01), Hardwick et al.
patent: 5228088 (1993-07-01), Kane et al.
patent: 5384891 (1995-01-01), Asakawa et al.
patent: 5400434 (1995-03-01), Pearson
patent: 5401897 (1995-03-01), Depalle et al.
patent: 5469087 (1995-11-01), Eatwell
patent: 5555190 (1996-09-01), Derby et al.
patent: 5641927 (1997-06-01), Pawate et al.
patent: 5787398 (1998-07-01), Lowry
patent: 5832437 (1998-11-01), Nishiguchi et al.
patent: 5987413 (1999-11-01), Dutoit et al.
patent: 6064955 (2000-05-01), Huang et al.
patent: 6115684 (2000-09-01), Kawahara et al.
patent: 6475245 (2002-11-01), Gersho et al.
patent: 0 438 174 (1991-07-01), None
McClellan et al., “Variable-rate CELP based on subband flatness,” IEEE Transactions on Speech and Audio Processing, vol. 5, No. 2, Mar. 1997, pp. 120 to 130.*
McClellan et al., “Spectral entropy: an alternative indicator for rate allocation?” IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Apr. 1994, pp. 1-201 to 1-204.*
C Murgia, et al., <<An Algorithm for the Estimation of Glottal Closure Instants Using the Sequential Detection of Abrupt Changes in Speech Signals>>, Proceedings of Eusipco-94, 7thEuropean Signal Processing Conference, Edinburgh, vol. 3, Sep. 1994, pp. 1685-1688.
R Le Bouquin et al., <<Enhancement of Noisy Speech Signals: Application to Mobile Radio Communications>>, Speech Communication, Jan. 1996, vol. 18, No. 1, pp. 3-19.
S Nandkumar et al., <<Speech Enhancement Based on a New Set of Auditaury Constrained Parameters>>, Proceedings of the International Conference on Acoustics, Speech, Signal Processing, ICASSP 1994, Apr. 1994, vol. 1, pp. 1-4
P Lockwood et al., <<Experiments With a Nonlinear Spectral Subtractor (NSS), Hidden Markov Models and the Projection for Robust Speech Recognition in Cars>>, Speech Communication, Jun. 1992, vol. 11, No. 2/3, pp. 215-228.
Lockwood Philip
Lubiarz Stéphane
Dorvil Richemond
Lerner Martin
Matra Nortel Communications
Trop Pruner & Hu P.C.
LandOfFree
Method for conditioning a digital speech signal does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for conditioning a digital speech signal, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for conditioning a digital speech signal will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3349294