Speech processing technique for use in speech recognition...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S206000, C704S207000

Reexamination Certificate

active

06263306

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention pertains to a method of processing speech signals for use in speech recognition applications. More particularly, the present invention relates to a technique for calculating from a speech signal an intermediate set of features for use in speech recognition applications and for use in speech pitch estimation.
2. Description of the Related Art
Various signal processing techniques have been developed for analyzing and digitizing speech signals, which can then be used for various control functions, e.g. computer operation, etc. Some such known techniques employ short-time Fourier spectra or “monograms” of a speech signal, which are computed using windowed Fourier transforms, as explained more fully in Rabiner et al.,
Fundamentals of Speech Recognition
(1993). The resulting sonograms are then further processed to determine, for example, cepstra, fundamental frequencies, etc. A drawback of such known techniques is that they yield non-robust results.
Another problem in speech analysis is that of automated pitch determination. Knowledge of the pitch contour of a speech signal is essential for various speech applications such as coding, speech recognition and speech synthesis. Most known pitch determination techniques are classified as either time domain based or frequency domain based. Time domain techniques rely on the detection of the fundamental period of oscillation in the speech signal, also known as the peak-to-peak measurement in the amplitude of the speech signal. A drawback of such time-based techniques results from the presence of noise may be missing or disguised.
As for frequency domain techniques, these techniques detect a stack of equally spaced lines in the spectrum of a speech signal. The spacing between the lines is a measurement of pitch. For such frequency domain techniques, noise also presents a problem.
SUMMARY OF THE INVENTION
The present invention is directed to a novel speech processing technique for use in speech recognition and pitch estimation applications. The inventive speech processing technique is implemented by calculating Slepian sequences over a selected time length and frequency width and forming a product of the calculated Slepian sequences with a portion of a subject speech signal or segment. The length of the segment is selected to be equivalent to the time and frequency parameters of the calculated Slepian sequences. Fourier transforms of the product are then calculated to obtain multiple tapered Fourier transforms of the speech segment. A frequency dependent quantity is calculated from the multiple tapered Fourier transforms, which is then used to obtain angular derivatives of the speech spectrogram corresponding to the speech signal, thus defining features of the speech signal for use in speech recognition and coding.
In a preferred embodiment, a robust pitch estimate of the subject speech signal is obtained by calculating Fourier transforms of an estimate of the derivative of the log of the speech segment spectrum to produce a peak when the resulting Fourier transforms are plotted. The position of the peak in the plotted Fourier transform provides an estimate of pitch.
In another preferred embodiment an estimate of the pitch of a speech signal is obtained by calculating an F-spectrum from the Fourier transform of the product of the Slepian functions and speech segment. A smoothed derivative of the logarithm of the F-spectrum is then calculated. Once so calculated, the Fourier transform of the resultant quantity (“F-cepstrum”) is obtained, the peak of which represents the pitch estimate.


REFERENCES:
patent: 4217808 (1980-08-01), Slepian et al.
patent: 4556869 (1985-12-01), Thomson
patent: 5325427 (1994-06-01), Dighe
patent: 6124544 (2000-09-01), Alexander et al.
D. J. Thomson, “Multiple-window spectrum estimates for non-stationary data,” Ninth IEEE SP Workshop in Statistical Signal and Array Processing, Sep. 1998, pp. 344 to 347.*
D. J. Thomson, “Quadratic-inverse estimates of transfer functions,” IEEE Sixth SP Workshop on Statistical Signal and Array Processing, Oct. 1992, pp. 432 to 435.*
D.J. Thomson, “An overview of multiple-window and quadratic-inverse spectrum estimation methods,” 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, Apr. 1994, pp. VI/185 to VI/194.*
D.J. Thomson, “Signal extraction via multitaper spectra of nonstationary date,” Conference Record of the Thirty-Second Asilomar Conference on Signals, Systems & Computers, vol. 1, Nov. 1998, pp. 271 to 275.*
Nadeu et al., “Frequency averaging: an useful multiwindow spectral analysis approach,” 1997 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 5, Apr. 1997, pp. 3953 to 3956.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech processing technique for use in speech recognition... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech processing technique for use in speech recognition..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech processing technique for use in speech recognition... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2471793

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.