Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission
Reexamination Certificate
1998-11-24
2001-05-01
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
For storage or transmission
C704S207000, C704S208000, C704S214000, C704S219000
Reexamination Certificate
active
06226606
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to computer speech systems. In particular, the present invention relates to pitch tracking in computer speech systems.
Computers are currently being used to perform a number of speech related functions including transmitting human speech over computer networks, recognizing human speech, and synthesizing speech from input text. To perform these functions, computers must be able to recognize the various components of human speech. One of these components is the pitch or melody of speech, which is created by the vocal cords of the speaker during voiced portions of speech. Examples of pitch can be heard in vowel sounds such as the “ih” sound in “six”.
The pitch in human speech appears in the speech signal as a nearly repeating waveform that is a combination of multiple sine waves at different frequencies. The period between these nearly repeating waveforms determines the pitch.
To identify pitch in a speech signal, the prior art uses pitch trackers. A comprehensive study of pitch tracking is presented in “A Robust Algorithm for Pitch Tracking (RAPT)” D. Talkin, Speech Coding and Synthesis, pp.495-518, Elsevier, 1995. One such pitch tracker identifies two portions of the speech signal that are separated by a candidate pitch period and compares the two portions to each other. If the candidate pitch period is equal to the actual pitch of the speech signal, the two portions will be nearly identical to each other. This comparison is generally performed using a cross-correlation technique that compares multiple samples of each portion to each other.
Unfortunately, such pitch trackers are not always accurate. This results in pitch tracking errors that can impair the performance of computer speech systems. In particular, pitch-tracking errors can cause computer systems to misidentify voiced portions of speech as unvoiced portions and vice versa, and can cause speech systems to segment the speech signal poorly.
SUMMARY OF THE INVENTION
In a method for tracking pitch in a speech signal, first and second window vectors are created from samples taken across first and second windows of the speech signal. The first window is separated from the second window by a test pitch period. The energy of the speech signal in the first window is combined with the correlation between the first window vector and the second window vector to produce a predictable energy factor. The predictable energy factor is then used to determine a pitch score for the test pitch period. Based in part on the pitch score, a portion of the pitch track is identified.
In other embodiments of the invention, a method of pitch tracking takes samples of a first and second waveform in the speech signal. The centers of the first and second waveform are separated by a test pitch period. A correlation value is determined that describes the similarity between the first and second waveforms and a pitch-contouring factor is determined that describes the similarity between the test pitch period and a previous pitch period. The correlation value and the pitch-contouring factor are then combined to produce a pitch score for transitioning from the previous pitch period to the test pitch period. This pitch score is used to identify a portion of the pitch track.
Other embodiments of the invention provide a method of determining whether a region of a speech signal is a voiced region. The method involves sampling a first and second waveform and determining the correlation between the two waveforms. The energy of the first waveform is then determined. If the correlation and the energy are both high, the method identifies the region as a voiced region.
REFERENCES:
patent: 4731846 (1988-03-01), Secrest et al.
patent: 5680508 (1997-10-01), Liu
patent: 0 625 774 A2 (1994-11-01), None
patent: 0 712 116 A2 (1996-05-01), None
“Super Resolution Pitch Determination of Speech Signals,” IEEE Transactions on Signal Processing, vol. 39, No. 1, pp. 40-48 (Jan. 1, 1991).
“A Pitch Determination and Voiced/unvoiced Decision Algorithm for Noisy Speech,” Speech Communication, NL, Elsevier Science Publishers, Amsterdam, vol., 21, No. 3, pp. 191-207 (Apr. 1, 1997).
A. Acero, “Source Filter Models for Time-Scale Pitch-Scale Modification of Speech”,IEEE Int. Conf. on Acoustics, Speech, and Signal Processing,vol. 2, Seattle, pp. 881-884, May 1998.
W. Hess, “Pitch Determination of Speech Signals.”, Springer-Verlag, New York, 1983.
X. Qian and R. Kimaresan, “A variable Frame Pitch Estimator and Test Results.”,IEEE Int. Conf. on Acoustics, Speech, and Signal Processing,vol. 1, Atlanta, GA, pp. 228-231, May, 1996.
L. R. Rabiner, “On the Use of Autocorrelation Analysis for Pitch Detection.”,IEEE transactions on ASSP,vol. 25, pp. 24-33, 1977.
D. Talkin, “A Robust Algorithm for Pitch Tracking (RAPT).”, InSpeech Coding and Synthesis,pp. 495-518, Elsevier, 1995.
Acero Alejandro
Droppo, III James G.
Dorvil Richemond
Magee Theodore M.
Microsoft Corporation
Westman Champlin & Kelly P.A.
Wieland Susan
LandOfFree
Method and apparatus for pitch tracking does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for pitch tracking, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for pitch tracking will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2471416