Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1997-02-25
2001-03-27
Hudspeth, David R. (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S255000, C704S258000, C704S242000
Reexamination Certificate
active
06208967
ABSTRACT:
A method and apparatus for automatic speech segmentation into phoneme-like units for use in speech processing applications, and based on segmentation into Broad Phonetic Classes, Sequence-Constrained Vector Quantization, and Hidden-Markov-Models.
BACKGROUND TO THE INVENTION
The invention relates to a method for automatically segmenting speech for use in speech processing applications. Of various possible applications, a particular one is speech synthesis, more in particular speech synthesis based on the concatenating of diphones. Diphones are short speech segments that each contain mainly a transition between two adjacent phonemes, plus the last part of the preceding and the first part of the succeeding phoneme, respectively. Diphones may be extracted according to certain rules that are known per se, from a database that has already been segmented into single phonemes. Typically, such a data base consists of isolated words recorded from a particular single speaker in a controlled environment, and also comprises the verified correspondence between phonetic transcription and acoustic realization. A straightforward and automatic realization of the segmentation method according to the preamble and based on phoneme Hidden Markov Models (HMM) has been disclosed in O. Boëffard et al, Automatic Generation of Optimized Unit Dictionaries for Text to Speech Synthesis, International Conference on Speech and Language Processing, Banff, Alberta CANADA (1992), p. 1211-1215. However, the quality of the known method has been found insufficient, in that the boundaries found by the method generally deviate too much from the positions where corresponding boundaries would be placed by a manual procedure. Of course, the segmentation accuracy could be improved if the phoneme HMMs are first trained with a separate and manually segmented database. Setting up of such a manually segmented database is however often too costly, since this has to be repeated each time a new speaker person will be used in a speech synthesis system. In consequence, amongst other things it is an object of the present invention to propose a method for speech segmentation, that is fully automatic, does not need manually segmented speech material, and gives a better result than the reference.
SUMMARY TO THE INVENTION
Now, according to one of its aspects, the invention provides a method for automatically segmenting speech for use in speech processing applications, said method comprising the steps of:
classifying and segmenting utterances from a speech data base into three broad phonetic classes (BPC) voiced, unvoiced, and silence, for attaining preliminary segmentation positions;
using preliminary segmentation positions as anchor points for further segmentation into phoneme-like units by sequence-constrained vector quantization in an SCVQ-step;
initializing phoneme Hidden-Markov-Models with the segments provided by the SCVQ-step, and further tuning of the HMM parameters by Baum-Welch estimation;
finally, using the fully trained HMMS to perform Viterbi alignment of the utterances with respect to their phonetic transcription and in this way obtaining the final segmentation points.
An additional advantage of the method recited is that only minimal initial information is required such as would consist in a phonetic transcription of the utterances. In particular, no separate manually segmented database is needed for estimating the HMM parameters.
Advantageously, after said training a diphone set is constructed for further usage, such as in speech synthesis. The invention has provided a straightforward and inexpensive multi-speaker system.
The invention also relates to an apparatus for segmenting speech for use in speech processing applications, said apparatus comprising:
BPC segmenting means fed by a speech data base for classifying and segmenting utterances received into three broad phonetic classes (BPC) voiced, unvoiced, and silence,
SCVQ segmenting means fed by said BPC segmenting means for by using preliminary segmentation positions as anchor points executing further segmentation into phoneme-like units by sequence-constrained vector quantization (SCVQ),
phone Hidden-Markov-Means (HMM) fed by said SCVQ segmenting means for initialization of phoneme HMM and further tuning of HMM parameters;
final segmentation means controlled by said HMM.
Such an apparatus would allow untrained personnel to train it in short time to an arbitrary new speaker. Further advantageous aspects of the invention are recited in dependent Claims.
REFERENCES:
patent: 5579436 (1996-11-01), Chou et al.
patent: 5715367 (1998-02-01), Gillick et al.
Speech Communication, pp. 207-220, vol. 19, No. 3, Sep. 1996, S. Pauws et al, “A Hierarchical Method of Automatic Segmentation fort Synthesis Applications”.
O. Boeffard et al, Automatic Generation of Optimized Unit Dictionaries for Text to Speech Synthesis, International Conf. on Speech and Language Processing, Banff, Alberta Canada (1992), pp. 1211-1215.
L.R. Rabiner, “A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”, Proc. IEEE, vol. 77, No.2, Feb. 1989, pp. 257-286.
C.S. Myers and L.R. Rabiner, A Level Building Dynamic Time Warping Algorithm for Connected Word Recognition, IEEE Trans. ASSP, vol. 29, No. 2, Apr. 1981, pp. 284-297.
J.G. Wilpon and L.R. Rabiner, A Modified K-Means Clustering Algorithm for Use in Isolated Word Recognition, IEEE Trans. ASSP, vol. 33, No. 3, Jun. 1985, pp. 587-594.
P.A. Taylor and S.D. Isard, Automatic Diphone Segmentation, Eurospeech 91, pp.709-711.
Kamp Yves G. C.
Pauws Stefan C.
Willems Leonardus F. W.
Hudspeth David R.
U.S. Philips Corporation
Wieland Susan
LandOfFree
Method and apparatus for automatic speech segmentation into... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for automatic speech segmentation into..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for automatic speech segmentation into... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2451248