Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Reexamination Certificate
2005-12-20
2005-12-20
Abebe, Daniel (Department: 2655)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
C704S260000
Reexamination Certificate
active
06978239
ABSTRACT:
A speech synthesizer is provided that concatenates stored samples of speech units without modifying the prosody of the samples. The present invention is able to achieve a high level of naturalness in synthesized speech with a carefully designed training speech corpus by storing samples based on the prosodic and phonetic context in which they occur. In particular, some embodiments of the present invention limit the training text to those sentences that will produce the most frequent sets of prosodic contexts for each speech unit. Further embodiments of the present invention also provide a multi-tier selection mechanism for selecting a set of samples that will produce the most natural sounding speech.
REFERENCES:
patent: 5146405 (1992-09-01), Church
patent: 5384893 (1995-01-01), Hutchins
patent: 5732395 (1998-03-01), Alexander Silverman
patent: 5839105 (1998-11-01), Ostendorf et al.
patent: 5890117 (1999-03-01), Silverman
patent: 5905972 (1999-05-01), Huang et al.
patent: 6064960 (2000-05-01), Bellegarda et al.
patent: 6076060 (2000-06-01), Lin et al.
patent: 6185533 (2001-02-01), Holm et al.
patent: 6230131 (2001-05-01), Kuhn et al.
patent: 6401060 (2002-06-01), Critchlow et al.
patent: 6665641 (2003-12-01), Coorman et al.
patent: 6708152 (2004-03-01), Kivimaki
patent: 6751592 (2004-06-01), Shiga
patent: 6829578 (2004-12-01), Huang et al.
patent: 2002/0072908 (2002-06-01), Case et al.
patent: 2002/0103648 (2002-08-01), Case et al.
patent: 2002/0152073 (2002-10-01), DeMoortel et al.
patent: 0 984 426 (2000-03-01), None
Huang X et al., “Recent Improvements on Microsoft's Trainable Text-To-Speech System-Whistler,” Acoustics, Speech and Signal Processing, 1997, pp. 959-962.
Hunt A et al., “Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database,” IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, pp. 373-376.
Tien Ying Fung et al., “Concatenating Syllables for Response Generation in Spoken Language Applications,” IEEE International Conference on Acoustics, Speech and Signal Processing, 2000, pp. 933-936.
Fu-Chiang Chou et al., “A Chinese Text-To-Speech System Based on Part-of-Speech Analysis, Prosodic Modeling and Non-Uniform Units,” Acoustics, Speech, and Signal Processing, 1997, pp. 923-926.
Bigorgne D. et al., “Multilingual PSOLA Text-To-Speech System,” Statistical Signal and Array Processing, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1993, pp. 187-190.
Nakajima S et al., “Automatic Generation of Synthesis Units Based on Context Oriented Clustering,” International Conference on Acoustics, Speech and Signal Processing, 1988, pp. 659-662.
Black A W et al. “Optimizing Selection of Units from Speech Databases for Concatenative Synthesis,” 4thEuropean Conference on Speech Communication and Technology Eurospeech, 1995, pp. 581-584.
Copy of European Search Report Application No.: EP 01 12 8765.
P.B. Mareuil and B. Soulage, “Input/output normalization and linguistic analysis for a multilingual text-to-speech Synthesis System,” Proc. of 4thISCA workshop on speech synthesis, Scotland, 2001.
http://www.research.att.com/projects/tts/.
D.H. Klatt, “The Klattalk text-to-speech conversion system,” Proc. of ICASSP '82, pp. 1589-1592, 1982.
H. Fujisaki, K. Hirose, N. Takahashi and H. Morikawa, “Acoustic characteristics and the underlying rules of intonation of the common Japanese used by radio and TV announcers,”Proc. of ICASSP '86, pp. 2039-2042, 1986.
K.N. Ross and M. Ostendorf, “A dynamical system model for generating fundamental frequency for speech synthesis,” IEEE transactions on speech and audio processing, vol. 7, No. 3, pp. 295-309, 1999.
J.R. Bellegarda, K. Silverman, K. Lenzo, and V. Anderson, “Statistical prosodic modeling: from corpus design to parameter estimation,” IEEE transactions on speech and audio processing, vol. 9, No. 1, pp. 52-66, 2001.
S. Chen, S. Hwang and Y. Wang, “An RNN-based prosodic information synthesizer for Mandarin text-to-speech,” IEEE transactions on speech and audio processing, vol. 6, No. 3, pp. 226-239, 1998.
M. Chu, H. Peng, H. Yang and E. Chang, “Selecting non-uniform units from a very large corpus for concatenative speech synthesizer,”Proc. of ICASSP '2001, Salt Lake City, 2001.
E. Moulines and F. Charpentier, “Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis Using Diphones,” Speech Communication vol. 9, pp. 453-467, 1990.
Y. Stylianou, T. Dutoit, and J. Schroeter, “Diphone concatenation using a harmonic plus noise model of speech,” Proc. Of Eurospeech '97, pp. 613-616, Rhodes, 1997.
X.D. Huang, A. Acero, J. Adcock, et al., “Whistler: a trainable text-to-speech system,” Proc. of 'ICSLP '96, Philadelphia, 1996.
R.E. Donovan and E.M. Eide, “The IBM trainable speech synthesis system,” Proc. of ICSLP '98, Sidney, 1998.
H. Peng, Y. Zhao and M. Chu, “Perpetually optimizing the cost function for unit selection in a TTS system with one single run of MOS evaluation,” Proc. of ICSLP '2002, Denver, 2002.
M. Chu and H. Peng, “An objective measure for estimating MOS of synthesized speech,” Proc. of Eurospeech '2001, Aalborg, 2001.
http://www.microsoft.com/speech/techinfo/compliance/.
Huang, X., Luo, Z. and Tang, J., “A Quick Method for Chinese Word Segmentation,” Intelligent Processing Systems, vol. 2, pp. 1773-1776 (1997).
Wong, P. and Chan, C., “Chinese Word Segmentation Based on Maximum Matching and Word Binding Force,” COLING'96, Copenhagen (1996).
Wang, W.J., Campbell, W.N., Iwahashi, N. and Sagisaka, Y., “Tree-Based Unit Selection for English Speech Synthesis,” ICASSP'93, vol. 2, pp. 191-194 (1993).
Hon, H., Acero, A., Huang, S., Liu, J. and Plumpe, M., “Automated Generation of Synthesis Units for Trainable Text-to-Speech Systems,” ICASSP'98, vol. 1, pp. 293-296 (1998).
Black, A. and Campbell, N., “Unit Selection in a Concatentaive Speech Synthesis System Using a Large Speech Database,” ICASSP'96, pp. 373-376 (1996).
Chu, M., Tang, D., Si, H., Tian, Z. and Lu, S., “Research on Perception of Juncture Between Syllables in Chinese,” Chinese Journal of Acoustics, vol. 17, No. 2, pp. 143-152.
Wang, et al. “Tree-Based Unit Selection for English Speech Synthesis,” ICASSP'93, vol. 2, pp. 191-194 (1993).
Chu Min
Peng Hu
Abebe Daniel
Magee Theodore M.
Microsoft Corporation
Westman Champlin & Kelly P.A.
LandOfFree
Method and apparatus for speech synthesis without prosody... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for speech synthesis without prosody..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for speech synthesis without prosody... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3514264