Method and apparatus for speech synthesis without prosody...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S260000

Reexamination Certificate

active

07127396

ABSTRACT:
A speech synthesizer is provided that concatenates stored samples of speech units without modifying the prosody of the samples. The present invention is able to achieve a high level of naturalness in synthesized speech with a carefully designed training speech corpus by storing samples based on the prosodic and phonetic context in which they occur. In particular, some embodiments of the present invention limit the training text to those sentences that will produce the most frequent sets of prosodic contexts for each speech unit. Further embodiments of the present invention also provide a multi-tier selection mechanism for selecting a set of samples that will produce the most natural sounding speech.

REFERENCES:
patent: 5146405 (1992-09-01), Church
patent: 5384893 (1995-01-01), Hutchins
patent: 5732395 (1998-03-01), Silverman
patent: 5839105 (1998-11-01), Ostendorf et al.
patent: 5890117 (1999-03-01), Silverman
patent: 5905972 (1999-05-01), Huang et al.
patent: 6064960 (2000-05-01), Bellegarda et al.
patent: 6076060 (2000-06-01), Lin et al.
patent: 6185533 (2001-02-01), Holm et al.
patent: 6230131 (2001-05-01), Kuhn et al.
patent: 6401060 (2002-06-01), Critchlow et al.
patent: 6665641 (2002-08-01), Coorman et al.
patent: 6499014 (2002-12-01), Chihara
patent: 6505158 (2003-01-01), Conkie
patent: 6708152 (2004-03-01), Kivimaki
patent: 6751592 (2004-06-01), Shiga
patent: 6829578 (2004-12-01), Huang et al.
patent: 7010489 (2006-03-01), Lewis et al.
patent: 2002/0072908 (2002-06-01), Case et al.
patent: 2002/0103648 (2002-08-01), Case et al.
patent: 2002/0152073 (2002-10-01), DeMoortel et al.
Wang, et al. “Tree-Based Unit Selection for English Speech Synthesis,” ICASSP'93, vol. 2, pp. 191-194 (1993).
Huang, X., Luo, Z. and Tang, J., “A Quick Method for Chinese Word Segmentation,” Intelligent Processing Systems, vol. 2, pp. 1773-1776 (1997).
Wong, P. and Chan, C., “Chinese Word Segmentation Based on Maximum Matching and Word Binding Force,” Coling'96, Copenhagen (1996).
Hon, H., Acero, A., Huang, S., Liu, J. and Plumpe, M., “Automated Generation of Synthesis Units for Trainable Text-to-Speech Systems,” ICASSP'98, vol. 1, pp. 293-296 (1998).
Black, A. and Campbell, N., “Unit Selection in a Concatentaive Speech Synthesis System Using a Large Speech Database,” ICASSP'96, pp. 373-376 (1996).
Chu, M., Tang, D., Si, H., Tian, Z. and Lu, S., “Research on Perception of Juncture Between Syllables in Chinese,” Chinese Journal of Acoustics, vol. 17, No. 2, pp. 143-152.
Huang X et al., “Recent Improvements on Microsoft's Trainable Text-To-Speech System-Whistler,” Acoustics, Speech and Signal Processing, 1997, pp. 959-962.
Hunt A et al., “Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database,” IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, pp. 373-376.
Tien Ying Fung et al., “Concatenating Syllables for Response Generation in Spoken Language Applications,” IEEE International Conference on Acoustics, Speech and Signal Processing, 2000, pp. 933-936.
Fu-Chiang Chou et al., “A Chinese Text-To-Speech System Based on Part-of-Speech Analysis, Prosodic Modeling and Non-Uniform Units,” Acoustics, Speech, and Signal Processing, 1997, pp. 923-926.
Bigorgne D. et al., “Multilingual PSOLA Text-To-Speech System,” Statistical Signal and Array Processing, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1993, pp. 187-190.
Nakajima S et al., “Automatic Generation of Synthesis Units Based on Context Oriented Clustering,” International Conference on Acoustics, Speech and Signal Processing, 1988, pp. 659-662.
Black A W et al. “Optimizing Selection of Units from Speech Databases for Concatenative Synthesis,” 4thEuropean Conference on Speech Communication and Technology Eurospeech, 1995, pp. 581-584.
Copy of European Search Report Application No. EP 01 12 8765.
P.B. Mareuil and B. Soulage, “Input/output normalization and linguistic analysis for a multilingual text-to-speech Synthesis System,” Proc. of 4thISCA workshop on speech synthesis, Scotland, 2001.
http://www.microsoft.com/speech/techinfo/compliance/.
http://www.research.att.com/projects/tts/.
D.H. Klatt, “The Klattalk text-to-speech conversion system,” Proc. of ICASSP '82, pp. 1589-1592, 1982.
H. Fujisaki, K. Hirose, N. Takahashi and H. Morikawa, “Acoustic characteristics and the underlying rules of intonation of the common Japanese used by radio and TV announcers,”Proc. of ICASSP '86, pp. 2039-2042, 1986.
K.N. Ross and M. Ostendorf, “A dynamical system model for generating fundamental frequency for speech synthesis,” IEEE transactions on speech and audio processing, vol. 7, No. 3, pp. 295-309, 1999.
J.R. Bellegarda, K. Silverman, K. Lenzo, and V. Anderson, “Statistical prosodic modeling: from corpus design to parameter estimation,” IEEE transactions on speech and audio processing, vol. 9, No. 1, pp. 52-66, 2001.
S. Chen, S. Hwang and Y. Wang, “An RNN-based prosodic information synthesizer for Mandarin text-to-speech,” IEEE transactions on speech and audio processing, vol. 6, No. 3, pp. 226-239, 1998.
M. Chu, H. Peng, H. Yang and E. Chang, “Selecting non-uniform units from a very large corpus for concatenative speech synthesizer,”Proc. of ICASSP '2001, Salt Lake City, 2001.
E. Moulines and F. Charpentier, “Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis Using Diphones,” Speech Communication vol. 9, pp. 453-467, 1990.
Y. Stylianou, T. Dutoit, and J. Schroeter, “Diphone concatenation using a harmonic plus noise model of speech,” Proc. Of Eurospeech '97, pp. 613-616, Rhodes, 1997.
M. Chu, H. Peng, H. Yang and E. Chang, “Selecting non-uniform units from a very large corpus for concatenative speech synthesizer,” Proc. of ICASSP '2001, Salt Lake City, 2001.
X.D. Huang, A. Acero, J. Adcock, et al., “Whistler: a trainable text-to-speech system,” Proc. of 'ICSLP '96, Philadelphia, 1996.
R.E. Donovan and E.M. Eide, “The IBM trainable speech synthesis system,” Proc. of ICSLP '98, Sidney, 1998.
H. Peng, Y. Zhao and M. Chu, “Perpetually optimizing the cost function for unit selection in a TTS system with one single run of MOS evaluation,” Proc. of ICSLP '2002, Denver, 2002.
M. Chu and H. Peng, “An Objective measure for estimating MOS of synthesized speech,” Proc. of Eurospeech '2001, Aalborg, 2001.
Office Action dated: Nov. 15, 2004 and Response dated: Dec. 8, 2004, from U.S. Appl. No. 09/850,527, filed May 7, 2001.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for speech synthesis without prosody... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for speech synthesis without prosody..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for speech synthesis without prosody... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3714486

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.