Method and apparatus for identifying prosodic word boundaries

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S252000

Reexamination Certificate

active

09850526

ABSTRACT:
A method and computer-readable medium are provided that identify prosodic word boundaries for a text. If the text is unsegmented, it is first segmented into lexical words. The lexical words are then converted into prosodic words using an annotated lexicon to divide large lexical words into smaller words and a model to combine the lexical words and/or the smaller words into larger prosodic words. The boundaries of the resulting prosodic words are used to set the prosody for the synthesized speech.

REFERENCES:
patent: 5146405 (1992-09-01), Church
patent: 5384893 (1995-01-01), Hutchins
patent: 5592585 (1997-01-01), Van Coile et al.
patent: 5727120 (1998-03-01), Van Coile et al.
patent: 5732395 (1998-03-01), Silverman et al.
patent: 5839105 (1998-11-01), Ostendorf et al.
patent: 5890117 (1999-03-01), Silverman
patent: 5905972 (1999-05-01), Huang et al.
patent: 6064960 (2000-05-01), Bellegarda et al.
patent: 6076060 (2000-06-01), Lin et al.
patent: 6101470 (2000-08-01), Eide et al.
patent: 6185533 (2001-02-01), Holm et al.
patent: 6230131 (2001-05-01), Kuhn et al.
patent: 6401060 (2002-06-01), Critchlow et al.
patent: 6499014 (2002-12-01), Chihara
patent: 6665641 (2003-12-01), Coorman et al.
patent: 6708152 (2004-03-01), Kivimaki
patent: 6751592 (2004-06-01), Shiga
patent: 6829578 (2004-12-01), Huang et al.
patent: 7010489 (2006-03-01), Lewis et al.
patent: 2002/0072908 (2002-06-01), Case et al.
patent: 2002/0103648 (2002-08-01), Case et al.
patent: 2002/0152073 (2002-10-01), DeMoortel et al.
patent: 0 984 426 (2000-03-01), None
Wang et al. “Tree-Based Unit Selecion for English Speech Synthesis,” ICASSP'93, vol. 2, pp. 191-194 (1993).
Huang, X., Luo, Z. and Tang, J., “A Quick Method for Chinese Word Segmentation,” Intelligent Processing Systems, vol. 2, pp. 1773-1776 (1997).
Wong, P. and Chan, C., “Chinese Word Segmentation Based on Maximum Matching and Word Binding Force,” COLING'96, Copenhagen (1996).
Wang, W.J., Campbell, W.N., Iwahashi, N. and Sagisaka, Y., “Tree-Based Unit Selection for English Speech Synthesis,” ICASSP'93, vol. 2, pp. 191-194 (1993).
Hon, H., Acero, A., Huang, S., Liu, J. and Plumpe, M., “Automated Generation of Synthesis Units for Trainable Text-to-Speech Systems,” ICASSP'98, vol. 1, pp. 293-296 (1998).
Black, A. and Campbell, N., “Unit Selection in a Concatentaive Speech Synthesis System Using a Large Speech Database,” ICASSP'96, pp. 373-376 (1996).
Chu, M., Tang, D., Si, H., Tian, Z. and Lu, S., “Research on Perception of Juncture Between Syllables in Chinese,” Chinese Journal of Acoustics, vol. 17, No. 2, pp. 143-152.
Huang X et al., “Recent Improvements on Microsoft's Trainable Text-To-Speech System-Whistler,” Acoustics, Speech and Signal Processing, 1997, pp. 959-962.
Hunt A et al., “Unit Selection in a Concatenative Speech Synthesis System Using a Large Speech Database,” IEEE International Conference on Acoustics, Speech and Signal Processing, 1996, pp. 373-376.
Tien Ying Fung et al., “Concatenating Syllables for Response Generation in Spoken Language Applications,” IEEE International Conference on Acoustics, Speech and Signal Processing, 2000, pp. 933-936.
Fu-Chiang Chou et al., “A Chinese Text-To-Speech System Based on Part-of-Speech Analysis, Prosodic Modeling and Non-Uniform Units,” Acoustics, Speech, and Signal Processing, 1997, pp. 923-926.
Bigorgne D. et al., “Multilingual PSOLA Text-To-Speech System,” Statistical Signal and Array Processing, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 1993, pp. 187-190.
Nakajima S et al., “Automatic Generation of Synthesis Units Based on Context Oriented Clustering,” International Conference on Acoustics, Speech and Signal Processing, 1988, pp. 659-662.
Black A W et al. “Optimising Selection of Units from Speech Databases for Concatenative Synthesis,” 4thEuropean Conference on Speech Communication and Technology Eurospeech, 1995, pp. 581-584.
European Search Report Application No. EP 01 12 8765.
P.B. Mareuil and B. Soulage, “Input/output normalization and linguistic analysis for a multilingual text-to-speech Synthesis System,” Proc. of 4thISCA workshop on speech synthesis, Scotland, 2001.
http://www.research.att.com/projects/tts/.
D.H. Klatt, “The Klattalk text-to-speech conversion system,” Proc. of ICASSP '82, pp. 1589-1592, 1982.
H. Fujisaki, K. Hirose, N. Takahashi and H. Morikawa, “Acoustic characteristics and the underlying rules of intonation of the common Japanese used by radio and TV announcers,”Proc. of ICASSP '86, pp. 2039-2042, 1986.
K.N. Ross and M. Ostendorf, “A dynamical system model for generating fundamental frequency for speech synthesis,” IEEE transactions on speech and audio processing, vol. 7, No. 3, pp. 295-309, 1999.
J.R. Bellegarda, K. Silverman, K. Lenzo, and V. Anderson, “Statistical prosodic modeling: from corpus design to parameter estimation,” IEEE transactions on speech and audio processing, vol. 9, No. 1, pp. 52-66, 2001.
S. Chen, S. Hwang and Y. Wang, “An RNN-based prosodic information synthesizer for Mandarin text-to-speech,” IEEE transactions on speech and audio processing, vol. 6, No. 3, pp. 226-239, 1998.
M. Chu, H. Peng, H. Yang and E. Chang, “Selecting non-uniform units from a very large corpus for concatenative speech synthesizer,”Proc. of ICASSP '2001, Salt Lake City, 2001.
E. Moulines and F. Charpentier, “Pitch-Synchronous Waveform Processing Techniques for Text-to-Speech Synthesis Using Diphones,” Speech Communication vol. 9, pp. 453-467, 1990.
Y. Stylianou, T. Dutoit, and J. Schroeter, “Diphone concatenation using a harmonic plus noise model of speech,” Proc. Of Eurospeech '97, pp. 613-616, Rhodes, 1997.
M. Chu, H. Peng, H. Yang and E. Chang, “Selecting non-uniform units from a very large corpus for concatenative speech synthesizer,” Proc. of ICASSP '2001, Salt Lake City, 2001.
X.D. Huang, A. Acero, J. Adcock, et al., “Whistler: a trainable text-to-speech system,” Proc. of 'ICSLP '96, Philadelphia, 1996.
R.E. Donovan and E.M. Eide, “The IBM Trainable speech synthesis system,” Proc. of ICSLP '98, Sidney, 1998.
H. Peng, Y. Zhao and M. Chu, “Perpetually optimizing the cost function for unit selection in a TTS system with one single run of MOS evaluation,” Proc. of ICSLP '2002, Denver, 2002.
M. Chu and H. Peng, “An objective measure for estimating MOS of synthesized speech,” Proc. of Eurospeech '2001, Aalborg, 2001.
http://www.microsoft.com/speech/techinfo/compliance/.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for identifying prosodic word boundaries does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for identifying prosodic word boundaries, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for identifying prosodic word boundaries will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3838411

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.