Method and apparatus for rapid acoustic unit selection from...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S266000

Reexamination Certificate

active

06697780

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of Invention
The invention relates to methods and apparatus for synthesizing speech.
2. Description of Related Art
Rule-based speech synthesis is used for various types of speech synthesis applications including Text-To-Speech (TTS) and voice response systems. Typical rule-based speech synthesis techniques involve concatenating pre-recorded phonemes to form new words and sentences.
Previous concatenative speech synthesis systems create synthesized speech by using single stored samples for each phoneme in order to synthesize a phonetic sequence. A phoneme, or phone, is a small unit of speech sound that serves to distinguish one utterance from another. For example, in the English language, the phoneme /r/ corresponds to the letter “R” while the phoneme /t/ corresponds to the letter “T”. Synthesized speech created by this technique sounds unnatural and is usually characterized as “robotic” or “mechanical.”
More recently, speech synthesis systems started using large inventories of acoustic units with many acoustic units representing variations of each phoneme. An acoustic unit is a particular instance, or realization, of a phoneme. Large numbers of acoustic units can all correspond to a single phoneme, each acoustic unit differing from one another in terms of pitch, duration, and stress as well as various other qualities. While such systems produce a more natural sounding voice quality, to do so they require a great deal of computational resources during operation. Accordingly, there is a need for new methods and apparatus to provide natural voice quality in synthetic speech while reducing the computational requirements.
SUMMARY OF THE INVENTION
The invention provides methods and apparatus for speech synthesis by selecting recorded speech fragments, or acoustic units, from an acoustic unit database. To aide acoustic unit selection, a measure of the mismatch between pairs of acoustic units, or concatenation cost, is pre-computed and stored in a database. By using a concatenation cost database, great reductions in computational load are obtained compared to computing concatenation costs at run-time.
The concatenation cost database can contain the concatenation costs for a subset of all possible acoustic unit sequential pairs. Given that only a fraction of all possible concatenation costs are provided in the database, the situation can arise where the concatenation cost for a particular sequential pair of acoustic units is not found in the concatenation cost database. In such instances, either a default value is assigned to the sequential pair of acoustic units or the actual concatenation cost is derived.
The concatenation cost database can be derived using statistical techniques which predict the acoustic unit sequential pairs most likely to occur in common speech. The invention provides a method for constructing a medium with an efficient concatenation cost database by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing the concatenation costs values on the medium.


REFERENCES:
patent: 5870706 (1999-02-01), Alshawi
patent: 5913193 (1999-06-01), Huang et al.
patent: 5970460 (1999-10-01), Bunce et al.
patent: 6006181 (1999-12-01), Buhrke et al.
patent: 6173263 (2001-01-01), Conkie
patent: 6233544 (2001-05-01), Alshawi
patent: 6366883 (2002-04-01), Campbell et al.
patent: 6370522 (2002-04-01), Agarwal et al.
Hunt et al., “Unit Selection in a Concatenative Speech Synthesis System using a Large Speech Database,” 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, May 1996, pp. 373 to 376.*
Webopedia, definition of “hashing”, 1 page.*
TechTarget, definition of “hashing”, 2 pages.*
Beutnagel, Mohri, and Riley, “Rapid Unit Selection from a Large Speech Corpus for Concatenative Speech Synthesis” AT&T Labs Research, Florham Park, New Jersey, no publication date.
Robert Endre Tarjan and Andrew Chi-Chih Yao, “Storing a Sparse Table”, Communication of the ACM, vol. 22:11, pp. 606-611, Nov. 1979.
Y. Stylianou (1998) “Concatenative Speech Synthesis using a Harmonic plus Noise Model”, Workshop on Speech Synthesis, Jenolan Caves, NSW, Australia, Nov. 1998.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for rapid acoustic unit selection from... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for rapid acoustic unit selection from..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for rapid acoustic unit selection from... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3315106

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.