Synthesis-based pre-selection of suitable units for...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S268000

Reexamination Certificate

active

06505158

ABSTRACT:

TECHNICAL FIELD
The present invention relates to synthesis-based pre-selection of suitable units for concatenative speech and, more particularly, to the utilization of a table containing many thousands of synthesized sentences for selecting units from a unit selection database.
BACKGROUND OF THE INVENTION
A current approach to concatenative speech synthesis is to use a very large database for recorded speech that has been segmented and labeled with prosodic and spectral characteristics, such as the fundamental frequency (F
0
) for voiced speech, the energy or gain of the signal, and the spectral distribution of the signal (i.e., how much of the signal is present at any given frequency). The database contains multiple instances of speech sounds. This multiplicity permits the possibility of having units in the database that are much less stylized than would occur in a diphone database (a “diphone” being defined as the second half of one phoneme followed by the initial half of the following phoneme, a diphone database generally containing only one instance of any given diphone). Therefore, the possibility of achieving natural speech is enhanced with the “large database” approach.
For good quality synthesis, this database technique relies on being able to select the “best” units from the database—that is, the units that are closest in character to the prosodic specification provided by the speech synthesis system, and that have a low spectral mismatch at the concatenation points between phonemes. The “best” sequence of units may be determined by associating a numerical cost in two different ways. First, a “target cost” is associated with the individual units in isolation, where a lower cost is associated with a unit that has characteristics (e.g., F
0
, gain, spectral distribution) relatively close to the unit being synthesized, and a higher cost is associated with units having a higher discrepancy with the unit being synthesized. A second cost, referred to as the “concatenation cost”, is associated with how smoothly two contiguous units are joined together. For example, if the spectral mismatch between units is poor, there will be a higher concatenation cost.
Thus, a set of candidate units for each position in the desired sequence can be formulated, with associated target costs and concatenative costs. Estimating the best (lowest-cost) path through the network is then performed using, for example, a Viterbi search. The chosen units may then concatenated to form one continuous signal, using a variety of different techniques.
While such database-driven systems may produce a more natural sounding voice quality, to do so they require a great deal of computational resources during the synthesis process. Accordingly, there remains a need for new methods and systems that provide natural voice quality in speech synthesis while reducing the computational requirements.
SUMMARY OF THE INVENTION
The need remaining in the prior art is addressed by the present invention, which relates to synthesis-based pre-selection of suitable units for concatenative speech and, more particularly, to the utilization of a table containing many thousands of synthesized sentences as a guide to selecting units from a unit selection database.
In accordance with the present invention, an extensive database of synthesized speech is created by synthesizing a large number of sentences (large enough to create millions of separate phonemes, for example). From this data, a set of all triphone sequences is then compiled, where a “triphone” is defined as a sequence of three phonemes—or a phoneme “triplet”. A list of units (phonemes) from the speech synthesis database that have been chosen for each context is then tabulated.
During the actual text-to-speech synthesis process, the tabulated list is then reviewed for the proper context and these units (phonemes) become the candidate units for synthesis. A conventional cost algorithm, such as a Viterbi search, can then be used to ascertain the best choices from the candidate list for the speech output. If a particular unit to be synthesized does not appear in the created table, a conventional speech synthesis process can be used, but this should be a rare occurrence,
Other and further aspects of the present invention will become apparent during the course of the following discussion and by reference to the accompanying drawings.


REFERENCES:
patent: 5384893 (1995-01-01), Hutchins
patent: 5905972 (1999-05-01), Huang et al.
patent: 5913193 (1999-06-01), Huang et al.
patent: 5913194 (1999-06-01), Karaali et al.
patent: 5937384 (1999-08-01), Huang et al.
patent: 6163769 (2000-12-01), Acero et al.
patent: 6173263 (2001-01-01), Conkie
patent: 6253182 (2001-06-01), Acero
patent: 6304846 (2001-10-01), George et al.
patent: 6366883 (2002-04-01), Campbell et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Synthesis-based pre-selection of suitable units for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Synthesis-based pre-selection of suitable units for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Synthesis-based pre-selection of suitable units for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3040138

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.