Determining an adequate representative sound using two...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S266000, C704S256000

Reexamination Certificate

active

06430532

ABSTRACT:

BACKGROUND OF THE INVENTION
Field of the Invention
The invention relates to a method and a configuration for producing a sound.
Such a configuration and such a method are known from R. E. Donovan et al.: “Automatic Speech Synthesiser Parameter Estimation using HMMs”, IEEE 1995, pages 640-643; hereinafter “Donovan et al.” This publication discloses the production of a decision tree, for its part permitting a cluster formation for the modeling of triphones. For this purpose, a series of questions, which relate directly to a phonetic context and can be answered with yes or no, is used. Each answer sets up a further subtree. Consequently, training data of naturally spoken speech are projected onto the branches and finally leaves of the decision tree.
The decision tree is used for calculating the leaves to be used in order to determine hidden Markov models for all possible triphones not covered by the training data.
Hidden Markov models (HMM) for the modeling of sounds are known from E. G. Schukat-Talamazzini:
Automatische Spracherkennung —Grundlagen, statistische Modelle und effiziente Algorithmen
[Automatic Speech Recognition—Principles, Statistical Models And Efficient Algorithms], Vieweg & Sohn Verlagsgesellschaft mbH, Brunswick/Wiesbaden 1995, pages 125-139. In the linguistic production of a word, the constituent sounds are realized with variable duration and in varying spectral composition. Dependent on the rate and rhythm of the speech, each individual phonetic segment of the utterance is allotted an unpredictable number of feature vectors; each vector includes not only its phonetic content but also information components relating to the speaker, ambience and slurring, which make identification of the sounds significantly more difficult.
These conditions can be modeled in a simplified form by a two-stage process, as
FIG. 1
shows by the example of the word “haben” [have]. For the phonemes of the word, a corresponding number of states
102
to
106
are reserved in the model and are run through along the direction of the arrow
101
for producing speech. At every time pulse, it is possible to remain in the current state or transfer to the succeeding state; the system behaves randomly and is determined by the transfer probabilities
107
to
111
depicted. For example, the state
103
belonging to the phoneme /a/ is adopted over a number of successive brief analysis intervals (on average over ten), whereas realizations of the plosive /b/ take less time.
While the first stage described of the random process models the time distortion of different pronunciation variants, a second stage serves for sensing spectral variations. Linked to each state of the word model is a statistical output function that weights the phonetic realization alternatives. In the example of
FIG. 1
, for the production of the phoneme /a/ not only the actually matching phone class
113
but also the phone class
114
is permitted with a positive probability (here: 0.1). The phone class
118
is likewise permitted for the production of the phoneme
/ with a probability of 0.3. The formalism described also allows a description of an optional sound elimination, expressed by the “bridging”
119
of the state
105
by a direct transfer between the states
104
and
106
. The bridging is given a probability of 0.2 by way of example.
The transfer probabilities of the hidden Markov model can be determined based on training data. When it has been fully trained, the HMM represents a blueprint for the production of sequences of sounds (cf. Schukat-Talamazzini, pages 127-139). One method of training the HMM is to use the Baum-Welch algorithm.
However, in the method described in Donovan et al. it is disadvantageous that, in the decision tree, only the leaves respectively found are used for the sound modeling.
SUMMARY OF THE INVENTION
It is accordingly an object of the invention to provide a method and configuration for determining a representative sound, method for synthesizing speech, and method for speech processing that overcomes the hereinafore-mentioned disadvantages of the heretofore-known devices of this general type and that, when determining a representative sound (from a large number of sounds), accounts for not only a structure, devised according to predetermined criteria, but also a characteristic state criterion of this structure.
With the foregoing and other objects in view, there is provided, in accordance with the invention, a method for determining a representative sound based on a structure. The first step of the method is forming, from a sound, a structure having a characteristic state criterion. The next step is providing a set of sound models, each sound model having a representative with a plurality of quality criterion. The next step is determining, in the structure, a first sound model from the set of sound models matching a first quality criterion. The next step is determining a second sound model from the set of sound models depending on the characteristic state criterion of the structure. The next step is forming an overall quality criterion for each representative by assessing representatives of the first and the second sound model with regard to a second quality criterion in addition to the first quality criterion. The next step is determining a representative having an adequate overall quality criterion with regard to the first and second quality criteria as a representative sound.
In the method for determining a representative sound based on a structure that includes a set of sound models, each sound model has at least one representative for the modeled sound. In the structure, a first sound model, matching with regard to a first quality criterion, is determined from the set of sound models. Dependent on a characteristic state criterion of the structure, at least one second sound model is determined from the set of sound models. Representatives of the first sound model and of the at least one second sound model are assessed in addition to the first quality criterion with regard to a second quality criterion. From the representatives of the first and the at least one second sound model, that at least one representative which has an adequate overall quality criterion with regard to the first and second quality criteria is determined as a representative sound.
In accordance with how the structure is ordered, a search is conducted within the structure for a matching sound model for the sound to be produced. In this case, “matching” applies with regard to the first quality criterion, which is predetermined in particular by the structure.
The structure may be configured as a tree structure, preferably as a binary tree. Such a tree structure has nodes (for the embodiment of the sound models), branches (for the hierarchical subdivision of the sound models dependent on the criteria on the basis of which the structure is constructed) and leaves (nodes from which no further branch extends).
The structure constructed based on predetermined criteria is then used in order to determine, depending on the characteristic state of the structure (in particular the tree structure) at least one second sound model from the set of sound models. In this case, the characteristic state criterion in the structure may be a measure of distance from the first sound model. In the case of the binary tree as the structure, all the sound models within a predetermined distance from the first sound model may be regarded as second sound models. Here, the term “distance” is not necessarily to be interpreted in the local sense; rather, the “distance” may also concern a dimension of distance with regard to one or more predetermined criteria.
With the first sound model and a set of second sound models that satisfy the characteristic state criterion and have the predetermined distance from the first sound model, a second quality criterion is determined for the representatives of the sound models. The overall quality criterion for each representative is made up of the first and the at least one second quali

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Determining an adequate representative sound using two... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Determining an adequate representative sound using two..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Determining an adequate representative sound using two... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2907491

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.