Speech synthesis

Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

704224, 704208, G10B 906

Patent

active

059787643

DESCRIPTION:

BRIEF SUMMARY
BACKGROUND OF THE INVENTION

1. Field of the Invention
This invention relates generally to the synthesis of speech waveforms having a smoothed delivery.
2. Related Art
One method of synthesising speech involves the concatenation of small units of speech in the time domain. Thus representations of speech waveform may be stored, and small units such as phonemes, diphones or triphones--i.e. units of less than a word--selected according to the speech that is to be synthesised, and concatenated. Following concatenation, known techniques may be employed to adjust the composite waveform to ensure continuity of pitch and signal phase. However, another factor affecting the perceived quality of the resulting synthesised speech is the amplitude of the units; preprocessing of the waveforms--i.e. adjustment of amplitude prior to storage--is not found to solve this problem, inter alia because the length of the units extracted from the stored data may vary.


SUMMARY OF THE INVENTION

According to the present invention there is provided a speech synthesiser comprising thereto of desired sounds to select from the store units of speech waveform representing portions of words corresponding to the desired sounds; by means for adjusting the amplitude of at least the voiced portion relative to a predetermined reference level.


BRIEF DESCRIPTION OF THE DRAWINGS

One example of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a block diagram of one example of speech synthesis according to the invention;
FIG. 2 is a flow chart illustrating operation of the synthesis; and
FIG. 3 is a timing diagram.


DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

In the speech synthesiser of FIG. 1, a store 1 contains speech waveform sections generated from a digitised passage of speech, originally recorded by a human speaker reading a passage (of perhaps 200 sentences) selected to contain all possible (or at least, a wide selection of) different sounds. Accompanying each section is stored data defining "pitchmarks" indicative of points of glottal closure in the signal, generated in conventional manner during the original recording.
An input signal representing speech to be synthesised, in the form of a phonetic representation is supplied to an input 2. This input may if wished be generated from a text input by conventional means (not shown). This input is processed in known manner by a selection unit 3 which determines, for each unit of the input, the addresses in the store 1 of a stored waveform section corresponding to the sound represented by the unit. The unit may, as mentioned above, be a phoneme, diphone, triphone or other sub-word unit, and in general the length of a unit may vary according to the availability in the waveform store of a corresponding waveform section.
The units, once read out, are concatenated at 4 and the concatenated waveform subjected to any desired pitch adjustments at 5.
Prior to this concatenation, each unit is individually subjected to an amplitude normalisation process in an amplitude adjustment unit 6 whose operation will now be described in more detail. The basic objective is to normalise each voiced portion of the unit to a fixed RMS level before any further processing is applied. A label representing the unit selected allows the reference level store 8 to determine the appropriate RMS level to be used in the normalisation process. Unvoiced portions are not adjusted, but the transitions between voiced and unvoiced portions may be smoothed to avoid sharp discontinuities. The motivation for this approach lies in the operation of the unit selection and concatenation procedures. The units selected are variable in length, and in the context from which they are taken. This makes preprocessing difficult, as the length, context and voicing characteristics of adjoining units affect the merging algorithm, and hence the variation of amplitude across the join. This information is only known at run-time as each unit is selected. Postprocessing after th

REFERENCES:
patent: 5091948 (1992-02-01), Kametani
patent: 5384893 (1995-01-01), Hutchins
patent: 5469257 (1995-11-01), Blake
Shadle et al. Speech Synthesis by Linear Interpolation of Spectral Parameters Between Dyad Boundaries', Nov. 1979.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech synthesis does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech synthesis, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesis will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2149529

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.