Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Patent
1996-08-26
1999-11-02
Hudspeth, David R.
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
704224, 704208, G10B 906
Patent
active
059787643
DESCRIPTION:
BRIEF SUMMARY
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to the synthesis of speech waveforms having a smoothed delivery.
2. Related Art
One method of synthesising speech involves the concatenation of small units of speech in the time domain. Thus representations of speech waveform may be stored, and small units such as phonemes, diphones or triphones--i.e. units of less than a word--selected according to the speech that is to be synthesised, and concatenated. Following concatenation, known techniques may be employed to adjust the composite waveform to ensure continuity of pitch and signal phase. However, another factor affecting the perceived quality of the resulting synthesised speech is the amplitude of the units; preprocessing of the waveforms--i.e. adjustment of amplitude prior to storage--is not found to solve this problem, inter alia because the length of the units extracted from the stored data may vary.
SUMMARY OF THE INVENTION
According to the present invention there is provided a speech synthesiser comprising thereto of desired sounds to select from the store units of speech waveform representing portions of words corresponding to the desired sounds; by means for adjusting the amplitude of at least the voiced portion relative to a predetermined reference level.
BRIEF DESCRIPTION OF THE DRAWINGS
One example of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:
FIG. 1 is a block diagram of one example of speech synthesis according to the invention;
FIG. 2 is a flow chart illustrating operation of the synthesis; and
FIG. 3 is a timing diagram.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
In the speech synthesiser of FIG. 1, a store 1 contains speech waveform sections generated from a digitised passage of speech, originally recorded by a human speaker reading a passage (of perhaps 200 sentences) selected to contain all possible (or at least, a wide selection of) different sounds. Accompanying each section is stored data defining "pitchmarks" indicative of points of glottal closure in the signal, generated in conventional manner during the original recording.
An input signal representing speech to be synthesised, in the form of a phonetic representation is supplied to an input 2. This input may if wished be generated from a text input by conventional means (not shown). This input is processed in known manner by a selection unit 3 which determines, for each unit of the input, the addresses in the store 1 of a stored waveform section corresponding to the sound represented by the unit. The unit may, as mentioned above, be a phoneme, diphone, triphone or other sub-word unit, and in general the length of a unit may vary according to the availability in the waveform store of a corresponding waveform section.
The units, once read out, are concatenated at 4 and the concatenated waveform subjected to any desired pitch adjustments at 5.
Prior to this concatenation, each unit is individually subjected to an amplitude normalisation process in an amplitude adjustment unit 6 whose operation will now be described in more detail. The basic objective is to normalise each voiced portion of the unit to a fixed RMS level before any further processing is applied. A label representing the unit selected allows the reference level store 8 to determine the appropriate RMS level to be used in the normalisation process. Unvoiced portions are not adjusted, but the transitions between voiced and unvoiced portions may be smoothed to avoid sharp discontinuities. The motivation for this approach lies in the operation of the unit selection and concatenation procedures. The units selected are variable in length, and in the context from which they are taken. This makes preprocessing difficult, as the length, context and voicing characteristics of adjoining units affect the merging algorithm, and hence the variation of amplitude across the join. This information is only known at run-time as each unit is selected. Postprocessing after th
REFERENCES:
patent: 5091948 (1992-02-01), Kametani
patent: 5384893 (1995-01-01), Hutchins
patent: 5469257 (1995-11-01), Blake
Shadle et al. Speech Synthesis by Linear Interpolation of Spectral Parameters Between Dyad Boundaries', Nov. 1979.
Breen Andrew Paul
Jackson Peter
Lowry Andrew
Abebe Daniel
British Telecommunications public limited company
Hudspeth David R.
LandOfFree
Speech synthesis does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech synthesis, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech synthesis will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2149529