Electrical audio signal processing systems and devices – Monitoring/measuring of audio devices – Loudspeaker operation
Patent
1990-11-15
1994-07-05
Kemeny, Emanuel S.
Electrical audio signal processing systems and devices
Monitoring/measuring of audio devices
Loudspeaker operation
G10L 500
Patent
active
053274989
DESCRIPTION:
BRIEF SUMMARY
BACKGROUND OF THE INVENTION
The invention relates to methods and devices of speech synthesis; it relates more particularly to synthesis from a dictionary of sound elements (also known as component sounds) by fractionating the text to be synthesized into microframes each identified by an order number of a corresponding sound element and by prosodic parameters (information concerning sound height at the beginning and at the end of the sound element and duration of the sound element), then by adaptation and concatenation of the sound elements by an adding overlapping procedure.
The sound elements stored in the dictionary will frequently be diphones, i.e. transitions between phonemes, which makes it possible, for the French language, to make to with a dictionary of about 1300 sound elements; different sound elements may however be used, for example, syllables or even words. The prosodic parameters are determined as a function of criteriae relating to the context; the sound height which corresponds to the intonation depends on the position of the sound element in a word and in the sentence and the duration given to the sound element depends on the rythm of the sentence.
It should be recalled that speech synthesis methods are divided into two groups. Those which use a mathematic model of the vocal tract (linear prediction synthesis, formant synthesis and fast Fourier transform synthesis) rely on a deconvolution of the source and of the transfer function of the vocal tract and generally require about 50 arithmetic operations per digital sample of the speech before digital-analog conversion and restoration.
This source-vocal duct deconvolution makes it possible to modify the value of the fundamental frequency of the voiced sounds, namely sounds which have a harmonic structure and are caused by vibration of the vocal cords, and compression of the data representing the speech signal.
Those which belong to the second group of processus use time-domain synthesis by concatenation of wave forms. This solution has the advantage of flexibility in use and the possibility of considerably reducing the number of arithmetic operations per sample. On the other hand, it is not possible to reduce the flow rate required for transmission as much as in the methods based on a mathematic model. But this drawback does not exist when good restoration quality is essential and there is no requirement to transmit data over a narrow channel.
Speech synthesis according to the present invention belong to the second group. It finds a particularly important application in the field of transformation of an orthographic chain (formed for example by the text delivered by a printer) into a speech signal, for example restored directly delivered or transmitted over a normal telephone line.
A speech synthesis process from sound elements using a short term signal add-overlap technique is already known (Diphone synthesis using an overlap-add technique for speech waveforms concatenation, Charpentier et al, ICASSP 1986, IEEE-IECEJ-ASJ International Conference on Acoustics Speech and Signal Processing, pp. 2015-2018). But it relates to short term synthesis signals with standardization of the overlap of the synthesis windows, obtained by a very complex procedure: source; signal;
SUMMARY OF THE INVENTION
It is a main object of the present invention to provide a relatively simple process making acceptable reproduction of speech possible. It starts from the assumption that voiced sounds may be considered as the sum of the impulse responses of a filter, stationary for several milliseconds, (corresponding to the vocal tract) excited by a Dirac succession, i.e. by a "pulse comb", synchronously with the fundamental frequency of the source, namely of the vocal cords, which causes a harmonic spectrum in the spectral field, the harmonics being spaced apart from the fundamental frequency and being weighted by an envelope having maxima called formants, dependent on the transfer function of the vocal tract.
It has already been proposed (Micro-phonemic method of spee
REFERENCES:
patent: 4398059 (1983-08-01), Lin et al.
patent: 4833718 (1989-05-01), Sprague
patent: 4852168 (1989-07-01), Sprague
Charpentier et al, "Diphone Synthesis etc." IEEE-ICASSP 86, Tokyo, pp. 2015-2018.
Makhoul et al, "Time-Scale Modification etc." IEEE-ICASSP 86, Tokyo, pp. 1705-1708.
Kemeny Emanuel S.
Ministry of Posts, Tele-French State Communications & Space
LandOfFree
Processing device for speech synthesis by addition overlapping o does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Processing device for speech synthesis by addition overlapping o, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Processing device for speech synthesis by addition overlapping o will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-802591