Data processing: speech signal processing – linguistics – language – Speech signal processing – Synthesis
Patent
1996-12-02
2000-07-25
Knepper, David D.
Data processing: speech signal processing, linguistics, language
Speech signal processing
Synthesis
704266, G10L 1308
Patent
active
060946334
DESCRIPTION:
BRIEF SUMMARY
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a method and apparatus for converting text to a waveform. More specifically, it relates to the production of an output in form of an acoustic wave, namely synthetic speech, from an input in the form of signals representing a conventional text.
2. Related Art
This overall conversion is very complicated and it is sometimes carried out in several modules wherein the output of one module constitutes the input for the next. The first module receives signals representing a conventional text and the final module produces synthetic speech as its output. This synthetic speech may be a digital representation of the waveform followed by conventional digital-to-analogue conversion in order to produce the audible output. In many cases it is desired to provide the audible output over a telephone system. In this case it may be convenient to carry out the digital-to-analogue conversion after transmission so that transmission takes place in digital form.
There are advantages in the modular structure, e.g. each module is separately designed and any one of the modules can be replaced or altered in order to provide flexibility, improvements or to cope with changing circumstances.
Some procedures utilise a sequence of three modules, namely
A brief description of these modules will now be given.
Module (A) receives signals representing a conventional text, e.g. the text of this specification, and it modifies selected features. Thus module (A) may specify how numbers are processed. For example, it will decide if of module (A), each of which is compatible with the subsequent modules so that different forms of output result.
Module (B) converts graphemes to phonemes. "Grapheme" denotes data representations corresponding to the symbols of the conventional alaphbet used in the conventional manner. The text of this specification is a good example of "graphemes". It is a problem of synthetic speech that the graphemes may have little relationship to the way in which the words are pronounced, especially in languages such as English. Therefore, in order to produce waveforms, it is appropriate to convert the graphemes into a different alphabet, called "phonemes" in this specification, which has a very close correlation with the sound of the words. In other words it is the purpose of module (B) to deal with the problem that the conventional alphabet is not phonetic.
Module (C) converts the phonemes into a digital waveform which, as mentioned above, can be converted into an analogue format and thence into audible waveform.
This invention relates to a method and apparatus for use in module (B) and this module will now be described in more detail.
Module (B) utilises linked databases which are formed of a large number of independent entries. Each entry includes access data which is in the form of representations, eg bytes, of a sequence of graphemes and an output string which contains representations, eg bytes of the phoneme equivalent to the graphemes contained in the access section. A major problem of grapheme/phoneme conversion resides in the size of database necessary to cope with a language. One simple, and theoretically ideal, solution would be to provide a database so large that it has an individual entry for every possible word in the language, including all possible inflections of every possible word in the language. Clearly, given a complete database, every word in the input text would be individually recognised and an excellent phoneme equivalent would be output. It should be apparent that it is not possible to provide such a complete database. In the first place, it is not possible to list every word in a language and even if such a list were available it would be too large for computational purposes.
Although the complete database is not possible, it is possible to provide a database of useable dimension which contains, for example, common words and words whose pronunciation is not simply related to the spelling. Such a database will give excellent g
REFERENCES:
Jonathan Allen, "Machine-to-Man Communication by Speech Part II: Synthesis of Prosodic Features of Speech by Rule", Proc. of the Spring Joint Computer Conference, Apr. 30-May 2, 1968, pp. 339-344.
Francis Lee, "Machine-to-Man Communication by Speech Part I: Generation of Segmental Phonemes from Text" Proc. of the Spring Joint Computer Conference, Apr. 30-May 2, 1968.
Klatt, "Review of Text-to-Speech Conversion for English", J. Acoust. Soc. Am., vol. 82, No. 3, Sep. 1987, pp. 737-793.
Furni, Digital Speech Processing, Synthesis and Recognition, 1989, Marcel Dekker, Inc., pp. 220-224.
Rowden, Speech Processing, 1992, McGraw-Hill Book Company, pp. 184-221 (Chapter 6).
Gaved Margaret
Hawkey James
British Telecommunications public limited company
Knepper David D.
LandOfFree
Grapheme to phoneme module for synthesizing speech alternately u does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Grapheme to phoneme module for synthesizing speech alternately u, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Grapheme to phoneme module for synthesizing speech alternately u will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1343224