Linguistic converter

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000, C704S010000, C704S251000

Reexamination Certificate

active

06829580

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a converter for generating a sequence of linguistic elements from a signal representing text. In particular this invention relates to a converter for generating a sequence of phonemes from a textual signal. Such a converter is commonly referred to as a grapheme to phoneme converter, a grapheme being a sub sequence of one or more letters, and a phoneme being a particular type of linguistic element which represents the pronunciation of part of a word. A grapheme to phoneme converter may be used in speech synthesis during text analysis, prior to synthesis of speech from the sequence of phonemes. It may also be used in speech recognition in order to generate a sequence of linguistic elements required to create a speech recognition template. Another use for such a converter could be in a process to linguistically analyse text (for example, sentences) to determine the linguistic properties of the text for example, in terms of the number of phonemes, biphones or triphones.
2. Related Art
One technique for converting a sequence of graphemes to a sequence of phonemes is to use a set of letter-to-sound rules. However, unless a language is phonetic such rules will often produce incorrect phoneme sequences (or pronunciations) for some words. An alternative is to use a large lexicon which provides a phonemic transcription for as many as possible words in a language.
For languages such as Celtic languages (for example, Welsh) and other languages which exhibit the phenomenon known as mutation, the initial letter of a word changes depending on the context of the word. If every possible mutation of a word is included in a lexicon the result is an enormous dictionary which requires a large amount of memory, and long search times.
BRIEF SUMMARY OF THE INVENTION
In this invention a linguistic analyser is provided which uses a phonemic look-up table or dictionary with a smaller number of dictionary entries than would be required if a phonemic transcription was provided for each possible word in a language, thus reducing memory and search time required by the analyser.
According to the present invention there is provided an apparatus for receiving an input signal representing a word, each word comprising a sequence of one or more graphemes, and for providing a sequence of one or more symbols, each symbol representing a phonetic element of said word said apparatus comprising
a first store containing a plurality of representations of words and corresponding symbol sequences;
a second store containing a plurality of duples comprising a substitutable grapheme and a corresponding substitute grapheme;
a third store containing a plurality of duples comprising a substitutable grapheme and a corresponding symbol; and
a processor arranged to
receive said input signal;
provide a first signal corresponding to a grapheme in the word and a second signal corresponding to any graphemes other than said grapheme;
access the second store using the first signal to retrieve a corresponding substitute grapheme;
access the third store using the first signal to retrieve a corresponding symbol;
provide a modified signal comprising a signal corresponding to said substitute grapheme and said second signal;
access the first store using the modified signal to retrieve a corresponding sequence of symbols;
provide a modified sequence of symbols comprising the symbol retrieved from the third store and symbols of the retrieved sequence, which symbols correspond to the second signal.
In a preferred embodiment the first signal corresponds to the first grapheme in the word.
This invention also provides a method for analysing a word, each word comprising a sequence of one or more graphemes, and for providing a sequence of one or more symbols, each symbol representing a phonetic element of said word, the method comprising steps of
a) providing a first signal corresponding to a grapheme in the word and a second signal corresponding to any graphemes other than said grapheme;
b) using the first signal to determine a corresponding substitute grapheme;
c) using the first signal to determine a corresponding symbol;
d) providing a modified signal comprising a signal corresponding to said substitute grapheme and said second signal;
e) using the modified signal to determine a corresponding sequence of symbols;
f) providing a modified sequence of symbols comprising the symbol determined at step c) and symbols of the retrieved sequence, which symbols correspond to the second signal.
In a preferred embodiment the first signal corresponds to the first grapheme in the word.
In an improved version, in the event that no sequence of symbols corresponding to the modified signal is determined at step e) the method further comprises steps of
g) providing a suffix signal corresponding to a subsequence of graphemes at the end of the word and a whole stem signal corresponding to the subsequence of graphemes other than those corresponding to the suffix signal;
h) using the whole stem signal to determine a corresponding sequence of symbols;
i) in the event that a sequence of symbols corresponding to the stem signal is not determined at step h), providing an ending signal corresponding to a sequence of graphemes with which a word may end and using a signal comprising the whole stem signal and the ending signal to determine a corresponding sequence of symbols;
i) using the suffix signal to determine a corresponding sequence of symbols; and
j) providing a sequence of symbols comprising the symbol sequence corresponding to the stem signal and the symbol sequence corresponding to the suffix signal.
And another improvement gives a method in which in which in the event that no sequence of symbols corresponding to the stem signal is determined at step h) the method further comprises steps of
k) providing a first stem signal corresponding to a grapheme in the sequence of graphemes corresponding to the stem signal and a second stem signal corresponding to any graphemes other than said grapheme;
l) using the first stem signal to determine a corresponding substitute grapheme;
m) using the first stem signal to determine a corresponding symbol;
n) providing a modified signal comprising a signal corresponding to said substitute grapheme and said second stem signal;
o) using the modified signal to determine a corresponding sequence of symbols;
p) providing a modified sequence of symbols comprising the symbol determined at step m), symbols of the retrieved sequence, which symbols correspond to the second stem signal and symbols corresponding to the suffix symbol.
This invention also provides a speech synthesiser incorporating a linguistic analyser as described above and a speech recogniser incorporating a linguistic analyser as described above.


REFERENCES:
patent: 5781884 (1998-07-01), Pereira et al.
patent: 5794177 (1998-08-01), Carus et al.
patent: 6094633 (2000-07-01), Gaved et al.
patent: 6098035 (2000-08-01), Yamamoto et al.
“A Text Analyzer for Korean Text-to-Speech Systems”; Sangho lee; Yung-Hwan Oh; Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on , Vol.: 3 , Oct. 3-6, 1996; pp.: 1692-1695 vol. 3.*
Abe et al, “A Kana-Kanji Translation System for Non-Segmented Input Sentences Based on Syntactic and Semantic Analysis”, Zeitschrift Fuer Werkstofftechnik—Journal of Materials Technology, Materials Technology and Testing, Aug. 25, 1986, pp. 280-285, XP000612328.
Do et al, A Proposal for Vietnamese Character Encoding Standards in a Unified Text Processing Framework, Computer Standards and Interfaces, vol. 14, No. 1, Jan. 1, 1992, pp. 3-12, XP000247148.
IBM Technical Disclosure Bulletin, “Intelligent Computer Keyboard for Entering Texts of Sinhalese and Other Similar Languages”, vol. 35, No. 6, Nov. 1, 1992, pp. 24-27, XP000314045.
Derouault et al, “Natural Languages Modeling for Phoneme-to-Text Transcription”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Nov. 1986, vol. PAMI-8, No. 6, pp. 742-749, XP002071655.
Nobuyasu Itoh, “Japanese Language

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Linguistic converter does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Linguistic converter, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Linguistic converter will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3291863

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.