Computer method and apparatus for grapheme-to-phoneme...

Data processing: speech signal processing – linguistics – language – Speech signal processing – For storage or transmission

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S265000

Reexamination Certificate

active

06347295

ABSTRACT:

BACKGROUND OF THE INVENTION
Generally speaking, a “speech synthesizer” is a computer device or system for generating audible speech from written text. That is, a written form of a string or sequence of characters (e.g., a sentence) is provided as input, and the speech synthesizer generates the spoken equivalent or audible characterization of the input. The generated speech output is not merely a literal reading of each input character, but a language dependent, in-context verbalization of the input. If the input was the phone number (508) 691-1234 given in response to a prior question of “What is your phone number?”, the speech synthesizer does not produce the reading “parenthesis, five hundred eight, close parenthesis, six hundred ninety-one . . . ” Instead, the speech synthesizer recognizes the context and supporting punctuation and produces the spoken equivalent “five (pause) zero (pause) eight (pause) six . . . ” just as an English-speaking person normally pronounces a phone number.
Historically the first speech synthesizers were formed of a dictionary, engine and digital vocalizer. The dictionary served as a look-up table. That is, the dictionary cross referenced the text or visual form of a character string (e.g., word or other unit) and the phonetic pronunciation of the character string/word. In linguistic terms the visual form of a character string unit (e.g., word) is called a “grapheme” and the corresponding phonetic pronunciation is termed a “phoneme”. The phonetic pronunciation or phoneme of character string units is indicated by symbols from a predetermined set of phonetic symbols. To date, there is little standardization of phoneme symbol sets and usage of the same in speech synthesizers.
The engine is the working or processing member that searches the dictionary for a character string unit (or combinations thereof) matching the input text. In basic terms, the engine performs pattern matching between the sequence of characters in the input text and the sequence of characters in “words” (character string units) listed in the dictionary. Upon finding a match, the engine obtains from the dictionary entry (or combination of entries) of the matching word (or combination of words), the corresponding phoneme or combination of phonemes. To that end, the purpose of the engine is thought of as translating a grapheme (input text) to a corresponding phoneme (the corresponding symbols indicating pronunciation of the input text).
Typically the engine employs a binary search through the dictionary for the input text. The dictionary is loaded into the computer processor physical memory space (RAM) along with the speech synthesizer program. The memory footprint, i.e., the physical memory space in RAM needed while running the speech synthesizer program, thus must be large enough to hold the dictionary. Where the dictionary portion of today's speech synthesizers continue to grow in size, the memory footprint is problematic due to the limited available memory (RAM and ROM) in some applications.
The digital vocalizer receives the phoneme data generated by the engine. Based on the phoneme data together with timing and stress data, the digital vocalizer generates sound signals for “reading” or “speaking” the input text. Typically, the digital vocalizer employs a sound and speaker system for producing the audible characterization of the input text.
To improve on memory requirements of speech synthesizers, another design was developed. In that design, the dictionary is replaced by a rule set. Alternatively, the rule set is used in combination with the dictionary instead of completely substituting therefor. At any rate, the rule set is a group of statements in the form
IF (condition)-then-(phonemic result) Each such statement determines the phoneme for a grapheme that matches the IF condition. Examples of rule-based speech synthesizers are DECtalk by Digital Equipment Corporation of Maynard, Massachusetts and TrueVoice by Centigram Communications of San Jose, Calif.
Each rule (If-then statement) is the result of years of linguistic studies and are largely “manually” generated. Thus the formation of a rule set is a very time consuming and language dependent process. Further, there are little standards in this area.
These and other problems exist in speech synthesizer technology. New solutions have been attempted but with little success. As a result, highly accurate speech synthesizers are yet to come.
In particular, typically a speech-synthesizer developer starts with a very large dictionary. A human linguist specializing in language and speech analysis examines words and their corresponding pronunciation in view of respective part of speech, spelling and other linguistic factors. As such, the linguist manually extracts rules from the dictionary or uses his knowledge of the language to create some rules. Next the speech-synthesizer developer removes from the dictionary the words that can be synthesized from the newly created rules. The more words able to be removed from the dictionary due to a created rule, the better.
The task of manually analyzing words of a language for grapheme-to-phoneme patterns is a laborious and painstaking one. It may take several months or years of manual human effort for a linguist to analyze a language and produce a grapheme-to-phoneme rule set for that language. Not only is this process lengthy and complicated, it is also prone to error where the created rules are manually typed into a rule file. Some of the typographical errors may be caught in compiling the rule file. Those that are not caught typically result in rules which will never be matched and hence never utilized during text-to-speech processing.
SUMMARY OF THE INVENTION
The foregoing problems of manual grapheme-to-phoneme rule generation are overcome by the present invention. The present invention provides a computer method and system for automated rule and rule set generation. Instead of having a human linguist manually analyze dictionary entries to determine grapheme-to-phoneme patterns and manually type the rules into a rule set, the present invention provides automatic digital processing means and a statistical approach to create the best possible rule set. In turn, the present invention enables creation of the smallest possible dictionary associated with a reasonable sized set of rules to substitute for a single huge dictionary of the prior art which cannot be used for many embedded applications. Even in the future, if large memory becomes inexpensive and available, the small memory footprint of the present invention rule set and resulting dictionary will still be an advantage since one may use the extra available memory to store other information such as a domain dictionary, abbreviation, phrase dictionary, etc.
In a preferred embodiment, each dictionary entry in an input dictionary is formed of (i) a sequence of one or more characters indicative of a subject character string, and (ii) a corresponding phonemic (phoneme string) representation formed of phonemic data parts. There is a different phonemic data part for each different subsequence of characters in the character string.
For each of the different subsequences of characters in the character string of a dictionary entry, the present invention (a) determines respective corresponding phonemic data parts found throughout the input dictionary for the subsequence of characters, and (b) from the determined respective corresponding phonemic data parts for the certain subsequence of characters, forms a grapheme-to-phoneme rule for indicating transformation from the certain subsequence of characters to at least one of the respective corresponding phonemic data parts. To that end the present invention generates grapheme-to-phoneme rules from the input dictionary. As such, the present invention effectively provides a rule generator.
By using the rule generator of the present invention, errors in rule creation are minimized, and a more accurate (less redundant and with fewer exceptions) set of rules is created. Also, rules may be genera

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Computer method and apparatus for grapheme-to-phoneme... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Computer method and apparatus for grapheme-to-phoneme..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer method and apparatus for grapheme-to-phoneme... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2966312

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.