Data processing: speech signal processing – linguistics – language – Linguistics – Natural language
Reexamination Certificate
1998-02-11
2002-12-10
Edouard, Patrick N. (Department: 2654)
Data processing: speech signal processing, linguistics, language
Linguistics
Natural language
C704S001000
Reexamination Certificate
active
06493662
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to the translation of numbers from one form of representation to another and, in particular, to the translation of a numerical representation of a number into an alphabetical representation, or from an alphabetical representation into a numerical representation.
BACKGROUND OF THE INVENTION
A variety of automated business applications would benefit from the translation of numerical representations of numbers to alphabetical representations having the same number value. Printing a check is one example of a computer system business application that employs both numerical representations of numbers, comprising numerical character strings, and alphabetical representations of numbers, comprising alphabetical character strings. Typically, a check's payable amount is printed on the check as a numerical character string, e.g., “$4,562.92” in the check's upper right hand corner. The payable amount is typically also “spelled out”, e.g., “four thousand five hundred and sixty-two dollars and ninety-two cents” at another location on the face of the check. Since numbers are typically stored within a computer as a binary numerical representation, it is relatively easy to produce a number in numerical form for printing on a check. But a stored numerical representation must be translated into a human-readable form such as a natural language representation, which typically will take the form of an alphabetical representation, for printing of a check's “spelled out” value, and this translation is substantially more involved than the translation from an internal binary numerical representation to an external decimal numerical representation. There are a number of other business applications which require translation from a number's numerical character string representation to an alphabetical character string representation.
A number translation engine might also find more general application in speech synthesis and speech recognition applications. A number translation engine which translates numerical character strings into alphabetical character strings could be used within a speech synthesis system to provide an appropriate character string to a speech synthesis system's output sound system. For example, a speech synthesis system contained within a slot machine might announce to a gambler, and not insignificantly, to nearby gamblers, that the player has won “four thousand five hundred and sixty-two dollars”. Without proper translation from numerical to alphabetical character strings, the announcement may sound something like “four five six two dollars”, or even worse. As a result, a good deal of the drama, and advertising value, associated with the announcement would be lost.
A number translation engine may also be employed to translate alphabetical representations of numbers into numerical representations within a speech recognition system. For example, rather than requiring a user to enunciate numbers in an unnatural, awkward, fashion, e.g., “four five six two point nine two dollars” in a speech-input banking application, a number translation engine may allow a person to speak in a natural manner, indicating that they would like to deposit “four thousand five hundred and sixty-two dollars and ninety two cents”.
Although number translators which transform a numerical representation of a number into an English language alphabetical representation exist, such translators do not accommodate a variety of languages, or even various representations, such as ordinal and cardinal representations, within a single language. The development of a number translator that can accommodate various languages faces significant obstacles. For example, it's not enough to modify an algorithm for English to read the literal string values from a resource file. English separates the component parts of a number with spaces; Italian and German do not. Furthermore, although the other digit positions are separated by spaces, English and French separate the ones and tens digits with a hyphen, Spanish uses “y,” and many other languages either use a space or nothing. Some languages, such as Greek and Swedish, run the tens and one digits together into one word but put spaces between the others.
Some languages, such as Spanish and Italian, drop the word for “one” from the phrases “one hundred” or “one thousand.” In Spanish, for example, “one thousand” is “mil,” not uno mil.” In some languages, such as German, the word for “one” in “one hundred” or “one thousand” is different from the word for “one” on its own: in German, “one” is eins,” but “one thousand” is “eintausend,” not “einstausend.” In some languages, the word for “hundred” or “thousand” becomes plural when there's a number other than 1 in the hundreds place. In French, for example, 100 is “cent,” but 200 is “deux cents.” In some languages, the word for “hundred” or “thousand” also changes form depending on whether it's followed by more digits. 100 in Spanish is “cien,” for example, but 101 is “ciento uno.”
In most languages, the words for the values from 11 to 19 are based on the words for the values from 1 to 9, but are not simple concatenations. In English, for example, 15 is “fifteen” and not “fiveteen.” This also happens for the words for the tens digits in most languages (twenty, instead of twoty, in English). In some languages, this also applies to other groups of words. In Spanish, for example, the tens and ones digits are usually joined by “y”; “thirty-one” is “treinta y uno.” But the values from 21 to 29 contract the phrase down into a single word; instead of “veinte y uno,” you say “veintiuno.” So these values have to be special-cased. Worse, it still isn't a simple concatenation. Sometimes, the ones digit acquires an accent mark it doesn't have when standing alone: 22, for example, is “veintidos” instead of “veintidos.” In Spanish and Greek, canned strings are also required for the hundreds place. In Spanish, for example, you combine the words for 2 through 9 with “cientos,” but word for the multiplier sometimes changes form in the contraction. 500, for example, is “quinientos,” not “cincocientos.” One might employ canned strings for the twenties and hundreds as well, even though most languages wouldn't need them. There are additional peculiarities in various languages. In German, the ones digit goes before the tens digit: 23 is “dreiundzwanzig.”
In French and German, the combination of tens and ones digit is different if the ones digit is 1 than if it's something else: in German, 21 is “einundzwanzig” instead of “einsundzwanzig.” In French, “et” goes before the ones digit only if it's 1; 21 is “vingt-et-un,” but 22 is “vingt-deux.” In Greek, the word for each tens digit has an accent mark that is eliminated when combined with a ones digit; 30 is “triánta,” but 31 is “triantaéna.” In Italian, when the tens digit ends with a vowel and the ones digit begins with a vowel, the tens digit loses its vowel: 50 is “cinquanta” and 52 is “cinquantadue,” but 51 is cinquantuno.”
Another area where permutations arise is in major groupings. For example, in American English and most European languages, large numbers are grouped by thousands (i.e., after a thousand, a new word is introduced every factor of 1,000). In British English, however, large numbers are grouped by million (a “billion” in British English is a “trillion” in American English; what we call a “billion” is called a “thousand million” in Britain). More importantly, in Japanese, large numbers are grouped by ten thousand, rather than by thousand.
French has a couple of peculiarities of its own: In European French, there are no words for 70, 80 or 90. The numbers from 70 up are rendered as “soixante-dix,” “soixante et onze,” “soixante-douze,” “soixante-treize,” and so on (literally, “sixty-ten,” “sixty and eleven,” “sixty-twelve, “sixty-thirteen,” etc.) 80 is rendered as “quatre vingts” (literally, “four twenties”), and the numbers proceed by score from there (i.e., 81 is “quatre-vingt-un” (“four-twen
Edouard Patrick N.
Kudirka & Jobse LLP
LandOfFree
Rule-based number parser does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Rule-based number parser, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Rule-based number parser will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2917189