Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine
Reexamination Certificate
1998-09-24
2001-09-04
Thomas, Joseph (Department: 2644)
Data processing: speech signal processing, linguistics, language
Linguistics
Translation machine
C704S002000, C704S006000
Reexamination Certificate
active
06285978
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to the field of automatic natural language translation. More specifically, the invention relates to a method and system for automatically estimating the accuracy of translations by an automatic translation system.
BACKGROUND OF THE INVENTION
Perfect and automatic translation between two natural languages, i.e. a source natural language and a target natural language, by a computer is highly desirable in today's global community and is the goal of many computational systems. Here natural language can be any language that is written (textual) or spoken by humans.
One of the main methods for producing automatic translation is the transfer-based method of Machine Translation. A transfer-based MT system typically takes a source text (the text in the original natural language, e.g. English), segments it into natural language segments (e.g. sentences or phrases) which we abbreviate as “segments”, and performs source analysis, transfer, and target generation to arrive at the target text (the translated text).
Source analysis can be performed in any one or more well-known ways. Typically, source analysis is dependent on a syntactic theory of the structure of natural language. For example, in rule-based grammars there are rules for the natural language structure, and they are used by the source analysis to parse the given natural language text or input into one or more parse structures. For example, in the rule-based grammar system Slot Grammar, there are rules for filling and ordering so-called slots; slots are grammatical relations, e.g. subject, direct object, and indirect object. A further explanation of source analysis is given in McCord, M. C. “Slot Grammars,”
Computational Linguistics,
vol. 6, pp. 31-43, 1980 and McCord, M. C. “Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars,” in R. Studer (Ed.), Natural Language and Logic: International Scientific Symposium, Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 118-145, 1990, which are herein incorporated by reference in their entirety.
The source analysis produces a parse structure that is a formal representation of one of the source segments. The parse structure includes elements like word senses (e.g. choice between homonyms), morphological features (such as parts of speech), surface syntactic structure, and deep syntactic structure, and relates these elements to one another according to the rules of the grammar (e.g. syntactic and semantic relationships) used to parse the given natural language input. Parse structures such as those of Slot Grammar may also include information on such things as punctuation (e.g. occurrences of commas and periods), and formatting tags (e.g. SGML tags).
The transfer step typically transfers the source elements from the source natural language to target elements in the target natural language, producing an initial transfer structure. The transfer step then iteratively performs structural transformations, starting with the initial transfer structure, until the desired syntactic structure for the target language is obtained, thus producing the target structure. A further explanation of transfer is given in M. C. McCord, “Design of LMT: A Prolog-based Machine Translation System”,
Computational Linguistics,
vol. 15, pp. 33-52, which is herein incorporated by reference in its entirety.
The target generation step typically inflects each word sense in the target structure, taking into account the inflectional features marked on each word, and then outputs the resulting structure as a natural language sentence in the target language. A further explanation of target generation is given in M. C. McCord and S. Wolff, “The Lexicon and Morphology for LMT, a Prolog-based MT system,” IBM Research Report RC 13403, 1988, and G. Arrarte, I. Zapata, and M. C. McCord, “Spanish Generation Morphology for an English-Spanish Machine Translation System,” IBM Research Report RC 17058, 1991, which are herein incorporated by reference in their entirety.
LMT is an example of a transfer-based MT (machine translation) system, and it uses steps like those outlined above to translate a natural language text. The McCord reference (“Prolog-based Machine Translation”) gives an overview of these steps for translating a sentence from English to German.
In the preceding reference, the example sentence is: The woman gives a book to the man. The source parse structure shows how the various parts of the sentence fit together: The head of the sentence is the verb gives, which has the morphological features third person, singular, present, and indicative. The verb gives has three slots, subject, which is filled by the word sense woman, object, which is filled by the word sense book, and prepositional object, which is filled by the word sense man.
Next, the initial transfer structure shows the structure right after lexical transfer. Each word sense in the source parse structure has been transferred to the corresponding German word sense, e.g. the English woman has been transferred to German frau. In addition, the correct transfer features have been marked on each word, e.g. the subject is marked nominative, and the object is marked accusative. The order of the words in the initial transfer structure is the same as in the source parse structure.
Then a transformation applies to the initial transfer structure to produce the target language structure that represents the correct word order for German. The transformation moves the indirect object noun phrase the man from its position after the object, the book, to a position before the object, thus producing a target language structure with word order like that in The woman gives the man a book.
Finally, each word sense in the tree is inflected as required by its features, and the result of the translation output as a string with appropriate capitalization and punctuation: Die Frau gibt dem Mann ein Buch.
A further explanation of LMT is given in M. C. McCord, “LMT”,
Proceedings of MT Summit II,
pp. 94-99, Deutsche Gesellschaft für Dokumentation, Frankfurt, and in H. Lehmann (1995), “Machine Translation for Home and Business Users”,
Proceedings of MT Summit V,
Luxembourg, July 10-13, which are herein incorporated by reference in their entirety.
STATEMENT OF PROBLEMS WITH THE PRIOR ART
Natural languages are very complex, and this poses great challenges for any MT system. No MT system today is able to produce perfect translation of arbitrary text. For any given system, translations range from almost perfect to unintelligible, and the user is not given any indication of how good the translation may be.
Bad translations cause a high degree of frustration for the user, because the prior art fails to effectively measure the accuracy of the given translation. If the user could know that the translation was likely to be bad, the user would have the choice not to look at it.
The Logos Translatability Index (TI) assigns a measure of the translatability of a complete document by the LOGOS system. The Logos Translatability Index was not expected to “provide sentence-specific information with any degree of reliability. The TI applies to the corpus or document as a whole but is not useful in pinpointing problem sentences.” See C. Gdaniec: “The Logos Translatability Index”,
Proc. First Conference of the Association for Machine Translation in the Americas,
pp. 97-105, AMTA, 1994, which is herein incorporated by reference in its entirety.
Any step in the translation process may introduce wrong data that will result in bad translation quality, and it is a weakness of existing translation systems that processing continues past the point where such wrong data is introduced.
In order to guarantee high quality of the translation, some systems, e.g. The Integrated Authoring and Translation System (U.S. Pat. No. 5,677,835) which is herein incorporated by reference in its entirety, require that the source text be constrained severely. Not only does this place a considerable burden on the author, but it also mean
Bernth Arendse
Gdaniec Claudia Maria
McCord Michael Campbell
Medeiros Sue Ann
International Business Machines - Corporation
Percello Louis J.
Thomas Joseph
LandOfFree
System and method for estimating accuracy of an automatic... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for estimating accuracy of an automatic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for estimating accuracy of an automatic... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2545968