Multilingual electronic transfer dictionary containing...

Data processing: speech signal processing – linguistics – language – Linguistics – Dictionary building – modification – or prioritization

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06490548

ABSTRACT:

COPYRIGHT
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the PTO patent file or records, but otherwise reserves all copyright rights whatsoever. Copyright © 1999 ISTA.
TECHNICAL FIELD
The present invention relates to multilingual electronic dictionaries that may be used for machine translation.
DEFINITIONS
“Multilingual” means pertaining to two or more languages.
Unless the context otherwise requires, the terms “subject”, “topic” and “field” are virtually synonymous in this disclosure, as are the terms “dictionary”, “glossary” and “lexicon.”
BACKGROUND ART
The existence of field-dependent translations of terms has long been a problem for both ordinary human translation and for machine translation. A term in a source language, for example, Japanese, may have more than one translation in a target language, for example, English, depending on the subject, topic or field of the document being translated. For example, the word “soshiki” in Japanese would be translated to the English “tissue” in a medical document, to the English “weave” in the case of textiles, or to the English “microstructure” in the case of metallurgy.
Conventional machine translation programs, for example, Systran®, contain topical dictionaries or glossaries. The user must manually select topical dictionaries appropriate for the document being translated. In this case, there is one dictionary per topic, for example, chemistry or medicine, rather than one topic per dictionary entry or record as in the current invention.
The machine translation program METAL contains three individual lexicons: a German monolingual lexicon, an English monolingual lexicon and a German-English bilingual lexicon (Katherine Koch, “Machine Translation and Terminology Database—Uneasy Bedfellows?” Lecture Notes in Artificial Intelligence 898, Machine Translation and the Lexicon, Petra Steffens, ed., Springer, Berlin, 1995, pp. 131-140.) Semantic information is disclosed only for the monolingual lexicons, not for the bilingual lexicon. Even for the monolingual lexicon, only 15 semantic types are disclosed, such as “abstract”, “concrete”, “human”, “animal” and “process.” These are quite different from the topical classifications that are the subject of the current invention.
Brigitte Blaser, in “TransLexis: An Integrated Environment for Lexicon and Terminology Management,” Lecture Notes in Artificial Intelligence 898, Machine Translation and the Lexicon, Petra Steffens, ed., Springer, Berlin, 1995, pp. 158-173, discloses the incorporation of concepts, including broader concepts, narrower concepts and related concepts in a lexicon database management system for machine translation. However, this disclosure does not extend to the incorporation of subject codes, notably hierarchical subject codes, in a multilingual electronic dictionary not does it disclose the use of concepts or other subject area information for automatic topic discrimination in machine translation. Notably, these concepts are not subject areas; rather, they constitute the interlingua for interlingua-based machine translation.
Masterson disclosed a means of automatic sense disambiguation for the machine translation of Latin to English in the article “The thesaurus in syntax and semantics,” Mechanical Translation, Vol. 4, pp. 1-2, 1957. As described by Wilks, Slator, and Guthrie in Electric Words: Dictionaries, Computers and Meanings, MIT Press, Cambridge, Mass., 1996, pp. 88-89, Masterson disclosed a nonstatistical method using the headings in Roget's Thesaurus.
In this predecessor to interlingua-based machine translation, Masterson disclosed a concept thesaurus for the words in a Latin passage from Virgil's Georgics. Each word stem from the Latin passage was associated with a set of head numbers from Roget's International Thesaurus by translating the word stems into English and selecting the head numbers for the corresponding English words. For example, the three Latin noun stems, “agricola”, “terram” and “aratro” have the following heads (where the head words are shown instead of the head numbers):
AGRICOLA: Region, Agriculture
TERRAM: Region, Land, Furrow
ARATRO: Agriculture, Furrow, Convolution
In the case of the text, “Agricola incurvo in terram dimovit aratro”, the heads that occur more than once are selected into a concept set. In the above example, this yields the following sets:
AGRICOLA: Region, Agriculture
TERRAM: Region, Furrow
ARATRO: Agriculture, Furrow
Finally, the English words listed under each head in Roget's Thesaurus are intersected to leave the appropriate translation candidates. In the current example, this yields the following sets:
AGRICOLA: farmer, ploughman
TERRAM: soil, ground
ARATRO: plough, ploughman, rustic
Masterson does not disclose a multilingual dictionary, nor does she disclose use of topical codes in a multilingual dictionary for disambiguation.
Kenneth W. Church, William A. Gale and David E. Yarowsky, in U.S. Pat. No. 5,541,836, also disclosed the use of the categories from Roget's Thesaurus in automatically disambiguating word/sense pairs and the use of bilingual bodies of text to train word/sense probability tables. Church et al do not disclose a multilingual dictionary nor do they disclose the use of topical codes in a multilingual dictionary for sense disambiguation.
JuneJei Kuo, in U.S. Pat. No. 5,285,386, “Machine Translation Apparatus Having Means for Translating Polysemous Words Using Dominated Codes”, discloses interlingua-based machine translation using semantic codes in the role of the interlingua. While Kuo discloses transfer dictionaries, these are not multilingual transfer dictionaries. Rather they are transfer dictionaries between the semantic codes, the interlingua in this case, and words in the target language.
The requirement of manually selecting a topical dictionary is a barrier to the automated translation of documents such as patent documents that cover many topical areas. Also, the semantic methods of the interlingua-based approaches do not provide for automatically determining the topic of the document being translated. There is a need for a means for automatically determining the most appropriate target definition depending on the topic of the document. Such a means is referred to as “automatic topic disambiguation” in the text below.
Elizabeth Liddy, Woojin Palk and Edmund Szi-li Wu, in U.S. Pat. No. 5,873,056, “Natural Language Processing System for Semantic Vector Representation Which Accounts for Lexical Ambiguity”, disclose a monolingual lexical database that contains nonhierarchical subject codes assigned to each word in the database. To avoid unnecessary reiteration of prior teachings, the disclosure of each reference cited herein is hereby incorporated by reference.
SUMMARY OF THE INVENTION
This invention provides a multilingual electronic dictionary comprising a memory that contains a data structure composed of a plurality of records, each record comprising representations of the following: a first term (in a first language), a second term (in a second language), and a topical code. The topical code indicates a topical area in which the second term is a translation of the first term.
Such an electronic dictionary allows for selecting topic-appropriate translations of terms in a textual object in a first language into a second language. This is accomplished by:
(a) providing an electronic dictionary containing records comprising representations of terms in the first language and the second language;
(b) scanning a textual object in the first language to identify each occurrence of a term in the textual object in a record of the electronic dictionary;
(c) inserting each topical code associated with each of the records identified in step (b) into a data structure that provides for counting of the frequency of occurrence of each topical code; and
(d) whenever there occur a plurality of terms in t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multilingual electronic transfer dictionary containing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multilingual electronic transfer dictionary containing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multilingual electronic transfer dictionary containing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2935945

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.