Using ranked translation choices to obtain sequences...

Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S004000

Reexamination Certificate

active

06393389

ABSTRACT:

FIELD OF THE INVENTION
The invention relates to techniques that provide information about expressions in one natural language, with the information being understandable in another natural language.
BACKGROUND
Various techniques have been proposed for providing information about expressions in a first natural language, with the information being understandable in a second natural language. For people who understand the second language but not the first, such information can produce an understanding of expressions in the first language. Some such techniques attempt to perform automatic translation, while other techniques provide machine aids for translation. Yet other techniques provide information without attempting translation.
Bauer, D., Segond, F., and Zaenen, A., “LOCOLEX: the translation rolls off your tongue”, in ACH-ALLC '95 Conference Abstracts, Santa Barbara, Calif., Jul. 11-15, 1995, pp. 6-9, describe LOCOLEX, an intelligent reading aid that provides bilingual dictionary lookup through the interaction between a complete on-line dictionary and an on-line text. The user can click on a word in a sentence, and LOCOLEX uses the word's context to look for multiword expressions (MWES) that include the word, to choose between parts of speech for the word based on parts of speech of neighboring words, and to exclude irrelevant information from the dictionary in order to focus the user's attention on the best translation for better comprehension. The user can ask for more information about one meaning by clicking on it to get a usage example, one of the types of dictionary information that is not initially displayed. The lookup process after the user selects a word includes tokenization, normalization of each word to a standard form, morphological analysis, part of speech disambiguation, dictionary lookup, identification of MWEs, and elimination of irrelevant parts of the dictionary for display. To improve recognition of MWEs, they are encoded as regular expressions in a two-level rule formalism, and the rules are inserted into the relevant dictionary entries in place of the existing static text that are the normal forms of the MWEs. LOCOLEX could also make better use of the usage labels and indicators in the dictionary to filter out semantically inappropriate meanings of a word in a given context, such as by interactively asking the user to choose topics and then displaying translations associated with the chosen topics.
A prototype LOCOLEX system provides a user interface in which a user could click on a word about which the user desires information. In response, the system operates as described above to obtain and display a part of a dictionary entry relating to the word's part of speech or to an MWE that includes the word. If the user feels the presented information is inappropriate, the user can click on the word again, and the system responds by displaying the complete dictionary entry.
U.S. Pat. No. 5,642,522 describes a context-sensitive technique for finding information about a word in an electronic dictionary. The technique maps the selected word from its inflected form to its citation form, analyzes the selected word in the context of neighboring and surrounding words to resolve ambiguities, and displays the information that is determined to be the most likely to be relevant. The user can request additional information, in which case either the next most relevant information or all information about the selected word is provided. The dictionary can include information about multi-word combinations that include the selected word, and content determination can include checking whether the selected word is part of a predefined multi-word combination. The technique could be used with a thesaurus or a dictionary, including a dictionary used for translation.
SUMMARY OF THE INVENTION
The invention addresses problems that arise with conventional techniques for providing information about expressions in one language, where the information is understandable in another language. The problems relate to complexity of sentences and other expressions whose meanings can depend on relationships, not only between individual tokens such as words, but also between groups of tokens. For example, the meaning of a sentence may depend on the relationship between a noun phrase and a verb or predicate phrase. Sentences and other expressions whose meaning depends on relationships between two or more groups of tokens are referred to herein as “multi-token expressions”.
Because of the various meanings an expression may have, a sentence that includes more than a few component expressions will have a very large number of possible translations, making it computationally difficult to consider all possible translations of the sentence. On the other hand, most sentences are neither identical nor nearly identical to a previously translated sentence, making it impractical in most situations to reuse previous translations of sentences. Conventional techniques thus tend to fall into three groups:
The first group includes automatic translation techniques, which attempt to translate sentences despite the computational difficulties of considering all possible translations. The techniques in the first group frequently make serious errors in translation, especially when applied to sentences that include expressions with multiple possible translations. The techniques in the first group typically discard information about ambiguities in order to obtain a translated result, making it impossible to recover from an error.
The second group includes machine aided translation tools and other techniques that find sentences that match or nearly match previously translated sentences, then retrieve the previous translation. Techniques in the second group are especially useful in the cases in which identical or nearly identical sentences occur frequently, but do not provide assistance the first time a new sentence occurs. Therefore, techniques in the second group are only useful in restricted contexts in which the chances of finding repetitions of similar sentences are reasonably high.
The third group of techniques does not attempt to automatically translate complete sentences. An example of this is LOCOLEX, described above, which displays a dictionary entry with one or more definitions of a selected expression, a technique analogous to an ordinary translating dictionary. Techniques in the third group can be quite helpful in aiding comprehension, but they have severe limitations. In LOCOLEX, for example, the user must have some knowledge of the first language in order to make appropriate selections In some cases, a dictionary entry does not provide sufficient information to allow the user to determine the meaning of a word or multi-word expression (MWE) in a specific context, much less the meaning of a sentence that includes the word or MWE.
The invention alleviates problems resulting from complexity of multi-token expressions, referred to herein as “complexity problems”. The invention does so based on the surprising discovery that the meaning of a multi-token expression in a first language is often indicated by appropriately chosen translations of subexpressions into a second language, without attempting a complete, accurate translation of the multi-token expression.
The invention alleviates the complexity problem by providing techniques that rank a subexpression's translation choices. The techniques use the ranked translation choices to produce a sequence of translation choices, to indicate in the second language the meaning of the multi-token expression.
The techniques also make it possible to consider a subset of possible translations of a multi-token expression, the subset that includes the translation choices that are likely for each subexpression. In other words, information can be automatically or interactively presented indicating more than one sequence of translation choices for the multi-token expression, allowing the user to find one which seems most likely to indicate the meaning of t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Using ranked translation choices to obtain sequences... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Using ranked translation choices to obtain sequences..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Using ranked translation choices to obtain sequences... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2912440

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.