Automated translation of annotated text based on the...

Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S007000, C704S008000, C707S793000

Reexamination Certificate

active

06470306

ABSTRACT:

TECHNICAL FIELD
The invention relates to automated natural language translation in which a source document having annotations is translated automatically into another language while preserving the annotations in the translation. For example, an HTML document in English can be automatically translated into an equivalent Japanese language HFTML document to allow a World Wide Web page to be viewed in Japanese while preserving the formatting and hyperlinks present in the original English language version of the page.
BACKGROUND INFORMATION
Various schemes for the machine-based translation of natural language have been proposed. Typically, the system used for translation includes a computer which receives input in one language and performs operations on the received input to supply output in another language. This type of translation has been an inexact one, and the resulting output can require significant editing by a skilled operator. The translation operation performed by known systems generally includes a structural conversion operation. The objective of structural conversion is to transform a given parse tree (i.e., a syntactic structure tree) of the source language sentence to the corresponding tree in the target language. Two types of structural conversion have been tried, grammar-rule-based and template-to-template.
In grammar-rule-based structural conversion, the domain of structural conversion is limited to the domain of grammar rules that have been used to obtain the source-language parse tree (i.e., to a set of subnodes that are immediate daughters of a given node). For example, given
 VP=VT
01
+NP (a VerbPhrase consists of a SingleObject Transitive Verb and a NounPhrase, in that order)
and
Japanese: 1+2=>2+1 (Reverse the order of VT
01
and NP),
each source-language parse tree that involves application of the rule is structurally converted in such a way that the order of the verb and the object is reversed because the verb appears to the right of its object in Japanese. This method is very efficient in that it is easy to determine where the specified conversion applies; it applies exactly at the location where the rule has been used to obtain the source-language parse tree. On the other hand, it can be a weak conversion mechanism in that its domain, as specified above, may be extremely limited, and in that natural language may require conversion rules that straddle over nodes that are not siblings.
In template-to-template structural conversion, structural conversion is specified in terms of input/output (I/O) templates or subtrees. If a given input template matches a given structure tree, that portion of the structure tree that is matched by the template is changed as specified by the corresponding output template. This is a very powerful conversion mechanism, but it can be costly in that it can take a long period of time to find out if a given input template matches any portion of a given structure tree.
Conventional systems translate annotations in text, such as part-of-speech settings, i.e. <VERB>, <NOUN>, Hypertext Markup Language (HTML) and Standard Generalized Markup Language (SGML). Such systems however, often do a poor job of preserving in the translated version of the text, the original intent, meaning, and look of the annotations in the original document. In one such system, HTML and SGML markup is placed in a translated version of the text adjacent to the translated word that corresponds to the word in the original text to which it was adjacent. This manner of insertion often results in inaccuracies in the translated version of the text due to markup that does not properly apply to words in the translated text to which it is adjacent, or due to markup that should not have been carried through to the translated version of the text.
It is therefore an object of the present invention to provide a system and method for translating a source document in a first language to a target document in a second language while preserving the annotations that exist in the source document, and inserting the annotations in appropriate locations in the target document.
SUMMARY OF THE INVENTION
The automated natural language translation system according to the invention has many advantages over known machine-based translators. After the system of the invention automatically selects the best possible translation of the input textual information and provides the user with an output (preferably a Japanese language or Spanish language translation of English-language input text), the user can then interface with the system to edit the displayed translation or to obtain alternative translations in an automated fashion. An operator of the automated natural language translation system of the invention can be more productive because the system allows the operator to retain just the portion of the translation that he or she deems acceptable while causing the remaining portion to be retranslated automatically. Since this selective retranslation operation is precisely directed at portions that require retranslation, operators are saved the time and tedium of considering potentially large numbers of incorrect, but highly ranked translations. Furthermore, because the system allows for arbitrary granularity in translation adjustments, more of the final structure of the translation will usually have been generated by the system. The system thus reduces the potential for human (operator) error and saves time in edits that may involve structural, accord, and tense changes. The system efficiently gives operators the fill benefit of its extensive and reliable knowledge of grammar and spelling.
The automated natural language translations system's versatile handling of ambiguous sentence boundaries in the source language, and its powerful semantic propagation provide further accuracy and reduced operator editing of translations. Stored statistical information also improves the accuracy of translations by tailoring the preferred translation to the specific user site. The system's idiom handling method is advantageous in that it allows sentences that happen to include the sequence of words making up the idiom, without intending the meaning of the idiom, to be correctly translated. The system is efficient but still has versatile functions such as long distance feature matching. The system's structural balance expert and coordinate structure expert effectively distinguish between intended parses and unintended parses. A capitalization expert effectively obtains correct interpretations of capitalized words in sentences, and a capitalized sequence procedure effectively deals with multiple-word proper names, without completely ignoring common noun interpretations.
The present invention is directed to an improvement of the automated natural language translation system, wherein the improvement relates to translating input textual information having annotations and being in a source or first natural language, such as English, into output textual information with the annotations preserved and being in target or second natural language, such as Japanese or Spanish. The annotations in the source document can represent part-of-speech settings, Hypertext Markup Language (“HTML”) markup, Standard Generalized Markup Language (“SGML”) markup, Rich Text Format (“RTF”) markup and Nontypesetting Runoff (“NROFF”) markup. In the present invention, annotations can be removed prior to translation, stored in an annotations database and inserted by the system at appropriate locations in the translated version of the source text. The system of the present invention employs a novel process involving creating a token string which includes word tokens representing the text, annotation tokens representing the annotations and ending tokens representing sentence breaks and sentence endings in the source document. As the word tokens are transformed and the annotation tokens are processed or otherwise removed during translation, the ending tokens are the only tokens that

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Automated translation of annotated text based on the... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Automated translation of annotated text based on the..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automated translation of annotated text based on the... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2992446

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.