Method and system for machine translation using epistemic...

Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000

Reexamination Certificate

active

06233546

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to machine translation and, more particularly, to a universal translation system applicable for transforming signals that embody knowledge according to a knowledge representation, such as a natural language.
BACKGROUND OF THE INVENTION
There is a long felt need for a reliable, high-quality language translation system. The increasing internationalization and globalization of the world's economies continues to bring many different people together who speak different languages for business. A significant cost and obstacle, however, continues to be the requirement to translate documents and spoken words from one language to another. In particular, it is difficult to find competent and affordable translators who are both fluent in the desired languages and can understand the subject matter as well. Researchers have been investigating for some time whether and how translation of natural and artificial languages can be automated.
Perhaps the single most difficult impediment to a high-quality automated language translation system is the sheer complexity of the world's human languages. Human languages are notoriously complex, especially in their vocabularies and grammars. Conventional attempts to perform machine translation, however, have not been able to manage this complexity very well.
According to one approach, such as that described in U.S. Pat. No. 4,706,212, software routines are hard-coded to translate sentences in a source language to sentences in a target language. In particular, the complexity of the grammar of the source and target languages is handled by various ad-hoc, hard-coded logic. For example, U.S. Pat. No. 4,706,212 discloses logic for recognizing some grammatical constructions in English as a source language and outputting a Russian construction. The logic devised for recognizing and translating these source grammatical constructions, however, is tightly coupled to a particular source language. As a practical matter, most of the subroutines coded to handle English source construction are utterly inapplicable for another language such as Chinese. Therefore, extending such conventional translation systems to handle a new source or target language requires a virtual re-implementation of the entire system. Furthermore, since the hard-coded logic is often quite complicated, it is difficult and expensive to debug and maintain, especially to improve the quality of the language translation.
Since handling grammatical rules by special purpose subroutines is difficult to debug, maintain, and extend, other conventional attempts have attempted to circumvent the above difficulties by utilizing complicated internal data structures to represent the text under translation. For example, U.S. Pat. No. 5,528,491 describes a system in which a graph of possible interpretations is produced according to grammar rules of a specific source language, such as English. In general, these data structures are quite complex with a variety of node types for different grammatical constructions, especially if such a system attempts to implement the principles of Noam Chomsky's transformational grammar. Since each language employs different grammatical constructions, the data structure for one language is often not usable for another language.
Another example of a complicated internal data structure is an interlingua, which is an artificial language devised for representing a superset of the source and target languages. Such an approach is described, for example, in U.S. Pat. No. 5,426,583. In order to be useful, the interlingua must be designed to include all the features of the source and target languages. Thus, if capability for a new language is to be added to an interlingual system, then the interlingua typically needs to be upgraded, requiring modification to the routines that translate to and from the interlingua. Other conventional approaches, such as U.S. Pat. No. 5,477,451, employ complex statistical or mathematical models to translate human text.
In general, conventional approaches at best manage the complexity of language in an ad-hoc instead of a systematic manner. As a result, it is difficult to extend such conventional systems to support a new language. Furthermore, such techniques are even more difficult to apply in mixed language situations, including, for example, computer programming languages embedded in a natural language context. Another drawback is that such systems are difficult to debug and therefore difficult to tweak to achieve high-quality translations.
SUMMARY OF THE INVENTION
There has long been a need for reliable, high quality, automated language translation. The necessity for a language translation system that is readily extensible to new human languages is apparent. There is also a need for a language translation system and methodology that are capable of handling mixed-language texts, especially those texts that also include an artificial language, such as a programming language.
These and other needs are addressed by the present invention, in which a source signal embodying knowledge is decomposed into a simple and regular internal representation. This internal representation is then transformed into another internal representation from which a target signal is constructed. Despite the simplicity of the internal representation, the complexity of language is appropriately localized within rules for decomposing the source signal into the internal representation and transforming the internal representation into another internal representation. In one embodiment, source signal decomposition and target signal constructions are facilitated by look ups into a universal dictionary. Advantageously, extending and improving such a language translation system involves updating the universal dictionary, the decomposition rules, and the mapping rules, thereby avoiding modification of hard-coded logic and internal data structures.
Accordingly, one aspect of the invention relates to a method and a computer-readable medium bearing instructions for translating a source signal embodying information according to a source language into a target signal embodying information according to target language. The methodology involves analyzing the source signal to produce a first internal representation of epistemic instances corresponding to the information embodied in the source signal. The epistemic instances are fundamental semantic structures expressing a transformation of two objects or objective grammatical forms. The first internal representation is transformed into a second internal representation of epistemic instances according to the target language, and the target signal is constructed based on the second internal representation.
In various embodiments, the source language and the target language may be any of a natural language, a computer language, formatting conventions, and mathematical expressions. The source signal and the target signal can be realized as digital signals representing text, acoustic signals representing speech, optical signals representing characters, and as any other analog or digital signal. The described methodology is also applicable to transforming a source signal embodying information according a knowledge representation relating to a knowledge discipline, for example, physics and engineering.
According to another aspect of the invention, a method and computer-readable medium bearing instructions for translating a source signal embodying information according to a source language into a target signal embodying information according to target language involve storing related dictionary entries in a computer-readable medium. Each related dictionary entry includes a source word form, a source grammatical form, a corresponding target word form, and a target grammatical form for the target word form. The grammatical form in some embodiments relates to a sub-grammatical form for specifying the morphology of a word form including grammatical inflection and auxiliaries. The source signal is

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for machine translation using epistemic... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for machine translation using epistemic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for machine translation using epistemic... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2560755

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.