Terminology translation for unaligned comparable corpora...

Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S277000, C704S001000

Reexamination Certificate

active

06885985

ABSTRACT:
The invention relates to a method and apparatus for generating translations of natural language terms from a first language to a second language. A plurality of terms are extracted from unaligned comparable corpora of the first and second languages. Comparable corpora are sets of documents in different languages that come from the same domain and have similar genre and content. Unaligned documents are not translations of one another and are not linked in any other way. By accessing monolingual thesauri of the first and second languages, a category is assigned to each extracted term. Then, category-to-category translation probabilities are estimated, and using said category-to-category translation probabilities, term-to-term translation probabilities are estimated. The invention preferably exploits class-based normalization of probability estimates, bi-directionality, and relative frequency normalization. The most important applications are cross-language text retrieval, semi-automatic bilingual thesaurus enhancement, and machine-aided human translation.

REFERENCES:
patent: 5323310 (1994-06-01), Robinson
patent: 5418717 (1995-05-01), Su et al.
patent: 5477451 (1995-12-01), Brown et al.
patent: 5523946 (1996-06-01), Kaplan et al.
patent: 5680511 (1997-10-01), Baker et al.
patent: 5907821 (1999-05-01), Kaji et al.
patent: 5956711 (1999-09-01), Sullivan et al.
patent: 6041293 (2000-03-01), Shibata et al.
patent: 6047299 (2000-04-01), Kaijima
patent: 6061675 (2000-05-01), Wical
patent: 6064951 (2000-05-01), Park et al.
patent: 6092080 (2000-07-01), Gustman
patent: 6236958 (2001-05-01), Lange et al.
patent: 6330530 (2001-12-01), Horiguchi et al.
patent: 6349276 (2002-02-01), McCarley
Salton G., “Automatic translation of Foreign Language Documents”, Computational Linguistics, pp. 1-28, Sep. 1969.*
Brown, Peter F. et al., “A statistical approach to Machine Translation”, IBM, pp. 79-85, Jun. 1990.*
Sintichakis, Marios et al., “A Method for Monolingual Thesauri Merging”, pp. 129-138, SIGIR, Jul. 1997.*
Ker, Sue J. et al., “A class-based approach to alignment”, pp. 314-343, Computational Linguistics, vol. 23 Issue 2 , Jun. 1997.*
Kaji, Hiroyuki et al., “Extracting word correspondences from bilingual corpora based on word co-occurrences information”, pp. 23-28 Proceedings of the 16th conference on Computational linguistics—vol. 1, Aug. 1996.*
Tanaka, Kumiko et al., “Extraction of lexical translations from non-aligned corpora”, pp. 580-586 Proceedings of the 16thCOLING, Aug. 1996.*
M. Franz, J.S. McCarley, and S. Roukos. Ad hoc and Multilingual Information Retrieval at IBM. Proc. of the Seventh Text Retrieval Conference (TREC7), pp. 157-168, 1999.
Pascal Fung and Lo Yuen Yee. An IR Approach For Translating New Words From Nonparallel, Comparable Texts. In Proc. of Coling/ACL, pp. 414-420, 1998.
Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Chapter 5 Applications: pp. 101-135. Kluwer Academic Press, 1994.
David A. Hull. Automating The Construction Of Bilingual Terminology Lexicons. Terminology, 4(2):225-244, 1997.
Dekang Lin. Automatic Retrieval And Clustering Of Similiar Words. In Proc. of Coling/ACL, pp. 768-774, 1998.
P. McCullagh and J.A. Nelder. Generalized Linear Models, chapter 4, pp. 98-148. Chapman and Hall, 1989.
Carol Peters and Eugenio Picchi. Capturing The Comparable: A System For Querying Comparable Text Corpora. In Proc. of Analisi Statistica dei Dati Testuali (JADT), pp. 247-254, 1995.
Reinhard Rapp. Identifying Word Translation In Nonparallel Texts. In Proc. of the 35th ACL, student session, pp. 321-322, 1995.
B. Rehder, M.L. Littman, S. Dumais, and T.K. Landauer. Automatic 3-Language Cross Language Information Retrieval With Latent Semantic Indexing. In Proc. of the Sixth Text Retrieval Conference (TREC6), pp. 233-239, 1998.
Hinrich Schutze. Dimensions Of Meaning. In Proc. of Supercomputing, pp. 787-796, 1992.
Paraic Sheridan and Jean Paul Ballerini. Experiments In Multilingual Information Retrieval Using The SPIDER System. In Proc. of the 19th ACM/SIGIR Conference, pp. 58-65, 1996.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Terminology translation for unaligned comparable corpora... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Terminology translation for unaligned comparable corpora..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Terminology translation for unaligned comparable corpora... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3368459

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.