Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine
Reexamination Certificate
2005-04-26
2005-04-26
{hacek over (S)}mits, Tālivaldis Ivars (Department: 2654)
Data processing: speech signal processing, linguistics, language
Linguistics
Translation machine
C704S277000, C704S001000
Reexamination Certificate
active
06885985
ABSTRACT:
The invention relates to a method and apparatus for generating translations of natural language terms from a first language to a second language. A plurality of terms are extracted from unaligned comparable corpora of the first and second languages. Comparable corpora are sets of documents in different languages that come from the same domain and have similar genre and content. Unaligned documents are not translations of one another and are not linked in any other way. By accessing monolingual thesauri of the first and second languages, a category is assigned to each extracted term. Then, category-to-category translation probabilities are estimated, and using said category-to-category translation probabilities, term-to-term translation probabilities are estimated. The invention preferably exploits class-based normalization of probability estimates, bi-directionality, and relative frequency normalization. The most important applications are cross-language text retrieval, semi-automatic bilingual thesaurus enhancement, and machine-aided human translation.
REFERENCES:
patent: 5323310 (1994-06-01), Robinson
patent: 5418717 (1995-05-01), Su et al.
patent: 5477451 (1995-12-01), Brown et al.
patent: 5523946 (1996-06-01), Kaplan et al.
patent: 5680511 (1997-10-01), Baker et al.
patent: 5907821 (1999-05-01), Kaji et al.
patent: 5956711 (1999-09-01), Sullivan et al.
patent: 6041293 (2000-03-01), Shibata et al.
patent: 6047299 (2000-04-01), Kaijima
patent: 6061675 (2000-05-01), Wical
patent: 6064951 (2000-05-01), Park et al.
patent: 6092080 (2000-07-01), Gustman
patent: 6236958 (2001-05-01), Lange et al.
patent: 6330530 (2001-12-01), Horiguchi et al.
patent: 6349276 (2002-02-01), McCarley
Salton G., “Automatic translation of Foreign Language Documents”, Computational Linguistics, pp. 1-28, Sep. 1969.*
Brown, Peter F. et al., “A statistical approach to Machine Translation”, IBM, pp. 79-85, Jun. 1990.*
Sintichakis, Marios et al., “A Method for Monolingual Thesauri Merging”, pp. 129-138, SIGIR, Jul. 1997.*
Ker, Sue J. et al., “A class-based approach to alignment”, pp. 314-343, Computational Linguistics, vol. 23 Issue 2 , Jun. 1997.*
Kaji, Hiroyuki et al., “Extracting word correspondences from bilingual corpora based on word co-occurrences information”, pp. 23-28 Proceedings of the 16th conference on Computational linguistics—vol. 1, Aug. 1996.*
Tanaka, Kumiko et al., “Extraction of lexical translations from non-aligned corpora”, pp. 580-586 Proceedings of the 16thCOLING, Aug. 1996.*
M. Franz, J.S. McCarley, and S. Roukos. Ad hoc and Multilingual Information Retrieval at IBM. Proc. of the Seventh Text Retrieval Conference (TREC7), pp. 157-168, 1999.
Pascal Fung and Lo Yuen Yee. An IR Approach For Translating New Words From Nonparallel, Comparable Texts. In Proc. of Coling/ACL, pp. 414-420, 1998.
Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Chapter 5 Applications: pp. 101-135. Kluwer Academic Press, 1994.
David A. Hull. Automating The Construction Of Bilingual Terminology Lexicons. Terminology, 4(2):225-244, 1997.
Dekang Lin. Automatic Retrieval And Clustering Of Similiar Words. In Proc. of Coling/ACL, pp. 768-774, 1998.
P. McCullagh and J.A. Nelder. Generalized Linear Models, chapter 4, pp. 98-148. Chapman and Hall, 1989.
Carol Peters and Eugenio Picchi. Capturing The Comparable: A System For Querying Comparable Text Corpora. In Proc. of Analisi Statistica dei Dati Testuali (JADT), pp. 247-254, 1995.
Reinhard Rapp. Identifying Word Translation In Nonparallel Texts. In Proc. of the 35th ACL, student session, pp. 321-322, 1995.
B. Rehder, M.L. Littman, S. Dumais, and T.K. Landauer. Automatic 3-Language Cross Language Information Retrieval With Latent Semantic Indexing. In Proc. of the Sixth Text Retrieval Conference (TREC6), pp. 233-239, 1998.
Hinrich Schutze. Dimensions Of Meaning. In Proc. of Supercomputing, pp. 787-796, 1992.
Paraic Sheridan and Jean Paul Ballerini. Experiments In Multilingual Information Retrieval Using The SPIDER System. In Proc. of the 19th ACM/SIGIR Conference, pp. 58-65, 1996.
Xerox Corporation
{hacek over (S)}mits Tälivaldis Ivars
LandOfFree
Terminology translation for unaligned comparable corpora... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Terminology translation for unaligned comparable corpora..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Terminology translation for unaligned comparable corpora... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3368459