Method and apparatus for aligning bilingual corpora

Data processing: speech signal processing – linguistics – language – Linguistics – Translation machine

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S009000, C704S254000

Reexamination Certificate

active

07349839

ABSTRACT:
A method is provided for aligning sentences in a first corpus to sentences in a second corpus. The method includes applying a length-based alignment model to align sentence boundaries of a sentence in the first corpus with sentence boundaries of a sentence in the second corpus to form an aligned sentence pair. The aligned sentence pair is then used to train a translation model. Once trained, the translation model is used to align sentences in the first corpus to sentences in the second corpus. Under aspects of the invention, pruning is used to reduce the number of sentence boundary alignments considered by the length-based alignment model and by the translation model. In further aspects of the invention, the length-based model utilizes a Poisson distribution.

REFERENCES:
patent: 5109509 (1992-04-01), Katayama et al.
patent: 5768603 (1998-06-01), Brown et al.
patent: 6092034 (2000-07-01), McCarley et al.
patent: 6182026 (2001-01-01), Tillmann et al.
patent: 6304841 (2001-10-01), Berger et al.
patent: 6665642 (2003-12-01), Kanevsky et al.
patent: 2002/0065658 (2002-05-01), Kanevsky et al.
L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, No. 2, pp. 257-286 (Feb. 1989).
W. Gale et al., “A Program for Aligning Sentences in Bilingual Corpora,” Computational Linguistics, vol. 19, No. 1, pp. 75-102 (1993).
I. Dagan et al., “Termight: Coordinating Humans and Machines in Bilingual Terminology Acquisition,” Machine Translation, vol. 12, pp. 89-107 (1997).
S. Chen, “Aligning Sentences in Bilingual Corpora Using Lexical Information,” Proceedings of the 31stAnnual Meeting of the Association for Computational Linguistics, pp. 9-16 (1993).
P. Brown et al., “Aligning Sentences in Parallel Corpora,” Proceedings of the 29thAnnual Meeting of the Association for Computational Linguistics, pp. 169-176 (1991).
J. Goodman, “Global Thresholding and Multi-Pass Parsing,” Harvard University, 15 pages, undated.
P. Brown et al., “The Mathematics of Statistical Machine Translation: Parameter Estimation,” Computational Linguistics, vol. 19, No. 2, pp. 263-311 (1993).
M. Simard et al., “Bilingual Sentence Alignment: Balancing Robustness and Accuracy,” Machine Translations, vol. 13, pp. 59-80 (1998).
R. Moore, “Fast and Accurate Sentence Alignment of Bilingual Corpora,” 5thConference on the Association for Machine Translation in the Americas, Oct. 8, 2002, Tiburon, CA, pp. 135-144.
D. Wu, “Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria,” 32ndAnnual Meeting of the Association for Computational Linguistics, Jun. 1994, Las Cruces, NM, pp. 80-87.
L. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, IEEE, New York, US, vol. 77, No. 2, Feb. 1, 1989, pp. 257-285.
European Search Report for Application No. 03015479.3, filed Jul. 9, 2003.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for aligning bilingual corpora does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for aligning bilingual corpora, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for aligning bilingual corpora will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2805354

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.