System and method for disambiguating non diacritized arabic...

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S010000

Reexamination Certificate

active

08041559

ABSTRACT:
The present invention proposes a solution to the problem of word lexical disambiguation in Arabic texts. This solution is based on text domain-specific knowledge, which facilitates the automatic vowel restoration of modern standard Arabic scripts. Texts similar in their contents, restricted to a specific field or sharing a common knowledge can be grouped in a specific category or in a specific domain (examples of specific domains; sport, art, economic, science . . . ). The present invention discloses a method, system and computer program for lexically disambiguating non diacritized Arabic words in a text based on a learning approach that exploits; Arabic lexical look-up, and Arabic morphological analysis, to train the system on a corpus of diacritized Arabic text pertaining to a specific domain. Thereby, the contextual relationships of the words related to a specific domain are identified, based on the valid assumption that there is less lexical variability in the use of the words and their morphological variants within a domain compared to an unrestricted text.

REFERENCES:
patent: 5758322 (1998-05-01), Rongley
patent: 2002/0178394 (2002-11-01), Bamberger et al.
patent: 2004/0006456 (2004-01-01), Loofbourrow et al.
patent: 2005/0015237 (2005-01-01), Debili
Smith, Jennifer, Noun Faithfulness: On the priviledged behavior of nouns in phonology May 21, 1997, University of Massachussets, Amherst, pp. 1-27.
Yarowsky, D., A comparison of corpus-based techniques for restoring accents in Spanish and French text 1994, Proceedings of the Second Annual Workshp on Very Large Corpora, pp. 19-32.
Tufis, D., Automatic diacritics insertion in Romanian texts. 1999, Proceedings of the International Conference on Computational Lexicography, pp. 185-194.
Simard, M., Automatic insertion of accents in French text. 1998, Proceedings of the Third Conference on Empirical Methods in Natural Language Processing, pp. 27-35.
Mihalcea R., Diacritics restoration: Learning from letters versus learning from words. 2002, CICLing, pp. 339-348.
Mihalcea R., Letter level learning form language independent diacritics restoration. 2002, CICLing, pp. 105-111.
Yoon, Aesun, Building a Domain-Specific French-Korean Lexicon 2000, Pusa National University, pp. 465-474.
European Search Report for application No. EP 05 11 0694 dated May 12, 2006.
Debili, Fathi et al.; “Voyellation Automatique de l'arabe”; Computational Approaches to Semitic Languages—Proceedings of the Workshop; Association for Computational Linguistics Stroudsburg, PA, USA; Aug. 16, 1998; pp. 42-49.
Kirchhoff, K.; “Novel Speech Recognition Models for Arabic”; Sep. 30, 2003; Johns-Hopkins University Summer Research Workshop 2002, Final Report; pp. 1-109.
Todd, S.; “Abbreviated Typing for Word Processing”; Feb. 1979; IBM Technical Disclosure Bulletin, vol. 21, No. 9; pp. 3796-3797.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for disambiguating non diacritized arabic... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for disambiguating non diacritized arabic..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for disambiguating non diacritized arabic... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4277978

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.