Data processing: speech signal processing – linguistics – language – Linguistics – Dictionary building – modification – or prioritization
Reexamination Certificate
2003-08-21
2009-06-30
Shah, Paras (Department: 2626)
Data processing: speech signal processing, linguistics, language
Linguistics
Dictionary building, modification, or prioritization
C704S007000, C704S009000
Reexamination Certificate
active
07555428
ABSTRACT:
A system and method for identifying compounds through iterative analysis of measure of association is disclosed. A limit on a number of tokens per compound is specified. Compounds within a text corpus are iteratively evaluated. A number of occurrences of one or more n-grams within the text corpus is determined. Each n-gram includes up to a maximum number of tokens, which are each provided in a vocabulary for the text corpus. At least one n-gram including a number of tokens equal to the limit based on the number of occurrences is identified. A measure of association between the tokens in the identified n-gram is determined. Each identified n-gram with a sufficient measure of association is added to the vocabulary as a compound token and the limit is adjusted.
REFERENCES:
patent: 5842217 (1998-11-01), Light
patent: 5867812 (1999-02-01), Sassano
patent: 6173298 (2001-01-01), Smadja
patent: 6285999 (2001-09-01), Page
patent: 6349282 (2002-02-01), Van Aelten et al.
patent: 6754617 (2004-06-01), Ejerhed
patent: 2007/0067157 (2007-03-01), Kaku et al.
patent: 08-161340 (1996-06-01), None
Su. K., Wu, M,. and Chang, J. 1994. A Corpus-based approach to automatic compound extraction. In Proceedings of the 32nd Annual Meeting on Association For Computational Linguistics (Las Cruces, New Mexico, Jun. 27-30, 1994). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 242-247.
Venkataraman, A. 2001. A statistical model for word discovery in transcribed speech. Comput. Linguist. 27, 3 (Sep. 2001), 352-372.
Gao, J., Goodman, J., Li, M., and Lee, K. 2002. Toward a unified approach to statistical language modeling for chinese. ACM Transactions on Asian Language Information Processing (TALIP) 1, 1 (Mar. 2002), 3-33. DOI=http://doi.acm.org/10.1145/595576.595578.
Jurafsky, D., et al. (2000). Backoff. Speech and Language Processing.: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson Hall Jerse, pp. 216-217.
Smadja, F. 1993. Retrieving collocations from text: Xtract. Comput. Linguist. 19, 1 (Mar. 1993), 143-177.
Frantzi, K. T. and Ananiadou, S. 1996. Extracting nested collocations. In Proceedings of the 16th Conference on Computational Linguistics—vol. 1 ( Copenhagen, Denmark, Aug. 5-9, 1996). International Conference On Computational Linguistics. Association for Computational Linguistics, Morristown, NJ 41-46. DOI=http://dx.doi.org/10.3115/9926.
Seretan V., Neriman, L. and Wehrli, E. 2003. Extraction of Multi-Word Collocations Using Syntactic Bigram Composition. In Proceedings of the International Conference on Recent Advances in NLP (RANLP-2003), Borovets, Bulgaria, pp. 131-138.
C.D. Manning and H. Schutze, “Foundations Of Statistical Natural Languages Processing,” Ch. 5, MIT Press (1999).
T. Dunning, “Accurate Methods For The Statistics Of Surprise And Coincidence,”Comp. Ling., vol. 19, No. 1, pp. 61-74 (1993).
Franz Alexander
Milch Brian
Fish & Richardson P.C.
Google Inc.
Shah Paras
LandOfFree
System and method for identifying compounds through... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for identifying compounds through..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for identifying compounds through... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4126306