System and method for identifying compounds through...

Data processing: speech signal processing – linguistics – language – Linguistics – Dictionary building – modification – or prioritization

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S007000, C704S009000

Reexamination Certificate

active

07555428

ABSTRACT:
A system and method for identifying compounds through iterative analysis of measure of association is disclosed. A limit on a number of tokens per compound is specified. Compounds within a text corpus are iteratively evaluated. A number of occurrences of one or more n-grams within the text corpus is determined. Each n-gram includes up to a maximum number of tokens, which are each provided in a vocabulary for the text corpus. At least one n-gram including a number of tokens equal to the limit based on the number of occurrences is identified. A measure of association between the tokens in the identified n-gram is determined. Each identified n-gram with a sufficient measure of association is added to the vocabulary as a compound token and the limit is adjusted.

REFERENCES:
patent: 5842217 (1998-11-01), Light
patent: 5867812 (1999-02-01), Sassano
patent: 6173298 (2001-01-01), Smadja
patent: 6285999 (2001-09-01), Page
patent: 6349282 (2002-02-01), Van Aelten et al.
patent: 6754617 (2004-06-01), Ejerhed
patent: 2007/0067157 (2007-03-01), Kaku et al.
patent: 08-161340 (1996-06-01), None
Su. K., Wu, M,. and Chang, J. 1994. A Corpus-based approach to automatic compound extraction. In Proceedings of the 32nd Annual Meeting on Association For Computational Linguistics (Las Cruces, New Mexico, Jun. 27-30, 1994). Annual Meeting of the ACL. Association for Computational Linguistics, Morristown, NJ, 242-247.
Venkataraman, A. 2001. A statistical model for word discovery in transcribed speech. Comput. Linguist. 27, 3 (Sep. 2001), 352-372.
Gao, J., Goodman, J., Li, M., and Lee, K. 2002. Toward a unified approach to statistical language modeling for chinese. ACM Transactions on Asian Language Information Processing (TALIP) 1, 1 (Mar. 2002), 3-33. DOI=http://doi.acm.org/10.1145/595576.595578.
Jurafsky, D., et al. (2000). Backoff. Speech and Language Processing.: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Pearson Hall Jerse, pp. 216-217.
Smadja, F. 1993. Retrieving collocations from text: Xtract. Comput. Linguist. 19, 1 (Mar. 1993), 143-177.
Frantzi, K. T. and Ananiadou, S. 1996. Extracting nested collocations. In Proceedings of the 16th Conference on Computational Linguistics—vol. 1 ( Copenhagen, Denmark, Aug. 5-9, 1996). International Conference On Computational Linguistics. Association for Computational Linguistics, Morristown, NJ 41-46. DOI=http://dx.doi.org/10.3115/9926.
Seretan V., Neriman, L. and Wehrli, E. 2003. Extraction of Multi-Word Collocations Using Syntactic Bigram Composition. In Proceedings of the International Conference on Recent Advances in NLP (RANLP-2003), Borovets, Bulgaria, pp. 131-138.
C.D. Manning and H. Schutze, “Foundations Of Statistical Natural Languages Processing,” Ch. 5, MIT Press (1999).
T. Dunning, “Accurate Methods For The Statistics Of Surprise And Coincidence,”Comp. Ling., vol. 19, No. 1, pp. 61-74 (1993).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for identifying compounds through... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for identifying compounds through..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for identifying compounds through... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4126306

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.