Lexical association metric for knowledge-free extraction of...

Data processing: speech signal processing – linguistics – language – Linguistics – Natural language

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S001000, C704S010000

Reexamination Certificate

active

07739103

ABSTRACT:
A method and system for determining a lexical association of phrasal terms are described. A corpus having a plurality of words is received, and a plurality of contexts including one or more context words proximate to a word in the corpus is determined. An occurrence count for each context is determined, and a global rank is assigned based on the occurrence count. Similarly, a number of occurrences of a word being used in a context is determined, and a local rank is assigned to the word-context pair based on the number of occurrences. A rank ratio is then determined for each word-context pair. A rank ratio is equal to the global rank divided by the local rank for a word-context pair. A mutual rank ratio is determined by multiplying the rank ratios corresponding to a phrase. The mutual rank ratio is used to identify phrasal terms in the corpus.

REFERENCES:
patent: 5355311 (1994-10-01), Horioka
patent: 5406480 (1995-04-01), Kanno
patent: 5423032 (1995-06-01), Byrd et al.
patent: 5675819 (1997-10-01), Schuetze
patent: 5867812 (1999-02-01), Sassano
patent: 6081775 (2000-06-01), Dolan
patent: 6101492 (2000-08-01), Jacquemin et al.
patent: 6697793 (2004-02-01), McGreevy
patent: 6859771 (2005-02-01), Xun et al.
patent: 6925433 (2005-08-01), Stensmo
patent: 7031910 (2006-04-01), Eisele
patent: 7197449 (2007-03-01), Hu et al.
patent: 2003/0065501 (2003-04-01), Hamdan
patent: 2003/0083863 (2003-05-01), Ringger et al.
patent: 2003/0236659 (2003-12-01), Castellanos
patent: 2004/0253569 (2004-12-01), Deane
patent: 2005/0049867 (2005-03-01), Deane
Ha, Le Quan et al., “Extension of Zipf's Law to Words and Phrases,” 2002, Procs. of the 19th International Conference on Computational Linguistics.
Kit, Chunyu, “Corpus Toos for Retrieving and Deriving Termhood Evidence,” 2002, 5th East Asia Conferences on Terminology.
“Rules of Probability” 2001, http://web.archive.org/web/20001003132730/http://library.thinkquest.org/11506/prules.html.
Smadja, Retrieving Collocations from Text: Extract, 1993, Computational Linguistics, 19:143-177.
Dagan et al., Termight: Identifying and Translating Technical Terminology, ACM International Conference Proceeding Series: Proceedings of the fourth conference on Applied Natural Language Processing, pp. 34-40, (1994).
Justeson et al., Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text, 1995, Natural Language Engineering, 1(1):9-27.
Daille, Study and Implementation of Combined Techniques for Automatic Extraction of Terminology, in The Balancing Act: Combining Symbolic and Statistical Approaches to Language, J. Klavans & P. Resnik, eds., pp. 49-66, (1996).
Daille, Study and Implementation of Combined Techniques for Automatic Extraction of Terminology, pp. 29-36, Talana, University Paris.
Jacquemin et al., Expansion of Multi-Word Terms for Indexing and Retrieval using Morphology and Syntax, Proceedings of ACL, 1997, pp. 24-31.
Jacquemin et al., NLP for Term Variant Extraction: Synergy between Morphology, Lexicon, and Syntax, 1999, Natural Language Information Retrieval, pp. 25-74.
Bougarev et al., Applications of Term Identification Technology: Domain Description and Content Characterisation, 1999, Natural Language Engineering 5(1): 17-44.
Frantzi et al., Automatic Recognition of Multi-Word Terms; the C-Value and NC-Value Method, 2000, International Journal on Digital Libraries 3(2): 115-130.
Maynard et al., Identifying Terms by Their Family and Friends, COLING 2000, pp. 530-536, (2000).
Church et al., Word Association Norms, Mutual Information, and Lexicography, 1990, Computational Linguistics, 16(1): 22-29.
Dunning, Accurate Methods for the Statistics of Surprise and Coincidence, 1993, Computational Linguistics, 19(1): 61-74.
Zipf, The Psycho-Biology of Language: an Introduction to Dynamic Philology, Houghton-Mifflin, Boston, Massachusetts, 1935.
Zipf, Human Behavior and the Principle of Least Effort, Addison-Wesley, Cambridge, Massachusetts, 1949.
Deane, Paul; A Nonparametric Method for Extraction of Candidate Phrasal Terms; Proc. of the 43rd Annual Meeting of the ACL; pp. 605-613; Jun. 2005.
Lewis, David, Jones, Karen Sparck; Natural Language Processing for Information Retrieval; Communications of the ACM; vol. 39, No. 1; pp. 92-101; Jan. 1996.
Lewis, David, Jones, KarenSparck; Natural Language Processing for Information Retrieval; Communications of the ACM; Jul. 1993.
Baayen, R. H., Word Frequency Distributions, Kluwer: Dordrecht, 2001.
Choueka, Y., Looking for needles in a haystack or locating interesting collocation expressions in large textual databases, Proceedings of the RIAO, 1988, pp. 38-43.
Dias, G., S. Guilloré, and J.G. Pereira Lopes, Language independent automatic acquisition of rigid multiword units from unrestricted text corpora, TALN, 1999, p. 333-338.
Evert, S., The Statistics of Word Cooccurrences: Word Pairs and Collocations, Phd Thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart, 2004.
Evert, S. and B. Krenn, Methods for the Qualitative Evaluation of Lexical Association Measures, Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, 2001, pp. 188-195.
Ferreira Da Silva, J. and G. Pereira Lopes, A local maxima method and a fair dispersion normalization for extracting multiword units from corpora, Sixth Meeting on Mathematics of Language, 1999, pp. 369-381.
Gil, A. and G. Dias, Efficient Mining of Textual Associations, International Conference on Natural Language Processing and Knowledge Engineering, Chengqing Zong (eds.), 2003a, pp. 26-29.
Gil, A. and G. Dias, Using masks, suffix array-based data structures, and mutildimensional arrays to compute positional n-gram statistics from corpora, In Proceedings of the Workshop on Multiword Expressions of the 41st Annual Meeting of the Association of Computational Linguistics, 2003b, pages 25-33.
Johansson, C., Catching the Cheshire Cat, in Proceedings of COLING 94, vol. II, 1994b, pp. 1021-1025.
Johansson, C., Good Bigrams, in Proceedings from the 16th International Conference on Computational Linguistics (COLING-96), 1996, pp. 592-597.
Krenn, B., Acquisition of Phraseological Units from Linguistically Interpreted Corpora, A Case Study on German PP-Verb Collocations, Proceedings of ISP-98, 1998, pp. 359-371.
Krenn, B., Empirical Implications on Lexical Association Measures, Proceedings of the Ninth EURALEX International Congress, 2000.
Krenn, B. and S. Evert, Can we do better than frequency? A case study on extracting PP-verb collocations, Proceedings of the ACL Workshop on Collocations, 2001, pp. 39-46.
Lin, D., Extracting Collocations from Text Corpora, First Workshop on Computational Terminology, 1998, pp. 57-63.
Lin, D., Automatic Identification of Non-computational Phrases, In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 1999, pp. 317-324.
Manning, C.D. and H. Schütze, Foundations of Statistical Natural Language Processing, 1999, MIT Press, Cambridge, MA, USA.
Pantel, P. and D. Lin, A Statistical Corpus-Based Term Extractor, In: Stroulia, E. and Matwin, S. (eds.), AI 2001, Lecture Notes in Artificial Intelligence, 2001, pp. 36-46, Springer-Verlag.
Resnik, P., Selectional constraints: an information-theoretic model and its computational realization, 1996, Cognition 61: 127-159.
Schone, P. and D. Jurafsky, Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? Proceedings of Empirical Methods in Natural Language Processing, 2001, pp. 100-108.
Sekine, S., J. J. Carroll, S. Ananiadou, and J. Tsujii, Automatic Learning for Semantic Collocation, Proceedings of the 3rd Conference on Applied Natural Language Processing, 1992, pp. 104-110.
Shimohata, S., T. Sugio, and J. Nagata, Retrieving collocations by co-occurrences and word order constraints, Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 1997, pp. 476-481.
Thanapoulos, A., N. Fakotakis and G. Kokkinkais, Comparative Evaluation of Collocation Extraction Metrics, Proceedings of the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Lexical association metric for knowledge-free extraction of... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Lexical association metric for knowledge-free extraction of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Lexical association metric for knowledge-free extraction of... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4249395

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.