Method and platform for term extraction from large...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000

Reexamination Certificate

active

07395256

ABSTRACT:
A method and platform for statistically extracting terms from large sets of documents is described. An importance vector is determined for each document in the set of documents based on importance values for words in each document. A binary document classification tree is formed by clustering the documents into clusters of similar documents based on the importance vector for each document. An infrastructure is built for the set of documents by generalizing the binary document classification tree. The document clusters are determined by dividing the generalized tree of the infrastructure into two parts and cutting away the upper part. Statistically significant individual key words are extracted from the clusters of similar documents. Key words are treated as seeds and terms are extracted by starting from the seeds and extending to their left or right contexts.

REFERENCES:
patent: 5287278 (1994-02-01), Rau
patent: 5423032 (1995-06-01), Byrd et al.
patent: 5463773 (1995-10-01), Sakakibara et al.
patent: 5642518 (1997-06-01), Kiyama et al.
patent: 5799268 (1998-08-01), Boguraev
patent: 5926811 (1999-07-01), Miller et al.
patent: 6137911 (2000-10-01), Zhilyaev
patent: 6446061 (2002-09-01), Doerre et al.
patent: 2004/0117448 (2004-06-01), Newman et al.
patent: 1 304 627 (2003-04-01), None
International Preliminary Report on Patentability (PCT Article 36 and Rule 70) May 16, 2005.
T. Strzalkowski, “Natural Language Information Retrieval”,Information Processing and Management, vol. 31(3), pp. 397-417, 1995.
K. Church et al., “Word Association Norms, Mutual Information and Lexicography”,In proceedings of ACL, pp. 76-83, 1989.
T. Dunning, “Accurate Methods for the Statistics of Surprise and Coincidence”,Computational Linguistics, vol. 19(1), pp. 61-74, 1993.
L.F. Chien et al., “Internet-based Chinese Text Corpus Classification and Domain-Specific Keyterm Extraction”,Proceedings of Workshop on Computational Technology, pp. 71-75, 1998.
H. Schutze, “The Hypertext Concordance: A Better Back-of-the-Book Index”,Proceedings of Workshop on Computational Technology, pp. 101-104, 1998.
C. Jacquemin, “FASTR: A Unification-Based Front End to Automatic Indexing”,Proceedings of RIAO, pp. 34-47, 1994.
D. Bourigaut, “An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation”,Proceedings of EACL, pp. 187-213, 1993.
G. Grefenstette, “Explorations in Automatic Thesaurus Discovery”,Kluwer Academic Press, 1994, 35 page.
J. Pustejovsky, “Lexical Semantic Techniques for Corpus Analysis”,Association for Computational Linguistics, vol. 19(2), pp. 331-358, 1993.
K. Frantzi et al., “Automatic recognition of multi-word terms: the C-value/NC-valuemethod”,Journal of Digital Library, vol. 3, pp. 115-130, 2000.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and platform for term extraction from large... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and platform for term extraction from large..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and platform for term extraction from large... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2753533

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.