Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2008-07-01
2008-07-01
Mofiz, Apu (Department: 2161)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
10465567
ABSTRACT:
A method and platform for statistically extracting terms from large sets of documents is described. An importance vector is determined for each document in the set of documents based on importance values for words in each document. A binary document classification tree is formed by clustering the documents into clusters of similar documents based on the importance vector for each document. An infrastructure is built for the set of documents by generalizing the binary document classification tree. The document clusters are determined by dividing the generalized tree of the infrastructure into two parts and cutting away the upper part. Statistically significant individual key words are extracted from the clusters of similar documents. Key words are treated as seeds and terms are extracted by starting from the seeds and extending to their left or right contexts.
REFERENCES:
patent: 5287278 (1994-02-01), Rau
patent: 5423032 (1995-06-01), Byrd et al.
patent: 5463773 (1995-10-01), Sakakibara et al.
patent: 5642518 (1997-06-01), Kiyama et al.
patent: 5799268 (1998-08-01), Boguraev
patent: 5926811 (1999-07-01), Miller et al.
patent: 6137911 (2000-10-01), Zhilyaev
patent: 6446061 (2002-09-01), Doerre et al.
patent: 2004/0117448 (2004-06-01), Newman et al.
patent: 1 304 627 (2003-04-01), None
International Preliminary Report on Patentability (PCT Article 36 and Rule 70) May 16, 2005.
T. Strzalkowski, “Natural Language Information Retrieval”,Information Processing and Management, vol. 31(3), pp. 397-417, 1995.
K. Church et al., “Word Association Norms, Mutual Information and Lexicography”,In proceedings of ACL, pp. 76-83, 1989.
T. Dunning, “Accurate Methods for the Statistics of Surprise and Coincidence”,Computational Linguistics, vol. 19(1), pp. 61-74, 1993.
L.F. Chien et al., “Internet-based Chinese Text Corpus Classification and Domain-Specific Keyterm Extraction”,Proceedings of Workshop on Computational Technology, pp. 71-75, 1998.
H. Schutze, “The Hypertext Concordance: A Better Back-of-the-Book Index”,Proceedings of Workshop on Computational Technology, pp. 101-104, 1998.
C. Jacquemin, “FASTR: A Unification-Based Front End to Automatic Indexing”,Proceedings of RIAO, pp. 34-47, 1994.
D. Bourigaut, “An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation”,Proceedings of EACL, pp. 187-213, 1993.
G. Grefenstette, “Explorations in Automatic Thesaurus Discovery”,Kluwer Academic Press, 1994, 35 page.
J. Pustejovsky, “Lexical Semantic Techniques for Corpus Analysis”,Association for Computational Linguistics, vol. 19(2), pp. 331-358, 1993.
K. Frantzi et al., “Automatic recognition of multi-word terms: the C-value/NC-valuemethod”,Journal of Digital Library, vol. 3, pp. 115-130, 2000.
Ji Donghong
Nie Yu
Yang Lingpeng
Agency for Science Technology and Research
Mofiz Apu
Padmanabhan Kavita
Sughrue & Mion, PLLC
LandOfFree
Method and platform for term extraction from large... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and platform for term extraction from large..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and platform for term extraction from large... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3914518