Creating taxonomies and training data for document...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

07409404

ABSTRACT:
Methods, apparatus and systems to generate from a set of training documents a set of training data and a set of features for a taxonomy of categories. In this generated taxonomy the degree of feature overlap among categories is minimized in order to optimize use with a machine-based categorizer. However, the categories still make sense to a human because a human makes the decisions regarding category definitions. In an example embodiment, for each category, a plurality of training documents selected using Web search engines is generated, the documents winnowed to produce a more refined set of training documents, and a set of features highly differentiating for that category within a set of categories (a supercategory) extracted. This set of training documents or differentiating features is used as input to a categorizer, which determines for a plurality of test documents the plurality of categories to which they best belong.

REFERENCES:
patent: 6360227 (2002-03-01), Aggarwal et al.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Creating taxonomies and training data for document... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Creating taxonomies and training data for document..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Creating taxonomies and training data for document... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-4010332

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.