System and method for generating taxonomies with...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C709S219000, C725S116000, C345S428000

Reexamination Certificate

active

06360227

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to generating graph taxonomies and to making content-based recommendations. In particular, related information is classified using a directed acyclic graph. Furthermore, the present invention relates to an automated system and method for generating a graph taxonomy and for recommending to a user a group of documents in a subject area which is related to a document given by the user.
BACKGROUND OF THE INVENTION
The increased capability to store vast amounts of information has led to a need for efficient techniques for searching and retrieving of information. For example, much information may be found in various databases and on the World Wide Web. Often information may be preprocessed and organized in order to provide users quicker access to relevant documents or data records. In particular, searching for and retrieving information may be facilitated by grouping similar data objects into clusters. Further, groups of similar data objects or clusters may be arranged in a hierarchy. Thus, a hierarchy of clusters may form an abstract representation of stored information.
Electronic documents, for example, may be represented by a tree hierarchy. Each node of the tree hierarchy may represent a cluster of electronic documents, such as, for example, a group of Web pages. Edges connecting nodes of the tree hierarchy may represent a relationship between nodes. Each node in the tree may be labeled with a subject category. Edges of the tree connect higher level nodes or parent nodes with lower level nodes or child nodes. A special node in a tree hierarchy is designated as the root node or null node. The root node has only outgoing edges (no incoming edges) and corresponds to the 0
th
or highest level of the tree. The level of a node is determined by the number of edges along a path connecting the node with the root node. The lowest level nodes of a tree are referred to as leaf nodes. Thus, a tree hierarchy may be used as a classification of information with the root node being the coarsest (all inclusive) classification and the leaf nodes being the finest classification.
FIG. 1
shows an exemplary tree hierarchy for data objects. In
FIG. 1
the root node represents a cluster containing all the available information. Available information may be stored in data objects. Data objects may be, for example, Web pages or links. All data objects belong to the cluster represented by the root node (i.e. level 0). Data objects containing information relevant to the category “business” belong to a cluster represented by a level 1 node. Data objects containing information relevant to the category “recreation” also belong to a cluster also represented by a level 1 node. Further, data objects containing information relevant to the category “education” belong to a cluster represented by a level 1 node. The nodes labeled “business”, “recreation”, and “education” are all child nodes of the root node. The category “business” may be further subdivided into the leaf categories of “large business” and “small business”, as indicated by two level 2 nodes. Nodes labeled “large business” and “small business” are both child nodes of the node labeled “business”. The category “recreation” may be further subdivided into the leaf categories of “movies”, “games”, and “travel”, as indicated by three level 2 nodes. Nodes labeled “movies”, “games”, and “travel” are all child nodes of the node labeled “recreation”. The category “Education” may be further subdivided into the leaf categories of “High-Schools”, “colleges”, “Universities”, and “institutes”, as indicated by four level 2 nodes. Nodes labeled “High-Schools”, “colleges”, “Universities”, and “institutes” are all child nodes of the node labeled “Education”.
A tree hierarchy may serve as a guide for searching for a subject category of data objects in which a user may be interested. For example, a test document, containing keywords which indicate an area of interest, may be given by a user. Based on a test document a tree hierarchy of subject categories may be searched for a node which matches the subject area sought by the user. Once a matching subject area is found, information associated with the matching subject area may be retrieved by the user.
Typically, a tree hierarchy may be searched in a top down fashion beginning with the root node and descending towards the leaf nodes. At each stage of a search, edges or branches are assigned a score. The branch with the highest score indicates the search (descent) direction of the tree. As higher levels of the tree are searched first, and as higher levels are often associated with broader subjects, errors in matching subject areas may lead to erroneous recommendation. In other words, as attaching a descriptive label to higher level nodes may be difficult, an error in matching a subject area to nodes at the beginning of a top down search may lead to a search through irrelevant branches of the tree.
Forming a classification of data is referred to as generating a taxonomy (e.g. a tree hierarchy). The data which is used in order to generate a taxonomy is referred to as training data. The process of finding the closest matching subject area to a given test document is referred to as ‘making content-based recommendations’. Methods for taxonomy generation and applications to document browsing and to performing recommendations have been previously proposed in the technical literature. For example, Douglas R. Cutting, David R. Karger, and Jan O. Pedersen, “Constant interaction-time scatter/gather browsing of large document collections,” Proceedings of the ACM SIGIR, 1993; Douglas R. Cutting, David R. Karger, Jan O. Pedersen, and John W. Tukey, “Scatter/Gather: A cluster-based Approach to Browsing Large Document Collections,” Proceedings of the ACM SIGIR, 1992, pp. 318-329; Hearst Marti A., and Pedersen J. O., “Re-examining the cluster hypothesis: Scatter/Gather on Retrieval Results,” Proceedings of the ACM SIGIR, 1996, pp. 76-84, 1996; Anick P. G., and Vaithyanathan S., “Exploiting clustering and phrases for Context-Based Information Retrieval,” Proceedings of the ACM SIGIR, 1997, pp. 314-322; and Schutze H., and Silverstein C., “Projections for efficient document clustering,” Proceedings of the ACM SIGIR, 1997, pp. 74-81.
Exemplary applications of content-based recommendations methods are in facilitating a search by a user for information posted on the World Wide Web. The content of Web Pages may be analyzed in order to classify links to Web Pages in the appropriate category. Such a method is employed, for example, by WiseWire Corporation (recently acquired by Lycos Inc., http://www.lycos.com). Lycos builds a directory index for Web Pages using a combination of user feedback and so-called intelligent agents. Links to Web Pages may be organized in a hierarchical directory structure which is intended to deliver accurate search results. At the highest level of the hierarchy subject categories may be few and generic, while at the lower levels subject may be more specific. A similar directory structure may be found in other search engines such as that employed by Yahoo Inc. (http://www.yahoo.com).
SUMMARY OF THE INVENTION
A graph taxonomy of information which is represented by a plurality of vectors is generated. The graph taxonomy includes a plurality of nodes and a plurality of edges. The plurality of nodes is generated, and each node of the plurality of nodes is associated with ones of the plurality of vectors. A tree hierarchy is established based on the plurality of nodes. A plurality of distances between ones of the plurality of nodes is calculated. Ones of the plurality of nodes are connected with other ones of the plurality of nodes by ones of the plurality of edges based on the plurality of distances.


REFERENCES:
patent: 5265065 (1993-11-01), Turtle
patent: 5442778 (1995-08-01), Pedersen et al.
patent: 5488725 (1996-01-01), Turtle et al.
patent: 5694594 (1997-12-01), Chang
patent: 5708767 (1998-01-01), Yeo et al.
patent: 5740421 (1998-04-01), Palmon
patent: 58

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for generating taxonomies with... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for generating taxonomies with..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for generating taxonomies with... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2851270

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.