Computer graphics processing and selective visual display system – Computer graphics processing – Graph generating
Reexamination Certificate
1997-03-07
2001-08-21
Nguyen, Phu K. (Department: 2671)
Computer graphics processing and selective visual display system
Computer graphics processing
Graph generating
Reexamination Certificate
active
06278464
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to data visualization and data mining.
2. Related Art
Many data mining tasks require classification of data into classes. Typically, a classifier classifies the data into the classes. For example, loan applications can be classified into either “approve” or “disapprove” classes. The classifier provides a function that maps (classifies) a data item (instance) into one of several predefined classes. More specifically, the classifier predicts one attribute of a set of data given one or more attributes. For example, in a database of iris flowers, a classifier can be built to predict the type of iris (iris-setosa, iris-versicolor or iris-virginica) given the petal length, petal width, sepal length and sepal width. The attribute being predicted (in this case, the type of iris) is called the label, and the attributes used for prediction are called the descriptive attributes.
A classifier is generally constructed by an inducer. The inducer is an algorithm that builds the classifier from a training set. The training set consists of records with labels. The training set is used by the inducer to “learn” how to construct the classifier as shown in FIG.
1
. Once the classifier is built, it can be used to classify unlabeled records as shown in FIG.
2
.
Inducers require a training set which is a database table containing attributes, one of which is designated as the class label. The label attribute type must be discrete (e.g., binned values, character string values, or few integers).
FIG. 3
shows several records from a sample training set.
Once a classifier is built, it can classify new records as belonging to one of the classes. These new records must be in a table that has the same attributes as the training set; however, the table need not contain the label attribute. For example, if a classifier for predicting iris_type is built, we can apply the classifier to records containing only the descriptive attributes, and a new column is added with the predicted iris type.
In a marketing campaign, for example, a training set can be generated by running the campaign at one city and generating label values according to the responses in the city. A classifier can then be induced and campaign mail sent only to people who are labeled by the classifier as likely to respond, but from a larger population, such as all the U.S. Such mailing can have substantial cost savings.
A well known type of classifier is the Decision-Tree classifier. The Decision-Tree classifier assigns each record to a class. The Decision-Tree classifier is induced (generated) automatically from data. The data, which is made up of records and a label associated with each record, is called the training set.
Decision-Trees are commonly built by recursive partitioning. A univarite (single attribute) split is chosen for the root of the tree using some criterion (e.g., mutual information, gain-ratio, gini index). The data is then divided according to the test, and the process repeats recursively for each child. After a full tree is built, a pruning step is executed which reduces the tree size.
A major problem associated with decision-tree classifiers is that they are difficult to visualize using available visualizers. This is especially true of large decision-trees with thousands of nodes. Current two-dimensional displays are very limited. One product by AT&T called Dotty provides a two-dimensional visualization, but interaction is limited to simple scrolling of canvas.
FIG. 4A
shows an example of a small decision-tree generated by Dotty.
FIG. 4B
shows an example of a large decision-tree generated by Dotty.
Another conventional technique generates a decision-tree as a simple two-dimensional ASCII display.
FIG. 4C
shows a simple ASCII display generated by a conventional visualizer. It is difficult to comprehend from the two-dimensional ASCII display how much data was used to build parts of the decision-tree. Also, it is difficult to analyze the data from the ASCII display. A large decision-tree is a complex structure that typically has thousands of nodes. Such a large decision-tree typically does not fit into a display screen. Although graphical displays used to visualize decision-tree classifiers can scroll, they do not provide a good solution for large trees with thousands of nodes. This is because, it is extremely difficult to follow a path in a tree having hundreds of nodes. Also, it is difficult to see the big picture when a tree has thousands of nodes. Other products that contain two-dimensional visualizations are SAS, SPLUS, Angoss' Knowledge Seeker, and IBM's Intelligent Data Miner.
SUMMARY OF THE INVENTION
As realized by the inventors, a three-dimensional visualization of a decision-tree classifier, also referred to as a decision-tree visualizer, is needed that is easy to visualize and comprehend. In addition, other advantages are provided by interactive navigation and search capabilities incorporated into the visualizer.
The present invention provides a method, system and a computer program product for visualizing a decision-tree classifier. The decision-tree visualizer provides more information about the decision-tree classifier than the mere decision-tree structure. Specifically, the visualizer shows how much data was used to build parts of the decision-tree. This indicates reliability and confidence of the subtrees in making predictions. Furthermore, the visualizer allows users to see interesting parts of the tree. For example, users can see nodes that represent skewed distribution of the data because of some special effects.
The present invention provides a decision-tree visualizer that displays quantitative and relational characteristics of data by showing them as hierarchically connected nodes in the decision-tree. Each node of a decision-tree contains graphical attributes whose height and color correspond to aggregation of data values. Lines connecting nodes show the relationship of one set of data to its subsets.
The decision-tree visualizer provides a fly-through environment for navigating through large trees. More importantly, users can view only a few levels or subtrees and dynamically expand the subtrees (comprised of one or more nodes). This feature of dynamically expanding subtrees is referred as the drill down capability.
Thus, while a tree can be very large with hundreds of thousands of nodes, the dynamic expansion feature lets users drill-down and view more information in the subtrees of interest. For example, a user can view a subtree comprised of one or more nodes and expand the subtree in order to get more information about the subtree. This removes the information overload that happens in many 2D non-interactive visualizers.
Briefly stated, the decision-tree visualizer comprises an inducer which constructs a decision-tree classifier from a training set. The inducer has a mapping module which creates visualization files from the training set. The visualization files are then transformed into graphical attributes which are displayed with the decision-tree classifier's structure.
The graphical attributes are clustered hierarchical blocks that are located at the root-node, child-nodes and leaf-nodes of the decision-tree classifier. The graphical attributes represent various characteristics of the dataset. For example, the graphical attributes shows relationship of one set of data to its subsets. Also, the graphical attributes provide information such as distributions, purity of classification and reliability of the classification.
In one embodiment of the present invention, the graphical attributes comprises a plurality of bars and bases. At each node of the decision-tree, there are one or more bars clustered together. The bars provide various information about the records at each node. For example, each bar represents a class of record. If at a particular node, there are three bars, then there are three classes of records at that node. A split may be performed at that node. The split may be
Kohavi Ron
Tesler Joel D.
Nguyen Phu K.
Silicon Graphics Inc.
Sterne Kessler Goldstein & Fox P.L.L.C.
LandOfFree
Method, system, and computer program product for visualizing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method, system, and computer program product for visualizing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method, system, and computer program product for visualizing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2474780