Method system and computer program product for visualizing...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06460049

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to machine learning, data mining, and data visualization.
2. Related Art
Many data mining tasks require classification of data into classes. Typically, a classifier classifies data into. classes. The classifier provides a function that maps (classifies) a data item (instance) into one of several predefined classes (labels). More specifically, the classifier predicts one attribute of a set of data given one or more other attributes. For example, in a database of iris flowers, a classifier can be built to predict the type of iris (iris-setosa, iris-versicolor or iris-virginica) given the petal length, petal width, sepal length and sepal width. The attribute being predicted (in this case,: the type of iris) is called the label, and the attributes used for prediction are called the descriptive attributes.
A classifier is generally constructed by an inducer. The inducer is an algorithm that builds the classifier from a training set. The training set consists of records with labels. The training set is used by the inducer to “learn” how to construct the classifier as shown in FIG.
1
. Once the classifier is built, it can be used to classify unlabeled records as shown in FIG.
2
.
Inducers require a training set, which is a database table containing attributes, one of which is designed as the class label. The label attribute type must be discrete (e.g., binned values, character string values, or few integers).
FIG. 3
shows several records from a sample training set pertaining to an iris database. The iris database was originally used in Fisher, R. A., “The use of multiple measurements in taxonomic problems,” in
Annals of Eugenics
7(1):179-188, (1936). It is a classical problem in many statistical texts.
Once a classifier is built, it can classify new unlabeled records as belonging to one of the classes. These new records must be in a table that has the same attributes as the training set; however, the table need not contain the label attribute. For example, if a classifier for predicting iris_type is built, the classifier can be applied to records containing only the descriptive attributes, and a new column is added with the predicted iris type. See, e.g., the general and easy-to-read introduction to machine learning, Weiss, S. M., and Kulikowski, C. A.,
Computer Systems that Learn
, San Mateo, Calif., Morgan Kaufmann Publishers, Inc. (1991), and the edited volume of machine learning techniques, Dietterich, T. G. and Shavlik, J. W. (eds.),
Readings in Machine Learning
, Morgan Kaufmann Publishers, Inc., 1990 (both of which are incorporated herein by reference).
A well known type of classifier is an Evidence classifier, also called a Bayes classifier or a Naive-Bayes classifier. The Evidence classifier uses Bayes rule, or equivalents thereof, to compute the probability of each class given an instance. Under the Bayes rule, attributes are assumed to be conditionally independent by the Evidence classifier in determining a label. This conditional independence can be assumed to be a complete conditional independence as in a Naive-Bayes classifier or Simple Bayes classifier. Alternatively, the complete conditional independence assumption can be relaxed to optimize classifier accuracy or further other design criteria.
For more information on classifiers, see the following documents, each of which is incorporated by reference in its entirety herein: Kononenko, I.,
Applied Artificial Intelligence
7:317-337 (1993) (an introduction to the evidence classifier (Naive-Bayes)); Schaffer, C., “A Conservation Law for Generalization Performance,” in Machine Learning:
Proceedings of the Eleventh International Conference
, Morgan Kaufmann Publishers, Inc., pp. 259-265 (1994) (a paper explaining that no classifier can be “best”); Taylor, C., et al.,
Machine Learning, Neural and Statistical Classification
, Paramount Publishing International (1994) (a comparison of algorithms and descriptions); Langley et al, “An Analysis of Bayesian Classifiers,”
Proceedings of the Tenth National Conference on Artificial Intelligence
, pp. 223-228 (1992) (a paper describing an evidence classifier (Naive-Bayes)); Good, I. J.,
The Estimation of Probabilities: An Essay on Modern Bayesian Methods
, MIT Press (1965) (describing an evidence classifier), and Duda, R. and Hart, P.,
Pattern Classification and Scene Analysis
, Wiley (1973) (describing the evidence classifier); and Domingos, P. and Pazzani, M.,_“Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier,”
Machine Learning, Proceedings of the
13
th International Conference
(ICML '96), pp. 105-112 (1996) (showing that, while the conditional independence assumption can be violated, the classification accuracy of the evidence classifier (called Simple Bayes in this paper) can be good).
Data mining applications and end-users now need to know how an evidence classifier maps each record to a label. Understanding how an evidence classifier works can lead to an even greater understanding of data. Current classifier visualizers are directed to other types of classifiers, such as, decision-tree classifiers. See, e.g., the AT&T product called Dotty that displays a decision-tree classifier in a 2-D ASCII text display. For an introduction to decision tree induction see Quinlan, J. R., C4.5
: Programs for Machine Learning
, Los Altos, Calif., Morgan Kaufmann Publishers, Inc. (1993); and the book on decision trees from a statistical perspective by Breiman et al.,
Classification and Regression Trees
, Wadsworth International Group (1984).
What is needed is an evidence classifier visualizer.
SUMMARY OF THE INVENTION
An evidence classifier visualization tool is needed to display information representative of the structure of an evidence classifier including information pertaining to how an evidence classifier predicts a label for each unlabeled record.
The present invention provides a computer-implemented method, system, and computer program product for visualizing the structure of an evidence classifier. An evidence classifier visualization tool is provided that displays information representative of the structure of an evidence classifier. The evidence classifier visualization tool displays information pertaining to how an evidence classifier assigns labels to unlabeled records.
An evidence inducer generates an evidence classifier based on a training set of labeled records. Each record in the training set has one or more attribute values and a corresponding class label. Once the evidence classifier is built, the evidence classifier can assign class labels to unlabeled records based on attribute values found in the unlabeled records.
According to the present invention, the evidence inducer includes a mapping module that generates visualization data files used for visualizing the structure of the evidence classifier generated by the evidence inducer. In the present invention, an evidence visualization tool uses the visualization data files to display an evidence pane and/or a label probability pane. The evidence pane includes two different representations: a first evidence pane display view and a second evidence pane display view. The first evidence pane display view shows a normalized conditional probability of each label value for each attribute value. The second evidence pane display view shows relative conditional probabilities of a selected label value for each attribute value.
The label probability pane includes a first label probability pane display view and/or a second label probability pane display view. The first label probability pane display view shows prior probabilities of each label value based on the training set. The second label probability pane display view shows posterior probabilities of each label value based on at least one selected attribute value.
According to one embodiment, the first evidence pane display view comprises a plurality of rows of charts. Each row corresponds to a respective attribute. Each row has a num

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method system and computer program product for visualizing... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method system and computer program product for visualizing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method system and computer program product for visualizing... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2995854

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.