System for constructing decision tree classifiers using...

Data processing: artificial intelligence – Neural network – Learning task

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details System for constructing decision tree classifiers using... System for constructing decision tree classifiers using...

: 1997-11-26
: 2001-07-31
: Powell, Mark R. (Department: 2122)
: Data processing: artificial intelligence
: Neural network
: Learning task

: C706S048000, C706S052000
: Reexamination Certificate
: active
: 06269353
: ABSTRACT:

BACKGROUND AND SUMMARY OF THE INVENTION
The present invention relates generally to computer-implemented artificial intelligence systems. More particularly, the present invention relates to computer-implemented neural networks with classification capability.
Classification using decision trees is a widely used nonparametric method in pattern recognition for complex classification tasks. The decision tree methodology is also popular in machine learning as a means of automated knowledge acquisition for expert or knowledge-based systems. As shown in
FIG. 1
, a decision tree classifier
50
uses a series of tests or decision functions
54
,
56
, and
60
to determine the identity of an unknown pattern or object. The evaluation of decision functions
54
,
56
, and
60
is organized in such a way that the outcome of successive decision functions reduces uncertainty about the unknown pattern being considered for classification. Left branches (e.g., left branch
61
) correspond to positive outcomes of the tests at the internal tree nodes. Right branches (e.g., right branch
63
) correspond to negative outcomes of the tests at the internal tree nodes.
In addition to their capability to generate complex decision boundaries, it is the intuitive nature of decision tree classifiers as evident from
FIG. 1
that is responsible for their popularity and numerous applications. Applications of the decision tree methodology include character recognition, power system monitoring, estimating software-development effort, and top-quark detection in high-energy physics among others.
While on occasional instances a decision tree classifier is determined heuristically, the common approach is to make use of a learning procedure to automatically configure a decision tree using a set of labeled pattern vectors, i.e. training examples or vectors. Several automatic decision tree induction algorithms exist for this purpose in pattern recognition and machine learning literature (for example, see, L. Breiman, J. Friedman, R. Olshen, and C. J. Stone,
Classification and Regression Tree
, Wadsworth Int'l Group, Belmont, Calif., 1984; and S. B. Gelfand, C. S. Ravishankar, and E. J. Delp, “An iterative growing and pruning algorithm for classification tree design,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, pp. 163-174, 1991).
However, most of these decision tree induction algorithms follow the top-down, divide-and-conquer strategy wherein the collection of labeled examples is recursively split to create example subsets of increasing homogeneity in terms of classification labels until predetermined terminating conditions are satisfied.
The top-down decision tree induction methodology basically consists of following components: 1) a splitting criterion to determine the effectiveness of a given split on training examples, 2) a method to generate candidate splits, 3) a stopping rule, and 4) a method to set up a decision rule at each terminal node. The last component is solved by following the majority rule. Different decision tree induction methods essentially differ in terms of the remaining three components. In fact, the differences are generally found only in the splitting criterion and the stopping rule.
Three decision tree induction methodologies in pattern recognition and machine learning literature are:
(1) AMIG (see, I. K. Sethi and G. P. R. Sarvarayudu, “Hierarchical classifier design using mutual information,” IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI-4, pp. 441-445, 1982);
(2) CART (see, L. Breiman, J. Friedman, R. Olshen, and C. J. Stone,
Classification and Regression Tree
, Wadsworth Int'l Group, Belmont, Calif., 1984); and
(3) ID3 (see, J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81-106, 1986).
AMIG and ID3, both use an information theory based measure, the average mutual information gain, to select the desired partitioning or split of training examples. Given training examples from c classes, and a partitioning P that divides them into r mutually exclusive partitions, the average mutual information gain measure of partitioning, I(P), is given as
I
⁡
(
P
)
=
∑
i
=
1
r
⁢
∑
j
=
1
c
⁢
p
⁡
(
r
i
,
c
j
)
⁢
log
2
⁢

⁢
p
⁡
(
c
j
/
r
i
)
p
⁡
(
c
j
)
where p(r
i
,c
j
) and p(c
j
/r
i
), respectively, are the joint and conditional probabilities and p(c
j
) is the class probability. Using the maximum likelihood estimates for probabilities, the above measure can be written as
I
⁡
(
P
)
=
∑
i
=
1
r
⁢
∑
j
=
1
c
⁢
n
ij
N
⁢
log
2
⁢

⁢
n
ij
⁢
N
N
i
⁢
n
j
where n
j
is the number of training examples from class c
j
, and n
ij
is the number of examples of class c
j
that lie in partition r
i
. The quantity N is the total of all training examples of which N
i
lie in partition r
i
. The split of training examples providing the highest value of the I(P) is selected. The CART procedure uses the Gini index of diversity to measure the impurity of a collection of examples. It is given as
G
=
1
-
∑
j
=
1
c
⁢
p
2
⁡
(
c
j
)
The split providing maximum reduction in the impurity measure is then selected. The advantage of this criterion is its simpler arithmetic.
To determine when to stop top-down splitting of successive example subsets is the other important part of a decision tree induction procedure. The AMIG procedure relies for stopping on the following inequality that specifies the lower limit on the mutual information, I(tree), to be provided by the induced tree
I
⁡
(
tree
)
≥
-
∑
j
=
1
c
⁢
p
⁡
(
c
j
)
⁢
log
2
⁢
p
⁡
(
c
j
)
+
p
error
⁢
log
2
⁢
p
error
+
(
1
-
p
error
)
⁢
log
2
⁡
(
1
-
p
error
)
-
p
error
⁢
log
2
⁡
(
c
-
1
)
where p
error
is the acceptable error rate. The tree growing stops as soon as the accumulated mutual information due to successive splits exceeds I(tree). CART and ID3 instead follow a more complex but a better approach of growing and pruning to determine the final induced decision tree. In this approach, the recursive splitting of training examples continues till 100% classification accuracy on them is achieved. At that point, the tree is selectively pruned upwards to find a best subtree according to some specified cost measure.
The generation of candidate splits at any stage of the decision tree induction procedure is done by searching for splits due to a single feature. For example in AMIG, CART, and ID3, each top-down data split takes either the form of “is x
i
≧t?” when the attributes are ordered variables or the form of “is x
i
true?” when the attributes are binary in nature. The reason for using single feature splits is to reduce the size of the space of legal splits. For example with n binary features, a single feature split procedure has to evaluate only n different splits to determine the best split. On the other hand, a multifeature split procedure must search through a very large number of Boolean combinations, 2
2
n
logical functions if searching for all possible Boolean functions, to find the best split.
Due to single feature splits, the decision tree induction procedures in practice often create large unwieldy trees that translate into production rules or concept descriptions that are not concise and do not generalize well. Another deficiency of single feature splits is their relative susceptibility to noise in comparison with multifeature splits.
In addition to using only a single feature split scheme to determine successive splits of the training examples, the top-down induction procedures have no look-ahead component in their splitting strategy, i.e. the evaluation of the goodness of a split, single or multifeature, at any stage of partitioning does not take into account its effect on future partitions. This is a major drawback which in many instances leads to large decision trees which yield lower classification accuracy and do not clearly bring out the relationships present in the training examples. The set of labeled examples of Table 1 illustrate this point.
x
1
x

Affiliated with

Sethi Ishwar K.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Yoo Jae H.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Harness & Dickey & Pierce P.L.C.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Khatri Anil

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Powell Mark R.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System for constructing decision tree classifiers using... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System for constructing decision tree classifiers using..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System for constructing decision tree classifiers using... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2506691

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure