System and method for approximating probabilities using a...

Data processing: artificial intelligence – Machine learning

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details System and method for approximating probabilities using a... System and method for approximating probabilities using a...

: 2000-12-18
: 2004-04-06
: Davis, George B. (Department: 2121)
: Data processing: artificial intelligence
: Machine learning

: C706S014000, C706S046000
: Reexamination Certificate
: active
: 06718315
: ABSTRACT:

TECHNICAL FIELD
The present invention generally relates to machine learning techniques and probabilistic reasoning under uncertainty. More particularly, the present invention relates to learning decision trees from data and using learned decision trees to approximate conditional probabilities.
BACKGROUND OF THE INVENTION
Machine learning techniques are a mechanism by which accumulated data can be used for prediction and other analytical purposes. For example, web site browsing data can be used to determine web sites more likely to be viewed by particular types of users. As another example, product purchase data can be employed to determine products a consumer is likely to purchase, based on prior product purchase history and other information.
One type of machine learning technique is decision-tree learning. A decision tree is a structure employed to encode a conditional probability distribution of a target attribute given a set of predictor attributes. For example, the set of predictor attributes may correspond to web sites a user has or has not viewed, or products a user has or has not purchased. The target attribute may then correspond to a web site or product that an analyst is examining to determine whether the user is likely to view or purchase, respectively. Once a decision tree has been constructed, it can be navigated by employing a particular target user's data to determine answers to future viewing or purchasing queries concerning the target user.
A decision tree
10
illustrated in Prior Art
FIG. 1
was constructed by a decision-tree learning algorithm for the purpose of predicting a person's salary based on various attributes associated with the person. The learning algorithm constructed the decision tree
10
using a set of training data, where each record in the training data corresponded to a person. The set of known attributes in the training data includes Age, Gender, Job and Salary, where Age is a continuous attribute, Job is a categorical attribute with three states {Engineer, Lawyer, Researcher} and Salary is a categorical (binary) attribute with states {High, Low}. Salary is referred to as the target attribute for the tree because the tree is used to predict Salary. Other attributes employed in building the tree
10
are referred to as predictor attributes.
The decision tree
10
in Prior Art
FIG. 1
encodes the conditional probability distribution p(Salary|Age,Gender,Job) learned from the training data. In particular, for assignments of the predictor attributes Age, Gender and Job, the decision tree
10
can be traversed from a root node
12
down to a leaf node
18
, a leaf node
20
and/or a leaf node
16
. The leaf nodes
18
,
20
and
16
store a probability distribution for Salary.
In general, a decision tree is traversed by starting at the root node and following child links until a leaf node is reached. Each non-leaf node is annotated with the name of a predictor attribute to be examined, and each out-going child link from that node is annotated with a value or a set of values for the predictor attribute. Every value of the predictor attribute corresponds to one out-going child link. When the traversal reaches a non-leaf node, the known value of the corresponding predictor attribute is examined, and the appropriate (unique) child link is followed. Non-leaf nodes are referred to as split nodes (or simply splits) in the decision tree. Each split node is annotated with the name of a predictor attribute X, and the node is thus referred to as a split on X. Splits have at least two children. Prior Art
FIG. 1
illustrates splits with two children, which create binary trees. It is to be appreciated by one skilled in the art, that although the application describes binary trees, the more general case of non-binary trees can be employed in accordance with the present invention.
To illustrate how to traverse and extract conditional probabilities from a decision tree, consider again the tree
10
in Prior Art FIG.
1
. Assume that an analyst desires to predict the salary of a twenty eight year old female engineer. The analyst desires to use the tree
10
to determine p(Salary|Age=28, Gender=female, Job=Engineer). The traversal starts at the root node
12
, which is a split on Age. Consequently, the known value of twenty eight for Age is examined and compared to the values on the out-going edges of the root node
12
. Because twenty eight is less than thirty, the left child edge is traversed and the traversal moves to a node
14
. The node
14
is a split on Job, and because Job=Engineer for the person in question, the traversal moves next to the node
18
. The node
18
is a leaf node, and consequently the traversal completes and the conditional probability for Salary can be obtained. In particular,
P
(Salary=High|Age=28, Gender=female, Job=Engineer)=0.65
P
(Salary=Low|Age=28, Gender=female, Job=Engineer)=0.35
Note that the decision tree
10
does not contain any splits on Gender. This means that the learning algorithm identified that Gender was not useful for predicting Salary, at least in the context of knowing Age and Job.
In general, given a decision tree for a probability distribution p(Y|X
1
, . . . ,X
N
) then for values x
1
, . . . x
n
the values p(Y|X
1
=x
1
, . . . X
N
=x
n
) can be extracted by performing the traversal algorithm as described above, and using the distributions stored in the leaf nodes. One skilled in the art will appreciate that p(Y|X
1
, . . . X
n
) denotes either a discrete probability distribution or a probability density function, depending on whether Y is a discrete or a continuous attribute, respectively.
There are three problems, using decision trees as described.
A first problem arises because decision trees are constructed using a finite set of data that may not contain very many examples corresponding to a probability later requested from the decision tree. Since the probability distributions at the leaf nodes are estimated from the training data, conventionally it is possible to extract a probability that may not have a reliable estimate due to this “inadequate training data” problem.
Another problem arises when the requested query does not contain a predictor value that may conventionally be employed to traverse a decision tree and thus retrieve a stored probability. This problem typically occurs when not all of the predictor values (e.g., the values of the attributes that define the splits in the decision tree) are provided in a query, yet a conditional probability of the target attribute is still sought. This problem can arise because the conditional probability distribution p(W|X,Y) does not provide adequate information about the probability distribution p(W|X). That is, if the values for one or more predictors are not known, a conventional decision tree may not extract the desired probability. This is known as the missing predictor problem.
A third problem arises because the domain (e.g. the set of possible values) for predictor attributes may not be known when the decision tree is constructed, and these domains may have to be estimated from data. For example, if a decision tree is constructed for p(W|X,Y) using a set of training data, and in that data the attribute X appears in one of two distinct states, the training algorithm is likely to assume that X is a binary attribute. This is problematic if X has more than two states, and the tree is later used to extract p(W|X,Y) for the third value of X. This is known as the “new value” problem.
These three problems can be illustrated in Prior Art FIG.
1
. To illustrate the inadequate training data problem, assume that the training data contained no data wherein a lawyer was under thirty. In this case, assume that the split on Job in node
14
was chosen by the learning algorithm because it separates the engineers from the researchers, and this disti

Affiliated with

Bernhardt Jeffrey R.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Chickering David M.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Meek Christopher A.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Rounthwaite Robert L.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Amin & Turocy LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Davis George B.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hirl Joseph P.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Microsoft Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for approximating probabilities using a... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for approximating probabilities using a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for approximating probabilities using a... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3256441

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure