Finding collective baskets and inference rules for internet...

Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S048000, C707S793000

Reexamination Certificate

active

06263327

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to online searching for data dependencies in large databases and more particularly to an online method of data mining.
2. Discussion of the Prior Art
Data mining, also known as knowledge discovery in databases, has been recognized as an important new area for database research with broad applications. With the recent popularity of the internet the internet rule mining problem is significant because of its ability to gain access to large databases over the Internet. The ability to gain access to such large databases without significant access delay is a primary goal of an on-line data miner.
In general, data mining is a process of nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases. The discovered knowledge can be applied to information management, query processing, decision making, process control, and many other applications. Furthermore, several emerging applications in information providing services, such as on-line services and the World Wide Web, also call for various data mining techniques to better understand user behavior, to meliorate the service provided, and to increase the business opportunities. Since it is difficult to predict what exactly could be discovered from a database, a high-level data mining query should be treated as a probe which may disclose some interesting traces for further exploration. Interactive discovery should be encouraged, which allows a user to interactively refine a data mining request for multiple purposes including the following; dynamically changing data focusing, flexibly viewing the data and data mining results at multiple abstraction levels and from different angles.
A data mining system can be classified according to the kinds of databases on which the data mining is performed. In general, a data miner can be classified according to its mining of knowledge from the following different kinds of databases: relational databases, transaction databases, object-oriented databases, deductive databases, spatial databases temporal databases, multimedia databases, heterogeneous databases, active databases, legacy databases, and the Internet information-base. In addition to the variety of databases available, several typical kinds of knowledge can be discovered by data miners, including association rules, characteristic rules, classification rules, discriminant rules, clustering, evolution, and deviation analysis. Moreover data miners can also be categorized according to the underlying data mining techniques. For example, it can be categorized according to the driven method into autonomous knowledge miner, data-driven miner, query-driven miner, and interactive data miner. It can also be categorized according to its underlying data mining approach into generalization based mining, pattern based mining, mining based on statistics or mathematical theories, and integrated approaches, etc.
Given a database of sales transactions, it is desirable to discover the important associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. A mathematical model was proposed in Agrawal R., Imielinski T., and Swami A. Mining association rules between sets of items in very large databases, Proceedings of the ACM SIGMOD Conference on Management of data, pages 207-216, Washington D.C., May 1993, to address the problem of mining association rules.
Let U={i
1
, i
2
, . . . , im} be a set of literals called items. Let D be a set of transactions; where each individual transaction T consists of a set of items, such that T is a subset of U. Note that the actual quantities of items bought in a transaction are not considered, meaning that each item is a binary (0 or 1) variable representing if an item was bought. Let U be a set of items. A transaction T is said to contain the set of items U if and only if U is a subset of T.
An association rule is an implication or query of the form X==>Y, where both X and Y are sets of items. The idea of an association rule is to develop a systematic method by which a user can figure out how to infer the presence of some sets of items, such as Y, given the presence of other items in a transaction, such as X. Such information is useful in making decisions such as customer targeting, shelving, and sales promotions.
The Rule X==>Y holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y. For example, a rule has 90% confidence when 90% of the tuples containing X also contain Y. The rule has support s if s% of transactions in D contain (X union Y).
It is often desirable to pay attention to only those rules which may have reasonably large support. Such rules with high confidence and high support are referred to as association rules. These concepts were first introduced into the prior art, see Agrawal et al, infra. The task of mining association rules is essentially to discover strong association rules in large databases. The notions of confidence and support become very useful in formalizing the problem in a computational efficient approach called the large itemset method. The large itemset approach can be decomposed into the following two steps:
1) Discover the large item sets, i.e., the sets of item sets that have transaction support above a predetermined user defined minimum support, called minsupport.
2) Use the large item sets to generate the association rules for the database that have confidence above a predetermined user defined minimum confidence called minconfidence.
Given an itemset S={I1, I2, . . . , Ik}, we can use it to generate at most k rules of the type [S−{Ir}]==>Ir for each r in {1, . . . , k}. Once these rules have been generated, only those rules above a certain user defined threshold called minconfidence are retained.
The overall computational complexity of mining association rules is determined by the first step. After the large itemsets are identified, the corresponding association rules can be derived in a straightforward manner. Efficient counting of large itemsets was the focus of most prior work. Nevertheless, there are certain inherent difficulties with the use of these parameters in order to establish the strength of an association rule.
After the fundamental paper on the itemset methods see Agrawal et al. infra, a considerable amount of additional work was done based upon this approach; For example, faster algorithms for mining association rules were proposed in Aarawal R., and Srikant R. Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, pages 478-499, Sep. 1994.
A secondary measure called the interest measure was introduced in Agrawal et. al. in Srikant R., and Aarawal R. Mining quantitative association rules in large relational tables. Proceedings of the 1996 ACM SIGMOD Conference on Management of Data. Montreal, Canada, June 1996. A rule is defined to be R-interesting, if its actual support and confidence is R-times that of the expected support and confidence. It is important to note here that the algorithms previously proposed for using the interest measure are such that the support level remained the most critical aspect in the discoverability of a rule, irrespective of whether or not an interest measure was used.
One of the difficulties of the itemset method is its inability to deal with dense data sets. Conversely, the success of the itemset approach relies on the sparsity of the data set. For example, if the probability of buying soup were around 2%, such occurrence would be considered to be statistically sparse and therefore amenable to an itemset approach. This is because for a k-dimensional database, a database with k purchasable items, there are 2{circumflex over ( )}k possibilities for itemsets. The sparsity of t

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Finding collective baskets and inference rules for internet... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Finding collective baskets and inference rules for internet..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Finding collective baskets and inference rules for internet... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2563028

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.