Mining of generalized disjunctive association rules

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06754651

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to mining of generalized disjunctive association rules. It relates generally to data processing, and more particularly to “computer database mining” in which association rules are discovered. In particular, this invention introduces the concept of a disjunctive association rule, a generalized disjunctive association rule and provides an efficient way to compute them.
BACKGROUND OF THE INVENTION
Let I={i
1
, i
2
. . . , i
m
} be a set of literals, called items. Let D be a set of transactions where each transaction t is a subset of the set of items I. We say that a transaction t contains X(X
I), if X
t. We use T(X) to denote the set of all transactions that contain X. An association rule is an implication of the form X
Y, where X⊂I, Y⊂I and X∩Y=&phgr;. The rule X
Y holds in the transaction set D with confidence c if c % of transactions in L that contain X also contain Y. The rule X
Y has support s in the transaction set if s% of transactions in D contain X 4 Y. Given a set of transactions D, the problem of mining association rules is to generate all association rules that have support and confidence “greater than the user-specified minimum support (minsupp) and minimum confidence (minconf) respectively [1,2,3]. In what follows, we use ‘item’ and ‘attribute’ interchangeably.
Mining algorithms have received considerable research attention. In one approach [2] the authors take into account the taxonomy (is-a hierarchy) on the items, and find associations between items at any level of the taxonomy. For example, given a taxonomy that says that jackets is-a outerwear is-a clothes, we may infer a rule that “people who buy outerwear tend to buy shoes”. This rule may hold even if rules that “people who buy jackets tend to buy shoes”, and “people who buy clothes tend to buy shoes” do not hold. Users are often interested only in a subset of association rules. For example, they may only want rules that contain a specific item or rules that contain children of a specific item in a hierarchy. In [3], the authors consider the problem of integrating constraints that are boolean expressions over the presence or absence of items into the association discovery algorithm.
Instead of applying these constraints as a post-processing step, the integrate constraints into the algorithm, which reduces the execution time.
So far, knowledge discovery in data mining has focussed on association rules with conjuncts (A
B→X
Y) only. Specifically, traditional association rules cannot capture contextual inter-relationships among attributes.
U.S. Pat. Nos. 5,794,209 and 5,615,341 describe a system and method for discovering association rules by comparing the ratio of the number of times each itemset appears in a dataset to the number of time particular subsets of the itemset appear in the database, in relation to a predetermined minimum confidence value. The specified system and method however are limited in the use of operators for defining the association rules. Logical completeness of association rule discovery requires a functionally complete set of operators ([
(and),
(or),
(not)], [⊕(xor),
].
Furthermore, the method does not utilize contextual information to define the association rules and is therefore limited in the effectiveness of the result. U.S. Pat. No. 5,615,341 is further limited by the use of hierarchical taxonomies in the determination of the association rules.
THE OBJECT AND SUMMARY OF THE INVENTION
The object of this invention is to provide a system and method for mining a new kind of rules called disjunctive association rules for analyzing data and discovering new kind of relationships between data items.
Another object of the present invention is to incorporate the
,
, as well as the ⊕ operators in the discovery of the disjunctive association rules.
To achieve the said objective this invention provides A method for mining data characterized in that it generates generalized disjunctive association rules to capture the relationships between data items with reference to a given context to provide improved data analysis independently of taxonomies, comprising the steps of:
generating a list of all possible data items that can influence said context,
discovering association rules for data items in said that co-occur based on a defined overlap threshold within said context,
clustering said data items to form a set of generalized disjunctive rules based on a defined confidence (and/or support) threshold, and
iterating the above steps until all items in said list are covered.
The said list is generated by selecting those data items that have a significant overlap with said context.
The said association rules are discovered by merging data items that overlap above said defined threshold within said context and confirmation that the strength of the relation is beyond a defined minimum support value.
The Clustering is agglomerative.
The discovery of said association rules uses a functionally complete set of operators including “AND”, “OR”, NOT” and “EXCLUSIVE-OR”.
The above method is applied to clustering of query results in a search engine where the query is the context, a word is mapped to an item, a document to a transaction, the recall is the confidence, and the resulting disjuncts are the labels of the clusters of documents.
The said method is extended to interactive query refinement.
The above method is applied to customer targeting by determining generalized disjunctive association rules on data such as customer purchase history, customer segments, product information and the like.
The above method is further used for making recommendations to customers where the customer's purchase history is the context and the generalized disjunctive association rules provide the recommendations.
The above method is applied to gene analysis by finding the generalized disjunctive association rules from gene databases.
The instant method is applied to cause-and-effect analysis in applications such as medical analysis, market survey analysis and census analysis, by finding generalized disjunctive association rules from the database of causes and effects.
The method is applied to fraud detection by finding generalized disjunctive association rules from transaction databases.
The present invention further relates to a system for mining data characterized in that it generates generalized disjunctive association rules to capture the relationships between data items with reference to a given context to provide improved data analysis independently of taxonomies, comprising:
means for generating a list of all possible data, items that can influence said context,
means for discovering association rules for data items in said list that co-occur based on a defined overlap threshold within said context,
means for clustering said data items to form a set of generalized disjunctive rules based on a defined confidence (and/or support) threshold, and
means for iterating the above steps until all items in said list are covered.
The said list is generated by means for selecting those data items that have a significant overlap with said context.
The said association rules are discovered by means for merging data items that overlap above said defined threshold within said context and confirmation that the strength of the relation is beyond a defined minimum support value.
The said clustering is agglomerative.
The discovery of said association rules uses a functionally complete set of operators including “AND”, “OR”, NOT” and “EXCLUSIVE-OR”.
The above system is used for clustering of query results in a search engine where the query is the context, a word is mapped to an item, a document to al transaction, the recall is the confidence, and the resulting disjuncts are the labels of the clusters of documents.
The system is extended to interactive query refinement.
The said system is used for customer targeting by means for determining generalized disjunctive association

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Mining of generalized disjunctive association rules does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Mining of generalized disjunctive association rules, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Mining of generalized disjunctive association rules will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3363336

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.