Method and apparatus for generating weighted association rules

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06173280

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to a method and apparatus for uncovering relationships or association rules between items in large databases, and in particular to a method and apparatus for providing preselected value “weights” to items and to database transaction records when generating association rules to identify sets of items and transactions having different levels of user importance.
BACKGROUND OF THE INVENTION
In recent years, commercial businesses have been increasing the use of information-driven marketing processes, managed by database technology, to develop and implement customized marketing strategies and programs. The progress of information automation has increased the size of commercial computer databases to the point where enormous amounts of commercial numbers, facts and statistics are collected and stored; unfortunately less information of any significance is being extracted from such databases because their size has become less and less manageable. The problem is that conventional computer databases are efficient in the manner in which they store data, but inefficient in the manner of searching through data to extract useful information. Simply stated, the use of computers in business and network applications has generated data at a rate that has far outstripped the ability to process and analyze it effectively.
Data “mining” or knowledge discovery in databases, has been growing in response to this problem because computer systems cannot efficiently and accurately undertake the intuitive and judgmental interpretation of data. Computer systems can, however, undertake the quantitative aspects of data mining because they can quickly and accurately perform certain tasks that demand too much time or concentration from humans. Data mining systems are ideally suited to the time-consuming and tedious task of breaking down vast amounts of data to expose categories and relationships within the data. These relationships can then be intuitively analyzed by human experts.
Data mining systems identify and extract important information from patterns or relationships contained in available databases by sifting through immense collections of data such as marketing, customer sales, production, financial and experimental data to “see” meaningful patterns or regularities and identify what is worth noting and what is not. For example, credit card companies, telephone companies and insurers are mining their enormous collections of data for subtle patterns within thousands of customer transactions to identify risky customers or even fraudulent transactions as they are occurring. Data mining is also being used to analyze the voluminous number of alarms that occur in telecommunications and networking alarm data. Progress in bar code technology use at retail organizations, such as supermarkets, has resulted in millions of electronic records which, when mined, can show purchasing relationships among the various items shoppers buy. Analysis of large amounts of supermarket basket data (the items purchased by an individual shopper) can show how often items are purchased together, such as, for example, milk, bread and butter. The results can be useful for decisions concerning inventory levels, product promotions, pricing, store layout or other factors that might be adjusted to changing business conditions.
Consider data mining of supermarket basket data. In such a situation, the supermarket contains a set of items (its products), of which each shopper transaction or purchase is a subset. In analyzing the volumes of subsets, it is desirable to find the transactions in which the presence of various items occurs a significant percentage of times. The fraction of transactions that a particular set of items (also referred to as an “itemset”) occurs in, is known as the support of an itemset. An itemset is called large if its support exceeds a preselected threshold. All other combinations are known as small itemsets. The fraction of transactions containing one itemset I, that also contain another specific itemset J is known as the confidence. For example, in a market basket analysis of shopper transactions, if 60% of the transactions that contain milk also contain bread, and 15% of all transactions contain both of these items, then 15% is the support and 60% is the confidence.
The objective of data mining systems is to uncover relationships or associations between the presence of various itemsets in transactions based on support and confidence factors (called “association rules”). The end result of a data mining operation is the generation of association rules that satisfy user-specified minimum support and confidence constraints for itemsets. These rules are formulated probability rules that are indicative of the frequency association between different items uncovered in the multitude of records.
One of the better known methods for finding large itemsets is the Apriori method described in the publication,
Fast Algorithms of Mining Association Rules,
by R. Agrawal and R. Srikant—Proceedings of the 20
th
VLDB Conference; Santiago, Chile, 1994. To discover large itemsets, the Apriori method makes multiple passes over the transaction records and counts the support of individual items to determine which of them are large, i.e., have minimum support and which of them are small. In each subsequent pass, this method starts with a seed set of itemsets found to be large in the previous pass. This seed set is used for generating new potentially large itemsets, called “candidate” itemsets, and the actual support for these candidate itemsets are counted during the pass over the data. At the end of the pass over the transactions, the candidate itemsets that are actually large are identified, and they become the seed for the next pass.
A fundamental premise of the Apriori method is that any subset of a large itemset must also be large. Therefore, candidate large itemsets can be generated by joining itemsets already found to be large, and eliminating those large candidate itemsets that contain a subset which has not been found to be large. This process continues, pass after pass over the data, until no new large itemsets are found. Association rules are constructed for itemsets which exceed the confidence threshold from the large itemsets uncovered.
One shortcoming of the Apriori method is that as the size of the database increases, the number of items searched increases, as does the number of association rules that are generated. In very large databases, the user is left a large amount of quantitative association information. However, in practice users are often interested in only a subset of associations, for instance, those containing items from a subset of items that have very different levels of importance. In the market basket example, some items like caviar or lobster are of much higher value than items such as candy. Association rules involving {lobster, caviar} will have less support than those involving candy, but are much more significant in terms of profits earned by the store. Under the Apriori method, the itemset {lobster, caviar} is of low support and will not be included in the association rules that are uncovered.
A more recent data mining technique that attempts to avoid some of the limitations of the Apriori method is that disclosed by H. Toivonen in the paper,
Sampling Large Databases for Association Rules,
H. Toivonen, Proceedings of the 22
nd
VLDB Conference, Bombay, India, 1996. Toivonen presents a database mining method which randomly picks a sample record from the database, uses it to determine the relationship or pattern on the assumption that it probably holds for the entire database, and then verifies the results with the rest of the database.
The method uses the random sample and makes a series of passes over the data to determine which items are frequently found. Each pass builds on the previous collection of frequently found items until the method finds a superset from the collection of frequently found subsets. This appr

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for generating weighted association rules does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for generating weighted association rules, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for generating weighted association rules will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2492273

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.