Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1998-08-21
2001-02-13
Amsbury, Wayne (Department: 2771)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06189005
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to data processing, and more particularly to “computer database mining” in which association rules which characterize a relationship between significant transactions, that are recorded in a database, are identified. In particular, the invention concerns the identification (i.e., mining) of itemsets or rules in a large database of transactions that are selected not because they are frequent, but because they show surprising patterns along time.
2. Description of the Related Art
Customer purchasing habits can provide invaluable marketing information for a wide variety of applications. For example, retailers can create more effective store displays and more effectively control inventory that otherwise would be possible if they know that, given a consumer's purchase of a first set of items (a first itemset), the same consumer can be expected, with some degree of likelihood of occurrence, to purchase a particular second set of items (a second itemset) along with the first set of items. In other words, it is helpful from a marketing standpoint to know the association between the first itemset and the second itemset (the association rule) in a transaction. For example, it would be desirable for a retailer of automotive parts and supplies to be aware of an association rule expressing the fact that 90% of the consumers who purchase automobile batteries and battery cables (the first itemset) also purchase post brushes and battery post cleansers (referred to as the “consequent” in the terminology of the invention).
Advertisers too may benefit from a thorough knowledge of such consumer purchasing tendencies since they may change their advertising based upon the information mined from the database. In addition, catalog companies may be able to conduct more effective mass mailings if they know the tendencies of consumers to purchase particular sets of items with other set of items. It is understood, however, that although this discussion focuses on the marketing applications of the invention, database mining and, hence, the principles of the invention, are useful in many other areas, such as business or science, for example.
Until recently, building large detailed databases that could chronicle thousands or even millions of consumer transactions was impractical. In addition, the derivation of useful information from these large databases (i.e., mining the databases) was highly impractical due to the large amounts of data in the database which required enormous amount of computer processing time to analyze. Consequently, in the past, marketing and advertising strategies have been based upon anecdotal evidence of purchasing habits, if any at all, and thus have been susceptible to inefficiencies in consumer targeting that have been difficult if not impossible to overcome.
Modem technology, such as larger, faster storage systems and faster microprocessors, have permitted the building of large databases of consumer transactions. In addition, the bar-code reader may almost instantaneously read so called basket data (i.e., when a particular item from a particular lot was purchased by a consumer, how many items the consumer purchased, and so on) so that the basket data may be stored. In addition, when the purchase is made with, for example, a credit card, the identity of the purchaser is also known and may be recorded along with the basket data.
However, building a transactions database is only part of the challenge. Another important part of the marketing is mining the database for useful information, such as the association rules. The database mining, however, becomes problematic as the size of the database expands into the gigabyte or terabyte size.
Not surprisingly, many methods have been developed for mining these large databases. The problem of mining association rules from large databases was first introduced in 1993 at the ACM SIGMOD Conference of Management of Data in a paper entitled, “Mining Association Rules Between Sets of Items in a Large Database” by Rakesh Agrawal, Tomasz Imielinski and Arun Swami. In general, the input, from which association rules are mined, consists of a set of transactions where each transaction contains a set of literals (i.e., items). An example of an association rule is that 30% of the transactions that contain beer and potato chips also contain diapers and that 2% of all transactions contains all of these items. In this example, 30% is the confidence of the association rule and 2% is the support of the rule. The typical problem is to find all of the association rules that satisfy user-specified minimum support and confidence constraints. As described above, this mining of association rules may be useful, for example, to such applications as market basket analysis, cross-marketing, catalog design, loss-leader analysis, fraud detection, health insurance, medical research and telecommunications diagnosis.
To better understand the context of the invention, a brief overview of typical association rules and their derivation is now provided. First let I={1
1
, 1
2
, . . . 1
m
} be a set of literals called items. Let D be a set of transactions, where each transaction T is a set of items such that T
I. Therefore, a transaction T contains a set A of some items in I if A
T. An association rule is an implication of the form A
B, where A⊂I, B⊂I, and A∩B=Ø. The rule A
B holds true in the transaction set D with a confidence “c” if c % of transaction in D that contain A also contain B (i.e., the confidence in the conditional probability p(B|A)). The rule A
B has support “s” in the transaction set D if s % of the transactions in D contain A∪B (i.e., the support is the probability of the intersection of the events). Given a set of transactions D, the computational task of a typical mining system is to generate all associations rules that have a support value and a confidence value greater than a user-specified minimum support value and minimum confidence value. Because a user of the system must estimate and select an appropriate confidence level and an appropriate support level, the typical association rule mining system may generate too few or too many association rules depending on the confidence and support values that are chosen. Therefore, the effectiveness of the typical association rule mining system is substantially dependent upon the user selecting the appropriate values for the confidence and support variables.
key limitation of existing itemset and rule mining systems are that they look for prevalent patterns, not surprising ones. These broad regularities (i.e., prevalent patterns) are already known to people who understand the particular domain (i.e., a domain expert). What these experts not want from large data warehouses are surprising, novel and non-trivial patterns which can indicate unexpected phenomena. Thus, there is a broad consensus that the success of a data mining system will depend critically on the ability to go beyond obvious patterns and find novel and useful patterns. Otherwise, as described above, the results of data mining will often be a large number of association rules which lack novelty, making it overwhelming difficult and unrewarding for the analyst to sift through them. In addition, a domain expert who is already familiar with the application domain is very unlikely to be satisfied with merely the prevalent patterns, because (1) presumably the company he works for is already exploiting them to the extent possible (because they already know about them) and (2) the competition also already knows about these patterns. Therefore, for data mining to be useful, it is desirable to discover the surprising patterns.
There are some systems and methods for discovering sequential patterns in transactions. These sequential pattern discovery systems discover a sequence of itemsets that are disjoint and a sequential pattern is characterized as interesting as long as its support/confidence, as described above, taken o
Chakrabarti Soumen
Dom Byron Edward
Sarawagi Sunita
Amsbury Wayne
International Business Machines - Corporation
Mizrahi Diane D.
Tran, Esq. Khanh Q.
LandOfFree
System and method for mining surprising temporal patterns does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for mining surprising temporal patterns, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for mining surprising temporal patterns will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2581931