Data mining for association rules and sequential patterns...

Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000

Reexamination Certificate

active

06553359

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to data mining technology. More particularly, it relates to the area of mining for association rules and/or sequential patterns within data assets.
2. Description and Disadvantages of Prior Art
Over the past two decades there has been a huge increase in the amount of data being stored in databases as well as the number of database applications in business and the scientific domain. This explosion in the amount of electronically stored data was accelerated by the success of the relational model for storing data and the development and maturing of data retrieval and manipulation technologies. While technology for storing the data developed fast to keep up with the demand, little stress was paid to developing software for analyzing the data until recently when companies realized that hidden within these masses of data was a resource that was being ignored. The huge amounts of stored data contains knowledge about a number of aspects of their business waiting to be harnessed and used for more effective business decision support. Database Management Systems used to manage these data sets at present only allow the user to access information explicitly present in the databases i.e. the data. The data stored in the database is only a small part of the ‘iceberg of information’ available from it. Contained implicitly within this data is knowledge about a number of aspects of their business waiting to be harnessed and used for more effective business decision support. This extraction of knowledge from large data sets is called Data Mining or Knowledge Discovery in databases and is defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. The obvious benefits of Data Mining has resulted in a lot of resources being directed towards its development.
Data mining involves the development of tools that analyze large databases to extract useful information from them. As an application of data mining, customer purchasing patterns may be derived from a large customer transaction database by analyzing its transaction records. Such purchasing habits can provide invaluable marketing information. For example, retailers can create more effective store displays and more effective control inventory than otherwise would be possible if they know consumer purchase patterns. As a further example, catalog companies can conduct more effective mass mailings if they know that, given that a consumer has purchased a first item, the same consumer can be expected, with some degree of probability, to purchase a particular second item within a particular time period after the first purchase.
Data mining uses several techniques to find pieces of knowledge in large amounts of data. Two of these techniques are the so-called mining for association rules and the mining for sequential patterns.
Identifying association rules from a large database of transactions is an essential part of data mining. An association rule is an expression of the form X→Y, where X and Y are sets of items. In the retail domain, the data to be mined typically consist of transactions, where each transaction is characterized by a set of items. For example, the database may contain customers' sale transactions on shoes and jackets. A possible association rule may be of the form “30 percent of transactions that contain jackets also contain shoes; 10 percent of all transactions contain both shoes and jackets”. The 30 percent value is referred to as the confidence of the rule, while the 10 percent value is the support of the rule. The task of mining association rules involves finding all the association rules from the transactions that satisfy certain user-specified minimum support and confidence constraints.
Conceptually, the problem may be viewed as finding the association rules from a relational table of records. Each record may represent a transaction, as in the case of a retail transaction database, or other data items in the database. Each record has one or more attributes where each attribute corresponds to an item of the transaction.
Another essential part of data mining relates to identification of sequential pattern. This involves rules that are based on temporal data. Suppose we have a database of natural disasters. From such a database if we conclude that whenever there was an earthquake in Los Angeles, the next day Mt. Kilimanjaro erupted, such a rule would be a sequence rule. Such rules are useful for making predictions which could be useful in making market gains or for taking preventive action against natural disasters. The factor that differentiates sequence rules from other rules is the temporal factor.
Other applications of data mining include catalog design, add-on sales, store layout, and customer segmentation based on buying patterns and many more. Typically the databases involved in these applications are very large. It is imperative, therefore, to have fast algorithms for this task.
Although several methods of mining for association rules and mining for sequential patterns have been proposed, only methods derived from the so-called APRIORI approach (see R. Agrawal, S. Rikant, Fast Algorithms for Mining Association Rules, in Proceedings of the 20th VLDB Conference, 1994) have been proven to be efficient enough to process large data volumes.
The APRIORI approach depends on a special format of the data called transaction format. In case of associations the transaction format conceptually consists of only two columns, namely a “transaction identifier” and an “item identifier”. In case of sequential patterns conceptually it consists of three columns, namely a “transaction group identifier”, a “transaction identifie”, and an “item identifier”. A much more serious drawback of the APRIORI approach according the current state of the art is that it requires that all of the “item identifiers” relate to the same item type. As a result the APRIORI approach is only capable of deriving association rules or sequences between items of the same type. If for instance the item identifier relates to a certain product bought by a certain customer the APRIORI technique would be capable of deriving only rules of the form: if a customer buys PRODUCT
1
then he also will buy PRODUCT
2
with the probability of X%. The APRIORI approach would not be able include in its generated rules items of other types, like for instance the gender, the age, the profession, the place of residence or other aspects of the customers. It can be expected that once a multitude of different item types can be included in the process of derivation of rules the importance of the derived rules can be significantly increased as they would be much more selective in nature.
OBJECTIVE OF THE INVENTION
The invention is based on the objective to provide a computerized method for data mining for association rules and or sequential patterns of a multitude of records, wherein the multitude of records comprise transaction-items of different item-types.
SUMMARY AND ADVANTAGES OF THE INVENTION
The objectives of the invention are solved by the independent claims. Further advantageous arrangements and embodiments of the invention are set forth in the respective subclaims.
The invention relates to a computerized method for data mining for association rules and/or sequential patterns of a multitude of records. The invention is applicable to records comprising a transaction-identification and at least one transaction-item with a corresponding item-type wherein said multitude of records comprise transaction-items of different item-types. The proposed method further comprises a preprocessing-step for transforming each record into one or more transaction-records of transaction-format. According to said transaction format for each transaction-item in said record a transaction-record is generated and said transaction-record comprises at least the transaction-identification of said record and an encoded transaction-item encoding said transaction-item and i

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Data mining for association rules and sequential patterns... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Data mining for association rules and sequential patterns..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Data mining for association rules and sequential patterns... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3112882

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.