Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2002-05-10
2004-07-13
Channavajjala, Srirama (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C707S793000, C706S046000, C706S047000, C706S056000, C706S060000, C706S061000
Reexamination Certificate
active
06763354
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is generally related to systems and processes of analyzing transactional database information to mine data item association rules and, in particular, to a system and method of backlinking reinforcement analysis of transactional data to establish emergent weighted association rules.
2. Description of the Related Art
Data mining systems and tools are utilized to determine associative relationships within data as contained in typically large-scale information databases. Where the source information represents, for example, commercial transactions conducted with respect to discrete items, association relationships between different items can be determined by analysis with relative degrees of accuracy and confidence. These association relationships can then be utilized for various purposes including, in particular, predicting likely consumer behaviors with respect to the set of items covered by the transaction data. In practical terms, the presentation and substance of product designs, marketing campaigns and the like can then be tailored efficiently to reflect consumer interest and demand.
Conventionally, the relationships mined from transactional information databases are collected as association rules within a reference database, generally referred to as an expert database. Each association rule is qualified, relative to the items in the relation, with a weight representing the significance or strength of the association between the items. A collected set of association rules can then be used to provide solutions to various problems presented as query assertions against the expert database. In conventional implementation, a relational trace through the expert database, discriminating between various relationship branches based on the associated relative weightings, allows a query to be resolved to a most highly correlated solution set of related items. The query itself may be represented as an identified item, item set, or attributes that are associated with the items identified within the expert database.
Automated association mining techniques, as opposed to manual processes of knowledge engineering used to create expert databases, are preferred particularly where the volume of data to be evaluated is large and where the usefulness of the mined associations degrades rapidly over time. Conventional automated association mining analysis techniques, however, are subject to a variety of limitations. In particular, the automated techniques tend to identify associations exponentially with the number of items identified within the transaction data. The performance of queries against an expert database naturally degrades with increases in the database size. Furthermore, many of the association rules generated may be irrelevant to the defined or even likely queries that will be asserted against the expert database.
Another problem is that variations in the underlying transactional data may affect the relative quality of the potential associations. The analysis determined strength of the associations identified may be distorted by the number of times particular items are identified in the transactional data and by the distribution of the items within the larger set of transactions. Thus, the confidence in the determined strengths of the relationships identified by the automated analysis can vary significantly.
In conventional systems, association rules are generated through an algorithmic processing of a transaction data record set representing, for example, a series of commercial transactions. Depending on the nature of the source transactional data, item associations are initially identified based on the rate of occurrence of unique item pairings or, where a transaction involves multiple items, sets of items. The occurrence rate for a specific item set within the set of transaction data records is conventionally referred to as the item set support. As described in “Mining Association Rules between Sets of Items in Large Databases” by Agrawal, Imielinski and Swami,
Proc. of the
1993
ACM SigMod Conf. on Management of Data
, May 1993, pp. 207-216, a minimum support threshold can be established to discriminate out insignificant item sets. As described there, the threshold support value is empirically selected to represent a statistical significance determined from business reasons. In the example provided, the threshold minimum support value was set at 1%. Association rules having a support less than the threshold support value, representing associations of less than minimal significance, are discarded.
The Agrawal article also describes the use of syntactic constraints to reduce the size of the generated expert database. The items that are of interest for queries or, conversely, the items that are not of interest may be known in advance of rule generation. A corresponding constraint on the generation of association rules is implemented in the algorithmic examination of transaction data records with the result that only association rules of interest are generated and stored to the expert database.
Finally, the Agrawal article describes a technique for assessing the confidence of the strength of association rules. The technique presumes that, in discovering the solution set for a query, the relative validity of rule strengths in the solution paths can be normalized based on the relative representation of association rules within the transaction data set. The conventional calculation of confidence for a given association rule, as presented by Agrawal, is the fraction of source transaction data records that support the association rule. That is, the confidence C of an association rule X
I, where X is an item set identified within a transaction data set T and I is a single item not in X, is the ratio of the support of X
I divided by the support of X.
The confidence determined for an association rule is used in the Agrawal article can be used as a threshold value for qualifying generated association rules for inclusion in the expert database. Association rules with a confidence level exceeding some defined minimum value are, in effect, deemed minimally reliable. The determination of the threshold confidence level is again empirical, based generally on an evaluation of the statistical insignificance of the rules excluded.
The support and confidence values determined for the minimally relevant and reliable association rules are conventionally stored with the corresponding rules within the expert database. Subsequent evaluation of queries against the expert database can utilize these support and confidence values, in part, to determine the optimal solution sets. U.S. Pat. No. 6,272,478, issued to Obata et al., describes the generally similar application of assigned evaluation values for association rules. Specifically, cost and sales values are assigned as attributes to association rules to permit evaluation of additional criteria in determining an optimal set of association rules to use in reaching a solution set for an applied query. The evaluation of these additional criteria permit, for example, selection of solution sets that optimize profitability. Where multiple items are specified in the antecedent and consequent terms of an association rule, mathematical formulas corresponding to the included item sets are used in the evaluation of the association rule. While the evaluation values and formulas may be stored in an item dictionary provided with the expert database, the evaluation values and formulas are derived independent of the support and confidence values.
The generation of an expert database with associations having defined minimum relevancy and reliability enables broad query assertions to be adequately resolved to solution sets of at least equal minimum relevancy and reliability. Any progressive evaluation of the support and confidence values of association rules applied in determining a solution set can be used to raise and change the minimum relevancy and reliability of the solution set reached. Furthe
AgentArts, Inc.
Channavajjala Srirama
NewTechLaw
Rosenberg, Esq. Gerald B.
LandOfFree
Mining emergent weighted association rules utilizing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Mining emergent weighted association rules utilizing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Mining emergent weighted association rules utilizing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3251346