Method for mining association rules in data

Data processing: artificial intelligence – Knowledge processing system

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S006000, C706S045000

Reexamination Certificate

active

06185549

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to an electronic data processing method for mining an at least partially optimized association rule from a data set, and particularly for mining from the data set a plurality of separate (disjoint) such rules containing uninstantiated numeric attributes.
2. Discussion of the Related Art
Mining association rules on large data sets has received considerable attention in recent years. Association rules are useful for discovering previously unappreciated aspects of the data, or, in other terms, determining correlations between attributes of a relation. Association rules are said to be optimized when focusing on characteristics that are, in some respect, the most interesting for a particular application. Such association rules have been applied in marketing, financing, and retailing, for example.
Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that the support, confidence, or gain of the rule is maximized. Instantiated attributes are attributes represented by certain stated elements of the data points in the data set and may be either numerical, such as a time period, or categorical, such as a city name. Uninstantiated attributes are attributes represented by numerical variables derivable from stated numerical elements of the data points in the data set, such as time periods that are variable combinations of the stated time periods. Support, confidence, and gain will be defined hereinafter.
The volume of critical business data stored in databases is expected to grow considerably in the near future. Many organizations have become unable to collect valuable insights from their own data, partly because most of the information is stored implicitly in the large amount of data. As a result, there arises a demand for new tools to enable business organizations to gather such valuable insights.
Association rules provide a useful mechanism for discovering correlations among the underlying data. In its most general form, as association rule can be viewed as a combination or conjunction of conditions employing the data in the database and satisfying user-specified minimum support and minimum confidence constraints. A large number of different schemes for providing such association rules have been proposed. Association rules that are termed “optimized” are useful for unraveling ranges for numeric attributes where certain trends or correlations are strong. A typical limitation of such previously proposed optimized association rules is that only a single optimum interval for a single numeric attribute can be determined. Such a solution may miss some very significant or very interesting local trends in the data. For many applications, it is desirable to find a plurality of different (disjoint) optimum intervals in the data and to find them as efficiently as possible.
SUMMARY OF THE INVENTION
According to the invention, a method is provided for mining an at least partially optimized association rule from a data set for every point of which support, confidence, and gain are available, the at least partially optimized association rule containing at least one uninstantiated condition A, comprising the steps of providing the data set in a form usable by an electronic central processing unit; computing in the central processing unit via a dynamic programming algorithm a plurality k of separate regions of the uninstantiated condition A having optimal support, each region having at least a minimum confidence, each optimal separate region being optSet(i, j, l), where each interval (i, j) contains at most l non-overlapping intervals,
the dynamic programming algorithm being:
procedure optSup1D(i, j, l)
begin
1.  if computed (optSet(i, j, l)) = true
2.   return optSet(i, j, l)
3.  if conf((i, j)) ≧ minConf {
4.  optSet(i, j, l): = ((i, j))
5.  return optSet(i, j, l)
6.  }
7.  tmpSet := 0
8.  if i < j {
9.  if l = 1
10.
tmpSet := maxSupSet(optSup1D(i, j − 1, 1), optSup1D(j +
1, j, 1))
11. else
12.   for r:= i to j − 1 do
13.
tmpSet := masSupSet(tmpSet, optSup1D(i, r, 1) ∪
optSup1D(r + 1, j, l − 1))
14. }
15. optSet(i, j, l) := maxSupSet(optSet(i, j, l), tmpSet)
16. return optSet (i, j, l)
end,
where the algorithm is initially invoked with i=1, J=n, where n is the total number of values of the same type in the data set, k=1 (or other desired value of k), and recursion occurs on i and j for each value of k; and delivering from the electronic central processing unit to a utilization apparatus a signal representing each optSet (i, j, l) for the plurality k of separate regions.
According to subsidiary features of the invention, bucketing and divide-and-conquer techniques are employed to reduce the complexity of the calculations.
According to a further aspect of the invention, a plurality of uninstantiated conditions can be used, in which case the partial (or local) optimums will be approximate if k exceeds 3.
It is believed to be significant to the success of the invention, and, in particular, to the success of the dynamic programming algorithm, that, as has been discovered pursuant to the present invention, that, for one uninstantiated condition, the following conditions are always satisfied by the use of the dynamic programming algorithm:
If, for i < j, conf((i, j)) < minConf, then the optimized set optSet(i, j, l) is
&Circlesolid; l = 1: optSet(i, j-1, 1), if sup(optSet(i, j-1, 1) < sup(optSet(i + 1, j, 1).
  (optSet(i + 1, j, 1), otherwise.
&Circlesolid; l = 1: optSet(i, r, 1) ∪ optSet(r + 1, j, l-1), where i ≦ r < j is such that
  sup(optSet(i, r, 1) ∪ optSet(r + 1, j, l − 1)) is maximum.


REFERENCES:
patent: 5724573 (1998-03-01), Agrawal et al.
patent: 5809499 (1998-09-01), Wong et al.
patent: 5813003 (1998-09-01), Chen et al.
patent: 5842200 (1998-11-01), Agrawal et al.
patent: 5920855 (1999-07-01), Aggarwal et al.
patent: 5943667 (1999-08-01), Aggarwal et al.
patent: 5970482 (1999-10-01), Pham et al.
patent: 6012058 (2000-01-01), Fayyad et al.
patent: 6061682 (2000-05-01), Agrawal et al.
patent: 6092064 (2000-07-01), Aggarwal et al.
Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. In Proc. of the VLDB Conference, Santiago, Chile, Sep. 1994, 13 pages.
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, and Takesh Tokuyama. Mining optimized association rules for numeric attributes. InProc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Jun. 1996, pp. 182-191.
Brian Lent, Arun Swami, and Jennifer Widom. Clustering association rules. InInt'l Conference on Data Engineering, Birmingham, U.K., Apr. 1997, pp. 1-25.
Hannu Toivonen. Sampling large databases for association rules. InProc. of the 22nd VLDB Conference. Mumbai (Bombay), India, 1966, pp. 1-12.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for mining association rules in data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for mining association rules in data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for mining association rules in data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2571277

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.