Systems and methods for pairwise analysis of event data

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Systems and methods for pairwise analysis of event data Systems and methods for pairwise analysis of event data

: 2001-10-12
: 2004-02-24
: Jung, David (Department: 2175)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: C707S793000, C707S793000, C707S793000
: Reexamination Certificate
: active
: 06697802
: ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to data processing techniques and, more particularly, to methods and apparatus for performing pairwise analysis of data.
BACKGROUND OF THE INVENTION
As it becomes feasible to collect large volumes of data, businesses are increasingly looking for ways to capitalize on this data, especially market data. Thus, such businesses turn toward data mining techniques. As is known, data mining seeks to discover interesting and previously unknown patterns or information from a large amount of historical data often stored in a database, e.g., market data. Specifically, one key aspect of data mining involves searching for significant patterns embedded in the data. A pattern generally refers to a set of items denoted as pat={i
1
, i
2
, i
3
, . . . , i
k
}, where i
k
is the k-th item. In accordance with the present invention, we focus on pairwise patterns, i.e., patterns with two items.
To discover item sets that are of interest, two issues need to be addressed and, very often, these two issues also need to be compromised. The first issue relates to defining what patterns we intend to discover, and determining the level of interest associated with a particular pattern. To this end, many measurements have been developed: (1) frequency of item sets; (2) association rules; (3) dependency test; (4) mutual dependence; and so on. One measurement or a combination of measurements is needed to describe significant patterns for a specific application. For example, a marketing application may target the general population in order to optimize a profile. Thus, in this case, the frequency of patterns is the key measurement, see, e.g., R. Agrawal et al., “Mining Association Rules Between Sets of Items in Large Databases,” Proceeding of VLDB, pp. 207-216, 1993, and also see, e.g., an extension of R. Agrawal et al. in H. Mannila et al., “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, Vol. 1, No. 3, 1997, the disclosures of which are incorporated by reference herein.
In system management or intrusion detection tasks, discovery of infrequent, but significant, patterns may be required. Such patterns have been described in U.S. patent applications identified as: Ser. No. 09/918,253, entitled “Systems and Methods for Discovering Mutual Dependence Patterns,” and filed on Jul. 30, 2001; and Ser. No. 09/929,883, entitled “Systems and Methods for Discovering Fully Dependent Patterns,” and filed on Aug. 15, 2001, the disclosures of which are incorporated by reference herein. In both applications, patterns of interest are ones that occur non-randomly. It is known that a statistical test can be used to test whether a pattern occurs randomly. As a result, it is desirable to use the statistical test together with other measurements.
The second issue associated with pattern discovery that needs to be addressed involves determining mechanisms for efficiently mining a large amount of data, wherein the data may not be fully loaded into main memory. In one extreme, each possible pattern may be exhaustively examined in order to find all patterns satisfying a defined qualification function. However, since there are a huge number of possible item sets (to be precise, O(n
k
), where n is the number of distinct items and k is the length of the item sets), such a brute-force approach is not reasonable, even for a moderate number of distinct items.
R. Agrawal et al. (in the above-referenced “Mining Association Rules Between Sets of Items in Large Databases” article) devised the so-called “Apriori” algorithm, and demonstrated that it is possible to discover all frequent item sets which are above a predefined minimum support (minsup). The Apriori algorithm searches item set space in a level-wise manner. At level k, the algorithm constructs candidate patterns with a length k based on qualified patterns in the previous level, and scans data to find all qualified patterns. This effectively prunes the search space. The reason that such an algorithm can be surprisingly effective in pruning the search space lies in the fact that the frequency (or support) has the downward closure property or, equivalently, the anti-monitone property. That is, if an item set is frequent above a threshold, then all its subsets are frequent above the threshold. Clearly, such an algorithm can be generalized to other qualification functions that are downward closed. Unfortunately, very few desired statistical qualification functions have such a downward closure property. For example, the confidence measure (e.g., predictive power) and the commonly-used dependency test do not satisfy this property.
So far, most previous work focused on discovering all patterns that satisfy a qualification condition that is downward closed. For example, frequent association patterns use the minsup as a qualification criterion. Other previous work defined mutual dependent patterns (m-patterns) and fully dependent patterns (d-patterns). An m-pattern requires that the mutual dependence of a pattern be above a threshold; while a d-pattern requires that all subsets of a pattern satisfy a dependency test.
However, the previous work has certain limitations. First, although a level-wise approach can significantly eliminate the pattern space, it is still computationally expensive to discover all patterns with any length, especially for long patterns. This is because all subsets of a pattern are qualified patterns. As a result, the algorithm needs to enumerate all the subsets. For example, a pattern with length 20 has more than 10
30
qualified subpatterns, requiring enormous storage space and unrealistic computational time for a level-wise algorithm to discover such a pattern.
Second, to apply an efficient algorithm, a qualification measurement needs to have the downward closure property. Consequently, arbitrary conditions that are best for an application can not be designed. Many valuable measurements, such as predictive power and statistical testing, can not be applied in the mining process. Unfortunately, this results in missing many important patterns and discovering many false patterns.
Lastly, and most importantly, pattern discovery is an interactive and iterative process in which patterns found should be validated by domain experts before used in production. However, in practice, it is very difficult for humans to make sense of a long pattern because of the exponentially possible number of relationships of items. For example, in a pattern with three items, i.e., {a, b, c}, there can be many types of relationships: three items may be cascaded (e.g., a causes b and b causes c); two of the three items may trigger the third one (e.g., a and b cause c); and one item may trigger the other two (e.g., a causes b and c). Clearly, each type has many possible permutations. As a result, how to interpret a pattern with three or more items becomes a very difficult task for a domain expert.
SUMMARY OF THE INVENTION
The present invention provides techniques for mining or discovering two-item or pairwise patterns. Preferably, the pairwise patterns are mined or discovered from a large amount of temporal historical events.
In a first aspect of the invention, a technique for mining or discovering one or more patterns in an input data set, wherein the input data set is characterized by attributes, comprises the following steps. First, the technique includes mapping attributes of the input data set to mapping values. Then, one or more candidate patterns are formed as groupings of two mapping values that occur within a predefined time period. Next, for each of the one or more candidate patterns, a qualification function is computed and a result of the qualification function is compared with at least one predefined threshold value. The one or more candidate patterns whose qualification function results are greater than or equal to the predefined threshold value are identified as one or more qualified patterns.
In a second aspect of the invention, the mapping step comprises m

Affiliated with

Grabarnik Genady

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hellerstein Joseph L.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Ma Sheng

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Perng Chang-Shing

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Jung David

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Ryan & Mason & Lewis, LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Zarick Gail H.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Systems and methods for pairwise analysis of event data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Systems and methods for pairwise analysis of event data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Systems and methods for pairwise analysis of event data will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3325107

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure