Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique
Reexamination Certificate
2000-06-02
2003-03-11
Follansbee, John A. (Department: 2121)
Data processing: artificial intelligence
Knowledge processing system
Knowledge representation and reasoning technique
Reexamination Certificate
active
06532456
ABSTRACT:
BACKGROUND
1. Technical Field
The present invention generally relates to data mining and, more particularly, to identifying partial periodic patterns in an event sequence. Imperfect patterns may be transformed into perfect patterns through random replacement of events.
2. Background Description
Periodic pattern discovery is an important problem in mining time series data. A periodic pattern is a list of ordered events which repeats itself in an event sequence. Periodic pattern discovery is useful in characterizing the cyclic behavior of time series data. However, in practice, not every portion in the time series data contributes to the repetitiveness of a given pattern(s) in the time series. For example, the stock of a particular company may gain a few points at the beginning of each trading session but may not have much regularity at other times. This type of looser repetitiveness is often referred to as “partial periodicy”.
Moreover, due to random noise, a pattern may not always be perfectly repeated. Thus, the event sequence can be viewed as a series of perfect pattern repetitions with a few “random replacements”. That is, the event sequence would become a series of perfect repetitions of some pattern after a few replacements (of some of the events in the sequence). If the amount of “replacement” is below some reasonable threshold, the pattern may be considered to be “exhibited” in the event sequence.
The mining of time series data is a newly developing research area. Most of the previous work on mining time series data includes creating a mapping to the association rule mining technique, i.e., each event corresponds to an item and every certain number of consecutive events are treated as a transaction. Metrics referred to as “support” and “confidence” are used to identify significant patterns from the remaining data. The support of a pattern is the number of occurrences of that pattern within a sequence of events. The confidence of a pattern is defined as the fraction specified by the number of occurrences of the pattern within the sequence of events over the length of the sequence. Most association rule mining methods favor frequently occurring events. However, patterns involving infrequent events may also be as significant as (or even more significant than) frequent events in an event sequence. This issue becomes more critical when different events occur at divergent frequencies. As examples, consider the following three illustrative applications.
The first application corresponds to the workload on a cluster of web servers. In the example, the cluster consists of 5 servers, with the workload on each server being measured by the following 4 ranks: low; relatively low; relatively high; and high. Thus, there are 4
5
(1024) different events, one for each possible combination of server states. Examination of the cluster states over time may show that the fluctuations of the server states comply with some periodic behavior. Obtaining such knowledge would be very beneficial for understanding the behavior of the cluster and for improving its performance.
The second application corresponds to earthquakes. In general, earthquakes are classified by their magnitude and type. Scientists may be interested in knowing whether there exists any inherent seismic period so that future earthquake predictions can be made. Note that patterns involving big earthquakes are much more valuable than patterns involving small earthquakes, even though big earthquakes occur at a much lower frequency than small earthquakes.
The third application is the stock market. Stock investors would like to understand the movement of the stock market so that they can take advantage of the movement to maximize their profits. In the example, the movement of the market on any given day can be categorized by the following descriptions: significant increase; moderate increase; unchanged; moderate decrease; and significant decrease. Even though movements such as significant increase and significant decrease occur less frequently than the other types of movements, the importance of capturing the former types of movements is readily apparent.
The above examples share a common characteristic; that is, different events may exhibit different rates of occurrence. Further, the above examples also show that patterns involving infrequent events are also of great importance. Thus, it would be desirable and highly advantageous to identify any strong tendency incurred by an event (or combination of events). Consider the following event sequence, a
1
a
1
a
1
a
1
a
1
a
2
a
1
a
1
a
2
a
1
a
3
a
2
a
3
a
1
a
3
a
3
a
1
a
3
a
1
a
1
. The event a
1
occurs 12 times while the event a
2
occurs only 3 times. Despite its low frequency, the occurrence of a
2
follows a very strong pattern. In particular, an occurrence of a
2
is followed by another occurrence of a
2
three positions later. This type of tendency or pattern in the event sequence may be very crucial in some application domains.
A closer look will now be taken at the following two patterns (a
1
, *, *) and (a
2
, *, *), wherein * represents an “eternal event”. An eternal event is a virtual event that matches any event. Thus, an eternal event can be used to represent the “don't care” position in a pattern. An eternal event may also be referred to as “non-distinct”, in which case an event such as a
2
may also be referred to as “distinct”. Even though the support of (a
1
, *, *) is greater than that of (a
2
, *, *) (due to the fact that al occurs much more frequently than a
2
), no strong tendency is preserved by the occurrence of a
1
. In the support/confidence model, in order to find pattern (a
2
, *, *), the support threshold has to be set lower than 3. In general, this would lead to the examination of a larger number of patterns which do not exhibit a strong tendency (e.g., (a
1
, *, *)). This indicates that support is not necessarily a proper measurement in evaluating periodic patterns. Thus, a new metric to measure the significance of a pattern needs to be designed to meet this goal. The support/confidence thresholds can be viewed as “absolute” bounds applied to every event irrespective of their occurrence frequencies.
Thus, it would be desirable and highly advantageous to have a new metric for use in data mining which overcomes the deficiencies of the metrics of the prior art. Moreover, it would be desirable and highly advantageous to have a method for identifying significant patterns in subsequences of an event sequence.
SUMMARY OF THE INVENTION
The problems stated above, as well as other related problems of the prior art, are solved by the present invention, a method for identifying partial periodic patterns in event sequences.
According to a first aspect of the invention, there is provided a method for mining partial periodic patterns in an event sequence, wherein each pattern includes a list of events from the event sequence. At least one subsequence of the event sequence and at least one pattern in the at least one subsequence are identified, such that an information gain of the at least one subsequence with respect to the at least one pattern exceeds a predefined threshold. At least one of the at least one pattern and the at least one subsequence is stored.
According to a second aspect of the invention, each of the events in a given pattern is one of distinct and non-distinct, wherein each non-distinct event matches any of the distinct events.
According to a third aspect of the invention, the method further includes the step of determining the information gain of the at least one subsequence with respect to the at least one pattern. The determining step includes the step of dividing the at least one subsequence into contiguous segments, wherein each contiguous segment is a same length as the at least one pattern. For each contiguous segment, the information gain of the contiguous segment is calculated, when all of the events of the contiguous segment match the corresponding events of the at least one pattern. The information loss of the contiguo
Wang Wei
Yang Jiong
Yu Philip Shi-lung
F. Chau & Associates LLP
Follansbee John A.
Hirl Joseph P.
LandOfFree
Methods for identifying partial periodic patterns of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods for identifying partial periodic patterns of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods for identifying partial periodic patterns of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3011576