Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique
Reexamination Certificate
2000-06-02
2004-04-06
Patel, Ramesh (Department: 2121)
Data processing: artificial intelligence
Knowledge processing system
Knowledge representation and reasoning technique
C706S055000
Reexamination Certificate
active
06718317
ABSTRACT:
BACKGROUND
1. Technical Field
The present invention generally relates to data mining and, more particularly, to identifying partial periodic patterns and corresponding event subsequences in an event sequence.
2. Background Description
Periodic pattern discovery is an important problem in mining time series data. The discovery of sequential patterns in time series data was first described by Agrawal et al., in “Mining Sequential Patterns”, Proceedings of the International Conference on Data Engineering (ICDE), Taipei, Taiwan, pp. 3-14, March 1995. The input data in a mining process is a set of sequences, called data sequences. Each data sequence is a list of transactions. Typically, there is a transaction time associated with each transaction. A sequential pattern also consists of transactions, i.e., sets of events. In general, time series data mining requires that all sequential patterns with a user-specified minimum “support” be found. The support of a sequential pattern is the number of occurrences of the pattern within an event sequence.
According to the prior art, methods have been created for efficiently mining partial periodic patterns by exploring properties related to partial periodicity, such as the Apriori Property and the max-subpattern hit set property. Such methods are described by Han et. al., in “Efficient Mining Partial Periodic Patterns in Time Series Database”, Proc. Int. Conf. on Data Engineering, 106-115, 1999. However, these methods require a predefined period. In addition, these methods assume that the disturbance within a series of repetitions of a pattern, if any, would not result in the loss of synchronization of subsequent occurrences of the pattern with respect to the previous occurrences of the pattern. For example, “Joe Smith reads the newspaper every morning” is a periodic pattern. Even though Joe may occasionally not read the newspaper every morning, this disturbance will not affect the fact that Joe reads the newspaper during subsequent mornings. In other words, disturbance is allowed only in terms of “missing occurrences” but not in terms of the “insertion of random noise events”. However, this assumption is often too restrictive since it results in a failure to detect an interesting pattern, when some occurrences of the pattern are misaligned due to inserted noise events.
With respect to a more desirable approach to data mining, consider the following example, which is directed to the application of inventory replenishment. The history of inventory refill orders can be regarded as an event sequence. Assume that the time between two replenishments of cold medicine is generally one month. The refill order is filed at the beginning of each month before a major outbreak of flu which, in turn, causes an additional refill at the third week. Afterwards, even though the replenishment frequency is back to once each month, the refill time shifts to the 3rd week of each month (and not the beginning of each month, as was the case previously). Therefore, it would be desirable for the pattern to still be recognized when the disturbance is within some reasonable threshold.
In addition, the system behavior may change over time. Some patterns may not be present all of the time (but rather within some time interval). Therefore, it would be desirable to find periodic patterns that are significant within a subsequence of events, even if such patterns contain disturbances having a length up to a certain predefined threshold.
SUMMARY OF THE INVENTION
The present invention is directed to identifying partial periodic patterns and corresponding event subsequences in an event sequence. That is, the subsequence in which each pattern is significant is also identified.
According to a first aspect of the invention, there is provided a method for identifying partial periodic patterns in an event sequence, wherein pattern includes a list of events from the event sequence. At least one subsequence of the event sequence and at least pattern in the at least one subsequence is identified, such that the at least one pattern exceeds a minimum number of repetitions in the at least one subsequence, and a distance between successive repetitions of the at least one pattern in the at least one subsequence does not exceed a predefined distance threshold. At least one of the at least one pattern and the at least one subsequence is stored.
According to a second aspect of the invention, each of the events in a given pattern is one of distinct and non-distinct, and each non-distinct event matches any of the distinct events.
According to a third aspect of the invention, there is provided a method for identifying partial periodic patterns in an event sequence, wherein each pattern includes a list of events from the event sequence. A set CPP of candidate pattern periods and a plurality of sets CE of candidate events are identified, wherein each candidate pattern period in the set CPP corresponds to one of the plurality of sets CE of candidate events. At least one pattern comprised of the candidate events in one of the plurality of sets CE and having a same period as the corresponding candidate pattern period in the set CPP is identified. Further, at least one subsequence of the event sequence that comprises the at least one pattern is identified. The identification of the at least one pattern and the at least one subsequence is such that the at least one pattern exceeds a minimum number of repetitions in the at least one subsequence, and a distance between successive repetitions of the at least one pattern in the at least one subsequence does not exceed a predefined distance threshold. At least one of the at least one pattern and the at least one subsequence is stored.
According to a fourth aspect of the invention, the period of each of the candidate pattern periods in the set CPP is shorter than a pre-specified maximum period length.
According to a fifth aspect of the invention, the at least one subsequence is the longest event subsequence including the at least one pattern.
According to a sixth aspect of the invention, the step of identifying the set CPP of candidate pattern periods and the plurality of sets CE of candidate events includes the step of initializing a plurality of distance counters. Each distance counter is associated with one of a plurality of event and period combinations, each combination including one of the periods from the set CPP and one of the events from the corresponding set CE of candidate events. A value of each of the distance counters is updated to reflect a distance from the associated event to a previous occurrence of the associated event. The event and period combinations are identified such that the value of the distance counters associated with each of the identified combinations exceeds the minimum number of repetitions.
According to a seventh aspect of the invention, each of the periods in the plurality of event and period combinations is less than the pre-specified maximum period length.
According to an eighth aspect of the invention, the step of identifying the set CPP of candidate pattern periods and the plurality of sets CE of candidate events includes the step of identifying singular patterns and subsequences that comprise the singular patterns, wherein each singular pattern has only one position filled by one of the events in the event sequence. Complex patterns are identified and validated, wherein each complex pattern has more than one position filled by one of the events in the event sequence.
According to a ninth aspect of the invention, the method further comprises the step of validating singular patterns.
REFERENCES:
patent: 5259040 (1993-11-01), Hanna
patent: 5379366 (1995-01-01), Noyes
patent: 5517642 (1996-05-01), Bezek et al.
patent: 5579441 (1996-11-01), Bezek et al.
patent: 5594837 (1997-01-01), Noyes
patent: 5615309 (1997-03-01), Bezek et al.
patent: 5819266 (1998-10-01), Agrawal et al.
patent: 5832482 (1998-11-01), Yu et al.
patent: 5878406 (1999-03-01), Noyes
patent: 6185559 (2001-02-01), Brin et al.
patent: 2001/0043721 (2001-11-01
Wang Wei
Yang Jiong
Yu Philip Shi-lung
F. Chau & Associates LLP
Holmes Michael B.
Patel Ramesh
LandOfFree
Methods for identifying partial periodic patterns and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Methods for identifying partial periodic patterns and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Methods for identifying partial periodic patterns and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3197440