Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-12-17
2003-02-25
Robinson, Greta (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06526405
ABSTRACT:
TECHNICAL FIELD
The invention relates generally to the field of data mining. More particularly, the invention relates to the field of data mining for sequential data that represents event sequences.
BACKGROUND OF THE INVENTION
Most data mining research has concentrated on set-oriented tabular data. There are, however, important types of data that do not fit within this framework. One such form of data is event sequences that occur in many application areas. An event sequence is an ordered collection of events from a finite set of event types, with each event of the sequence having an occurrence time.
One example of an event sequence is the event or error log from a process such as telecommunications network management. Here the event types are the possible error messages, and the events are actual occurrences of errors at certain times. Also a web access log from a single session of a user can be viewed as an event sequence. In this case, the event types are the web pages, and an individual event is a request for a particular page at a particular time. Other examples of application areas in which event sequences occur are user interface design (event types are different user commands), criminology (types of crime), biostatistics (different symptoms), etc. In each of these applications, the data consists of one or several event sequences. Note that an event sequence is different from a time series in that a time series describes a variable with a continuous value over time, whereas an event sequence consists of discrete events happening in more or less random fashion.
One interesting aspect of event sequences is the degree of similarity between two event types. The definition of similarity may vary depending on the application. For example, if the application is web browsing behavior, two web pages may be considered similar if they convey the same type of information. If the example is computer user interface design, two commands may be similar if they accomplish the same function. Such similarity information is useful in itself, as it provides insight into the data. The similarity of web pages may be used to determine where links may be appropriate. The similarity of computer user commands may be used to select alternative commands to be suggested to the computer user. Moreover, similarities between event types can be used in various ways to make querying the data set more useful.
Because of the increasing difficulty and expense of determining similarity between event types in large sets of event sequences, it is desirable to provide a method of determining similarity between event types by merely scanning and interpreting the data in the database. Such a method would do away with the need to compare the content of event types to determine their similarity.
SUMMARY OF THE INVENTION
Determining a degree of relative similarity between event types in event sequences by examining the context of the event types facilitates similarity analysis of large sets of event sequences.
An event sequence may be defined as an ordered collection of events from a finite set of event types, with each event of the sequence having an occurrence time. An event may be defined as a tuple containing two elements, an event type and an occurrence time.
The present invention determines the similarity between a plurality of event types in at least one event sequence. The invention may be implemented in the form of program modules or computer-executable instructions stored on a computer readable medium or in the form of an apparatus having means to process data stored on a database.
In one exemplary embodiment of the invention, an event sequence is examined, and a context for each of a plurality of event types is compiled. These contexts are then compared to determine a degree of similarity between the corresponding event types.
A context set is compiled by recording the events which occur within a predetermined time period of each occurrence of the event type. The context set may include event types which occur during the time period before each occurrence of the event type or event types both before and after the event type.
To compare the contexts, the context sets are transformed into vector representations and then the distance between the vectors is calculated. The context sets may be transformed into vectors having the same number of dimensions as the finite set of event types has elements, with each dimension corresponding to an event type and the value of each dimension being based on the number of times the corresponding event type occurs in the context set. The context sets may be transformed into centroid vectors.
Conventional distance calculations may be employed for determining the distance between vectors, such as the Euclidean distance formula.
REFERENCES:
patent: 5664174 (1997-09-01), Agrawal et al.
patent: 5666442 (1997-09-01), Wheeler
patent: 5724573 (1998-03-01), Agrawal et al.
patent: 5742811 (1998-04-01), Agrawal et al.
patent: 5819266 (1998-10-01), Agrawal et al.
patent: 5933821 (1999-08-01), Matsumoto et al.
patent: 5940825 (1999-08-01), Castelli et al.
patent: 6023571 (2000-02-01), Matsumoto et al.
patent: 6092065 (2000-07-01), Floratos et al.
patent: 6108666 (2000-08-01), Floratos et al.
patent: 6122628 (2000-09-01), Castelli et al.
patent: 6134541 (2000-10-01), Castelli et al.
patent: 6278998 (2001-08-01), Ozden et al.
patent: 6289354 (2001-09-01), Aggarwal et al.
R. Agrawal, C. Faloutsos, and A. Swami. Efficiency similarity search in sequence databases. InProceedings of the 4th International Conference on Foundations of Data Organization and Algorithms(FODO '93), pp. 69-84, Chicago, Illinois, USA, Oct. 1993. Springer-Verlag.
R. Agrawal, K-I. Lin, H.S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. InProceedings of the 21st International Conference on Very Large Databases(VLDB '95), pp. 490-501, Zurich, Switzerland, Sep. 1995. Morgan Kaufmann.
D.Q. Goldin and P.C. Kanellakis. On similarity queries for time-series data: Constraint specification and implementation. InProceedings of the 1st International Conference on Principles and Practice of Constraint Programming(CP '95), Cassis, France, Sep. 1995, Springer-Verlag.
P. Laird. Identifying and using patterns in sequential data. InAlgorithmic Learning Theory, 4th International Workshop, pp. 1-18, Berlin, 1993. Springer-Verlag.
H. Mannila and P. Ronkainen. Similarity of event sequences. InProceedings of the Fourth International Workshop on Temporal Representation and Reasoning(Time '97), pp. 136-139, Daytona, Florida, USA, May 1997, IEEE Computer Society Press.
D. Rafiei and A. Mendelzon. Similarity-based queries for time series data. SigmodRecord(ACM Special Interest Group on Management of Data), 26(2):13-25, May 1997.
P. Ronkainen, Attribute similarity and event sequence similarity in data mining. PhLic thesis, Report C-1998-42, University of Helsinki, Department of Computer Science, Helsinki, Finland, Oct. 1998.
Mannila Heikki O.
Moen Pirjo
Le Debbie M.
Microsoft Corporation
Robinson Greta
Watts, Hoffmann, Fisher & Heinke Co. L.P.A.
LandOfFree
Determining similarity between event types in sequences does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Determining similarity between event types in sequences, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Determining similarity between event types in sequences will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3176738