System and method for systematic construction of correlation...

Data processing: artificial intelligence – Knowledge processing system – Knowledge representation and reasoning technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06697791

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to network and systems management and, more particularly, to techniques for generating correlation rules for use in detecting and resolving availability and performance problems.
BACKGROUND OF THE INVENTION
With the dramatic decline in the price of hardware and software, the cost of ownership for computing devices is increasingly dominated by network and systems management. Included here are tasks such as establishing configurations, help desk support, distributing software, and ensuring the availability and performance of vital services. The latter is particularly important since inaccessible and/or slow services decrease revenues and degrade productivity.
The first step in managing availability and performance is event management. Almost all computing devices have a capability whereby the onset of an exceptional condition results in the generation of a message so that potential problems are detected before they lead to widespread service degradation. Such exceptional conditions are referred to as “events.” Examples of events include: unreachable destinations, excessive central processing unit (CPU) consumption, and duplicate Internet Protocol (IP) addresses. An event message contains multiple attributes, for example: (a) the source of the event; (b) type of event; and (c) the time at which the event was generated.
Event messages are sent to an “event management system (EMS).” An EMS has an “adaptor” that parses the event message and translates it into a normalized form. This normalized information is then placed into an “event database.” Next, the normalized event is fed into a “correlation engine” that determines actions to be taken. This determination is typically driven by correlation rules that are kept in a “rule database.” Examples of processing done by correlation rules includes:
1. Elimination of duplicate messages. “Duplicate” is interpreted broadly here. For example, if multiple hosts on the same local area network generate a destination-unreachable message for the same destination, then the events contain the same information.
2. Maintenance of operational state. “State” may be as simple as which devices are up (e.g., operating) and which are down (e.g., not operating). It may be more complex as well, especially for devices that have many intermediate states or special kinds of error conditions (e.g., printers).
3. Problem detection. A problem is present if one or more components of the system are not functioning properly. For example, the controller in a load balancing system may fail in a way so that new requests are always routed to the same back-end web server, a situation that can be tolerated at low loads but can lead to service degradation at a high load. Providing early detection of such situations is important in order to ensure that problems do not lead to widespread service disruptions.
4. Problem isolation. This involves determining the components that are causing the problem. For example, distributing a new release of an application that has software errors can result in problems for all end-users connecting to servers with the updated application. Other examples of causes of problems include: device failure, exceeding some internal limit (e.g., buffer capacity), and excessive resource demands.
The correlation engine provides automation that is essential for delivering cost effective management of complex computing environments. Existing art provides three kinds of correlation. The first employs operational policies expressed as rules, see, e.g., K. R. Milliken et al., “YES/MVS and the Automation of Operations for Large Computer Complexes,” IBM Systems Journal, vol. 25, no. 2, 1986. Rules are if-then statements in which the if-part tests the values of attributes of individual events, and the then-part specifies actions to take. An example of such a rule is: “If a hub generates an excessive number of interface-down events, then check if the software loaded on the hub is compatible with its hardware release.” The industry experience has been that such rules are difficult to construct, especially if they include installation-specific information.
Another approach has been developed by SMARTS (Systems Management Arts) based on the concept of a code book that matches a repertoire of known problems with event sequences observed during operation. This is described in U.S. Pat. No. 5,661,668 issued to Yemini et al. on Aug. 26, 1997 and entitled “Apparatus and Method for Analyzing and Correlating Events in a System Using a Causality Matrix.” Here, operational policies are models of problems and symptoms. Thus, accommodating new problems requires properly modeling their symptoms and incorporating their signatures into a code book. In theory, this approach can accommodate installation-specific problems. However, doing so in practice is difficult because of the high level of sophistication required to encode installation-specific knowledge into rules.
Recently, a third approach to event correlation has been proposed by Computer Associates International called “Neugents.” This approach trains a neural network to predict future occurrences of events based on factors characterizing their occurrence in historical data. Typically, events are specified based on thresholds, such as CPU utilization exceeding 90%. The policy execution system uses the neural network to determine the likelihood of one of the previously specified events occurring at some time in the future. While this technique can provide advanced knowledge of the occurrence of an event, it still requires specifying the events themselves. At a minimum, such a specification requires detailing the following:
1. The variable measured (e.g., CPU utilization);
2. The directional change considered (e.g., too large); and
3. The threshold value (e.g., 90%).
The last item can be obtained automatically from examining representative historical data. Further, graphical user interfaces can provide a mechanism to input the information in items (2) and (3). However, it is often very difficult for installations to choose which variables should be measured and the directional change that constitutes an exceptional situation.
To summarize, the above-described existing art for event management systems is of three types. The first type (e.g., as in the K. R. Milliken et al. article, 1986) requires that correlation rules be specified by experts, a process that is time-consuming and expensive. The second type (e.g., as in the Yemini et al. patent) reduces the involvement of experts but only for aspects of event management that share broad commonalties (e.g., IP connectivity). The third type (e.g., Computer Associates International's Neugent software, 1999) attempts to automate the construction of correlation rules for a broader range of management areas. However, to date, this has not been done in a manner that provides for customization by experts, especially in a way that avoids dealing with low-level details (e.g., specific threshold values, the choice of measurement values, and directional changes of interest for these variables).
Other work relating to the construction of correlation rules includes: (a) statistical process control, which provides for a way to set baseline levels of continuously operating machines, e.g., D. M. Thompson et al., “Examination of the Potential Role of the Internet in Distributed SPC and Quality Systems,” Quality and Reliability Engineering International, vol. 16, no. 1, 2000; (b) visual programming for rule-base systems, which overcomes some of the syntactic problems of rule construction, e.g., W. Mueller et al., “A Visual Framework for the Scripting of Parallel Agents,” IEEE International Symposium on Visual Languages,” Seattle, Wash., September 2000; and (c) event management design, which provides a process driven by human experts to construct correlation rules, e.g., D. Thoenen et al., “Event Relationship Networks: A Framework for Action Oriented Analysis in Event Management,” IBM Research Report RC 21843, October 2000.
SUMM

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for systematic construction of correlation... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for systematic construction of correlation..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for systematic construction of correlation... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3316916

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.