Electrical computers and digital processing systems: multicomput – Computer network managing – Computer network monitoring
Reexamination Certificate
2000-05-23
2004-07-20
Alam, Hosain (Department: 2155)
Electrical computers and digital processing systems: multicomput
Computer network managing
Computer network monitoring
C709S223000, C600S300000, C600S301000, C367S039000
Reexamination Certificate
active
06766368
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to distributed event management in telecommunication and data networks, and more particularly to the use of knowledge-based and distributed systems technologies for performing event correlation and notification for network fault, performance and test management.
DESCRIPTION OF THE RELATED ART
Since the first computer network came “online” there have been network problems, disorders and anomalies that periodically occur in the network hardware, software, or both. They are sometimes spurious, transient, redundant, time correlated, or too numerous to be handled at the same time. Given the size and dynamic nature of modern telecommunication and data networks, it is no wonder that the task of identifying network problems continues to baffle software engineers the world over. Exacerbating the problem is the reality that a single fault may sometimes result from a hardware problem and other times from a software problem. With the explosive growth in the size and complexity of networks, it is also not uncommon for a burst of alarms during a major network failure to reach 100, 200 and more alarms per second. Under these conditions, systems personnel of all experience levels confront an inability to follow the stream of incoming events, often leading to alarms being noticed too late, or not at all. When the alarms are eventually noticed, all too often corrective measures are determined based on a single alarm or on incomplete subset of the active alarms, potentially complicating the already onerous situation.
Such delays can be costly in large networks, which are heavily relied upon to quickly move vast amounts of data in short periods of time to carry out the normal course of business. For example, large financial institutions rely upon such systems to reflect the transfer of large sums of money electronically. Loss of that ability even for a relatively short period of time may be very costly to the institution and its clients. Similarly, airlines rely upon such systems to track passenger reservations. Loss of that ability can result in flight delays or cancellations and loss of customers.
In an effort to assist network management personnel in resolving these problems, a variety of network management systems to monitor network operations have been developed. These systems were generally capable of performing network surveillance and monitoring functions, and in some cases they were able to diagnose simple network faults.
As the size and complexity of networks grew, it became clear that the traditional network management systems could no longer simply report problems, and instead required intelligent analysis and diagnostic capabilities in order to be effective. Such a system must monitor network events, associate related events with each other, infer possible root causes of events, determine the impact of events on network traffic, present the current state of the network, and recommend appropriate actions. In other words, the network management systems must exhibit some level of intelligence in analyzing the incoming events, understanding the surrounding management context, testing connectivity between network elements, identifying patterns in the stream of events, and suggesting corrective actions. The systems should be able to explain their actions, learn from their past behavior, and present the results in a form easily comprehendible by the network management personnel. To a very large extent, many of the functions listed above are based on a fundamental capability of real-time event correlation. Formally, event correlation is a conceptual interpretation procedure that assigns new meaning to a set of events. Algorithmically, event correlation is a dynamic pattern matching process over a stream of events. These events may include: raw events, status and clear messages from network elements (NEs); events from mediation devices, subnetwork management systems, test systems, environmental sensors and other equipment; user action messages from network operator terminals; and system interrupts. In addition to the real-time events, the correlation patterns may include network topology information (e.g. network connectivity), diagnostic test data, data from external databases, and other ancillary information. Event correlation enables several event management tasks, including: (1) reducing information load by dynamic focus monitoring and context-sensitive event suppression and filtering; (2) increasing the semantic content of information through generalization of events; (3) fusion of information from multiple sources; (4) real-time fault detection, causal fault diagnosis, and suggestion of corrective actions; (5) ramification analysis of events and prediction of system behavior; and (6) long-term trending of historic events.
Real-time event correlation has been used for well over a decade with applications in various fields, not the least of which is network management. Today, event correlation has become one of the most critical functions for managing the high volume of event messages. Practically speaking, no network management system can effectively conduct network surveillance and control procedures without some form of event correlation. In fact, event correlation has become so instrumental in identifying obscure network problems that network management software developers have begun to broaden the utility of event correlation to other aspects of network management, such as performance configuration, testing, security, and service quality management.
An event, in the context of event correlation reflects a change in the state of an object, system or process. System internal events, e.g. failures, may be manifested by associated external events—alarms. However, in very many cases internal failures are not signaled by any alarms at all. The situation of an opposite phenomena arises with too many alarms, generated by cascaded network element failures caused by a single root failure. In this situation appropriate alarm correlation and filtering methods should be applied in order to detect the root cause of the “alarm storm”. Event correlation is the process of observing a series of events that occur over a period of time and then interpreting the events. The act of interpreting the events ranges from a simple task of event compression to a complex pattern-matching operation.
A more detailed discussion of the specific classes of event correlation will now be provided with reference to FIG.
1
. As shown in
FIG. 1
, the classes of event correlation include: compression, filtering, suppression, count, escalation, generalization, specialization, temporal relation, and clustering. Event compression is the task of reducing multiple occurrences of identical events into a single representation of the events. No number of occurrences of the event is taken into account. The meaning of the compression correlation is almost identical to the single event “a,” except that additional contextual information is assigned to the event to indicate that this event happened more than once.
Event filtering provides that if parameter, p(a) (e.g., priority, type, etc.) of alarm “a” does not fall into the set of predefined values H then alarm a is discarded or sent into a log file. In more sophisticated cases, the value of H could be dynamic and depend on a user-specified criteria or a criteria calculated by the system.
Event suppression is a context-sensitive process in which event “a” is temporarily inhibited depending on the dynamic operational context C of the network. The context C is determined by the presence of other event(s), network management resources, management priorities, or other external requirements. A change in C could later lead to the future reporting of the suppressed event. Temporary suppression of multiple events and the control of the order of their exhibition are two techniques for dynamic focus monitoring of the network management process.
Count is the process of counting and thresholding the number of repeated arrivals of identical
Jakobson Gabriel
Pathak Girish
Alam Hosain
Finnegan, Henderson, Farabow
Suchyta, Esq. Leonard C.
Verizon Laboratories Inc.
Wang Liang-che
LandOfFree
System and method for providing an internet-based... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for providing an internet-based..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for providing an internet-based... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3209927