Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-10-28
2001-06-26
Iqbal, Nadeem (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C370S216000
Reexamination Certificate
active
06253339
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field of the Invention
This invention relates to software fault management systems and, more particularly, to a method of correlating multiple network alarms in a large communications network.
2. Description of Related Art
In communications networks, a single network fault may generate a large number of alarms over space and time. In large, complex networks, simultaneous network faults may occur, causing the network operator to be flooded with a high volume of alarms. The high volume of alarms greatly inhibits the ability to identify and locate the responsible network faults.
In the 1997 IEEE paper,
Fault Isolation and Event Correlation for Integrated Fault Management
, the authors, S. Katker and M. Paterok, describe a state-of-the-art algorithm for alarm correlation. The Katker and Paterok algorithm, however, has several disadvantages. First, the algorithm processes alarms very inefficiently. As noted above, a single fault may trigger a large number of network alarms. For example, one fibre cut can result in hundreds of thousands of alarms being reported from circuits supported by the fibre. The Katker and Paterok algorithm initiates a large number of computing threads, each of which ultimately results in the same conclusion. Thus, an excessive amount of time and computational resources are utilized. Additionally, the Katker and Paterok algorithm fails to correlate network element (NE) alarms that are caused by a faulty NE that does not itself generate an alarm.
In order to overcome the disadvantages of existing solutions, it would be advantageous to have a system and method of correlating large numbers of network alarms which greatly reduces the time and computational resources utilized, and supports near real-time alarm correlation. The present invention provides such a system and method.
SUMMARY OF THE INVENTION
In one aspect, the present invention is a system for correlating alarms from a plurality of network elements (NEs) in a large communications network. The system comprises a plurality of alarm reporters that report alarms from the NEs when faults are detected, and an alarm correlator that partitions the alarms into correlated alarm clusters such that alarms of one cluster have a high probability that they are caused by one network fault.
In another aspect, the present invention is a method of correlating alarms from the NEs in a large communications network. The method includes the steps of collecting a plurality of uncorrelated alarms from the NEs, and partitioning the alarms into correlated alarm clusters such that alarms of one cluster have a high probability that they are caused by one network fault. The step of partitioning the alarms into correlated alarm clusters may include the steps of creating alarm sets, expanding the alarm sets into alarm domains, and merging the alarm domains into alarm clusters if predefined conditions are met. The domains are merged into one alarm cluster if, and only if, the two domains have at least one common NE, at least one of the common NEs is not tagged, and the majority (as defined by the network operator) of the NEs contained by the non-tagged common NE are not in an alarmed state.
REFERENCES:
patent: 5309448 (1994-05-01), Bouloutas et al.
patent: 5473596 (1995-12-01), Garafola et al.
patent: 5495470 (1996-02-01), Tyburski et al.
patent: 5500853 (1996-03-01), Engdahl et al.
patent: 5646864 (1997-07-01), Whitney
patent: 5737319 (1998-04-01), Croslin et al.
patent: 5748098 (1998-05-01), Grace
patent: 5768501 (1998-06-01), Lewis
patent: 5946373 (1999-08-01), Harris
patent: 5949759 (1999-09-01), Cretegny et al.
patent: 6000045 (1999-12-01), Lewis
patent: 6012152 (2000-01-01), Douik et al.
patent: 6124790 (2000-09-01), Golov et al.
patent: 0549937A1 (1993-07-01), None
patent: 2318479 (1998-04-01), None
S. Kätker and M. Paterok; “Fault Isolation and Event Correlation for Integrated Fault Management”;Proc 5th IFIP/IEEE International Symposium on Integrated Network Management; 1997; pp. 583-595.
Gosselin Nicolas
Tse Edwin
Iqbal Nadeem
Smith ,Danamraj & Youst, P.C.
Telefonaktiebolaget LM Ericsson (publ)
LandOfFree
Alarm correlation in a large communications network does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Alarm correlation in a large communications network, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Alarm correlation in a large communications network will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2475961