Distributed fault management architecture

Multiplex communications – Fault recovery

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C370S242000

Reexamination Certificate

active

06665262

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Technical Field of the Invention
The present invention relates in general to the communications field and, in particular, to a system and method for distributing control of the fault management functions throughout a communications network.
2. Description of Related Art
The tasks of managing and controlling the performance of distributed communications networks (e.g., distributed data networks or distributed telecommunications networks) are becoming increasingly complex due to a number of crucial factors, such as, for example, the increased complexity, dynamism and diversity of the network technologies, the spread of advanced services with very distinct requirements (e.g., live video, file transfers, etc.), and the heightened expectations of the users being served. Other crucial factors that impact network complexity are the progressive deregulation of the telecommunications industry, and the highly competitive market that has emerged as a result.
In order to survive in such an environment, a distributed communications network operator must manage the network so that its utilization is maximized (i.e., ensure a maximum return on the investment), while ensuring that all offered services perform within expected bounds. In order to perform such tasks, the operator's personnel need certain support tools that help them to manage the tasks with their complexities. In particular, certain distributed, dynamically changing networks, such as, for example, the next generation Internet and so-called third generation mobile communication networks will require a level of operational support that is not provided by today's support systems.
Operation and Support Systems (OSS) typically function to perform routine support tasks in data communications and telecommunications systems, such as, for example, traffic measurements, network supervision and performance management, analyses, fault diagnoses, administrative tasks, etc. The current approach used for network performance and fault management in the OSS industry typically involves a number of applications residing on a software platform. The software platform usually supports separate applications for monitoring network performance information, managing alarm conditions, and handling of common functions in order to initiate management operations for network resources. Normally, these applications are not integrated to a great extent, other than that they share the same platform facilities. Consequently, it is the operator who has to correlate the performance and alarm information, and where necessary, decide what actions are appropriate to take with regard to improving network performance.
As such, most of the support systems involved are centralized in a single, monolithic management center, or in some cases, distributed or spread across a relatively small number of geographically distinct management centers. In some of the distributed system cases, the main reason for the distribution is the distributed nature of the responsibilities in the corporate organizations involved.
Currently, in a typical telecommunication system, the network element of the system gathers statistics about the traffic it is handling over a five or fifteen minute interval. The network element then makes this information available to the system as an output file, or stores it locally for later retrieval. As such, two of the original motives for structuring the telecommunication system performance measurement activities in this way were to minimize the sheer volume of information generated, and reduce the network element's processor load. Typically, the performance information is retrieved by a network element's management system, and stored in a database from which performance reports can be generated, either periodically or on demand. The result, however, is that network performance information is not available in real time.
Detailed fault information (e.g., alarms), for both hardware and software faults, is also gathered by the various network elements and is sent up to a centralized fault management node, which is responsible for alarm filtering and alarm correlation. The central fault management node is also used to suggest actions to correct or to otherwise reduce the effect of the faults in response to an alarm or to a combination of alarms. In some cases, more or less intricate knowledge-based systems are sometimes designed to aid the operator with fault diagnosis. Existing fault management systems, however, generally rely upon operator input and are incapable of automatically correcting the faults or of reconfiguring the managed system, if needed. Moreover, because fault information is not available in real time and because the fault management process relies upon operator input, fault management systems are generally unable to react to and handle faults in real time.
Data and telecommunication networks are becoming increasingly complex to manage in terms of their scale, the diversity of the networks and services they provide, and the resulting voluminous amount of information that must be handled by the fault management system. In order to address these complexities, certain semi-automated and automated fault management solutions will be needed to support a network operator's staff. Such support capabilities actually do not exist (to any significant extent) in the fault management solutions provided today.
Specifically, today's fault management systems effectively introduce an inherent latency or delay in the availability of alarms and other fault information. Consequently, these delays effectively limit the ability of network managers to respond to faults within their networks. Clearly, in operating dynamic telecommunication networks such as cellular networks, Internets, and broadband multi-media networks, these delays in identifying and resolving network faults are unacceptable. Furthermore, as the network fault management systems become increasingly automated, such delays in the delivery of fault information will become increasingly unacceptable. Instead, the fault detection intervals used should be dictated by the timing requirements of the problem domain, rather than by the solutions the network elements provide today.
In addition, today's telecommunication network management systems are deployed in a relatively small number of locations in the network. In other words, the fault management functions are centralized in a small number of network nodes. Although it might theoretically be possible to build real-time capabilities into a centralized management system, there are some problems that would exist in such a system. First, unacceptably large amounts of bandwidth is consumed by the alarm information that must be sent to the highest level of the fault management system. The large volume of alarm data that can be generated as the size of the communications system increases will also tend to cause the central processing of such data to become slow. Another problem with maintaining all of the fault management functions at a fully centralized operation and management (O&M) system is that the system lacks robustness; if the centralized O&M system breaks down, handling of fault management tasks will be suspended.
SUMMARY OF THE INVENTION
The present invention comprises a system and method for performing distributed fault management functions in a communications network. The communications network includes a plurality of nodes. In a cellular telecommunications network, for example, the nodes are usually arranged in a hierarchy and can comprise physical devices, such as base stations, radio network controllers, radio network managers, and the like. In accordance with the present invention, each node generally includes a fault agent and an associated configuration agent. In addition, each node can also be used to supervise one or more network resources, which can comprise logical resources (e.g., a cell border, a location area, etc.) and/or physical devices.
As faults are detected in the commun

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Distributed fault management architecture does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Distributed fault management architecture, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distributed fault management architecture will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3124425

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.