Network fault alerting system and method

Electrical computers and digital processing systems: multicomput – Computer network managing – Computer network monitoring

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C709S202000, C709S223000, C714S039000, C714S043000, C714S047300

Reexamination Certificate

active

06813634

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention pertains to the arts of computer network management, and especially to the management of network bandwidth consumed by network management, status, and maintenance messages. More particularly, this invention relates to the arts of intelligent processing and diagnosis of network failures and problems based on fault analysis logic to more accurately detect and isolate computer network problems, to minimize the network bandwidth consumed by maintenance messages, and to effectively notify maintenance personnel of the most likely point of failure.
2. Description of the Related Art
Computer networks, such as local area networks (“LAN”), wide-area networks (“WAN”), intranets and the Internet typically include substantial maintenance and monitoring capabilities. Modern telephone networks, such as Signalling System 7 (“SS7), Integrated Services Data Network (“ISDN”), and many digital cellular networks including GSM, also include substantial equipment and software which are dedicated to the provisioning, monitoring and maintenance of the network and its equipment. All of the above named networks are packet-based networks, and are well-known within their respective arts.
Key to the architecture and operation of these networks are packet routers, which interconnect multiple physical networks and provide routing and forwarding of packets, or “messages”, from one network to another based upon addressing schemes defined by well-known protocols such as the Internet Protocol (“IP”) or LAPD for SS7 and ISDN. These addressing schemes can be generalized as schemes which define each data packet or message has having a header, payload, and tail. The destination address, origination address, packet sequence number, and payload size are typically included in the header section of the message. The payload section contains the actual computer data which is being transferred from one computer to another via the computer network, which may represent a portion of a computer file, a formatted message, or a section of digitized signal such as voice, video or other audio. The various message formats are defined by well-known standards promulgated by InterNIC, the International Telecommunications Union, Bellcore, and ANSI.
In order to manage these networks, including monitoring of network operation status, configuring and re-configuring network elements (routers, terminals and switches), and provisioning of new network sections, a number of well-known software and hardware products have been developed and placed on the market. Most of these products integrate specialized software onto network server platforms. The software uses the network connectivity and bandwidth provided by the network server platform to perform maintenance testing, messaging, status checking, and alert messaging. Many times, the actual network being used for “real” traffic, such as computer file transmission or telephone call transmission, is used for the maintenance communications as well. In this case, the maintenance messages “mix in” with the bandwidth of the “real” traffic. As such, if maintenance messages accumulate to significant bandwidth consumption, network performance may be adversely affected. In other cases, separate networks dedicated to maintenance may be configured to avoid this problem. But, even so, if maintenance messages exceed an expected bandwidth level, the dedicated maintenance network may fail.
When network management software like Netview/6000 or Hewlett-Packard's OpenView and others, detects a network device such as a router has gone off-line, it will send “node down” events or messages for all the workstations connected downstream from off-line router to network problem management server. The network problem management server provides correlation and processing for opening trouble tickets, and eventually, it send alerts to appropriate maintenance personnel thru pagers, e-mail, and/or telephone calls.
FIG. 1
shows the topology of prior art maintenance systems. A router (
1
) may have multiple ports to multiple networks. Each port is serviced by a network interface card (“NIC”), such as an Ethernet LAN interface card.
FIG. 1
shows an example of a router serving three networks, A, B, and C, each of which is a group of networked computer workstations or personal computers. For example, network A (
5
) has several “drops” to computers, and one drop or connection (
6
) to the router. Likewise, network B (
4
) is connected (
3
) to the router, and network C (
2
) is connected (
7
) to the router. Packets or messages received by the router are forwarded to other networks based on the addressing scheme of the network, such as IP in the case of many computer networks.
Also shown in
FIG. 1
is a connection (
8
) to a maintenance server (
9
) such as a NetView 6000 server. In this example, this connection (
8
) connects to the router (
1
) using the router's NIC for network D. The maintenance server (
9
) typically contains a connectivity database which contains all of the network addresses of all the elements on the other networks connected to the router, such as all the computers connected to networks A, B, and C. Using this database, the maintenance server (
8
) periodically sends status query messages, or “pings”, to each of the computers. If each computer is on-line, the router is functioning properly, and the network physical media (cable, RF links, etc.) is in tact, a reply will be received from each computer nearly immediately in response to the “ping”. If a reply or response is not received within a certain time from transmitting of the “ping”, the maintenance server (
9
) may assume a problem with the computer, router, or network(s) exists.
For example, if all computers and the router are functioning correctly except for one computer, then only one response will not be received, and all other responses will be received. However, if the router fails, no responses will be received from any of the computers. In the most basic of maintenance system configurations such as the basic NetView 6000 product, this scenario can result in a storm of events being sent to the problem management server which correlates events and opens trouble tickets, leading to many useless and/or redundant e-mails and pagers.
FIG. 2
illustrates this scenario. A normal “ping” (
20
) is forwarded from the NetView 6000 to the router, which forwards (
21
) it to the appropriate PC. The PC, if functioning properly, replies (
22
) via the router to the NetView 6000 (
23
) within a predetermined time limit t
1
. If the router has failed, the “ping” (
24
) will not be replied to by any of the computers within time t
1
, which will result in the NetView 6000 sending multiple “computer down” messages (
25
) to the problem management server. The problem management server is configured to wait a period of time t
3
before escalating the event to notification of the maintenance personnel, in order to reduce the number of alerts made for temporary problems such as power glitches, computer reboots, etc. But, if no “computer up” messages are received within time limit t
3
, the problem management server will send multiple pager messages and telephone calls, and may open multiple trouble tickets (
26
), as many as one per computer on the network. This results the in the alerting of the maintenance personnel, but is confusing to the personnel as to which element is actually failed, Additionally, the network link between the NetView 6000 server and the problem management server has suffered unnecessary bandwidth consumption by all of the “computer down” messages.
In an enhancement of the prior art network management technology, a product called Tivoli for Network Connectivity module (TFNC) by International Business Machines (“IBM”) employs similar concept, but it adds some intelligent processing to the maintenance server. With TFNC, all of the original “computer down” messages will be sent to the problem management server, but, as shown in
FIG. 3
, the Tivoli processing (
30
)

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Network fault alerting system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Network fault alerting system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Network fault alerting system and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3315829

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.