Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-04-20
2004-03-02
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S006130, C714S045000, C714S047300
Reexamination Certificate
active
06701449
ABSTRACT:
BACKGROUND OF THE DISCLOSURE
1. Field of the Invention
The invention relates to network appliances and, more particularly, the invention relates to a method and apparatus for monitoring and analyzing network appliance status information.
2. Description of the Background Art
Data processing and storage systems that are connected to a network to perform task specific operations are known as network appliances. Network appliances may include a general purpose computer that executes particular software to perform a specific network task, such as file server services, domain name services, data storage services, and the like. Because these network appliances have become important to the day-to-day operation of a network, the appliances are generally required to be fault-tolerant. Typically, fault tolerance is accomplished by using redundant appliances, such that, if one appliance becomes disabled, another appliance takes over its duties on the network. However, the process for transferring operations from one appliance to another leads to a loss of network information. For instance, if a pair of redundant data storage units are operating on a network and one unit fails, the second unit needs to immediately perform the duties of the failed unit. However, the delay in transitioning from one storage unit to another may cause a loss of some data. One factor in performing a rapid transition between appliances is to enable each redundant appliance to monitor the health of another redundant appliance. Monitoring is accomplished through a single link that informs another appliance of a catastrophic failure of a given appliance. Such notification causes another appliance to take over the network functions that were provided by the failed appliance. However, such a single link is prone to false failure notifications and limited diagnostic information transfer. For example, if the link between appliances is severed, the system may believe the appliance has failed when it has not.
Therefore, a need exists in the art for an improved method and apparatus for monitoring and analyzing status information of network appliances.
SUMMARY OF THE INVENTION
The disadvantages associated with the prior art are overcome by the present invention of a method and apparatus for performing fault-tolerant network computing using a “heartbeat” generation and monitoring technique. The apparatus comprises a pair of network appliances coupled to a network. The appliances interact with one another to detect a failure in one appliance and instantly transition operations from the failed appliance to a functional appliance. Each appliance monitors the status of another appliance using multiple, redundant communication channels.
In one embodiment of the invention, the apparatus comprises a pair of storage controller modules (SCM) that are coupled to a storage pool, i.e., one or more data storage arrays. The storage controller modules are coupled to a host network (or local area network (LAN)). The network comprises a plurality of client computers that are interconnected by the network. Each SCM comprises a status message generator and a status message monitor. The status message generators produce periodic status messages (referred to as heartbeat messages) on multiple communications channels. The status message monitors monitor all the communications channels and analyze any heartbeat messages to detect failed communications channels. Upon detecting a failed channel, the monitor executes a fault analyzer to determine the cause of a fault and a remedy.
REFERENCES:
patent: 4430710 (1984-02-01), Catiller et al.
patent: 4692918 (1987-09-01), Elliott et al.
patent: 4942579 (1990-07-01), Goodlander et al.
patent: 5357626 (1994-10-01), Johnson et al.
patent: 5430866 (1995-07-01), Lawrence et al.
patent: 5477544 (1995-12-01), Botelho
patent: 5530845 (1996-06-01), Hiatt et al.
patent: 5590276 (1996-12-01), Andrews
patent: 5592530 (1997-01-01), Brockman et al.
patent: 5617530 (1997-04-01), Stallmo et al.
patent: 5696895 (1997-12-01), Hemphill et al.
patent: 5757642 (1998-05-01), Jones
patent: 5764920 (1998-06-01), Cook et al.
patent: 5815649 (1998-09-01), Utter et al.
patent: 5918021 (1999-06-01), Aditya
patent: 5931916 (1999-08-01), Barker et al.
patent: 5938732 (1999-08-01), Lim et al.
patent: 5944838 (1999-08-01), Jantz
patent: 6073218 (2000-06-01), DeKoning et al.
patent: 6112249 (2000-08-01), Bader et al.
patent: 6119244 (2000-09-01), Schoenthal et al.
patent: 6233704 (2001-05-01), Scott et al.
patent: 6275953 (2001-08-01), Vahalia et al.
patent: 6308282 (2001-10-01), Huang et al.
patent: 6335932 (2002-01-01), Kadambi et al.
patent: 6341356 (2002-01-01), Johnson et al.
patent: 6363462 (2002-03-01), Bergsten
patent: 6393483 (2002-05-01), Latif et al.
patent: 0632379 (1995-01-01), None
patent: 0942554 (1999-09-01), None
patent: WO 99/17201 (1999-04-01), None
“Dual Active Redundant Controllers: The Highroad to Performance and Availability”. Rod DeKoning and Scott Hubbard, RAB, Computer Technology Review, No. 3, Mar. 1995.
Davis Daniel A.
Hai Xing
Beausoliel Robert
Ciprico, Inc.
Fredrikson & Byron PA
Puente Emerson
LandOfFree
Method and apparatus for monitoring and analyzing network... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for monitoring and analyzing network..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for monitoring and analyzing network... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3248049