Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2002-12-16
2004-12-07
Bonzo, Bryce P. (Department: 2114)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
Reexamination Certificate
active
06829720
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to computer systems.
2. Related Art
Computer storage systems are used to record and retrieve data. It is desirable for the services and data provided by the storage system to be available for service to the greatest degree possible. Accordingly, some computer storage systems provide a plurality of file servers, with the property that when a first file server fails, a second file server is available to provide the services and the data otherwise provided by the first. The second file server provides these services and data by takeover of resources otherwise managed by the first file server.
One problem in the known art is that when two file servers each provide backup for the other, it is important that each of the two file servers is able to reliably detect failure of the other and to smoothly handle any required takeover operations. It would be advantageous for this to occur without either of the two file servers interfering with proper operation of the other. This problem is particularly acute in systems when one or both file servers recover from a service interruption.
Accordingly, it would be advantageous to provide a storage system and a method for operating a storage system, that provides for relatively rapid and reliable takeover among a plurality of independent file servers. This advantage is achieved in an embodiment of the invention in which each file server (a) maintains redundant communication paths to the others, (b) maintains its own state in persistent memory at least some of which is accessible to the others, and (c) regularly confirms the state of the other file servers.
SUMMARY OF THE INVENTION
The invention provides a storage system and a method for operating a storage system, that provides for relatively rapid and reliable takeover among a plurality of independent file servers. Each file server maintains a reliable (such as redundant) communication path to the others, preventing any single point of failure in communication among file servers. Each file server maintains its own state in reliable (such as persistent) memory at least some of which is accessible to the others, providing a method for confirming that its own state information is up to date, and for reconstructing proper state information if not. Each file server regularly confirms the state of the other file servers, and attempts takeover operations only when the other file servers are clearly unable to provide their share of services.
In a preferred embodiment, each file server sequences messages on the redundant communication paths, so as to allow other file servers to combine the redundant communication paths into a single ordered stream of messages. Each file server maintains its own state in its persistent memory and compares that state with the ordered stream of messages, so as to determine whether other file servers have progressed beyond the file server's own last known state. Each file server uses the shared resources (such as magnetic disks) themselves as part of the redundant communication paths, so as to prevent mutual attempts at takeover of resources when each file server believes the other to have failed.
In a preferred embodiment, each file server provides a status report to the others when recovering from an error, so as to prevent the possibility of multiple file servers each repeatedly failing and attempting to seize the resources of the others.
REFERENCES:
patent: 4456957 (1984-06-01), Schieltz
patent: 4710868 (1987-12-01), Cocke et al.
patent: 4719569 (1988-01-01), Ludemann et al.
patent: 4814971 (1989-03-01), Thatte
patent: 4937763 (1990-06-01), Mott
patent: 5049873 (1991-09-01), Robins et al.
patent: 5067099 (1991-11-01), McCown et al.
patent: 5088081 (1992-02-01), Farr
patent: 5155835 (1992-10-01), Belsan
patent: 5163131 (1992-11-01), Row et al.
patent: 5222217 (1993-06-01), Blount et al.
patent: 5257391 (1993-10-01), DuLac et al.
patent: 5274799 (1993-12-01), Brant et al.
patent: 5278838 (1994-01-01), Ng et al.
patent: 5305326 (1994-04-01), Solomon et al.
patent: 5341381 (1994-08-01), Fuller
patent: 5355453 (1994-10-01), Row et al.
patent: 5357509 (1994-10-01), Ohizumi
patent: 5357612 (1994-10-01), Alaiwan
patent: 5379417 (1995-01-01), Lui et al.
patent: 5390187 (1995-02-01), Stallmo
patent: 5398253 (1995-03-01), Gordon
patent: 5452444 (1995-09-01), Solomon et al.
patent: 5454095 (1995-09-01), Kraemer et al.
patent: 5497422 (1996-03-01), Tysen et al.
patent: 5504883 (1996-04-01), Coverston et al.
patent: 5537567 (1996-07-01), Galbraith et al.
patent: 5566297 (1996-10-01), Devarakonda et al.
patent: 5572711 (1996-11-01), Hirsch et al.
patent: 5604862 (1997-02-01), Midgely et al.
patent: 5621663 (1997-04-01), Skagerling
patent: 5668943 (1997-09-01), Attanasio et al.
patent: 5675726 (1997-10-01), Hohenstein et al.
patent: 5678006 (1997-10-01), Valizadeh et al.
patent: 5721916 (1998-02-01), Pardikar
patent: 5729685 (1998-03-01), Chatwani et al.
patent: 5781716 (1998-07-01), Hemphill et al.
patent: 5819292 (1998-10-01), Hitz et al.
patent: 5819310 (1998-10-01), Vishlitzky et al.
patent: 5841997 (1998-11-01), Bleiweiss et al.
patent: 5856981 (1999-01-01), Voelker
patent: 5862312 (1999-01-01), Mann et al.
patent: 5948110 (1999-09-01), Hitz et al.
patent: 5950203 (1999-09-01), Stakuis et al.
patent: 5996086 (1999-11-01), Delaney et al.
patent: 6098155 (2000-08-01), Chong, Jr.
patent: 6101507 (2000-08-01), Cane et al.
patent: H1860 (2000-09-01), Asthana et al.
patent: 6119244 (2000-09-01), Schoenthal et al.
patent: 6134673 (2000-10-01), Chrabaszcz
patent: 6138126 (2000-10-01), Hitz et al.
patent: 6163853 (2000-12-01), Findlay et al.
patent: 6275953 (2001-08-01), Vahalia et al.
patent: 6279011 (2001-08-01), Muhlestein
patent: 6289356 (2001-09-01), Hitz et al.
patent: 6317844 (2001-11-01), Kleiman
patent: 6496942 (2002-12-01), Schoenthal et al.
patent: 2001/0039622 (2001-11-01), Hitz et al.
patent: 2001/0044807 (2001-11-01), Kleiman et al.
patent: 2002/0007470 (2002-01-01), Kleiman
patent: 2002/0049718 (2002-04-01), Kleiman et al.
patent: 0308056 (1988-08-01), None
patent: 0306244 (1989-03-01), None
patent: 0321723 (1989-06-01), None
patent: 0410630 (1991-01-01), None
patent: 0492808 (1992-07-01), None
patent: 0537098 (1993-04-01), None
patent: 0569313 (1993-11-01), None
patent: 0747829 (1996-12-01), None
patent: 0760503 (1997-03-01), None
patent: 1031928 (2000-08-01), None
patent: 1031928 (2000-08-01), None
patent: 05-197495 (1993-10-01), None
patent: 07-261947 (1996-07-01), None
patent: WO 89/03086 (1989-04-01), None
patent: WO 91/13404 (1991-09-01), None
patent: WO 94/29795 (1994-12-01), None
patent: WO 94/29796 (1994-12-01), None
patent: WO 98/38576 (1998-09-01), None
patent: WO 99/46680 (1999-09-01), None
patent: WO 00/07104 (2000-02-01), None
patent: WO 00/11553 (2000-03-01), None
patent: WO 01/14991 (2001-03-01), None
patent: WO 01/31446 (2001-05-01), None
patent: WO 01/43368 (2001-06-01), None
patent: WO 02/29572 (2002-04-01), None
IBM. “Parity preservation for redundant array of independent direct access storage device data loss minimization and repair.” IBM Technical Disclosure Bulletin, Mar. 1993, pp. 473-478, vol. 36, No. 03.
Kleiman. “Using NUMA interconnects for highly available filers.” IEEE Micro, Jan.-Feb. 1999, pp. 42-48.
Garcia-Molina et al. “Issues in disaster recovery.” 35thIEEE Computer Society International Conference, Feb. 26-Mar. 2, 1990, pp. 573-577.
Shashdot. “TUX 2: The filesystem that would be king.” Slashdot.com, Oct. 17, 2000.
Srinivasan et al. “Recoverable file system for microprocessor systems.” Microprocessors and Microsystems, May 1985, pp. 179-183, vol. 9, No. 4.
Kleiman Steven R.
Rowe Alan
Schoenthal Scott
Bonzo Bryce P.
Network Appliance Inc.
Swernofsky Law Group PC
LandOfFree
Coordinating persistent status information with multiple... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Coordinating persistent status information with multiple..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Coordinating persistent status information with multiple... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3319206