Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-04-17
2002-03-19
Sheikh, Ayaz (Department: 2155)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C709S249000, C709S239000, C714S014000, C714S057000
Reexamination Certificate
active
06360331
ABSTRACT:
FIELD OF THE INVENTION
The invention relates generally to computer network servers, and more particularly to computer servers arranged in a server cluster.
BACKGROUND OF THE INVENTION
A server cluster is a group of at least two independent servers connected by a network and managed as a single system. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server.
Other benefits include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Dynamic load balancing is also available. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications and the like.
Thus, the failover of an application from one server (i.e., machine) to another may be automatic in response to a software or hardware failure on the first machine, or alternatively may be manually initiated by an administrator. In any event, to failover an application in a manner that is transparent to the application and to the client requires that the application's execution environment be recreated on the other machine. This execution environment comprises distinct parts having different characteristics from one another, a first part of which is the application code. The application code changes very rarely, and thus an application's code environment may be replicated either by installing the application on all of the machines which may run in a cluster, or by installing the application on storage that is shared by all machines in the cluster. When an application needs to be restarted, the exact code is thus available to the cluster.
Another part of the execution environment is the application's data, which changes very regularly. The application's data environment is best preserved by having the application store all of its data files on a shared disk, a task that is ordinarily accomplished by inputting appropriate information via the application's user interface. When an application needs to be restarted, the exact data is thus available to the cluster.
A third part of the execution environment is the application configuration information, which changes occasionally. Applications that are “cluster-aware” (i.e., designed with the knowledge that they may be run in a clustering environment) store their application configuration information in a cluster registry maintained on a shared disk, thus ensuring reliable failover.
However, existing applications that are not cluster-aware (i.e., legacy applications) use their local machine registry to store their application configuration information. For example, Windows NT applications use the WIN32 Registry. As a result, this configuration data is not available to the rest of the cluster. At the same time, it is impractical (and likely very dangerous) to attempt to modify these legacy applications so as to use the cluster registry instead of their local registry. Moreover, it is not feasible to transparently redirect each of the local registries in the various machines to the cluster registry, and costly to replicate copies of each of the local registries to the various machines. Nevertheless, in order to ensure correct and transparent behavior after a failover, the application configuration information needs to be recreated at the machine on which the application is being restarted.
SUMMARY OF THE INVENTION
The present invention provides a method and system for transparently failing over resource configuration information stored by a resource (such as an application) on a local machine. More particularly, the application configuration information written to a registry of a local machine is made available to other machines of the cluster. The other machines can rapidly obtain this application configuration information and use it to recreate the application's execution environment on another machine in the cluster, ensuring a rapid and transparent failover operation.
Briefly, the present invention transparently fails over a legacy application by tracking and checkpointing changes to application configuration information that is stored locally, such as in a system's local registry. When an application running on the first system makes a change to the application configuration information in a subtree of the registry, the change is detected by a notification mechanism. A snapshot mechanism is notified, takes a snapshot of the subtree's data, and causes it to be written to a storage device shared by systems of the cluster. When the application is failed over to a second system, the snapshot for that application is retrieved from the quorum disk by a restore mechanism and written to the registry of the second system in a corresponding subtree. The application is then run on the second system using the restored application configuration information for that application.
Other benefits and advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which:
REFERENCES:
patent: 4736393 (1988-04-01), Grimes et al.
patent: 5021949 (1991-06-01), Morten et al.
patent: 5027269 (1991-06-01), Grant et al.
patent: 5117352 (1992-05-01), Falek
patent: 5128885 (1992-07-01), Janis et al.
patent: 5165018 (1992-11-01), Simor
patent: 5301337 (1994-04-01), Wells et al.
patent: 5341372 (1994-08-01), Kirkham
patent: 5398329 (1995-03-01), Hirata et al.
patent: 5416777 (1995-05-01), Kirkham
patent: 5423037 (1995-06-01), Hvasshovd
patent: 5434865 (1995-07-01), Kirkham
patent: 5435003 (1995-07-01), Chng et al.
patent: 5490270 (1996-02-01), Devarakonda et al.
patent: 5491800 (1996-02-01), Goldsmith et al.
patent: 5537532 (1996-07-01), Chng et al.
patent: 5568491 (1996-10-01), Beal et al.
patent: 5666486 (1997-09-01), Alfieri et al.
patent: 5666538 (1997-09-01), DeNicola
patent: 5710727 (1998-01-01), Mitchell et al.
patent: 5715389 (1998-02-01), Komori et al.
patent: 5737601 (1998-04-01), Jain et al.
patent: 5745669 (1998-04-01), Hugard et al.
patent: 5754752 (1998-05-01), Sheh et al.
patent: 5754877 (1998-05-01), Hagersten et al.
patent: 5757642 (1998-05-01), Jones
patent: 5768523 (1998-06-01), Schmidt
patent: 5768524 (1998-06-01), Schmidt
patent: 5781737 (1998-07-01), Schmidt
patent: 5787247 (1998-07-01), Norin et al.
patent: 5794253 (1998-08-01), Norin et al.
patent: 5805839 (1998-09-01), Singhal
patent: 5806075 (1998-09-01), Jain et al.
patent: 5812779 (1998-09-01), Ciscon et al.
patent: 5815649 (1998-09-01), Utter et al.
patent: 5819019 (1998-10-01), Nelson
patent: 5822532 (1998-10-01), Ikeda
patent: 5832514 (1998-11-01), Norin et al.
patent: 5852724 (1998-12-01), Glenn, II et al.
patent: 5857073 (1999-01-01), Tsukamoto et al.
patent: 5867714 (1999-02-01), Todd et al.
patent: 5919247 (1999-07-01), Van Hoff et al.
patent: 5933422 (1999-08-01), Kusano et al.
patent: 5935230 (1999-08-01), Pinai et al.
patent: 5940870 (1999-08-01), Chi et al.
patent: 5946689 (1999-08-01), Yanaka et al.
patent: 5963960 (1999-10-01), Swart et al.
patent: 5968121 (1999-10-01), Logan et al.
patent: 5968140 (1999-10-01), Hall
patent: 5982747 (1999-11-01), Ramfelt et al.
patent: 5991771 (1999-11-01), Falls et al.
patent: 5991893 (1999-11-01), Snider
patent: 6003075 (1999-12-01), Arendt et al.
patent: 6044367 (2000-03-01), Wolff
patent: 6047323 (2000-04-01), Krause
patent: 6134673 (2000-10-01), Chrabaszcz
patent: 6173420 (2001-01-01), Sunkara et al.
patent: 6195760 (2001-02-01), Chung et al.
Chen et al., “Designing Mobile Computing Systems Using Distribu
Shrivastava Sunita
Vert John D.
Backer Firmin
Michalik & Wylie PLLC
Microsoft Corporation
Sheikh Ayaz
LandOfFree
Method and system for transparently failing over application... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for transparently failing over application..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for transparently failing over application... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2875778