Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1998-04-09
2001-02-06
Beausoliel, Jr., Robert W. (Department: 2785)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S015000, C709S229000, C709S226000, C709S221000
Reexamination Certificate
active
06185695
ABSTRACT:
BACKGROUND
1. Field of the Invention
The present invention relates generally to distributed object operating systems, and more particularly to a system and method that supports transparent failover from a primary server to a secondary server during accesses to a remote object.
2. Related Art
As computer networks are increasingly used to link computer systems together, distributed operating systems have been developed to control interactions between computer systems across a computer network. Some distributed operating systems allow client computer systems to access resources on server computer systems. For example, a client computer system may be able to access information contained in a database on a server computer system. When the server fails, it is desirable for the distributed operating system to automatically recover from this failure. Distributed computer systems with distributed operating systems possessing an ability to recover from such server failures are referred to as “highly available systems.” Data objects stored on such highly available systems are referred to as “highly available data objects.”
For a highly available system to function properly, the highly available system must be able to detect a server failure and to reconfigure itself so accesses to objects on the failed server are redirected to backup copies on other servers. This process of switching over to a backup copy on another server is referred to as a “failover.”
Existing client-server systems typically rely on the client application program to explicitly detect and recover from server failures. For example, a client application program typically includes code that explicitly specifies timeout and retry procedures. This additional code makes client application programming more complex and tedious. It also makes client application programs particularly hard to test and debug due to the difficulty in systematically reproducing the myriad of possible asynchronous interactions between client and server computing systems. Furthermore, each client application program must provide such failover code for every access to a highly available object from a server.
Therefore, what is needed is a distributed-object operating system that recovers from server failures in a manner transparent to client application programs. Such a distributed system will allow client application programs to be written without the burden of providing and testing failure detection and retry code.
SUMMARY
One embodiment of the present invention provides a method and an apparatus that facilitates transparent failovers from a primary copy of an object on a first server to a secondary copy of the object on a second server when the first server fails, or otherwise becomes unresponsive. The method includes detecting the failure of the first server; selecting the second server; and reconfiguring the second server to act as a new primary server for the object. Additionally, the method includes transparently retrying uncompleted invocations to the object to the second server, without explicit retry commands from a client application program. A variation on this embodiment further includes winding up active invocations to the object before reconfiguring the second server to act as the new primary server. This winding up process can include causing invocations to unresponsive nodes to unblock and complete. Another variation further includes blocking new invocations to the object after detecting the failure of the first server, and unblocking these new invocations after reconfiguring the second server to act as the new primary server. Hence, the present invention can greatly simplify programming of client application programs for highly available systems. It also makes it possible to use a client application program written for a nonhighly available system in a highly available system.
Still other embodiments of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein is shown and described only the embodiments for the invention by way of illustration of the best modes contemplated for carrying out the invention. As will be realized, the invention is capable of other and different embodiments and several of its details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
REFERENCES:
patent: 5157663 (1992-10-01), Major et al.
patent: 5566297 (1996-10-01), Devarakonda et al.
patent: 5666479 (1997-09-01), Kashimoto et al.
patent: 5668943 (1997-09-01), Attanasio et al.
patent: 5682534 (1997-10-01), Kapor et al.
patent: 5737514 (1998-04-01), Stiffler
patent: 5796934 (1998-08-01), Bhanot et al.
patent: 5819019 (1998-10-01), Nelson
patent: 5852724 (1998-12-01), Glenn, II et al.
patent: 5907673 (1999-05-01), Hirayama et al.
patent: 5958070 (1999-09-01), Stiffler
patent: 0 817 024 A2 (1997-12-01), None
Thomas Becker,“Transparent Service Reconfiguration After Node Failure,” Configurable Distributed Systems, pp. 212-223, 1992.
Chin et al, “Transparency in a Replicated Network File System,”EUROMICRO-96, Beyond 2000: Hardware and Software Desing Strategies; Proceedings of the 22nd EUROMICRO Conference, pp. 285-291, 1995.
Bernabeu-Auban Jose M.
Khalidi Yousef A.
Matena Vladimir
Murphy Declan J.
Talluri Madhusudhan
Beausoliel, Jr. Robert W.
Park & Vaughan
Revak Christopher
Sun Microsystems Inc.
LandOfFree
Method and apparatus for transparent server failover for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for transparent server failover for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for transparent server failover for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2607436