Method for dynamically switching fault tolerance schemes

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S013000, C709S241000

Reexamination Certificate

active

06745339

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates generally to fault tolerant distributed computing systems, and in particular, to a method for dynamically switching fault tolerance schemes in a distributed system based on wait times of user interface events.
Fault tolerance is a key technology in distributed systems for ensuring reliability of operations for user critical applications such as e-commerce, database transactions and B2B, etc. A distributed system is a group of computing devices interconnected with a communication network which function together to implement an application. Fault tolerance provides reliability of operation from the user's perspective by masking failures in critical system components. Known fault tolerant mechanisms for distributed systems can use different fault tolerance schemes, including different fault detection and recovery means, to handle various types of failures, such as device and network failures.
However, it is known that fault tolerance schemes may have different fault tolerance and performance trade-offs. In the context of interactive applications, fault tolerance schemes can have an adverse effect on the time that a user has to wait for a system response once the user interacts with the system, particularly in mobile computing environments. This delay can affect user perception of the performance of a system, which is significant because users are known to give up on applications if their requests are not met within certain time limits. Accordingly, it is desirable to limit detrimental trade-offs between fault tolerance and perceived system performance.
Furthermore, different applications may have different requirements for fault tolerance and performance. In addition, these requirements may change over the course of execution of the same application. It may be that no particular implementation of a fault tolerance mechanism will perform well for all applications. In this context, it is important to know when to switch fault tolerance schemes and which scheme to dynamically select.
Therefore, there is a need for a method of dynamically switching fault tolerance schemes that can improve the user perceived performance of a system while taking into account the desired level of fault tolerance.
SUMMARY
In one aspect of the invention, a method of dynamically switching among a plurality of fault tolerance schemes is provided. The fault tolerance schemes are associated with a fault tolerance mechanism that executes in a distributed system. The method comprises obtaining a wait time of at least one user interface event occurring in the distributed system. The wait time includes at least one of a communications time, a service time and a fault tolerance time. The method further comprises determining whether a mean of the wait time is greater than a predetermined mean wait time threshold. The method also comprises determining whether the communications time, the service time and the fault tolerance time are mutually independent when the mean of the wait time is greater than the predetermined mean wait time threshold. In addition, the method comprises determining whether the mean of the wait time can be improved by reducing a mean of the fault tolerance time when the communications time, the service time and the fault tolerance time are mutually independent. The method also comprises switching from a first fault tolerance scheme to a second fault tolerance scheme when the wait time can be improved by reducing the mean of the fault tolerance time.
In another aspect of the invention, a fault tolerant distributed system capable of dynamically switching among a plurality of fault tolerance schemes associated with a fault tolerance mechanism is provided. The system comprises a means for obtaining a wait time of at least one user interface event occurring in the distributed system. The wait time includes at least one of a communications time, a service time and a fault tolerance time. The system further comprises a means for determining whether a mean of the wait time is greater than a predetermined mean wait time threshold. The system also comprises a means for determining whether the communications time, the service time and the fault tolerance time are mutually independent when the mean of the wait time is greater than the predetermined mean wait time threshold. In addition, the system comprises a means for determining whether the mean of the wait time can be improved by reducing a mean of the fault tolerance time when the communications time, the service time and the fault tolerance time are mutually independent. The system also comprises a means for switching from a first fault tolerance scheme to a second fault tolerance scheme when the wait time can be improved by reducing the mean of the fault tolerance time.


REFERENCES:
patent: 5280607 (1994-01-01), Bruck et al.
patent: 5828847 (1998-10-01), Gehr et al.
patent: 5963540 (1999-10-01), Bhaskaran
patent: 6195680 (2001-02-01), Goldszmidt et al.
patent: 6618817 (2003-09-01), Armstrong
patent: 6674713 (2004-01-01), Berg et al.
Ben Schneiderman,Designing The User Interface: Strategies for Effective Human Computer Interaction, Addison Wesley Longman, 1998, pp. 71-80.
E. N. Elnozahy, D. B. Johnson and Y. M. Wang,A Survey of Rollback-Recovery Protocols in Message-Passing Systems, Technical Report CMU-CS-960181, Carnegie Mellon University, 1996. http://citeseer.nj.nec.com/elnozahy96survey.html.
Java Message Service, API for accessing enterprise messaging systems from Java Programs, Version 1.1, Sun Microsystems, Apr. 12, 2002. http://java.sun.com/Download5.
Duncan Mackenzie,Reliable Messaging with MSMQ and .NET, Microsoft Developer Network, 2002. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/bdadotnetasync2.asp.
IBM MQSeries Overview, IBM, 2002. http://www-3.ibm.com/software/ts/mqseries/library/whitepapers/mqover/.
A. S. Tanenbaum and M. V. Steen,Distributed Systems: Principles and Paradigms, Prentice Hall, 2002, pp. 28-31, 36-37, 100-105, 362-363, 376-377.
A. Sears, J. S. Jacko and M. S. Borella,Internet Delay Effects: How Users Perceive Quality, Organization, and Ease of Use of Information, 1997. http://www.acm.org/sigchi/chi97/proceedings/short-talk/als2.htm.
S. Michiels, F. Matthijs, D. Walravens and P. Verbaeten,DiPS: A Unifying Approach for Developing System Software, The 8thWorkshop on Hot Topics in Operating Systems (HotOS-VIII). May 2001.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for dynamically switching fault tolerance schemes does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for dynamically switching fault tolerance schemes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for dynamically switching fault tolerance schemes will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3356459

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.