Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
1999-07-19
2003-06-03
Jaroenchonwanit, Bunjob (Department: 2141)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C709S200000
Reexamination Certificate
active
06574744
ABSTRACT:
This invention relates to a method of determining a uniform global view of the system status of a distributed computer network comprising at least three computers. The invention further relates to a distributed computer network for carrying out the method.
INTRODUCTION
In distributed computer networks, changes in system status occasionally occur as a result of intended events (e.g., addition of a new computer) or unintended events (e.g., failure of a computer). On the occurrence of such a change, it must be ensured that the computers in the computer network get a uniform global view of the new system status as quickly as possible. The problem of how to bring about a uniform global view of the system status is frequently also referred to as a “membership problem”.
This membership problem is particularly important in distributed computer networks which are used to monitor and control processes critical with regard to safety, such as in railway signaling or in power plant technology. In such computer networks, the individual computers compare their results. Results are output to the process only if they were determined independently of each other by a majority of the computers. If, in a network of three computers, for example, one of the computers fails, the other two computers can continue to deliver results to the process. This requires, however, that these two computers have come to a uniform global view of the system status, i.e., there must be agreement upon which of the computers has failed and which of the computers are free from faults.
From a publication by L. E. Moser et al entitled “Membership Algorithms for Asynchronous Distributed Systems”, 11th Int. Conf. on Distributed Computing Systems, Arlington, Tex., USA, May 1991, pages 480-488, different algorithms for solving the membership problem in an uncoupled distributed computer network are known. These algorithms are based on a failure hypothesis according to which the computers send either no messages or correct messages. The case where a computer sends erroneous messages is not assumed. The algorithms described use messages whose transmission is repeated if a receiver has not received the message. In addition, there are messages whose transmission is not repeated in such a case. This latter group includes, for example, the request messages, by which a computer notifies the other computers that it wants to become a member again. Admission to such a request is granted by the other computers via specific grant messages. The algorithms described there are limited to uncoupled computer networks and cannot readily be applied to synchronous or virtually synchronous distributed networks.
OBJECT
It is therefore an object of the invention to provide a method of determining a uniform global view of the system status of a synchronous or virtually synchronous distributed computer network comprising at least three computers. Another object of the invention is to provide a distributed computer network for carrying out the method.
SUMMARY OF THE INVENTION
These objects are attained, according to the invention, by a system wherein communication among the computers is implemented in the form of transmission rounds. A transmission round is characterized in that in in such a round, each of the computers receives a message from each of the other computers in the absence of an error. Each of the computers evaluates the messages received from the other computers and, based on the result of the evaluation, assigns one of at least three differently defined computer states to each of the other computers. In this manner, each computer determines its own local view of the system status. The computers exchange these local views. Each computer then determines a global view of the system status from the received local views, for example by subjecting the local views to a majority decision. As all of the computers have the same local views, they all come to the same global view of the system status.
This method places no exacting requirements on the synchrony of the transmission. It only requires that within a period of time which need not be fixed but must be finite and limited, each of the computers has received a message from each of the other computers. The method can thus be used with communications protocols according to which computers may send only during permanently assigned time slots, but also with a few communications protocols where such a fixed assignment does not exist.
Furthermore, use of the method according to the invention requires no specific sequences of operations to start up the distributed computer network, whereby the complexity of the computer network is reduced significantly.
Further advantageous features of the invention, will be apparent from the description below and the appended claims.
REFERENCES:
patent: 3848116 (1974-11-01), Moder et al.
patent: 3921149 (1975-11-01), Kreis et al.
patent: 4375683 (1983-03-01), Wensley
patent: 4521745 (1985-06-01), Falconer
patent: 4583224 (1986-04-01), Ishii et al.
patent: 4754400 (1988-06-01), Wakahara et al.
patent: 4789795 (1988-12-01), Christiaan
patent: 4907232 (1990-03-01), Harper et al.
patent: 4967347 (1990-10-01), Smith et al.
patent: 5003533 (1991-03-01), Watanabe
patent: 5157667 (1992-10-01), Carusone et al.
patent: 5327553 (1994-07-01), Jewett et al.
patent: 5339404 (1994-08-01), Vandling, III
patent: 5423024 (1995-06-01), Cheung
patent: 5438680 (1995-08-01), Sullivan
patent: 5535217 (1996-07-01), Cheung et al.
patent: 5761439 (1998-06-01), Kar et al.
patent: 5864657 (1999-01-01), Stiffler
patent: 5903717 (1999-05-01), Wardrop
patent: 6035416 (2000-03-01), Abdelnour et al.
patent: 6134673 (2000-10-01), Chrabaszcz
patent: 6223304 (2001-04-01), Kling et al.
patent: 6295541 (2001-09-01), Bodnar et al.
patent: 6308223 (2001-10-01), Opgenoorth
patent: 6370571 (2002-04-01), Medin, Jr.
patent: 6442694 (2002-08-01), Bergman et al.
patent: EP 246218 (1987-11-01), None
patent: 197 45 963 (1999-04-01), None
patent: EP 0 674 601 (2000-01-01), None
patent: 0 246 218 (1987-11-01), None
patent: 258654 (1988-09-01), None
patent: 0 902 369 (1999-03-01), None
patent: 0 905 623 (1999-03-01), None
patent: 0 910 018 (1999-04-01), None
patent: 0 913 759 (1999-05-01), None
Hopkins, Jr. et al., “FTMP-A Highly Relicable Fault-Tolerant Multiprocessor for Aircraft”, IEEE, 1978, pp. 1221-1239.*
Guerraoui et al., “Total Order Multicast to Multiple Groups*”, IEEE, 1997, pp. 578-585.*
Lamport et al., “Synchronizing Clocks in the Presence of Faults”, Journal of the Association for Computing Machinary, vol. 32, No. 1, Jan. 1985 pp. 52-78.*
Weinstock, “SIFT: System Design and Implementation”, IEEE, 1996, pp. 29-30.*
Wensley, “Fault-Tolerant Computers Industrial-control system does things in threes for safety”, electronics, Jan.-1983, pp. 98-102.*
“Countdown for Space Shuttle Velocity, Altitude Regimes To Push Computer Limits”, Aviation Week&Space Technology, Apr. 1981, pp. 49-51.*
Cooper et al., “Development of On-board Space Computer Systems”, On-Board Computer Evaluiation, Jan. 1976, pp. 5-19.*
Sklaroff, “Redundancy Management technique for Space Shuttle computers”, IBM J. Res. Develop., pp. 20-28.*
Moser, et al., “Total Ordering Algorithms”, ACM, 1991, pp. 375-380.*
“Membership Algorithms for Asynchronous Distributed Systems”, 11thConf, on Distributed Computing Systems, Arlington, TX, USA, May 1991, pp. 480-488.
SIEWIOREK, Daniel P.: “Architecture of Fault-Tolerant Computers: An Historical Perspective” Proceedings of the IEEE, vol. 79, No. 12, Dec. 1991, pp. 1710-1734; vgl.insb, S. 1729, Space Shuttle.
Kantz Heinz
Metzner Peter
Scheck Oliver
Alcatel
Jaroenchonwanit Bunjob
LandOfFree
Method of determining a uniform global view of the system... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of determining a uniform global view of the system..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of determining a uniform global view of the system... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3106182