Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2001-01-12
2004-06-22
Beausoliel, Robert (Department: 2113)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S011000, C714S012000
Reexamination Certificate
active
06754845
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to a network whose processor nodes exchange information in an asynchronous fashion, and more particularly to a method for achieving agreement among the processors, even in the presence of undetected faulty processors. Thus, it is applicable in a wide range of distributed computation systems, reaching from fault-tolerant database systems to intrusion tolerant e-commerce.
BACKGROUND OF THE INVENTION
Fault-tolerant systems use computer programs called protocols to ensure that the systems will operate properly even if there are individual processor failures. A fault-tolerant consensus protocol enables each processor or party to propose an action (via a signal) that is required to be coordinated with all other processors in the system. A fault-tolerant consensus protocol has as its purpose the reaching of a “consensus” on a common action (e.g., turning a switch off or on) to be taken by all non-faulty processors and ultimately the system. Consensus protocols are necessary because processors may send signals to only a single other processor at a time and a processor failure can cause two processors to disagree on the signal sent by a third failed processor. In spite of these difficulties, a fault-tolerant consensus protocol ensures that all non-faulty processors agree on a common action and that this action is one proposed by a non-faulty processor.
To reach consensus, consensus protocols first enable each processor or participating network device to propose an action (via a signal) that is later to be coordinated by all the processors or participating network devices in the system. The system then goes through the steps of the consensus protocol. After completing the consensus protocol steps, the common action of the consensus is determined. For example, in a flight-control system, there may be several processors, each equipped with its own sensor, that perform a calculation determining whether the aircraft needs to be moved up or down. In marginal situations, some processors may propose that the craft move up while others propose that it move down it is important that all non-faulty processors reach consensus on the direction and therefore act in concert in moving the craft.
The problem of consensus in a distributed system in spite of the presence of arbitrary failures was introduced in the context of aircraft control applications in 1978. L. Lamport, M. Pease and R. Shostak later isolated the problem and introduced the name “Byzantine Agreement” within their article “The Byzantine Generals Problem”, ACM Trans. Programming, Languages, Systems, vol. 4, no. 3, pp. 382-401, July 1982.
The “Byzantine Agreement”, also referred to as t-resilient binary Byzantine Agreement where t is the number of tolerable or corrupted participants or adversaries, is specified in the following:
Let &pgr; be a protocol for n parties for which each party P
i
has a private input b
i
&egr;{0, 1 }* It is said that &pgr; is a t-resilient Byzantine Agreement protocol if the following holds for all t-adversaries and for all inputs:
Validity: If no party is corrupted and all parties start transaction TID with a same input value then all parties decide &rgr; for transaction TID.
Agreement: If one uncorrupted party outputs &rgr; for transaction TID, then no uncorrupted party decides and outputs something other than &rgr; for the same transaction.
Termination: For every transaction TID that has been started by all uncorrupted parties, all uncorrupted parties eventually decide.
M. J. Fischer, N. A. Lynch and M. S. Paterson showed in their article “Impossibility of distributed consensus with one faulty process”, Journal of the ACM, 32(2): 374-382, April 1985, that no deterministic protocol can solve Byzantine Agreement in a fully asynchronous environment in the presence of failures.
Various types of protocols, such as synchronous, asynchronous, hybrid randomized, or deterministic protocols have been proposed whereby a few of them are addressed in the following.
Several synchronous system models have been proposed. The best reaches the deterministic optimum with min {f+2, t +1}rounds, where t is the maximum number of corrupted parties the protocol tolerates and f the number of corruptions that really occur.
As synchrony is a strong assumption, several timing models have been introduced to make the synchrony assumption more realistic. Later protocols isolated the timing assumptions in ‘failure detectors’ to abstract the protocols from the network properties, but an implementation of these failure detectors still requires time-outs. Most failure-detectors work in the crash failure model only, as failure-detectors do not work well with Byzantine corruptions so far.
Concerning asynchronous protocols, the first randomized protocols to solve fully asynchronous Byzantine Agreement where designed by M. Ben-Or and independently by M. O. Rabin and disclosed in their articles “Another advantage of free choice: Completely asynchronous agreement protocol (Extended Abstract)”, in Proceedings of the Second Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, pp. 27-30, Montreal, Canada, 17-19 Aug. 1983 and “Randomized Byzantine generals”, In 24th Annual Symposium on Foundations of Computer Science, pp. 403-409, Tuscon, Ariz., 7-9 Nov. 1983, IEEE.
While Ben-Or's protocol tolerates
[
n
5
]
-
1
corrupted parties, whereby this is called
[
n
5
]
-
1
resilient, with exponential expected running time, Rabin tolerates
[
n
8
]
-
1
corrupted parties with constant expected running time, but requires one previously generated secret value per transaction. Therefore, this protocol needs a trusted dealer after a constant number of transactions that generates new secrets.
In 1984, G. Bracha introduced a protocol for asynchronous broadcast with the article “An asynchronous [(n−1)/3]-resilient consensus protocol”, in Proceedings of the Third Annual ACM Symposium on Principles of Distributed Computing, pp. 154-162, Vancouver, Canada, 27-29 Aug. 1984. This protocol has become an important primitive for later protocols. However, it requires 3n
2
messages for one single broadcast, therefore no protocol using this primitive reaches agreement with less than O(n
3
) messages. R. Canetti and T. Rabin developed the first protocol with a resilience of
[
n
3
]
-
1.
This has been published under the title “Fast asynchronous byzantine agreement with optimal resilience”, In STOC93, pp. 42-51, 1993. Although the number of messages is polynomially bounded, this protocol is impractical, mainly due to the high cost for creating a common coin.
U.S. Pat. No. 4,569,015 describes a method for achieving a multiple processor agreement optimized for no faults wherein an originating processor broadcasts a value in a message with its unforgeable signature to all n active processors, including itself Receiving processors in the network pass such a message on with their own unforgeable signatures to all active processors, including themselves. If the number of signatures and phases is the same at each processor after the first two successive passings, then agreement as to the value with no fault is indicated, otherwise if after two passings, t+1 signatures have been collected, then these are signed and sent in the third passing, and in any case, each processor continues the steps of repeatedly sending messages when received, and appending its signature until t+2 passings have occurred. At that time, a processor will agree to the value if at least t+1 signatures append the message, otherwise a default value is adopted, t (n/2) being a reliability measure.
U.S. Pat. No. 5,598,529 discloses a computer system resilient to a wide class of failures. It includes a consensus protocol, a broadcast protocol and a fault tolerant computer system created by using the two protocols together in combination. The protocols are subject to certain validity conditions. The system in the state of consensus is guaranteed to have all non-faulty processors
Kursawe Klaus
Shoup Victor
Beausoliel Robert
Cameron Douglas W.
Dougherty Anne V.
Wilson Yolanda L
LandOfFree
Method of achieving optimistic multiple processor agreement... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method of achieving optimistic multiple processor agreement..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method of achieving optimistic multiple processor agreement... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3326720