Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2001-03-08
2004-03-09
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S011000, C714S012000
Reexamination Certificate
active
06704887
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to distributed computer systems, fault-tolerance, and security.
Replication and majority voting are the conventional methods for achieving fault-tolerance in distributed systems. The system consists of a set of redundant processors all working on the same task in parallel, then voting on the individual processors' results to pick one as the correct answer. This technique was first proposed, in the context of electronic computing, by John von Neumann about 1945 and has been in use for some time.
Early examples of this technique used centralized voting. Each processor sent its vote to a central counter, which analyzed the votes and determined a majority. There are several problems with centralized voting. First, the central counter represents a single point of failure for the system; if it fails, so does the entire system. Second, the system cannot be readily reconfigured—once it is set up for centralized voting, it is difficult to employ for other tasks.
For these reasons, another technique developed, distributed voting. In a distributed voting system, there is no central counter. The processors communicate among themselves to determine the majority vote. Thus there is no longer a single point of failure for the system. If one processor drops out, the others operate without it. Another attractive feature of distributed systems is dual-mode operation. When a task is critical, as in a space vehicle's launch phase, the processors operate in fault-tolerant mode. When fault tolerance is not required, as in the space vehicle's cruise phase, the processors cease to be redundant and execute in parallel different subtasks. Such a system has been used in the Space Shuttle's primary computer system since the 1970s (cf. A. Spector, D. Gifford,
The Space Shuttle Primary Computer System,
27 Communications of the ACM, No. 9 (September 1984)).
Prior-art protocols for distributed voting fall into two main types: two-phase commit and Byzantine. Several other protocols have recently been proposed for secure distributed voting. Each of the prior-art protocols has shortcomings that the present invention overcomes.
Two-Phase Commit Protocols
Software voting has had several embodiments in the development of fault-tolerant computing (see J. Wensley,
SIFT: The Design and Analysis of a Fault-Tolerant Computer for Aircraft Control,
66 Proceedings of the IEEE (October 1978), 1240-1255; G. Farber,
Taskspecific Implementation of Fault Tolerance In Process Automation Systems,
M. Dal Cin and E. Dilger, ed., Self-Diagnosis and Fault Tolerance, Werkhefte Nr. 4 Attempto (Tubingen, 1981); and E. Ammann, et al.,
ATTEMPTO: A Fault-Tolerant Multiprocessor Working Station: Design and Concepts,
IEEE Computer Society, Digest of Papers FTCS-13, p10-13, (1983)). More recently, distributed voting has been used for fault diagnosis in linear processor arrays (cf. V. Roda, T. Lin,
The Distributed Voting Strategy for Fault Diagnosis and Reconfiguration of Linear Processor Arrays,
34 Microelectronics Reliability (No. 6, June 1994), 955-967) where, in the absence of a centralized voter, the array elements share error flags that stem from comparing outputs between connected elements.
Current schemes of distributed voting subscribe to a common protocol. Once voting is complete and a majority result determined, one processor is chosen to commit the majority result to user
4
. Thus all are examples of a two-phase commit protocol (see J. Gray and A., Reuter,
Transaction Processing: Concepts and Techniques
(Morgan Kaufmann, 1993)). Voting is the first phase, in which all participants share results. A second phase, in which the committal is executed, follows. A distinguished process coordinates the committal. This type of protocol is prominent in the prior art.
Byzantine Protocols
A second class of protocols for distributed voting are the so-called ‘Byzantine’ protocols. The SIFT (Software Implemented Fault Tolerance) Program (circa 1975) (described in J. Goldberg,
A History of Research in Fault-Tolerant Computing at SRI International,
A. Avizienis, H. Kopetz, J. C. Laprie, ed., The Evolution of Fault-Tolerant Computing (Springer, Wien, Austria, 1987)) was the first attempt to support fine-grained flexibility in fault-tolerant operation that entailed decentralized voting. This program was also the seedbed for solutions to important problems in fault-tolerant distributed systems (see L. Lamport et al.,
The Byzantine Generals Problem,
4 ACM Transactions on Programming Languages and Systems (No. 3, July 1982)). Byzantine protocols were developed to tolerate voters that can fail in a totally arbitrary manner, such as sending conflicting results to two or more different sets of participating voters. The goal of these protocols is to achieve Byzantine agreement—all participants should have a globally consistent view of the system. Byzantine faults are the most malicious kind of processor faults that can occur, and they are therefore are the most difficult to tolerate.
While Byzantine protocols are more secure than two-phase commit protocols, they are also more complex. The theoretical requirements necessary to guarantee correct system behavior in this situation can be summarized as follows: to tolerate f Byzantine faults, it is necessary to have 3f+1 independent participating voters in the Byzantine fault-tolerant scheme, where the voters are connected by 2f+1 disjoint communication paths, and exchange information f+1 times to arrive at exact consensus (see id.). To tolerate a single fault, the system must consist of four processors, each having three independent communication paths, and require two exchanges of information. To tolerant as few as three faults requires a system with 10 processors, each having seven independent communication paths, and four exchanges of information.
Because of the complexity inherent in a Byzantine fault-tolerant system, which implies huge costs, few such systems have been implemented commercially. This type of protocol is therefore impractical for a commercially viable (i.e., economically feasible) distributed voting system.
Secure Voting Algorithms
Several protocols have been proposed to overcome the problems described above. For example, M. Castro and B. Liskov (
Practical Byzantine Fault Tolerance,
Proceedings of the 3rd Symposium on Operating System Design and Implementation (February 1999) offered an algorithm whose action can be simplified as follows:
A client sends a request to one of the voters.
The voter multi-casts the request to the other voters.
The voters execute the request and send a reply to the client.
The client waits for f+1 replies with the same result, where f is the number of faults to be tolerated; this result is the final one.
This algorithm is not subject to the same problem as the two-phase commit protocol, since every voter can commit a result. However, it does require substantial computation by the client, which must compare all the replies until f+1 have arrived with the same result. Thus this algorithm does not scale well.
Another protocol is described by M. Reiter (
How to Securely Replicate Services,
16
ACM Transactions on Programming Languages and Systems,
No. 3 (May 1994)). Reiter makes use of a (k,n)-threshold signature. Informally; such a protocol generates a public key, along with n shares of a corresponding private key, each of which can produce a partial result on a signed message m. Any k of these partial results can then be used to reconstruct the whole of m. In this particular protocol, n is the number of voters, and k is one more than the number of tolerated faults. Each voter signs its result with its particular share of the private key and broadcasts it to the other voters. The voters then sort through the broadcast messages for k partial results that agree with its own result and combine them into the whole message m, where m would be the signed final result. The voter then sends m to the client, which accepts the first valid m sent.
This protocol
Hardekopf Benjamin C.
Kwiat Kevin A.
Beausoliel Robert
Mancini Joseph A.
The United States of America as represented by the Secretary of
Wilson Yolanda L.
LandOfFree
Method and apparatus for improved security in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for improved security in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for improved security in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3199492