Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-04-13
2004-02-03
Baderman, Scott (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S011000, C714S013000
Reexamination Certificate
active
06687851
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates in general to fault-tolerant computer systems and, more particularly, to mechanisms for upgrading the systems to include additional central processing units (“CPUs”) while the system is operative.
2. Background Information
The fault-tolerant systems of interest operate redundant CPUs in lock-step, that is, in cycle-to-cycle synchronism. Accordingly, before an off-line CPU is brought on-line, to upgrade the system from single-mode redundancy to double-mode redundancy or double to triple-mode redundancy and so forth, the off-line CPU must first be synchronized to the state of an on-line CPU. Similarly, an off-line CPU must be synchronized to the on-line CPU when, for example, a faulty CPU is replaced.
In prior known lock-step systems, the on-line CPU communicates directly with the off-line CPU in accordance with a special synchronization protocol. The CPU boards in the prior system include dedicated synchronization hardware that allows the CPUs to communicate using the synchronization protocol. Accordingly, the CPU boards are both time consuming and expensive to design and manufacture.
Using the synchronization protocol, the on-line CPU directs the off-line CPU to set various components, such as certain registers and memory locations, to states that correspond to the states of the associated registers and memory locations of the on-line CPU. The on-line CPU thus controls a series of back and forth communications between the two CPUs, to provide the state information to the off-line CPU and to instruct the off-line CPU to use the information to set the registers and memory locations to the appropriate states. Accordingly, the other processing operations performed by the on-line CPU may be disrupted during the synchronization process.
SUMMARY OF THE INVENTION
The inventive system includes an I/O subsystem that controls the synchronization of an off-line CPU to an on-line CPU, such that much of the synchronization operation takes place essentially as a background task for the on-line CPU. The I/O subsystem requests that the on-line CPU to provide certain register and memory state information to general purpose registers on an I/O board. The I/O subsystem then copies the register contents to general purpose registers on the off-line CPU board, and the off-line CPU uses the information to set the states of certain of its registers and memory. The I/O system further includes a DMA engine that, at a time set by the I/O subsystem, copies is pages of memory from the on-line CPU to the off-line CPU.
At the end of the synchronization operation, the off-line CPU is directed to write to a predetermined register on the I/O board. When the off-line CPU performs the write operation, it indicates that the off-line CPU is in a known state and ready to go on-line. The I/O subsystem then holds the off-line CPU in the known state by stalling the return of an acknowledgement of the write operation. When the on-line CPU later performs the same write operation, the on-line and the off-line CPUs are then in essentially the same state, and the I/O processor resets the CPUs to ensure that the off-line CPU goes on line and starts a next operating cycle in lock-step with the on-line CPU.
The I/O subsystem includes comparison logic that is updated when the off-line CPU changes its status to on-line as part of the reset operation. The comparison logic then compares the output streams from the previously on-line CPUs and the newly added on-line CPU. Accordingly, after the CPUs reset, the comparison logic compares two output streams if the system went from single to double modular redundancy, or three output streams if the system went from double to triple modular redundancy, and so forth. As discussed in more detail below, when the output streams do not agree the comparison logic also properly handles voting based on the number of on-line CPUs. The system thus dynamically changes its comparison method, as CPUs are added to or removed from the system.
The communications between the on-line CPU and the I/O subsystem, and the I/O subsystem and the off-line CPU do not require a special synchronization communication protocol. Accordingly, the synchronization operation is less complex than the synchronization operations of the prior lock-step systems. Further, the components involved in the synchronization operation, namely, the general purpose registers and the DMA engine, are used for more than just the synchronization operation, and are thus not dedicated synchronization hardware. Also, the synchronization operation is controlled by the I/O subsystem, and thus, the processing operations of the on-line CPU are only minimally interrupted or disrupted. Finally, the comparison logic used to ensure valid output streams dynamically changes based on the number of on-line CPUs, and the system can thus be upgraded in the field.
REFERENCES:
patent: 3192362 (1965-06-01), Cheney
patent: 3533065 (1970-10-01), Keel
patent: 3533082 (1970-10-01), Schnabel
patent: 3544973 (1970-12-01), Borck, Jr. et al.
patent: 3548176 (1970-12-01), Shutler
patent: 3593307 (1971-07-01), Gouge, Jr.
patent: 3641505 (1972-02-01), Artz et al.
patent: 3665173 (1972-05-01), Bouricius et al.
patent: 3681578 (1972-08-01), Stevens
patent: 3688274 (1972-08-01), Cormier et al.
patent: 3710324 (1973-01-01), Cohen et al.
patent: 3736566 (1973-05-01), Anderson et al.
patent: 3783250 (1974-01-01), Fletcher et al.
patent: 3795901 (1974-03-01), Boehm et al.
patent: 3805039 (1974-04-01), Stiffler
patent: 3820079 (1974-06-01), Bergh et al.
patent: 3840861 (1974-10-01), Amdahl et al.
patent: 3879712 (1975-04-01), Edge et al.
patent: 3991407 (1976-11-01), Jordan, Jr. et al.
patent: 3997896 (1976-12-01), Cassarino, Jr. et al.
patent: 4015246 (1977-03-01), Hopkins, Jr. et al.
patent: 4030074 (1977-06-01), Giorcelli
patent: 4032893 (1977-06-01), Moran
patent: 4059736 (1977-11-01), Perucca et al.
patent: 4099234 (1978-07-01), Woods et al.
patent: 4176258 (1979-11-01), Jackson
patent: 4228496 (1980-10-01), Katzman et al.
patent: 4245344 (1981-01-01), Richter
patent: 4263649 (1981-04-01), Lapp, Jr.
patent: 4275440 (1981-06-01), Adams, Jr. et al.
patent: 4309754 (1982-01-01), Dinwiddie, Jr.
patent: 4323966 (1982-04-01), Whiteside et al.
patent: 4356550 (1982-10-01), Katzman et al.
patent: 4358823 (1982-11-01), McDonald et al.
patent: 4366535 (1982-12-01), Cedolin et al.
patent: 4369494 (1983-01-01), Bienvenu et al.
patent: 4375683 (1983-03-01), Wensley
patent: 4434463 (1984-02-01), Quinquis et al.
patent: 4449182 (1984-05-01), Rubinson et al.
patent: 4453215 (1984-06-01), Reid
patent: 4467436 (1984-08-01), Chance et al.
patent: 4484273 (1984-11-01), Stiffler et al.
patent: 4486826 (1984-12-01), Wolff et al.
patent: 4503496 (1985-03-01), Holzner et al.
patent: 4503535 (1985-03-01), Budde et al.
patent: 4507784 (1985-03-01), Procter
patent: 4543628 (1985-09-01), Pomfret
patent: 4562575 (1985-12-01), Townsend
patent: 4583224 (1986-04-01), Ishii et al.
patent: 4589066 (1986-05-01), Lam et al.
patent: 4590554 (1986-05-01), Glazer et al.
patent: 4597084 (1986-06-01), Dynneson et al.
patent: 4608631 (1986-08-01), Stiffler et al.
patent: 4610013 (1986-09-01), Long et al.
patent: 4622667 (1986-11-01), Yount
patent: 4628447 (1986-12-01), Cartret et al.
patent: 4630193 (1986-12-01), Kris
patent: 4633394 (1986-12-01), Georgiou et al.
patent: 4633467 (1986-12-01), Abel et al.
patent: 4644498 (1987-02-01), Bedard et al.
patent: 4648031 (1987-03-01), Jenner
patent: 4654846 (1987-03-01), Goodwin et al.
patent: 4654857 (1987-03-01), Samson et al.
patent: 4669056 (1987-05-01), Waldecker et al.
patent: 4669079 (1987-05-01), Blum
patent: 4686677 (1987-08-01), Flora
patent: 4700292 (1987-10-01), Campanini
patent: 4703420 (1987-10-01), Irwin
patent: 4736377 (1988-04-01), Bradley et al.
patent: 4739498 (1988-04-01), Eichhorn
patent: 4750177 (1988-06-01), Hendrie et al.
patent: 4799140 (1989-01-01), Dietz et al.
patent: 4805091 (1989-02-01), Thiel et al.
patent: 4809169 (1989-02-01), Sfarti et al.
patent: 4816990 (1989-
Somers Jeffrey S.
Tetreault Mark D.
Wegner Timothy M.
Baderman Scott
Stratus Technologies Bermuda Ltd.
Testa Hurwitz & Thibeault LLP
Wilson Yolanda L.
LandOfFree
Method and system for upgrading fault-tolerant systems does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and system for upgrading fault-tolerant systems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for upgrading fault-tolerant systems will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3285832