Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-04-13
2004-03-16
Baderman, Scott (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S043000, C710S316000
Reexamination Certificate
active
06708283
ABSTRACT:
FIELD OF THE INVENTION
The invention relates generally to fault tolerant computer systems and, more particularly, to mechanisms for fault tolerant access to system-critical devices on peripheral busses.
BACKGROUND OF THE INVENTION
Fault-tolerant computer systems are employed in situations and environments that demand high reliability and minimal downtime. Such computer systems may be employed in the tracking of financial markets, the control and routing of telecommunications and in other mission-critical functions such as air traffic control.
A common technique for incorporating fault-tolerance into a computer system is to provide a degree of redundancy to various components. In other words, important components are often paired with one or more backup components of the same type. As such, two or more components may operate in a so-called lockstep mode in which each component performs the same task at the same time, while only one is typically called upon for delivery of information. Where data collisions, race conditions and other complications may limit the use of lockstep architecture, redundant components may be employed in a failover mode. In failover mode, one component is selected as a primary component that operates under normal circumstances. If a failure in the primary component is detected, then the primary component is bypassed and the secondary (or tertiary) redundant component is brought on line. A variety of initialization and switchover techniques are employed to make a transition from one component to another during runtime of the computer system. A primary goal of these techniques is to minimize downtime and corresponding loss of function and/or data.
Fault-tolerant computer systems are often costly to implement since many commercially available components are not specifically designed for use in redundant systems. It is desirable to adapt conventional components and their built-in architecture whenever possible.
To reduce downtime, fault tolerant systems are designed to include redundancy for connections and operations that would otherwise be single points of failure for the system. Accordingly, the fault tolerant system may include redundant CPUs and storage devices. Certain devices on peripheral busses may also be single points of failure for the system. In a system that uses a Windows operating system, for example, the loss of a controller for peripheral busses and/or a video controller results in a system failure.
Devices such as a keyboard, mouse, monitor, floppy drives, CD ROM drives, and so forth typically communicate with a system I/O bus, such as a PCI bus, over a variety of peripheral busses such as a USB and an ISA/IDE bus. The various peripheral busses connect to the PCI bus through a peripheral bus controller, such as an Intel PCI to ISA/IDE Xcelarator. The windows operating systems require that the peripheral bus controller plug into location
0
on the system PCI bus, or what is commonly referred to as “PCI bus
0
.”
A PCI-to-PCI bridge may be used to provide additional slots on a PCI bus. A bridge for use with the PCI bus
0
, for example, provides slots for the system-critical peripheral bus controller and video controller, and various other devices. The PCI-to-PCI bridge is then a single point of failure, as is the peripheral bus controller and the video controller. While it is desirable to provide fault tolerance by including redundant paths to the peripheral devices, through redundant PCI-to-PCI bridges and associated peripheral bus controllers and video controllers, the operating system is not equipped to handle them. The operating system requires that all of the peripheral bus controllers connect to PCI bus
0
, and redundant controllers alone thus can not provide the desired, fully redundant paths to the peripheral devices. Accordingly, what is needed is a mechanism to achieve such redundancy within the confines of the commercially available operating systems.
SUMMARY OF THE INVENTION
The inventive system essentially hides redundant paths to the peripheral devices from the operating system, by reporting a single “virtual” path to the peripheral busses over PCI bus
0
. The virtual path includes at least a virtual peripheral bus controller and a virtual video controller. The system also tells the operating system that the real controllers are on another PCI bus on an opposite side of a PCI-to-PCI bridge connected also to PCI bus
0
. An I/O system manager selects one of the actual paths, which may, but need not, be connected to PCI bus
0
, to handle communications with the peripheral devices.
The I/O system manager maintains the controllers on the unselected path in an off-line or standby mode, in case of a failure of one or more of the controllers on the selected path. If a failure occurs, the I/O system manager performs a fail-over operation to change the selection of controllers, as discussed in more detail below. The operating system does not respond to the controller failure by declaring a system failure, however, because the operating system continues to look to the virtual path, with its virtual controllers, as a valid path to the peripheral devices. Accordingly, the fail-over operation does not adversely affect the overall operations of the system.
As discussed in more detail below, the system also allows hot swapping of PCI bridges, and associated devices on the PCI bus and the peripheral busses.
REFERENCES:
patent: 3544973 (1970-12-01), Borck, Jr. et al.
patent: 3548176 (1970-12-01), Shutler
patent: 3641505 (1972-02-01), Artz et al.
patent: 3710324 (1973-01-01), Cohen et al.
patent: 3736566 (1973-05-01), Anderson et al.
patent: 3795901 (1974-03-01), Boehm et al.
patent: 3805039 (1974-04-01), Stiffler
patent: 3820079 (1974-06-01), Bergh et al.
patent: 3840861 (1974-10-01), Amdahl et al.
patent: 3997896 (1976-12-01), Cassarino, Jr. et al.
patent: 4015246 (1977-03-01), Hopkins, Jr. et al.
patent: 4032893 (1977-06-01), Moran
patent: 4059736 (1977-11-01), Perucca et al.
patent: 4128883 (1978-12-01), Duke et al.
patent: 4228496 (1980-10-01), Katzman et al.
patent: 4245344 (1981-01-01), Richter
patent: 4263649 (1981-04-01), Lapp, Jr.
patent: 4275440 (1981-06-01), Adams, Jr. et al.
patent: 4309754 (1982-01-01), Dinwiddie, Jr. et al.
patent: 4366535 (1982-12-01), Cedolin et al.
patent: 4434463 (1984-02-01), Quinquis et al.
patent: 4449182 (1984-05-01), Rubinson et al.
patent: 4453215 (1984-06-01), Reid
patent: 4467436 (1984-08-01), Chance et al.
patent: 4484273 (1984-11-01), Stiffler et al.
patent: 4486826 (1984-12-01), Wolff et al.
patent: 4503496 (1985-03-01), Holzner et al.
patent: 4543628 (1985-09-01), Pomfret
patent: 4590554 (1986-05-01), Glazer et al.
patent: 4597084 (1986-06-01), Dynneson et al.
patent: 4608631 (1986-08-01), Stiffler et al.
patent: 4628447 (1986-12-01), Cartret et al.
patent: 4630193 (1986-12-01), Kris
patent: 4633394 (1986-12-01), Georgiou et al.
patent: 4654857 (1987-03-01), Samson et al.
patent: 4669056 (1987-05-01), Waldecker et al.
patent: 4669079 (1987-05-01), Blum
patent: 4700292 (1987-10-01), Campanini
patent: 4703420 (1987-10-01), Irwin
patent: 4750177 (1988-06-01), Hendrie et al.
patent: 4805091 (1989-02-01), Thiel et al.
patent: 4809169 (1989-02-01), Sfarti et al.
patent: 4816990 (1989-03-01), Williams
patent: 4827409 (1989-05-01), Dickson
patent: 4866604 (1989-09-01), Reid
patent: 4869673 (1989-09-01), Kreinberg et al.
patent: 4914580 (1990-04-01), Jensen et al.
patent: 4916695 (1990-04-01), Ossfeldt
patent: 4926315 (1990-05-01), Long et al.
patent: 4931922 (1990-06-01), Baty et al.
patent: 4939643 (1990-07-01), Long et al.
patent: 4974144 (1990-11-01), Long et al.
patent: 4974150 (1990-11-01), Long et al.
patent: 4985830 (1991-01-01), Atac et al.
patent: 4994960 (1991-02-01), Tuchler et al.
patent: 5005174 (1991-04-01), Bruckert et al.
patent: 5083258 (1992-01-01), Yamasaki
patent: 5099485 (1992-03-01), Bruckert et al.
patent: 5117486 (1992-05-01), Clark et al.
patent: 5138257 (1992-08-01), Katsura
patent: 5179663 (1993-01-01), Iimura
patent: 5243704 (1993-09-01), Baty et al.
patent: 5247522 (19
Alden Andrew
Dolaty Mohsen
Edwards, Jr. John W.
Kement Michael W.
MacLeod John R.
Baderman Scott
Stratus Technologies Bermuda Ltd.
Testa Hurwitz & Thibeault LLP
LandOfFree
System and method for operating a system with redundant... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for operating a system with redundant..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for operating a system with redundant... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3188774