Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-04-19
2003-11-11
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S006130, C714S052000
Reexamination Certificate
active
06647516
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to fault tolerant data storage systems and methods of operating a fault tolerant data storage system.
BACKGROUND OF THE INVENTION
Redundant array of independent disks (RAID) subsystems have been utilized for a number of years. In fault tolerant RAID subsystems, the primary objective for fault tolerance is not to prevent any type of fault from occurring but rather to continue to operate correctly during the presence of a component fault. There are many different methods for achieving the fault tolerant goals. However, even when these objectives are clearly in front of designers, it is often the case that this fault tolerance objective is not actually achieved.
For example, depending on the type of fault, some faults are so large that the system must be completely halted (e.g., a fire). Others will be fairly isolated and potentially corrupt the users data stored on the RAID subsystem. Once data is corrupted, it is generally less desirable to pass the corrupted data back to the host and advertise the data as being good. A system that is tolerant of all faults will not pass corrupted data back to the host.
In the past, fault tolerance was largely viewed as a vehicle to provide robustness and correctness of operation. Fault tolerance becomes very important when considering that the demand for complete data availability is increasing to extreme levels. For example, some systems provide a guaranteed down time of only 5 minutes per year.
The storage subsystem is just one component of many in some large systems. For example, a RAID subsystem may have an allocation of only 1 minute out of the total 5 minutes for yearly down time. Additionally, the subsystems of the RAID subsystems connected to this large system have to share this remaining 1 minute. It is typically unacceptable to ever allow data to become unavailable from the RAID storage subsystem. Further, the restrictions related to loss of data availability are increasing dramatically over time.
In conventional arrangements, one could provide fault tolerance and continued operation by halting all operations in the system, initiating a subsystem wide reset, reconfiguring the system to disable the failed component, and resuming operations after the “warm boot” operation. The time required to reboot the system is so long (on the order of a few seconds) that the data availability goals are significantly impacted by the reboot strategy. Such delays may approach unacceptable periods of time.
Accordingly, there exists a need to provide improved fault tolerant data storage systems and methods of operating fault tolerant data storage systems.
SUMMARY OF THE INVENTION
The invention provides fault tolerant data storage systems and methods of operating a fault tolerant data storage system.
In one aspect of the invention, a fault tolerant data storage system comprises: a plurality of coupled components individually including: an interface adapted to couple with a data connection and to selectively receive a plurality of transactions from the data connection; transaction processing circuitry coupled with the interface and configured to process transactions received from the interface; and analysis circuitry configured to detect error conditions within the transactions and to prevent entry of transactions individually including an error condition into the respective component responsive to the detection.
In another aspect of the invention, a method of operating a fault tolerant data storage system comprises: providing a fault tolerant data storage system including a plurality of components configured to process transactions; providing the transactions for communication to respective components; detecting error conditions within the transactions; and preventing entry of transactions which individually include an error condition into respective components responsive to the detecting.
Another aspect of the invention provides a method of operating a fault tolerant data storage system comprising: providing a fault tolerant data storage system including a plurality of coupled components configured to process transactions; communicating transactions intermediate coupled components; detecting an error condition within one of the transactions; and isolating the component which outputted the transaction including the error condition responsive to the detecting.
REFERENCES:
patent: 5383201 (1995-01-01), Satterlee et al.
patent: 5592610 (1997-01-01), Chittor
patent: 5680537 (1997-10-01), Byers et al.
patent: 5928370 (1999-07-01), Asnaashari
patent: 5953351 (1999-09-01), Hicks et al.
patent: 6073251 (2000-06-01), Jewett et al.
Grund Christine
Johansson Christopher W.
Oldfield Barry J.
Rust Robert A.
Shrader Steven Lee
Beausoliel Robert
Hewlett--Packard Development Company, L.P.
Puente Emerson
LandOfFree
Fault tolerant data storage systems and methods of operating... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Fault tolerant data storage systems and methods of operating..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fault tolerant data storage systems and methods of operating... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3135165