Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-10-18
2004-04-20
Baderman, Scott (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S015000
Reexamination Certificate
active
06725392
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to the field of array storage devices in computer processing systems and networks. More specifically, the present invention relates to a controller fault recovery system for recovering from faults that cause unscheduled stops for a distributed file system operating on an array storage system having multiple controllers that provides for a proxy arrangement to protect data integrity in the event of a unscheduled stop on just one controller in the array storage system and for an atomic data/parity update arrangement to protect data integrity in the event of an unscheduled stop of more than one controller.
BACKGROUND OF THE INVENTION
The use of array storage systems to store data in computer processing systems and networks is well known. Five different classes of architectures of array storage systems are described under the acronym “RAID” (Redundant Array of Independent/Inexpensive Disks). The primary purpose of a RAID system is to detect corrupted data, and, depending upon the class of the RAID system and the extent of the data corruption, use the data stored on other drives in the disk array to correct the corrupted data. In a RAID 5 disk array, for example, a parity technique is used that protects against the failure of a single one of the disk drives that make up the disk array. Data is written to the disk array in multiple data blocks with a defined number of data blocks (N) making up a parity group. Each parity group is protected by a parity block which is also written to the disk array. The parity block is generated by an exclusive or (XOR) operation on all of the data blocks in the parity group. When the parity group is read, the XOR operation is performed on the data blocks for the parity group and the results are compared with the results stored in the parity block to detect potential corrupted data. In a RAID 5 disk array, each of the data blocks in a parity group, as well as the parity block, is stored on a different disk drive. Therefore, there are a minimum of N+1 disk drives in the RAID 5 array for a parity group having N data blocks. In other words, for a disk array having N disk drives, there can be only N−1 data blocks in a parity group.
Due to the large amount of processing that can be required to implement error detection or error correction techniques, most existing array storage systems are implemented as a set of disks uniquely attached to and managed by a specialized hardware disk controller. In addition to the normal buffers and input/output circuitry required for passing data between a requestor and the disk array, the specialized disk controllers will typically have additional hardware, such as XOR hardware circuitry for computing parity and nonvolatile RAM (NVRAM) for caching and logging. These type of array storage systems are often referred to as hardware RAID systems.
The use of error detection and error correction techniques in RAID systems was initially thought to provide a sufficient level of reliability for these storage systems. Unfortunately, even RAID systems are susceptible to corruption in the event of an unscheduled stop due to a hardware or software error if it occurs during the period when updates or modifications are being made to either a data block or a parity block. Because there is a relatively high correlation of unscheduled stops with hardware faults that may prevent all of the drives in a disk array from being accessed, after such an unscheduled stop, it is often necessary for the system to reconstruct data on a lost drive. For example, if one drive out of a four drive RAID 5 disk array is inaccessible after an unscheduled stop, it will be necessary to reconstruct all of the information on that lost drive using the data and parity on the remaining three drives. If the unscheduled stop occurs during a period when updates or modifications are being made, the problem is deciding what are the proper contents of the blocks in the parity group on the remaining three drives that should be used to reconstruct the data. For example, if a new data block was written before a crash, but the corresponding new parity block was not, then the recovered data would be inaccurate if the information on the lost drive were to be reconstructed with the new data block contents and the old parity block contents.
The problem of knowing which data/parity blocks were successfully written and which were not is compounded by the fact that the buffers of the controller store data in volatile RAM memory during the relatively long period (of unpredictable length) required to actually write the data to the disk array. In the event of an unscheduled stop during this period, the data may or may not have been written to the disk array, and the contents in the volatile RAM memory are lost. Many hardware RAID systems choose to ignore this problem and reconstruct the data from whatever contents of the data and parity blocks are present on the remaining drives. Other systems recognize the problem and attempt to reconstuct a version of the lost data using a predetermined data pattern as described in U.S. Pat. No. 5,933,592. Some hardware RAID systems solve this problem by performing actions in order and using the NVRAM to maintain a log of the actions as they are performed. If an unscheduled stop or error occurs, then the controller plays back the log in the NVRAM and attempts to restore the data to a known state. An example of this type of error recovery using non-volatile storage in the controller is described in U.S. Pat. No. 6,021,463. Other examples include the Enterprise Storage system from Network Appliance that uses the WAFL file system and a non-volatile RAM to keep a log of all requests processed since the last consistency point (i.e., every 10 seconds or so a snapshot of the file system is stored in the NVRAM), and the SPRITE file system from the University of California, Berkeley that used a non-volatile recovery box to store critical system information.
While hardware RAID systems can be effective in solving many of the problems of array storage systems, such systems can be expensive, complicated and less efficient than non-RAID storage systems. The use of NVRAM to periodically store file system information does allow for recovery at each consistency point, but does nothing to avoid or minimize the loss of data resulting from errors occurring between such consistency points. Additionally, the ability to scale hardware RAID systems or use hardware RAID systems in a larger network environment can be limited. In an effort to decrease the cost and complexity of hardware RAID systems, software implementations of RAID systems have been developed for use with disk arrays that have relatively simple hardware controllers.
Most software RAID systems are implemented as part of a centralized file system that governs how the data is to be stored. However, such software RAID systems are subject to the same problems as the hardware RAID systems described above. Software RAID systems can be designed to recover from an unscheduled stop when all of the disks are available after the unscheduled stop by scanning all of the data on the system and performing error detection or parity checks to verify the accuracy of the data. Unfortunately, the amount of time required for this kind of recovery can be very lengthy. To solve the problem of having to scan all of the data for accuracy in the event of an unscheduled stop or error, some software RAID systems use a bit map stored along with the control information or meta-data for a file to indicate whether the parity for the data blocks that make up that file is accurate. Examples of the use of such parity bit maps are described in U.S. Pat. Nos. 5,574,882 and 5,826,001.
Though parity bit maps can be effective in decreasing the time required for recovery, parity bit maps do not address the problem of whether data in the buffers of the controller was successfully written to the disk array. One solution to this problem is described in U.S. Pat. No. 6,041,423 tha
Frey Alexander H.
Graham William A. P.
Olson Leonard
Adaptec, Inc.
Baderman Scott
Martine & Penilla LLP
LandOfFree
Controller fault recovery system for a distributed file system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Controller fault recovery system for a distributed file system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Controller fault recovery system for a distributed file system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3195691