Dual-drive fault tolerant method and system for assigning...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S770000, C711S114000

Reexamination Certificate

active

06453428

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The disclosed invention relates to architectures for arrays of disk drives, and more particularly, to disk array architectures that provide two-drive fault tolerance.
2. Description of the Related Art
Since the beginning of computers, data protection has been one of the main concerns in designing data storage systems. Valuable data stored in hard drives can be lost due to abnormal occurrences such as human errors, equipment failures, and adverse environmental conditions. With the advent of on-line, interactive computing, the protection of data against failure has become an even more important consideration in designing data storage systems. For example, modem e-commerce enables companies to conduct all or sizable portion of their business over the Internet using computers. In such scenario, if hard drives on a company's server computer fail, the company's business may come to a standstill. This may lead to a substantial loss in business and goodwill of its customers.
To guard against such disastrous events and enhance I/O performance, many computer systems implement a Redundant Array of Independent Disk (RAID) system, which is a disk system that includes a collection of multiple disk drives and an array controller. The disk drives are organized into a disk array and managed by the common array controller. The array controller presents the array to the user as one or more virtual disks. Disk arrays are the framework to which RAID functionality is added in functional levels to produce cost-effective, highly available, high-performance disk systems.
In RAID systems, the data are distributed over multiple disk drives to allow parallel operation, thereby enhancing disk access performance and providing fault tolerance against drive failures. Currently, a variety of RAID levels from RAID level 0 through level 6 has been specified in the industry. For example, RAID level 0 is a performance-oriented striped data mapping technique, Uniformly sized blocks of storage are assigned in a regular sequence to all of the disks in the array. RAID 0 provides high I/O performance at low cost. Reliability of a RAID 0 system is less than that of a single disk drive because failure of any one of the drives in the array can result in a loss of data.
On the other hand, RAID level 1, also called mirroring, provides simplicity and a high level of data availability. A mirrored array includes two or more disks wherein each disk contains an identical image of the data. A RAID level 1 array may use parallel access for high data transfer rates when reading. RAID 1 provides good data reliability and improves performance for read-intensive applications, but at a relatively high cost.
RAID level 2 is a parallel mapping and protection technique that employs error correction codes (ECC) as a correction scheme, but is considered unnecessary because off-the-shelf drives come with ECC data protection. For this reason, RAID 2 has no current practical se, and the same performance can be achieved by RAID 3 at a lower cost. As a result, RAID 2 is rarely used.
RAID level 3 adds redundant information in the form of parity data to a parallel accessed striped array, permitting regeneration and rebuilding of lost data in the event of a single-disk failure. One chunk of parity protects corresponding chunks of data on the remaining disks. RAID 3 provides high data transfer rates and high data availability. Moreover, the cost of RAID 3 is lower than the cost of mirroring since there is less redundancy in the stored data.
RAID level 4 uses parity concentrated on a single disk to allow error correction in the event of a single drive failure (as in RAID 3). Unlike RAID 3, however, member disks in a RAID 4 array are independently accessible. Thus RAID 4 is more suited to transaction processing environments involving short file transfers. RAID 4 and RAID 3 both have a write bottleneck associated with the parity disk, because every write operation modifies the parity disk.
In RAID 5, parity data is distributed across some or all of the member disks in the array. Thus, the RAID 5 architecture achieves performance by striping data blocks among N disks, and achieves fault-tolerance by using 1/N of its storage for parity blocks, calculated by taking the exclusive-or (XOR) results of all data blocks in the parity disks row. The write bottleneck is reduced because parity write operations are distributed across multiple disks.
As is well known in the art, the RAID levels 1 through 5 provide a single drive fault tolerance. That is, these RAID levels allow reconstruction of the original data if any one of the disk drives fail. Sometimes, however, more than one drive may fail in a RAID system. For example, dual drive failures are becoming more common occurrences as RAID systems incorporate an increasing number of disk drives.
To provide, in part, a dual fault tolerance to such failures, a RAID level 6 has been specified in the industry. The RAID 6 architecture is similar to RAID 5, but RAID 6 can overcome the failure of any two disk drives by using an additional parity block for each row (for a storage loss of 2/N). The first parity block (P) is calculated by performing XOR operation on a set of assigned data chunks. Likewise, the second parity block (Q) is generated by using Reed-Solomon codes on a set of assigned data chunks. When a pair of disk drives fails, the conventional dual-fault tolerant RAID systems reconstruct the data of the failed drives using the parity sets. The RAID systems are well known in the art and are amply described, for example, in
The RAID Book, A storage System Technology Handbook
, by Paul Massiglia, 6
th
Ed. (1997), which is incorporated herein by reference.
Conventional RAID systems implementing the RAID level 6, however, generally require costly and complex array controllers because the Reed-Solomon codes are complex and may require significant computational resources. That is, the complexity of Reed-Solomon codes may preclude the use of such codes in software and necessitate the use of expensive special purpose hardware. Thus, implementation of Reed-Solomon codes in a disk array increases the cost and complexity of the array. Unlike the simpler XOR codes, Reed-Solomon codes cannot easily be distributed among dedicated XOR processors. In a dual XOR RAID scheme described in U.S. patent application Ser. No. 09/250,657, which was previously incorporated by reference, the efficiency of reconstructing the original data depends largely on the scheme used to associate parity sets with data chunks.
Thus, what is needed is a generalized method and system that can efficiently assign column parity sets to data chunks in a dual-drive tolerant RAID system so as to allow efficient reconstruction of the original data in the event of disk drive failures.
SUMMARY OF THE INVENTION
The present invention fills these needs by providing a method and system for assigning column parity sets to data chunks in a dual-fault tolerant RAID system. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer readable medium. Several inventive embodiments of the present invention are described below.
In one embodiment, the present invention provides a method for assigning data chunks to column parity sets in a dual-drive fault tolerant storage disk drive system having N disk drives, where N is a prime number. In this method, each of the N disk drives are organized into N chunks such that the N disk drives are configured as one or more N×N array of chunks. The array has chunks arranged in N rows from row
1
to row N and in N columns from column 1 to column N. Each row includes a plurality of data chunks for storing data, a column parity chunk for storing a column parity set, and a row parity chunk for storing a row parity set. These data chunks are assigned in a predetermined order. The data chunks in each row are assigned to the row parity set. Each column parity

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Dual-drive fault tolerant method and system for assigning... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Dual-drive fault tolerant method and system for assigning..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dual-drive fault tolerant method and system for assigning... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2869985

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.