Multiple drive failure recovery for a computer system having...

Error detection/correction and fault detection/recovery – Pulse or data error handling – Error/fault detection technique

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S758000

Reexamination Certificate

active

06694479

ABSTRACT:

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
Not applicable.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to computer systems having multiple storage drives. More specifically, the invention relates to recovering from multiple drive failures in a computer system having an array of storage drives. More specifically still, the invention relates to a failure recovery scheme where it is assured missing data can be recovered regardless of which particular drives fail.
2. Background of the Invention
Early computer systems typically had only one hard drive or fixed storage device. Even today, computer systems having a single fixed storage drive are standard for personal computer systems. However, commercial and industrial computer users require greater data stability. That is, commercial and industrial computer users want some assurance that information stored on hard disks will not be lost in spite of drive failures.
Some users ensure data stability by performing periodic backups onto tape drive systems. For example, a user may make a complete backup of their hard drive contents on a weekly basis. The user may further make copies of only the changes since the last backup, commonly known as an incremental backup, on a daily basis. However, even this method leaves open the possibility that some information may be lost if there is a failure of the hard drive between data backups. Data stability demands drove computer manufacturers to make computer systems having multiple fixed storage devices.
FIG. 1A
represents one approach computer manufacturers take in storing data in a computer system having multiple hard drives. In
FIG. 1A
, each of the large boxes represents a hard drive in a computer system. One block of data D, being the set of data [d
0
,d
1
,d
2
], is divided into small subsets and distributed across the hard drives of the computer system. Thus, the information is stored on an array of disks. This configuration is commonly known as a Redundant Array of Inexpensive Disks (“RAID”), and may also be known as a Redundant Array of Independent Disks. The system exemplified in
FIG. 1A
is commonly known as RAID
0
. The disadvantage of the RAID
0
system is that upon failure of any one of the disk drives, the overall data D cannot be recovered.
FIG. 1B
represents, in matrix format, the storage system of RAID
0
. Carrying out the matrix multiplication of
FIG. 1B
reveals that d
0
=d
0
, d
1
=d
1
and d
2
=d
2
, which is mathematically uneventful, but is important in other systems as described below. As compared to a single hard drive computer system, RAID
0
actually increases the probability of data loss in that a failure of any one of the drives results in a complete data loss. RAID
0
exemplifies an important concept in multiple disk arrays, that concept being “striping”. With reference to
FIG. 1A
, data D is the combination of the smaller portions of data being [d
0
,d
1
,d
2
]. Placing small portions on each drive of a multiple drive system is known as striping. That is, data is striped across multiple drives.
Manufacturers may address the problem associated with a striped RAID
0
system by “mirroring”. In a mirrored system, there are duplicate drives containing complete sets of duplicate information. For example, an array of drives may consist of four drives, data D may be striped across two drives, and likewise striped again across the other two drives. In this way, as many as two drives may fail without loss of data, so long as the drive failures are not the drives containing the duplicate information. Fault tolerance implemented in this configuration is known as “RAID
1
+0”, “RAID
0
+1” or “RAID
10
.” While a RAID
1
+0 ensures greater data stability over a RAID
0
or a single disk system, the overhead associated with implementing such a system is high. In the exemplary system described, the effective storage utilization capacity of the four disk drives is only 50%. What was needed in the industry was a fault tolerance scheme that had a higher storage utilization capacity, which would therefore make it less expensive to implement.
FIG. 2A
represents a series of hard drives in a computer system that has the same number of hard drives as described with respect to mirroring, however, this specific system reaches a 75% utilization capacity. In this system the data represented by D[d
0
,d
1
,d
2
] is striped across the first three of the four disk drives. The system of
FIG. 2A
further writes error correction or parity information to the fourth disk drive. Such a system is referred to as having three data drives and one parity drive. It is noted that having three data drives is merely an exemplary number and more or fewer data drives are possible. However, fewer data drives translates into lower storage utilization. Likewise, a greater number of data drives represents higher storage utilization. Indeed, as the number of data drives significantly increases, with one parity drive, it is possible that the storage utilization may approach, but never actually reach, 100%.
The subset of data written to the parity drive of
FIG. 2A
is related to the data written to each of the data drives.
FIG. 2B
shows the relationship, in matrix format, of each data subset written to the data drives and the value of the parity subset written to the parity drive. Carrying out the matrix multiplication of
FIG. 2B
reveals that d
0
=d
0, d
1
=d
1
, d
2
=d2 and P=d
0
{circumflex over ( )}d
1
{circumflex over ( )}d
2
, where “{circumflex over ( )}” represents the logical exclusive-OR (XOR) function. Thus, as is indicated in the figure and shown above, the value of the parity subset is the XOR of each of the smaller subsets of the overall data. A system implementing the configuration of
FIGS. 2A
,
2
B is capable of recovery from a single drive failure. Loss of the parity drive does not affect stability of the data. However, loss of any one of the data drives is a recoverable error inasmuch as the data lost on the failed drive may be calculated using the remaining subsets of information in combination with the parity information. Such a fault tolerance scheme is known as RAID
4
.
In a RAID
4
system any write operation to any of the data drives also requires a write to the parity drive. This is true even if only one of the data drives is written. In the three data drive system exemplified in
FIG. 2A
, data throughput is not significantly hampered by this requirement. However, as the number of data drives increases system performance suffers as write commands to the parity drive accumulate.
In computer systems requiring more than a handful of data drives, the RAID
4
system is less desirable because of the throughput capabilities associated with queuing of write requests at the parity drive. Manufacturers address this problem by rotating the parity drive. That is, rather than having designated data and parity drives, the particular hard drive containing the parity information shifts for each block of parity data. Such a distributed parity system is known as RAID
5
. Although parity information is written for each write of a subset of data, no one hard drive becomes the receptacle for all those parity writes. In this way, system throughput is not limited by one parity drive having numerous writes stacked in its input queue.
The disk arrays discussed to this point may each have desirability in particular systems. That is to say, a RAID
5
system may be overkill for an application where there is a somewhat limited amount of data to be stored. It may be more economical in this circumstance to implement a RAID
1
system. Likewise, where large amounts of data must be stored, a RAID
5
may be more desirable.
Except for the two-drive mirroring technique discussed with respect to RAID
1
, the systems discussed to this point have only had the capability of recovering from a single drive failure in the array. For systems having a relatively small

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multiple drive failure recovery for a computer system having... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multiple drive failure recovery for a computer system having..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiple drive failure recovery for a computer system having... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3279022

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.