Electrical computers and digital processing systems: memory – Storage accessing and control – Specific memory composition
Reexamination Certificate
2002-10-16
2004-11-23
Bataille, Pierre (Department: 2186)
Electrical computers and digital processing systems: memory
Storage accessing and control
Specific memory composition
C714S006130, C714S770000, C714S784000
Reexamination Certificate
active
06823425
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to storage systems. More specifically, the present invention pertains to systems and methods for implementing advanced RAID using a set of unique matrices as coefficients.
2. Description of the Related Art
The speed of most processors is increasing faster than the speed of virtually all input/output (I/O) devices. As a response to this widening gap, the invention of the so called RAID (Redundant Array of Independent Disks) system was invented, which is aimed to increase the speed of disk drive reads and writes by replacing single disk drive unit with an array of a plurality of disks with smaller storage capacity. The data contained therein is accessed in parallel.
However, a drawback which is inherent with this solution is while the amount of parallelism (hence efficiency) increases with the size of the array (i.e., the number of disks it contains), the probability of disk failure increases as well. Disk failure may result in the loss of data. Therefore, a method that is able to recover the information contained in any one of the disks needs to be provided.
One known method of insuring the recovery of the data loss in anyone of the disks in an array is to always have two disks containing exactly the same data. This technique (i.e. the so-called mirroring) is used in RAID level
1
architectures. The drawback of this solution is that half of the space needs to be allocated for redundancy.
Another known method is to have a single disk reserved for parity. Here the RAID system will maintain the parity disk to contain the bitwise XOR of all the other disks at all times. If any disk of the array fails, the information contained therein can be recovered along with the information of all the other disks (including the parity disk). The parity information is usually “striped” over all the disks of the array to avoid a bottleneck on one singled out parity disk (RAID level
5
). It should be noted that maintaining the system in its “safe” state requires the parity disk to be updated as soon as any data is updated in other disks.
The Advanced RAID systems deal with the capability to recover data when two or more disks fail in a predetermined period before recovering occurs. One known approach to this problem is to group data disks into different “parity groups” each of which has it's corresponding parity disk. In this approach, the system maintains the bitwise XOR of the data of every disk of the same parity group on its corresponding parity disk. Much research has been done in this area. For example, a good tutorial on Reed-Solomon methods in RAID-like systems can be seen in “A tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems.” By James S. Plank (Technical Report UT-CS-96-332, Jul. 19, 1996). It can be proven that this method requires log
2
(N) parity disks to take care of N data disks.
It is desirous to have a system that is able to recover from disk failures originating from any two disks and maintaining a minimum number of possible redundancy disks; for example with only two redundancy disks. One standard Advanced RAID scheme using the bare minimum amount of redundancy information uses Reed-Solomon codes. The following is a brief description of the same.
First, a word size is fixed. The disks may be conceptualized as a sequence of chunks of memory, and each chunk (or word) having a fixed size. The redundancy words will then by computed over the data words of a same line. A line is defined as a sequence of words; and each word of the sequence comes from a distinct disk. Basically, as far as the coding scheme is concerned, one can think of our disks as containing only one word of fixed size. The details of implementation—such as the actual size of read and writes, stripping method, etc.—are irrelevant as far as the coding scheme is concerned.
Let D
1
, D
2
, . . . , D
N
be the n data words of the N respective disks of the array. Further let P and Q be the two corresponding redundancy words. In a Reed-Solomon based scheme, P and Q are maintained to be
P=D
1
+D
2
+ . . . +D
N
and
Q=c
1
*D
1
+c
2
*D
2
+ . . . +c
N
*D
N
,
where all operations take place in a carefully chosen Galois field, and c
1
, c
2
, . . . , c
N
are carefully chosen distinct words of this Field.
In the context of Galois Fields, it helps to think of the following three elements (i.e., the bits of our data, redundant backup, and coefficient words) as being coefficients of a polynomial over GF(2). For example, if our words are of size 8, the words 11001101, 01101001 and 00111100 correspond respectively to polynomials:
X
7
+X
6
+X
3
+X
2
+1,
X
6
+X
5
+X
3
+1, and
X
5
+X
4
+X
3
+X
2
.
The degree of a polynomial is the largest exponent appearing as a superscript of the X's. The degrees of the three above polynomials are respectfully 7, 6, and 5.
The sum of two words then corresponds to summing the two corresponding polynomials (in GF(2), where, since 2=0, we have X
i
+X
i
=0); this corresponds to the bitwise XORing of both words. For example, the sum of the words 11001101 and 01101001 is 10100100. From now on the reader must understand the sum of two words as being what has just been described.
The * operand corresponds to polynomial multiplication, and a final reduction modulo by the fixed irreducible polynomial which generates the Field. An irreducible polynomial is a polynomial which can not be obtained by multiplying two other polynomials together (just as a prime number is one which is not the product of two other numbers). For example, let
F=X
8
+X
5
+X
4
+X
2
+X
1
be our chosen irreducible polynomial (field generator). To obtain the word 11001101*01101001 we first find the product M of the corresponding polynomials; and M is as shown in FIG.
0
A.
One then computes the remainder of M through the Euclidian division by F. This corresponds to adding shifts of F (polynomials of the form X
a
F for some natural number a) to M until we get rid of all the terms of degree higher or equal to the degree of F. The computation is shown by FIG.
0
B. Thus 11001101*01101001=01101111 in the field generated by F.
The point of working in a field is that every element of a field is invertible. That is to say, for any word there is a corresponding “inverse word” w
−1
such that w*w
−1
=00 . . . 01. This relationship may be used to precisely recover lost information resulting from such events as failed disks.
A further example is provided to depict how one can recover data when the relations
P=D
1
+D
2
+ . . . +D
N
and
Q=c
1
*D
1
+c
2
*D
2
+ . . . +c
N
*D
N
,
are maintained. Assuming disk
1
and disk
2
fail. This means that the information contained in D
1
and D
2
are lost. By solving the above system of equations for D
1
and D
2
, we get
D
1
=(
c
1
+c
2
)
−1
*(
c
2
*P+Q+c
3
*D
3
+ . . . +c
N
*D
N
)
and
D
2
=P+D
1
+D
3
+ . . . +D
N
.
As can be seen, D
1
and D
2
can be computed if P, Q, D
3
, D
4
, . . . , D
N
are not lost (or are known).
It can further be seen that in order to maintain the system in a safe state the * operation must be done not only at every update of a data word, but also needs to be done many times during recovery. Therefore, the field generating polynomial and the N coefficients must be chosen carefully so as to minimize the time needed to compute the * operation between two words.
Lookup table methods have been used to compute the * operation, but this method becomes unreasonable as the word size increases. For example, too many calculations are needed in a too limited period of time. Yet a smaller word size induces a larger number of operations per data block, hence the need to increase the word size or pipe line the operations is required
U.S. Pat. No. 5,499,253, entitled “System and Method for Calculating
Ghosh Sukha
Jain Arvind
Whalen Thor
Anderson Brian J.
Bataille Pierre
iVivity, Inc.
LandOfFree
System and method for implementing advanced RAID using a set... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for implementing advanced RAID using a set..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for implementing advanced RAID using a set... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3297270