System and method for handling temporary errors on a...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S770000

Reexamination Certificate

active

06532548

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to redundant arrays of independent disks (RAID) and, more particularly, to a system and method for handling temporary errors on a redundant array of independent tapes (RAIT).
BACKGROUND ART
A data storage array is a collection of storage elements that are accessible by a host computer as a single storage unit. The individual storage elements can be any type, or a combination of types, of storage devices such as, hard disk drives, semiconductor memory, optical disk drives, magnetic tape drives, and the like. A common storage array comprises a plurality of hard disk drives, i.e., a disk array.
A disk array includes a collection of disks and a disk array controller. The controller controls the operation of the disks and presents them as a virtual disk to a host operating environment. The host operating environment is typically a host computer that executes an operating system and application programs. A virtual disk is an abstract entity realized by the controller and the disk array. A virtual disk is functionally identical to a physical disk from the standpoint of application software executing on the host computer.
One such disk array is a redundant array of independent disks (RAID). RAID comes in various operating levels which range from RAID level
0
(RAID-
0
) to RAID level
6
(RAID-
6
). Additionally, there are multiple combinations of the various RAID levels that form hybrid RAID levels such as RAID-
5
+, RAID-
6
+, RAID-
10
, RAID-
53
and so on. Each RAID level represents a different form of data management and data storage within the RAID disk array.
In a RAID-
5
array, data is generally mapped to the various physical disks in data “stripes” across the disks and vertically in a “strip” within a single disk. To facilitate data storage, a serial data stream is partitioned into blocks of data, the size of each block is generally defined by the host operating environment. Typically, one or more blocks of data are stored together to form a “chunk” or “segment” of data at an address within a given disk. Each chunk is stored on a different disk as the data is striped across the disks. Once all the disks in a stripe have been given chunks of data, the storage process returns to the first disk in the stripe, and stripes across all the disks again. As such, the input data stream is stored in a raster scan pattern onto all the disks in the array.
In a RAID-
5
array, data consistency and redundancy is assured using parity data that is distributed amongst all the disks. Specifically, a RAID-
5
array contains N member disks. Each stripe of data contains N−1 data strips and one parity strip. The parity segments of the array are distributed across the array members usually in cyclic patterns. For example, in an array containing five disks, the first parity strip is located in member disk four, the second parity strip on member disk three, and the third parity strip on member disk two, and so on.
RAID-
5
parity is generated using an exclusive OR (XOR) function. In general, parity data is generated by taking an XOR function of the user data strips within a given data stripe. Using the parity information, the contents of any strip of data on any single one of the data disks in the array can be regenerated from the contents of the corresponding strips on the remaining disks in the array. Consequently, if the XOR of the contents of all corresponding blocks in the data stripe, except one is computed, the result is the content of the remaining block. Thus, if disk three in the five disk array should fail, for example, the data it contains can still be delivered to applications by reading corresponding blocks from all the surviving members and computing the XOR of their contents. As such, the RAID-
5
array is said to be fault tolerant, i.e., the loss of one disk in the array does not impact data availability.
A problem with typical data storage element arrays is that in the event of a failed data storage element, data communication to all of the storage elements is stopped until the failed storage element executes its error recovery. The error recovery involves using the contents from the other storage elements to reconstruct the contents of the failed storage element. The probability of the data being corrected is high. However, a failed storage element can be unresponsive for significant periods. Consequently, the reading and writing of data from and to the data storage array are slowed.
What is needed are a method and system for handling temporary errors in a data storage element array that continuously communicate data to and from the array while a storage element is in error recovery.
SUMMARY OF THE INVENTION
Accordingly, it is an object of the present invention to provide a method and system for storing data in an array of storage elements arranged in parallel having data storage elements and redundant information storage elements in which data to be written to one of the data storage elements is written to one of the redundant information storage elements in place of redundant information if the one of the data elements is unresponsive for receiving data.
It is another object of the present invention to provide a method and system for storing data in an array of storage elements arranged in parallel having data storage elements and redundant information storage elements in which data is written to one of the redundant information storage elements in place of redundant information as long as one of the data elements is unresponsive for receiving data.
In carrying out the above objects and other objects, the present invention provides a method for storing data in a storage system having N storage elements arranged in parallel for concurrent access, where N is an integer greater than three. The method includes determining first redundancy information based on a first row of data to be striped across N−2 storage elements. Which of the N−2 storage elements are responsive for receiving data are then determined. The first row of data is then striped across the responsive storage elements of the N−2 storage elements if at least N−3 of the N−2 storage elements are responsive for receiving data. The first redundancy information is then written to the N−1 storage element. The data to be received by one of the N−2 storage elements is then written to the Nth storage element if the one of the N−2 storage elements is unresponsive for receiving data. The first redundancy information is written to the Nth storage element if all of the N−2 storage elements are responsive for receiving data.
In one embodiment, the method further includes determining second redundancy information based on a second row of data to be striped across the N−2 storage elements. The second row of data is then striped across the responsive storage elements of the N−2 storage elements if at least N−3 of the N−2 storage elements are responsive for receiving data. The second redundancy information is then written to the N−1 storage element. The data of the second row to be received by one of the N−2 storage elements is then written to the Nth storage element if the one of the N−2 storage elements is unresponsive for receiving data.
In another embodiment, the method further includes determining second redundancy information based on a second row of data to be striped across the N−2 storage elements and then determining if the one of the N−2 storage elements is still unresponsive for receiving data. The second row of data is then striped across the responsive storage elements of the N−2 storage elements if at least N−3 of the N−2 storage elements are responsive for receiving data. The second redundancy information is then written to the N−1 storage element. The data of the second row to be received by the one of the N−2 storage elements is then written to the Nth storage element if the one of the N−2 storage element

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for handling temporary errors on a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for handling temporary errors on a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for handling temporary errors on a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3044650

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.