Error detection/correction and fault detection/recovery – Pulse or data error handling – Digital data error correction
Reexamination Certificate
2001-05-09
2004-07-20
Dildine, R. Stephen (Department: 2133)
Error detection/correction and fault detection/recovery
Pulse or data error handling
Digital data error correction
C711S114000, C714S006130, C714S763000, C714S766000, C714S767000
Reexamination Certificate
active
06766491
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to performance enhancements for redundant array of inexpensive disks (RAID) storage systems and more particularly to a method and system for enhancing performance of mirroring operations between controllers in an active-active controller pair.
BACKGROUND OF THE INVENTION
A typical data processing system generally includes one or more storage units which are connected to a host computer either directly or through a control unit and a channel. The function of the storage units is to store data and other information (e.g., program code) which the host computer uses in performing particular data processing tasks.
Various types of storage units are used in current data processing systems. A typical system may include one or more large capacity tape units and/or disk drives connected to the system through respective control units for storing data. However, a problem exists if one of the storage units fails such that information contained in that unit is no longer available to the system. Generally, such a failure will shut down the entire computer system, which can create a problem for systems which require data storage systems to have high availability.
This problem has been overcome to a large extent by the use of Redundant Array of Inexpensive Disks (RAID) systems. RAID systems are widely known, and several different levels of RAID architectures exist, including RAID
1
through RAID
5
, which are also widely known. A key feature of a RAID system is redundancy, which is achieved through the storage of a data file over several disk drives and parity information stored on one or more drives. If one disk drive fails, then the RAID system is generally able to reconstruct the data which was stored on the failed drive from the remaining drives in the array.
High availability is a key concern because in many applications users rely heavily on the data stored on the RAID system. In these type of applications, unavailability of data stored on the RAID system can result in significant loss of revenue and/or customer satisfaction. Employing a RAID system in such an application enhances availability of the stored data, since if a single disk drive fails, data may still be stored and retrieved from the system. In addition to the use of a RAID system, it is common to use redundant RAID controllers to further enhance the availability of such a storage system. In such a situation, two or more controllers are used in a RAID system, where if one of the controllers fails the other remaining controller will assume operations for the failed controller. Such a platform enhances the availability of a RAID system because the system can sustain a failure of a controller and continue to operate. When using dual controllers, each controller may conduct independent read and write operations simultaneously, known as an active-active configuration. It can be advantageous in certain applications to use the active-active configuration, as the RAID system can support relatively high rates of data transfer between the disks and host, although employing an active-active configuration requires mirroring of data and parity between controllers to maintain redundancy, as will be described in detail below.
With reference to
FIG. 1
, a RAID system
100
having an active-active controller pair is described. The RAID system
100
is connected to a host computer
104
through a host channel
108
. The RAID system
100
includes a first active controller
112
, a second active controller
116
, and a disk array
120
. The disk array
120
is connected to the first active controller
112
by a first disk channel
124
and a second disk channel
128
, and to the second active controller
116
by the first and second disk channels
124
,
128
. The disk array
120
contains a number of disk drives
132
,
136
,
140
,
144
,
148
, that are used for data storage. Within the first active controller
112
, there is a processor
152
and a nonvolatile random access memory (NVRAM)
156
, and within the second active controller
116
there is a processor
160
and a NVRAM
164
. It should be understood that the number of drives shown in
FIG. 1
are for the purpose of discussion only, and that a RAID system
100
may contain more or fewer disk drives than shown in FIG.
1
. Data is written to the disk array
120
in such a way that if one drive fails, data can continue to be read from and written to the disk array
120
. How this redundancy is accomplished depends upon the level of RAID architecture used, and is well known in the art.
When storing data, generally, a controller receives the data and breaks the data down into blocks which will be stored on the individual disk drives
132
,
136
,
140
,
144
,
148
. The blocks of data are then arranged to be stored on the drives
132
,
136
,
140
,
144
,
148
. In arranging the blocks of data, the controller organizes the blocks into stripes and generates a parity block for each stripe. The data is written across several drives, and the parity for that stripe is written to one disk drive. In certain cases, the data may not be large enough to fill a complete stripe on the RAID system. This is known as a non-full stripe write. When the data sent to the controller occupies a full stripe, the data is simply written over existing data and the parity is written over the existing parity. Additionally, in certain cases, the controller may aggregate several small writes together to create a full stripe of data, which the controller treats as a full stripe of data for purposes of generating parity. However, in the case of a non-full stripe write, modifying the stripe of data requires several steps, and is a disk intensive activity.
The occurrence of non-full stripe writes is common in many applications, such as financial, reservation and retail systems, where relatively small data records are widely used and are accessed and modified at random. When an individual customer record needs to be revised, it may reside in a stripe of data that contains several other customer data records. In such a case, only a portion of the stripe needs to be modified, while the remainder of the stripe remains unaffected by the modification of the data.
As mentioned above, when using an active-active controller pair in a RAID system, in order to maintain redundancy, data and parity must be mirrored between the controllers in the active-active system. In such a system, when the host computer
104
sends data to be written to the disk array
120
, the data is typically sent to either the first active controller
112
, or the second active controller
116
. Where the data is sent depends upon the location in the disk array
120
the data will be written. In active-active systems, typically one controller is zoned to a specific array of drives, or a specific area within an array of drives. Thus, if data is to be written to the array that the first active controller
112
is zoned to, the data is sent to the first active controller
112
. Likewise, if the data is to be written to an array that the second active controller
116
is zoned to, the data is sent to the second active controller
116
. In order to maintain redundancy between the two controllers
112
,
116
, the data sent to the first active controller
112
must be copied onto the second active controller
116
. Likewise, the data sent to the second active controller
116
must be copied onto the first active controller
112
. The data is copied between controllers because, for example, if the first active controller
112
suffers a failure, the second active controller
116
can then use the copy of the data to complete any data writes which were outstanding on the first active controller
112
when it failed. This process of copying data, as well as parity, is known as mirroring.
Mirroring in such a system is typically necessary because when the host
104
sends data to be written, the controller that receives the data, stores the data in a memory location, and sends a reply to the host
104
that
Dildine R. Stephen
Dot Hill Systems Corp.
Sheridan & Ross P.C.
LandOfFree
Parity mirroring between controllers in an active-active... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Parity mirroring between controllers in an active-active..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parity mirroring between controllers in an active-active... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3201048