Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1999-09-13
2003-04-22
Corrielus, Jean M. (Department: 2172)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000, C707S793000, C709S221000
Reexamination Certificate
active
06553389
ABSTRACT:
TECHNICAL FIELD
The present invention relates generally to distributed data storage systems and more particularly to determining resource availability in such systems during transient failures.
BACKGROUND
In the past, the storage network, which linked a host such as a computer and its related data storage system, was fairly slow. As a result, data was generally not distributed across the data storage system because such distribution would increase access time.
With the advent of fiber optic technology for the data storage network, it becomes possible to have extremely fast access to data even when it is placed remotely and it is placed on a number of different data storage devices. One of the advantages of placing data, or “striping” data, on a number of different data storage devices is that the data storage devices can be accessed in parallel so the bandwidth can be substantially increased. Essentially, striping involves placing a first set of bytes on the first data storage device, the next set of bytes on the next data storage device, etc., and then wrapping around so the fourth set of bytes is placed on the first data storage device, etc. With three data storage devices, there is essentially three times the bandwidth over that of a single data storage device. This is essentially how a RAID array (redundant array of inexpensive disks) works.
In addition, in RAID arrays, there is hardware support for striping and for “mirroring”. “Mirroring” provides for replication of data on a number of separate devices for more rapid access and redundancy. The reason hardware support is required, for example for mirroring, is if the data is read-only, it can be read from whichever data storage device is faster, but if writes are performed, the write must be propagated to all copies. Further, if two hosts are writing to the same data at the same time, it is necessary for the writes to be consistent. While the hardware support for these storage arrays are fairly well developed, the same is not true for networks of data storage devices.
For distributed data storage systems, problems occur when some data storage devices fail. The data storage devices stop responding to messages and send no further messages. This has the effect of logically separating the failed data storage devices from the rest. Portions of the data storage network can also fail, which can lead to a “partitioning”. In partitioning the data system splits the hosts and the data storage devices in the data storage system into two or more “partitions”. Within a partition, all the hosts and data storage devices can communicate with each other, but no communications are possible between partitions. In many cases, the data storage system can not distinguish between a partitioning and the failure of one or more data storage devices. Thus, it is not possible to determine resource availability.
In particular, data storage systems that provide “virtual stores” to users present a special problem. A “virtual store” is a logical structure that appears to the host application as if it were a data storage device of a given capacity, but in reality the data in the virtual store is spread over multiple real data storage devices. For example, data can be minored to improve its availability and can be striped to improve bandwidth. Both of these approaches result in multiple data storage devices being involved in storing the data for the virtual store. When the virtual store is updated, all the data storage devices holding part of the virtual data space being updated must be updated. If not one data storage device will lose synchronization with the others, and a host that tries to read from that data storage device will see inconsistent data.
During partitioning, a host will be able to read some data storage devices, but not necessarily all. Further, two hosts in two different partitions will be only able to reach devices in their own partitions. If left uncontrolled, the data storage devices in different partitions will lose synchronization if the two hosts write only to the devices within their own partitions. If data are supposed to be mirrored, or if there are consistency requirements between different data, this is a major problem.
The typical solution is to “lock out” access to data in all but at most one partition. That is, at most one partition is chosen as “active”, and only hosts in that partition can access data. In all other partitions, hosts will be locked out or denied access until the data storage devices or the network are repaired.
The most common way of ensuring that the data are accessible in at most one partition is to require that there be a “quorum” of data storage devices in the partition. Typically, a “quorum” is defined as a majority of the data storage devices that store copies of the data. At the present time, it is entirely possible that no partition will contain a majority of the devices, and so the data will be totally inaccessible.
In a distributed data storage system, a quorum is not enough for correct operation. In addition, it is important that all of the data space in the virtual store be covered by data storage devices in the partition. For example, a virtual store can have its data space divided into three parts. Each part is mirrored so that six separate data storage devices each hold a portion of the data for the virtual store. A simple majority of the data storage devices can be formed by taking both of the mirrors of the first two-thirds of the data space. However, there may be no devices in the partition storing any of the last third of the data. This means that all the data would be unavailable despite having a quorum because of the lack of complete “coverage” of the data. Thus, a distributed data storage system requires both a quorum of devices and coverage of the data space.
In the past, mechanisms for establishing a quorum were only concerned with the replication of a single datum.
The data storage system was considered as moving through a sequence of “epochs” with a failure or repair defining the transition from one epoch to the next. At each epoch boundary, a protocol is run in each partition to determine what data storage devices are available in the partition and whether access will be allowed in the partition during that epoch.
At the end of the protocol, the data storage devices in at most one partition will have determined that they have a quorum so that access can continue in that partition. Those data storage devices may then elect to regenerate replicas into other data storage devices in that partition so that a proper degree of redundancy is available. This complicates the protocol for deciding when a partition has a quorum, because the population of data storage devices from which a quorum must be drawn changes over time. To handle the changing population of replicas, each replica maintains an epoch number and a list of the replicas active in that epoch.
Protocols of this type are known to provide good availability as long as there are three or more replicas. When there are only two replicas, both must be available in a partition to have more than half available, so that the failure of at least one renders the data unavailable. This results in lower availability than with a single replica. Thus, there is no truly effective way of determining data storage resource availability during data system failures for distributed data storage systems.
DISCLOSURE OF THE INVENTION
The present invention provides a data storage system including a virtual data store having a plurality of portions of data and a plurality of data storage devices connectable to said virtual store capable of storing portions of said data of said virtual store. A coordinator is connectable to at least one of said plurality of data storage devices and is responsive to information therein to allow recovery of said data storage system after a partitioning of said plurality of data storage devices when said at least of one of said plurality of data storage devices contains all of said plurality of portions of said data to have
Borowsky Elizabeth Lynn
Golding Richard Andrew
Corrielus Jean M.
Hewlett--Packard Company
LandOfFree
Resource availability determination mechanism for... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Resource availability determination mechanism for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Resource availability determination mechanism for... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3007850