Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-08-10
2003-12-30
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S006130, C714S006130, C713S001000, C713S002000
Reexamination Certificate
active
06671820
ABSTRACT:
TECHNICAL FIELD
The present disclosure relates in general to the field of computer networks and, more particularly, to a system and method for the backup and recovery of data in a multi-computer environment.
BACKGROUND
Computer networking environments such as Local Area Networks (LANs) and Wide Area Networks (WANs) permit many users, often at remote locations, to share communication, data, and resources. A storage area network (SAN) may be used to provide centralized data sharing, data backup, and storage management in these networked computer environments. This combination of a LAN or WAN with a SAN may be referred to as a shared storage network. A storage area network is a high-speed subnetwork of shared storage devices. A storage device is any device that principally contains a single disk or multiple disks for storing data for a computer system or computer network. The collection of storage devices is sometimes referred to as a storage pool. The storage devices in a SAN can be collocated, which allows for easier maintenance and easier expandability of the storage pool. The network architecture of most SANs is such that all of the storage devices in the storage pool are available to all the servers on the LAN or WAN that is coupled to the SAN. Additional storage devices can be easily added to the storage pool, and these new storage devices will also be accessible from any server in the larger network.
In a computer network that includes a SAN, the server can act as a pathway or transfer agent between the end user and the stored data. Because much of the stored data of the computer network resides in the SAN, rather than in the servers of the network, the processing power of the servers can be used for applications. Network servers can access a SAN using the Fibre Channel protocol, taking advantage of the ability of a Fibre Channel fabric to serve as a common physical layer for the transport of multiple upper layer protocols, such as SCSI, IP, and HIPPI, among other examples.
The storage devices in a SAN may be structured in a RAID configuration. When a system administrator configures a shared data storage pool into a SAN, each storage device may be grouped together into one or more RAID volumes and each volume is assigned a SCSI logical unit number (LUN) address. If the storage devices are not grouped into RAID volumes, each storage device will typically be assigned its own LUN. The system administrator or the operating system for the network will assign a volume or storage device and its corresponding LUN to each server of the computer network. Each server will then have, from a memory management standpoint, logical ownership of a particular LUN and will store the data generated from that server in the volume or storage device corresponding to the LUN owned by the server.
When a server is initialized, the operating system assigns all visible storage devices to the server. For example, if a particular server detects several LUNs upon initialization, the operating system of that server will assume that each LUN is available for use by the server. Thus, if multiple servers are attached to a shared data storage pool, each server can detect each LUN on the entire shared storage pool and will assume that it owns for storage purposes each LUN and the associated volume or storage device. Each server can then store the user data associated with that server in any volume or storage device in the shared data storage pool. Difficulties occur, however, when two or more servers attempt to write to the same LUN at the same time. If two or more servers access the same LUN at the same time, the data stored in the volume or storage device associated with that LUN will be corrupted. The disk drivers and file system drivers of each server write a data storage signature on the storage device accessed by the server to record information about how data is stored on the storage system. A server must be able to read this signature in order to access the previously written data on the storage device. If multiple servers attempt to write signatures to the same storage device, the data storage signatures will conflict with each other. As a result, none of the servers will be able to access the data stored in the storage device because the storage device no longer has a valid data storage signature. The data on the storage device is now corrupted and unusable.
To avoid the problem of data corruption that results from access conflicts, conventional storage consolidation software employs LUN masking software. LUN masking software runs on each server and masks the LUNs in order to prevent the operating system from automatically assigning the LUNs. In effect, LUN masking software masks or hides a device from a server. The system administrator may then use the storage consolidation software to assign LUNs to each server as needed. Because a server can access only those devices that it sees on the network, no access conflicts can arise if each LUN is masked to all but one server.
As storage available to a computer network increases, the need for adequate backup storage also increases. Often a computer network employs the use of dedicated backup storage devices, such as tape storage devices. Storing data on tapes is considerably cheaper than storing data on disks. Tapes also have large storage capacities, ranging from a few hundred kilobytes to several gigabytes. Because tapes are sequential-access media, accessing data on-tapes is much slower than accessing data on disks. As a result, tape storage devices are more appropriate for long-term storage and backup while disk drives are more appropriate for storing data to be used on a regular basis (such as a storage device for a SAN).
During backup operations, some or all of the storage devices available to the network transmit all or a portion of stored data to the dedicated backup storage devices. Backup operations are implemented to safeguard computer systems against disasters or other events that result in data loss. In the event of a disaster, data may be recovered from the dedicated backup storage devices. Examples of disasters that are caused by hardware failures include memory errors, system timing problems, resource conflicts, and power loss. Disasters may also be caused by software failure, file system corruption, accidental deletion, computer virus infection, theft, sabotage, or even natural disasters. One of the most common disasters occurs when a server on the LAN or WAN experiences a software failure or crash or suffers some other serious failure that causes the server to stop working or abort an application unexpectedly. Regardless of the cause of the disaster, user data may be lost. To restore the affected server to its previous state, the system administrator or user must copy the backup data to the affected server.
During the recovery process, backup data must be read from the dedicated backup storage devices on the storage network. As discussed above, a server normally runs LUN masking software to prevent the server from seeing and interfering with storage devices on the SAN that the server does not have the right to use because such interference can cause data corruption. But after a disaster, an affected server may no longer be running LUN masking software. Unfortunately, this creates a “catch-22” situation in the recovery of backup data. The LUN masking software must be recovered from the dedicated backup storage device on the storage network, yet the LUN masking software must already be running on the affected server in order for the affected server to safely interact with the storage network.
To prevent the affected server from accessing storage devices that are already claimed by another server and subsequently corrupting the data stored on those storage devices, system administrators frequently follow the steps of disconnecting the affected server from the fabric and connecting it to its associated dedicated backup storage device. Only then can the system administrator initiate the recovery process and restore the affected server. This
Baker & Botts L.L.P.
Beausoliel Robert
Dell Products L.P.
Wilson Yolanda L.
LandOfFree
System and method for the prevention of corruption of... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for the prevention of corruption of..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for the prevention of corruption of... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3161955