Testing components of a computerized storage network system...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C714S043000

Reexamination Certificate

active

06754853

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to data storage in a computerized storage area network (SAN) or system utilizing multiple controllers. More particularly, the present invention relates to a new and improved technique of determining whether one of the controllers or a device connected to the controller is functioning properly. Rather than merely detecting a lack of response to a data access request and inferring that something is not working, a test of certain capabilities of the controller is initiated so that particular problems can be diagnosed.
BACKGROUND OF THE INVENTION
In a computerized storage area network (SAN), various storage devices, such as hard drives, compact disc (CD) drives, tape drives and the like, are used to store data. The storage devices are typically arranged in groups, such as a RAID (Redundant Array of Independent Drives) configuration. One or more redundant disk array controllers (a.k.a. RDAC) are connected to each group of storage devices to control access to the storage devices. The groups are sometimes contained in storage units, such as storage arrays, so the controllers handle data accesses between the individual storage devices within the storage array and other components of the SAN outside of the storage array.
The storage area network (SAN) also typically includes a plurality of host devices connected through a switched, or network, fabric to the storage arrays. The host devices access a plurality of logical data volumes present on the storage devices in the storage arrays, usually on behalf of a plurality of client devices which are typically connected to each host device. Each storage array is connected at the controllers to one or more host devices through the network fabric.
Each host device can typically transfer data with each storage array and the logical data volumes stored therein through more than one data path. Each data path extends through the switched fabric to one of the controllers in the storage array. Since the storage array typically contains two (and possibly more) of the controllers, the host device typically has two (and possibly more) data paths to each storage array. The controllers are “redundant” because typically either one can satisfy data access requests from any host device to any storage device or logical data volume on the storage array.
The redundancy ensures that the logical data volumes will be available to the host devices in the event that one of the data paths develops a problem or fails to operate. If a host device detects a failure in one of the data paths to a storage array, the host device switches to the other data path to access the storage array.
The host device typically detects the failure when the host device sends a data access request through the data path, but either a response is not returned within a predetermined time period or the response includes an error notification. The problem that caused the error or failure may have occurred in the data path (e.g. in the switched fabric, a networking device, a cable or other component of the data path) or in the host device (e.g. in a network interface card or host bus adapter through which the host device accesses the switched fabric) or in the storage array (e.g. in the array controller, the storage device or other component of the storage array). However, no determination is made by the host device regarding the cause of the failure. Instead, a notification is sent to a system administrator indicating the data path that is not responding. It is typically then left to the system administrator to perform the burdensome task of diagnosing or troubleshooting the problem that caused the failure.
It is with respect to these and other background considerations that the present invention has evolved.
SUMMARY OF THE INVENTION
The present invention relieves some of the burden from the system administrator for troubleshooting the problem that caused a failure in a data path by automatically initiating a test of one or more of the array controllers in the storage array and disabling certain non-functional equipment when a problem is detected. The present invention also monitors the functional condition or status of the storage array by periodically initiating the test of the array controller(s), so the status of the storage array can be determined even before the host device has detected a failure or error.
One of the array controllers initiates the test of the other array controller, so if the controller under test is not functioning properly, the controller initiating the test can provide explanatory results of the test to the host device or the system administrator. The test checks the operation of parts of the array controller, the storage devices and the network fabric, so if the problem exists in one of these components of the storage area network, the explanatory results can provide the location of the problem for the system administrator, who can then quickly correct the problem. Even if the test does not identify a problem in any of the checked components, when the host device, nevertheless, has detected a failure, then the test will have eliminated the checked components as the source of the problem, so the system administrator can focus any troubleshooting efforts elsewhere.
These and other improvements are achieved by testing the operational condition of one of the controllers in a computerized system that has at least two controllers and one or more storage devices. The controllers are for controlling access to computerized data stored on the storage devices. The second controller sends a test command to the first controller to cause the first controller to execute predetermined operating functions. In response, the first controller attempts to perform the predetermined operating functions, preferably by directing certain data access commands to the storage devices. The outcome of the attempted predetermined operating functions is analyzed to determine whether the first controller was successful in performing the predetermined operating functions. The operational condition of the first controller is then determined based on whether the first controller was successful in performing the predetermined operating functions.
The controller under test preferably performs a read operation and/or a write operation on one or more of the storage devices to test its ability to access the storage devices. For the read operation, the controller initiating the test preferably writes some test data to the storage devices and then passes some test information to the controller under test with which the controller under test can check the test data after reading the test data from the storage devices. For the write operation, the controller under test preferably generates additional test data from the same test information and writes the additional test data to the storage devices, so the controller initiating the test can read the additional test data and check it with the original test information. Additionally, to perform either or both of the read and write operations, the controller under test preferably issues read and/or write commands to itself, to which the controller under test responds in a normal fashion as if the read and/or write commands were generated externally. Furthermore, the computerized system is preferably part of a networked storage system, and the controller under test preferably sends the read and/or write commands to an external device, such as a network device, that returns, or “loops back,” the commands to the controller under test.
The previously mentioned and other improvements are also achieved in a storage array for servicing data access requests received from the host devices through the network. The storage array includes an array of storage devices, two array controllers and a memory device (e.g. memory RAM). The array controllers are connected to each other, the network, the array of storage devices and the memory device. The memory device contains firmware instructions that cause the array controllers to perfo

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Testing components of a computerized storage network system... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Testing components of a computerized storage network system..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Testing components of a computerized storage network system... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3293175

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.