System and method for avoiding storage failures in a storage...

Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details System and method for avoiding storage failures in a storage... System and method for avoiding storage failures in a storage...

: 1999-05-27
: 2002-08-27
: Beausoleil, Robert (Department: 2785)
: Error detection/correction and fault detection/recovery
: Data processing system error or fault handling
: Reliability and availability

: C714S005110, C714S006130, C714S006130
: Reexamination Certificate
: active
: 06442711
: ABSTRACT:

FIELD OF THE INVENTION
This invention relates to storage systems and more particularly relates to a system and method for executing preventive maintenance of storage array systems.
BACKGROUND OF THE INVENTION
Redundant Arrays of Independent Disks (RAID) store large amounts of user data into a collection of disks. There are a plurality of levels of the RAID, such as levels
0
to
5
, having different characteristics of reliability, data availability, and cost performance.
In terms of reliability, the RAID protects the user data against loss or inaccessibility due to disk failures. Part of the RAID's physical storage capacity is used to store redundant or back-up data about the user data stored on the remainder of the physical storage capacity. The redundant data enables regeneration of the user data in the event that one of the array's member disks or the access path to it fails.
For example, a RAID system of level
4
(hereinafter, referred to as “RAID 4”) usually includes a plurality of data disks for storing user data received from a host computer, a parity disk for storing parity data, and. a spare disk for replacing one of the other disks if it fails. In RAID
4
, the user data is divided into a plurality of data blocks having a predetermined sequential address and a predetermined size. RAID
4
creates a parity block by carrying out exclusive OR (XOR) operations with a set of corresponding data blocks sequentially addressed on different data disks. The set of corresponding data blocks and the parity block make a “parity group”. Furthermore, the plurality of data blocks and the parity block are respectively distributed into the plurality of data disks and the parity disk in predetermined order.
In the event that one of the plurality of data disks or the parity disk fails completely and data on it becomes entirely unusable, RAID
4
regenerates a data block or a parity block of the failed disk using the remaining data blocks in the corresponding parity group and stores the regenerated data on the spare disk. This operation is referred to as “Hot Spare Operation”.
The Hot Spare Operation usually fulfills its function when an actual disk failure occurs. However it is also applicable to an exchange of disks in a preventive maintenance routine of the RAID as well as a recovery from an actual failure. When it is applied to the preventive maintenance routine, the RAID detects and counts the total number of errors of every disk. In the event that the total number of errors exceeds a predetermined value (“threshold value”), the RAID system alarms a necessity for exchanging the particular disk as a failed one to a new disk or automatically executes the Hot Spare Operation.
However, the RAID system judges when to execute the preventive maintenance only from the total number of errors specified as a maximum number of errors. Consequently, the RAID can not distinguish clearly an occasion when the errors are occurring at a normal error rate from an occasion when the errors are occurring at an abnormal error rate which requires preventive maintenance. There is some possibility that the RAID can not recognize a symptom of a fatal failure.
Furthermore, after executing the Hot Spare Operation, the RAID generally disconnects the failed disk from the system. Consequently, the RAID has no tolerance for recovering another disk failure until a new spare disk is attached. If another failure occurs before the attaching, that failure causes an irretrievable data loss.
SUMMARY OF THE INVENTION
Accordingly, it is an object of the present invention to provide a system and method for executing preventive maintenance of the conventional storage array system to achieve higher reliability.
A storage array system, consistent with the present invention, comprises a plurality of data storage devices for storing data and a control unit for controlling input and/or output operations of the plurality of data storage devices. The control unit includes means for storing a history of self recovered errors for each of the plurality of data storage devices, means for calculating an error rate of each of the plurality of data storage devices on the basis of the history of errors, and means for judging a reliability of operation of each of the plurality of data storage devices from the error rate.
A storage array system, consistent with the present invention, comprises a plurality of data storage devices for storing data, a spare storage device for replacing one of the plurality of data storage devices, and a control unit for controlling input and/or output operations of the plurality of data storage devices and the spare storage device. The control unit includes means for storing a history of self recovered errors for each of the plurality of data storage devices, means for calculating an error rate of each of the plurality of data storage devices on the basis of the history of errors, means for judging a necessity to execute preventive maintenance of each of the plurality of data storage devices from the error rate, and means for executing the preventive maintenance.

REFERENCES:
patent: 5422890 (1995-06-01), Klingsporn et al.
patent: 5611069 (1997-03-01), Matoba
patent: 5717850 (1998-02-01), Apperley et al.
patent: 5727144 (1998-03-01), Brady et al.

Affiliated with

Kinjo Morishige

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sasamoto Kyouichi

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Beausoleil Robert

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wilson Yolanda L.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for avoiding storage failures in a storage... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for avoiding storage failures in a storage..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for avoiding storage failures in a storage... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2973321

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure