Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-05-17
2003-04-22
Beausoliel, Robert (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C714S052000, C714S770000, C707S793000
Reexamination Certificate
active
06553511
ABSTRACT:
BACKGROUND OF THE INVENTION
This invention relates to mass information storage systems used in high-performance computer systems. More particularly, this invention relates to a new and improved technique of using sequence number and revision number metadata for assuring high data integrity against data path errors or drive data corruption errors which may inadvertently occur during the transfer of data to, or the retrieval of data from storage media, such as a redundant array of independent disks (RAID) mass storage system.
In high-performance computer systems, a mass storage system must be capable of rapidly supplying information to each processor of the computer system in a “read” operation, rapidly transferring information for storage in a “write” operation, and performing both read and write operations with a high level of integrity so that the information is not corrupted or lost. Incorrect, corrupted or lost information defeats or undermines the effectiveness of the computer system. The reliability of the information is an absolute necessity in most if not all business computing operations.
A variety of high-performance mass storage systems have been developed to assure rapid information storage and retrieval operations. The information storage and retrieval operations are generally the slowest operations performed by computer system, consequently the information storage and retrieval operations limit the speed and functionality of the computer system itself.
One popular mass storage system which offers relatively rapid information storage and retrieval capabilities at moderate cost, as well as the capability for assuring a relatively high integrity of the information against corruption and loss, is a redundant array of independent or inexpensive disks (RAID) mass storage system. In general, a RAID mass storage system utilizes a relatively large number of individual, inexpensive disk drives which are controlled separately and simultaneously. The information to be written is separated into smaller components and recorded simultaneously or nearly simultaneously on multiple ones of the disk drives. The information to be read is retrieved almost simultaneously in the smaller components from the multiplicity of disk drives and then assembled into a larger total collection of information requested. By separating the total information into smaller components, the time consumed to perform reading and writing operations is reduced. On the other hand, one inherent aspect of the complexity and speed of the read and write operations in a RAID mass storage system is an increasing risk of inadvertent information corruption and data loss arising from the number of disk drives and the number and complexity of the input/output (I/O) operations involved.
Various error correction and integrity-assuring software techniques have been developed to assure that inadvertent errors can be detected and that the corrupted information can be corrected. The importance of such integrity-assuring techniques increases with higher performance mass storage systems, because the complexity of the higher performance techniques usually involve an inherent increased risk of inadvertent errors. Some of these integrity-assuring techniques involve the use of separate software which is executed concurrently with the information storage and retrieval operations, to check and assure the integrity of the storage and retrieval operations. The use of such separate software imposes a performance penalty on the overall functionality of the computer system, because the concurrent execution of the integrity-assuring software consumes computer resources which could otherwise be utilized for processing, reading or writing the information. Another type of integrity-assuring technique involves attaching certain limited metadata to the data to be written, but then requiring a sequence of separate read and write operations involving both the new data and the old data. The number of I/O operations involved diminish performance of the computer system. Therefore, it is important that any integrity-assuring software impose only a small performance degradation on the computer system. Otherwise the advantages of the higher performance mass storage and computing system will be lost or diminished.
Although the integrity-assuring software techniques used in most mass storage systems are reliable, there are a few classes of hardware errors which seem to arise inadvertently and which are extremely difficult to detect or correct on a basis which does not impose a performance degradation. These types of errors seem prone to occur to the disk drives, almost inexplicably. One example of this type of an error involves the disk drive accepting information in a write request and acknowledging that the information has been correctly written, without actually writing the information to the storage media. Another example involves the disk drive returning information in response to a read request that is from an incorrect disk memory location. A further example involves the disk drive writing information to the wrong address location. These types of errors are known as “silent” errors, and are so designated because of the apparent, but nevertheless incorrect, accuracy of the operations performed.
The occurrence of silent errors is extremely rare. However, such errors must be detected and/or corrected in computer systems where absolute reliability of the information is required. Because of the extremely infrequent occurrence of such silent errors, it is not advantageous to concurrently operate any integrity-assuring software or technique that imposes a continuous and significant penalty of performance degradation on the normal, error-free operations of the computer system.
Apart from silent errors, there are other situations in which data and parity inconsistency are detected due to incomplete write operations, failed disk input/output (I/O) operations or other general firmware and hardware failures. In such circumstances, it is desirable to utilize a technique to make determinations of consistency in the data and parity. Parity is additional information that is stored along with the data that defines the data and allows for reconstruction of the data. By knowing either the correct data or the correct parity, it is possible to correctly regenerate the correct version of incorrect data or parity. While a variety of integrity-assuring software techniques are available to regenerate the correct data or the correct parity, it is desirable to avoid the performance degradation penalty by continually executing separate software to continuously check data and parity.
It is with respect to these and other background considerations that the present invention has evolved.
SUMMARY OF THE INVENTION
The present invention involves creating a sequence number and a revision number and storing the sequence number and revision number as metadata along with the data itself in a mass storage system, such as a RAID system. The invention also involves utilizing the sequence number and the revision number in an effective way which does not impose a significant performance degradation penalty on the computer system or the mass storage system to detect and correct silent errors and errors of data and parity inconsistency.
One aspect of the present invention pertains to a method of creating metadata from user data to detect errors arising from input/output (I/O) operations performed on information storage media contained in a mass storage system. The method involves creating at least two user data structures and a parity data structure. Each user data structure contains user data and metadata which describes the user data contained in that same user data structure. The parity data structure is associated with the two or more user data structures and contains metadata and parity information which describes separately and collectively the user data and metadata in each of the two or more user data structures. A sequence number and a revision number are included as part of the me
DeKoning Rodney A.
Greenfield Scott E.
Langford, II Thomas L.
Beausoliel Robert
John R. Ley, LLC
LSI Logic Corporation
Wilson Yolanda L.
LandOfFree
Mass storage data integrity-assuring technique utilizing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Mass storage data integrity-assuring technique utilizing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Mass storage data integrity-assuring technique utilizing... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3060949