Error detection/correction and fault detection/recovery – Data processing system error or fault handling – Reliability and availability
Reexamination Certificate
2000-04-07
2003-05-27
Iqbal, Nadeem (Department: 2184)
Error detection/correction and fault detection/recovery
Data processing system error or fault handling
Reliability and availability
C711S163000
Reexamination Certificate
active
06571351
ABSTRACT:
FIELD OF THE INVENTION
This invention is related to non-volatile, mass electronic storage systems, also known as secondary storage systems.
BACKGROUND INFORMATION
Secondary storage systems are widely used with different types of client applications including on-line transaction processing and multimedia storage. Transaction processing includes for instance credit card processing which is characterized by the client requesting a relatively large number of small data transactions. In contrast, multimedia storage such as video and music file access generally requires significantly larger transactions. In a typical operation, the client application sends a high level request for reading or writing a file to a file system. The file system maintains a file system name space, and maps file reads/writes to lower level block accesses. These block accesses are then fed to a storage engine that is typically part of a secondary storage system that includes a rotating magnetic disk drive. The storage engine typically has no knowledge of whether the file system is using a particular block in the disk drive, and as such can be described as being independent of the file system.
A currently popular technique for implementing a high performance, large capacity, and low cost secondary storage system is the Redundant Array of Inexpensive Disks (RAID). In a RAID, a set of rotating magnetic disk drives (referred to here as simply “disks”) are organized into a single, large, logical disk. Each disk in the set typically has the same number of platters and the same number of tracks on each platter where the data is actually stored. The data is “striped” across multiple disks to improve read/write speeds, and redundant information is stored on the disks to improve the availability of data (reliability) in the event of catastrophic disk failures. The RAID secondary storage system can typically rebuild the failed data disk, without involving the file system, by regenerating each bit of data in each track and platter of the failed disk (using its knowledge of the redundant information), and then storing each such bit in corresponding locations of a new, replacement disk.
Several RAID architectures (known in the industry as “levels”) have been developed to provide various combinations of cost, performance, and reliability. For instance, in a Level I RAID, a block of data received from the file system by an input/output (I/O) controller of the RAID is stored entirely in one disk and replicated in a check disk. Level I thus uses twice as many disks as a nonredundant disk array, but provides speedy access (either disk by itself may be used to retrieve the block) and high availability. Accordingly, Level I is frequently used with applications such as on-line transaction processing where availability and transaction rate are more important than storage capacity efficiency.
In contrast, in a Level III RAID architecture, the block of data is spread bit-wise over a number of data disks, and a single parity disk is added to tolerate any single disk failure. The use of parity rather than a replicate of the data lowers the availability in comparison with a Level I architecture. However, storage efficiency is greatly improved as only one parity disk is used for several data disks. Thus, Level III may particularly suit applications that require the storage of large amounts of data and high throughput where data is accessed sequentially most of the time, as in digital video file storage.
There are two problems with the above described RAID architectures. First, as the storage capability of the typical disk drive steadily increases, disk rebuild times also increase. Since, while rebuilding a failed disk, the storage system is unprotected, i.e. a second disk failure implies the total failure of storage system, longer rebuild times can become a serious problem. This problem becomes even greater as the likelihood of failure increases with larger RAID sets having greater numbers of disk drives.
Another problem is reduced read/write performance with applications such as television production, where large transactions involving requests to retrieve or store media files, such as television program and commercial files, are combined with smaller transactions that access text files or “metadata” files which describe the commercial or program contained in a particular media file. Although RAID Level I provides speedy access for both large and small transactions, duplicating the large media files makes inefficient use of the total storage space as compared to that which can be obtained using Level III. Performance using RAID Level III, however, suffers for small, randomly addressed transactions due to the need to access the data in a random fashion over several disks rather than just one.
SUMMARY
An embodiment of the invention described below benefits from the concept of closely coupling a fault tolerant, mass storage engine (such as a RAID engine) with a file system to achieve greater overall throughput in storage/database applications having a mix of large, sequential access transactions and small, random access transactions. In addition, disk rebuild time may be greatly reduced using such an embodiment.
A method according an embodiment of the invention includes dividing a logical storage space representing a storage area in a set of non-volatile storage devices into nonoverlapping storage allocation units (SAUs), each SAU to overlay all devices in the set. Different fault tolerant storage methodologies (FTSMs) are assigned to access (i.e. read/write) data in the different SAUs, respectively. Access to the data is done based on the particular FTSM for the SAU that is being accessed.
In a particular embodiment, an allocation table can be shared by both the file system and the RAID engine by virtue of the table being made public to the RAID engine. This allows the file system to chose the optimal fault tolerant storage methodology for storing a particular data stream or data access pattern, while simultaneously allowing the RAID engine to properly recover the data should a disk fail (by referring to the allocation table to determine which fault tolerant storage methodology was used to store the file.) Also, when the RAID engine rebuilds a failed disk, only the SAUs that are indicated in the allocation table as being used by the file system are rebuilt, thus saving rebuild time which is particularly advantageous when large capacity individual disk drives are being used.
REFERENCES:
patent: 5835940 (1998-11-01), Yorimitsu et al.
patent: 0485110 (1992-05-01), None
patent: 0584804 (1994-03-01), None
patent: 0701198 (1996-03-01), None
patent: 0768606 (1997-04-01), None
patent: 0801344 (1997-10-01), None
patent: WO91/13404 (1991-09-01), None
patent: WO97/07461 (1997-02-01), None
“RAID High Performance, Reliable Secondary Storage”, P.M. Chen et al, ACM Computing Serveys 26:2 (Jun., 1994) pp 145-185.
“A Case for Redundant Arrays of Inexpensive Disks (RAID)”, David A. Patterson et al., Tech. Rep. UCB/CSD 87/391 (1987), Univ. of Calif.
Mitaru Alexandru
Powell Michael
Stallkamp Richard W.
Blakely , Sokoloff, Taylor & Zafman LLP
Iqbal Nadeem
Omneon Video Networks
LandOfFree
Tightly coupled secondary storage system and file system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Tightly coupled secondary storage system and file system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Tightly coupled secondary storage system and file system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3041294