Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique
Reexamination Certificate
2001-01-19
2003-03-25
Yoo, Do Hyun (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Control technique
C711S112000, C711S114000, C710S068000
Reexamination Certificate
active
06539460
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to storage server systems, for example, having a data cache where the data is maintained in compressed form, and particularly to an improved method for storing units of data in the data cache where the unit of storage is the size of a disk sector and headers and trailers having metadata and redundancy check information are employed.
2. Discussion of the Prior Art
Storage servers are computer systems functioning to manage securely and efficiently large amount of data stored on hard disks, optical disks, magnetic tapes or other forms of mass storage media.
FIG. 1
is a block diagram depicting the typical structure of a storage server device
100
. As shown in
FIG. 1
, the storage server
100
is connected to the hosts via one or more host adapters
101
, which performs the communication tasks required by the specific protocol of the selected interconnection network (e.g., Gigabit Ethernet, Token Ring or Fibre Channel). The host adapters
101
are connected to one or more processors or processor clusters
103
via a cluster interconnection network
102
, which provides the media, protocols and services that ensure the communication between host adapters and processor clusters. To ensure continuous service in case of failure, usually a storage server has two or more processors or processor clusters
103
, each operating across different power boundaries, so that lack of power at one cluster does not affect the rest of the system. Non Volatile Store (NVS)
107
may additionally be used to speed up write operations while maintaining high reliability. In medium-to-large size storage servers, processor clusters are used instead of individual processors. As known, processor clusters may be arranged, for instance, in a Symmetric Multi-Processor (SMP) configuration, where multiple processors share the same memory. The processor clusters provide all the functionality required to guarantee data integrity, including data recovery in case of failure. Each processor cluster is connected to one or more device adapters
104
, which control the operations of hard disks, optical disks, magnetic tapes or other forms of mass storage media devices
105
. Processor clusters may additionally share device adapters. The device adapters can provide additional data integrity functionality, such as RAID services.
The hosts served by a storage server are often heterogeneous. The data is transferred to and from the storage server in atomic units, which are usually pages containing 4 Kilobytes (KB) of data. Pages are usually divided into sectors, usually containing 512 Bytes of data, because a (disk) sector is the atomic unit of I/O supported by the disks. Some operating systems add headers or trailers to the sectors. For example as described in G. Soltis, “Inside the AS/400”, Duke Press, Loveland CO, 1996, p.217, the operating system OS/400, which is the operating system of IBM AS/400, adds 8 Bytes of system header in front of each 512 Byte sector data, hence each sector contains 520 Bytes. To reduce the risk of data corruption, further headers or trailers may be added to the data within the storage server. For example, the host adapters can compute cyclic redundancy check bits and append them to each sector, further increasing the size of the sector, e.g., to 524 Bytes. The disk cache now may include simultaneously sectors of different size (but containing each 512 Bytes of sector data), depending on which type of host wrote them, which complicates its management. Alternatively, dummy headers and trailers may be appropriately added to sectors, so that all sectors now have the same size, and this approach wastes a small amount of space, but significantly simplifies the cache management.
While in the past 20 years the speed of processors has increased by a factor of 1000 or more, the speed of disks has barely increased by a factor of 3 to 4. Consequently, accessing data on disk is in general a very expensive operation in terms of latency. To hide part of the latency, in the storage server 100 of
FIG. 1
, a disk cache
106
may be employed, as taught, for instance, in the reference “Disk cache—miss ratio analysis and design considerations”,
ACM Trans. Comput. Syst
. 3, (Aug. 1985), pp. 161-203. by A. J. Smith. A disk cache is a fast memory (for example DRAM) that contains a copy of part of the content of the data stored on disk. Usually, the most recently read part of the disk is stored in the disk cache, and prefetching algorithms can be used to load in the cache data with addresses close to those of the most recently read data. If a host request is for data contained in the cache (event called “cache hit”), the latency of the data transfer is equal to the time required to process the request plus the time to read the data from memory and transmit it. If the data is not in the cache (“cache miss”), then the time to serve the host request is dominated by the disk access latency. A cache miss can have latency of three to four order of magnitude larger than a cache hit.
As described in the above-mentioned reference to A. J. Smith, the larger the cache, the smaller is the miss rate, and the better the performance of the overall system. However, the cost of RAM is about two order of magnitude larger than the cost of disk for the same capacity, and the gap appears to be growing. It seems therefore beneficial to increase the capacity of the cache by compressing its content.
Compressed caches are taught in the art, for example, as described in U.S. Pat. No. 5,875,454, to D. J. Craft, R. Greenberg entitled Compressed DataCache Storage System.
FIG. 2
illustrates an example of a system architecture for a compressed cache
200
, for a processor accessing data at high speed and in small memory block units, and a mass storage medium holding data in large transfer units. Uncompressed data is read from a mass storage system
201
in large transfer units (e.g., 64K to 200K bytes). The data so received is divided into 4K blocks which are individually compressed through the use of a Lempel-Ziv-type of lossless compressor
202
. The compressed 4K blocks are stored in the cache
203
, in an integer, variable number of allocation units
204
, which are fixed-size sections of contiguous memory, having size, for example, 512 bytes. The compressed block need not be stored in contiguous allocation units.
The actual locations within the cache of the first allocation unit for a transfer unit is recorded in the directory
205
. The other allocation units for the transfer unit are connected through a linked list. When a read request is received via the I/O interface
208
, the data is read from allocation units where the required block is stored, decompressed by a fast decompressor
207
, and sent to the computer via the I/O interface
208
. All the operations are controlled by a compressed data cache controller
206
, which also maintains the cache directory
205
, and performs the usual caching functions (such as replacing policies, free space management etc.).
One downside of the scheme described in U.S. Pat. No. 5,875,454, is the cost of the special-purpose hardware that needs to be developed, the use of a moderate speed compressor, and the use of a linked list to connect the allocation units for each transfer unit. However, the described prior art scheme is useful for moderate size caches (for example several MB, as described in U.S. Pat. No. 5,875,454). However, in an enterprise-class storage server, it is desirable to have disk caches having capacity equal to 0.1% to 2% of that of the entire disk subsystem. Typical sizes of the disk subsystem are in the order of terabytes, and are growing, hence the desired disk cache size is in the order of the Gigabytes to hundreds of Gigabytes, and will grow in the future. Additionally, such servers are designed to serve a large number of hosts. Hence, the compressor speed becomes very important, data integrity is essential, the management of the cache becomes more complex, and new services must be provi
Castelli Vittorio
Franaszek Peter A.
Heidelberger Philip
Robinson John T.
Dinh Ngoc V
International Business Machines - Corporation
Jennings, Esq. Derek S.
Scully Scott Murphy & Presser
Yoo Do Hyun
LandOfFree
System and method for storing data sectors with header and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for storing data sectors with header and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for storing data sectors with header and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3033449