Electrical computers and digital processing systems: memory – Storage accessing and control – Memory configuring
Reexamination Certificate
1998-06-08
2001-01-23
Chan, Eddie P. (Department: 2751)
Electrical computers and digital processing systems: memory
Storage accessing and control
Memory configuring
C711S170000, C711S171000, C711S172000, C710S068000
Reexamination Certificate
active
06178489
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to storage devices such as disk drives and to LRU or LFU organized storage subsystems. More particularly, this invention relates to managing writes in place and their updates of fixed-length symbol strings under selective compression and recency of usage constraints at either the linear address-mapped storage device or subsystem level.
DESCRIPTION OF THE RELATED ART
As used in the specification, the terms “data string” or “symbol string” are used synonymously. Also, an uncompressed data or symbol string comprises a fixed number of symbols. Relatedly, the terms “data strings”, “page”, “unit of work”, and “unit of storage” are likewise used synonymously. Furthermore, a disk drive is defined as a cyclic, multitracked storage device and is synonymously known as a direct access storage device or DASD. Lastly, a nominal compression ratio is an average compression ratio value associated with a given compression facility or device. An effective compression ratio is usually a lesser value than the nominal ratio and is applied as a compression measure to storage devices and subsystems.
Aspect of Compression
The term “compression” refers to a series of operations to reduce the number of symbols representing a fixed-length string of data. The compression may be lossless or lossy. Lossless compression enables recapture of the original symbol strings from their compressed image, while lossy compression means that some portion of the original symbol string cannot be recovered from the compressed version. While ultimately all forms of compression are founded upon a statistical characterization of an information source, practical compression utilizes either dictionary-based techniques or statistically-based techniques.
Dictionary-based techniques utilize a pointer to a word in a dictionary or codebook, the length of the pointer to the word being shorter than the word itself. In contrast, statistical compression substitutes codewords for the original symbols or symbol string. In the statistical case, length of the substitute codewords is inverse to the frequency of occurrence of the source symbols or symbol sequence from the source alphabet or language.
Typically, symbol strings may represent alphanumerically-coded text or raster-scanned image data. Alphanumerically-coded or text data usually is expected to be lossless since the information may not be elsewhere repeated for recovery purposes. However, raster-scanned image or voice data contains significant redundancy. Thus, in the case of image data, one or more datapoint losses could be recovered from the uncompressed image by signal scaling, averaging, or other techniques. A simple illustration is the difference exemplified between an error in a bank account and a slight blemish in a photograph or a weather map.
It is also well appreciated that prior art compression methods shorten alphanumeric or text data strings into a range from 1.5/1 to 4.0/1, while raster-scanned or image data is more highly compressible frequently up to 10/1 to 15/1. One form of irreversible compression, also called “compaction”, involves removing space or other syntactical characters between groups of text characters. It would not be unusual for an uncompressed symbol string to achieve a 4/1 compressed image by first compacting the symbol string and then compression encoding it.
Global Data Compression Characteristics
Compression is to be found in communications and storage applications because of their economies of scale. However, the high capital costs of communications and storage facilities require simplicity in design and operation. This has led to storage systems being tuned to conservative compression ratios of expected text and image usages. Thus, any general compression coding is more likely to achieve a compression ratio of 2/1 or 3/1 than 10/1 or 15/1.
The choice of a nominal compression ratio is discussed in Brodie et al., U.S. Pat. No. 5,666,114, “Method and Means for Managing Linear Mapped Address Spaces Storing Compressed Data at the Storage Subsystem Control Unit or Device Level”, issued Sep. 9, 1997. The essential point is that the symbol probability distributions, even for a first order approximation for English text, may be characterized as Gaussian or Laplacian and skewed or peaked to the left. When this distribution is superimposed on an ordered set of compression ratios, as in
FIG. 2
of the Brodie '114 patent, the distribution peak nominally falls in the 1.5/1 to 4/1 compression range.
Unpredictability of Compression Effects on Storage Performance
Clearly, compression increases the storage capacity of any memory device such as a buffer, cache, or disk-backing store. Also, it is clear that memory and storage accessing and management are enhanced when the units of storage are fixed and uniform. However, compression operations map fixed-length strings into variable-length strings. Furthermore, some of the strings, pages, blocks, records, or tracks are expected to be write modified or updated. This means that the compressed images of changed symbol strings may change in length. They may even expand under compression. Indeed, it has been the unpredictability of the length of the compressed image, especially when a modified symbol string has been recompressed, that has daunted the use of compression in disk-based storage.
Storage Management and Hierarchical Storage
Storage subsystems comprise an alterable storage medium, an accessing mechanism, and an addressing protocol. Uncompressed data is either written to an address in a storage model mapped onto the subsystem or is read from an address in the storage subsystem. The subsystem manages all aspects of data storage within its boundary. This includes managing the storage model, accessing operations such as address conversions, transducer positioning, data compression and decompression, and error detection and correction.
CPUs send read and write requests to storage subsystems in either random or sequential patterns. These patterns influence the storage subsystem design. Random referencing assumes that the largest fraction of repeat referencing requests will be made to a small percentage of clusterable data. This is termed “locality of referencing”. Based on the locality assumption, subsystems include a high-speed electronic random access memory or “front store” for storing a fraction of either the most recently used (MRU) or the most frequently used (MFU) data strings. The front store is coupled to a very large capacity but substantially slower speed “backing store”.
The larger storage subsystems maintain separately-tuned paths for read and write requests. In this regard, update writes are written out from the front store to a nonvolatile store (NVS) while the storage subsystem sends a write complete signal to the CPU. This permits the storage subsystem to record the update writes on the backing store asynchronously. However, a choice must be made whether to write out the changes on an update in place or on a scatter write basis. Updating in place preserves the locality of reference for future accessing purposes, while scatter writing assists the speed with which updates can be recorded on the backing store.
Compression Management in Linear Address Mapped Storage
The aforementioned Brodie et al. '114 patent is directed to managing a fixed-size storage facility under compression ordinarily accessed as a linear-mapped, addressable space. Brodie seeks to control the compression and recompression length variability and to preserve locality of referencing in the presence of update writes. First, he partitions the storage facility into a linear addressable space for compressed symbol string images and a linked list addressable space for overflow portions of each compressed image. The linear addressable space includes a plurality of fixed equal-size locations. Second, a token to the overflow portion is embedded in the linear address. The system dynamically manages space by periodically readjusting the percentage of space allocated between the linearly
Bluestone Randall J.
Chan Eddie P.
Felsman Bradley Vaden Gunter & Dillon, LLP
International Business Machines - Corporation
Klein Esther E.
LandOfFree
Method and apparatus for managing linear address mapped... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for managing linear address mapped..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for managing linear address mapped... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2554721