Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2000-03-20
2002-09-24
Yoo, Do Hyun (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S119000, C711S130000, C711S141000, C711S146000, C711S155000, C711S159000, C711S202000, C711S206000, C709S213000
Reexamination Certificate
active
06457104
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to the field of compressed memory architecture in computer systems, and more specifically to an improved method and apparatus for managing compressed main memory.
2. Discussion of the Prior Art
Computer systems generally consist of one or more processors that execute program instructions stored within a memory medium. This medium is most often constructed of the lowest cost per bit, yet slowest storage technology. To increase the processor performance, a higher speed, yet smaller and more costly memory, known as a cache memory, is placed between the processor and final storage to provide temporary storage of recent/and or frequently referenced information. As the difference between processor speed and access time of the final storage increases, more levels of cache memory are provided, each level backing the previous level to form a storage hierarchy. Each level of the cache is managed to maintain the information most useful to the processor. Often more than one cache memory will be employed at the same hierarchy level, for example when an independent cache is employed for each processor. Cache memory systems in computing devices have evolved into quite varied and sophisticated structures, but always they address the tradeoff between speed and both cost and complexity, while functioning to make the most useful information available to a processor as efficiently as possible. Typically only large “mainframe” computers employ memory hierarchies greater than three levels. However, systems are now being created using commodity microprocessors that benefit greatly from a third level of cache in the memory hierarchy. This level is best suited between the processor bus and the main memory, and being shared by all processors and in some cases the I/O system too, it is called a shared cache. Each level of memory requires several times more storage than the level it backs to be performance effective, therefore the shared cache requires several tens of megabytes of memory. To remain cost effective, the shared cache is implemented using low cost Dynamic Random Access Memory (DRAM), organized as a separate array or a portion of the system main memory.
Recently, cost reduced computer system architectures have been developed that more than double the effective size of the main memory by employing high speed compression/decompression hardware based on common compression algorithms, in the path of information flow to and from the main memory. Processor access to main memory within these systems is performed indirectly through the compressor and decompressor apparatuses, both of which add significantly to the processor access latency costs.
Referring now to 
FIG. 1
, a block diagram of a prior art computer system 
100
 is shown. The computer system includes one or more processors 
101
 connected to a common shared memory controller 
102
 that provides access to a system main memory 
103
 through a shared cache 
114
. The shared memory controller contains a compressor device 
104
 for compressing fixed size information blocks into as small a unit as possible for ultimate storage into the main memory 
103
, a decompressor device 
105
 for reversing the compression operation after the stored information is later retrieved from the main memory, and a cache controller 
115
 for managing a cache memory to contain uncompressed information. The cache controller 
115
 is connected to the memory controller 
106
 through at least a read request 
119
 and read request address 
120
 to signal the memory controller to read a quantity of information from the main memory for placement in to the cache 
114
 via bus 
117
. Information may be transferred to the processor data bus 
108
 from the cache 
114
 through bus 
117
, or from the main memory 
103
, either through or around the decompressor 
105
 via a multiplexor 
111
. Similarly, information may be transferred to the cache from the main memory 
103
 from the processor data bus 
108
. Information may be transferred to the main memory 
103
 from the processor data bus 
108
 or cache 
114
, either through or around the compressor 
104
 via a multiplexor 
112
. The processor data bus 
108
 is used for transporting uncompressed information between other processors and/or the shared memory controller 
102
, and the shared cache 
114
.
The main memory 
103
 is typically constructed of dynamic random access memory (DRAM) with access controlled by a memory controller 
106
. Addresses appearing on the processor address bus 
107
 and cache address bus 
116
 are known as Real Addresses, and are understood and known to the programming environment. Addresses appearing on the main memory address bus 
109
 are known as Physical Addresses, and are used and relevant only between the memory controller and main memory DRAM. Memory Management Unit (MMU) hardware within the memory controller 
106
 is used to translate the real processor addresses to the virtual physical address space. This translation provides a means to allocate the physical memory in small increments for the purpose of efficiently storing and retrieving compressed and hence, variable size information.
The compressor 
104
 operates on a fixed size block of information, say 1024 bytes, by locating and replacing repeated byte strings within the block with a pointer to the first instance of a given string, and encoding the result according to a protocol. This process occurs through a byte-wise compare over a fixed length and is paced by a sequence counter, resulting in a constant completion time. The post process output block ranges from just a few bytes to the original block size, when the compressor could not sufficiently reduce the starting block size to warrant compressing at all. The decompressor 
105
 functions by reversing the compressor operation by decoding resultant compressor output block to reconstruct the original information block by inserting byte strings back into the block at the position indicated by the noted pointers. Even in the very best circumstances, the compressor is generally capable of only ¼-½ the data rate bandwidth of the surrounding system. The compression and decompression processes are naturally linear and serial too, implying quite lengthy memory access latencies through the hardware.
FIG. 2
 depicts a prior art main memory partitioning scheme 
200
.
The main memory 
205
 is a logical entity because it includes the processor(s) information as well as all the required data structures necessary to access the information. The logical main memory 
205
 is physically partitioned from the physical memory address space 
206
. In many cases the main memory partition 
205
 is smaller than the available physical memory to provide a separate region to serve as a cache with either an integral directory, or one that is implemented externally 
212
. It should be noted that when implemented, the cache storage may be implemented as a region 
201
 of the physical memory 
206
, a managed quantity of uncompressed sectors, or as a separate storage array 
114
. In any case, when implemented, the cache controller requests accesses to the main memory in a similar manner as a processor would if the cache were not present.
The logical main memory 
205
 is partitioned into the sector translation table 
202
, with the remaining memory being allocated to sector storage 
203
 which may contain compressed or uncompressed information, free sector pointers, or any other information as long as it is organized into sectors. The sector translation table region size varies in proportion to the real address space size which is defined by a programmable register within the system. Particularly, equation 1) governs the translation of the sector translation table region size as follows: 
sector_translation
⁢
_table
⁢
_size
=
real_memory
⁢
_size
compression_block
⁢
_size
·
translation_table
⁢
_entry
⁢
_size
1)
Each entry is directly mapped to a fixed address range in the processor's real
Tremaine R. Brett
Wazlowski Michael
Jennings Derek S.
Scully Scott Murphy & Presser
Song Jasmine
Yoo Do Hyun
LandOfFree
System and method for recycling stale memory content in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for recycling stale memory content in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for recycling stale memory content in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2897395