Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2001-05-08
2003-01-21
Ellis, Kevin L. (Department: 2188)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S105000
Reexamination Certificate
active
06510492
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a method and structure for implementing a memory system. More specifically, the invention relates to a second level cache memory.
2. Description of the Prior Art
High-speed computer systems frequently use fast, small-capacity cache (buffer) memory to transmit signals between a fast processor and a slow (and low cost), large-capacity main memory. Cache memory is typically used to temporarily store data which has a high probability of being selected next by the processor. By storing this high probability data in a fast cache memory, the average speed of data access for the computer system is increased. Thus, cache memory is a cost effective way to boost system performance (as compared to using all high speed, expensive memories). In more advanced computer systems, there are multiple levels (usually two levels) of cache memory. The first level cache memory, typically having a storage of 4 Kbytes to 32 Kbytes, is ultra-fast and is usually integrated on the same chip with the processor. The first level cache is faster because it is integrated with the processor and therefore avoids any delay associated with transmitting signals to and receiving signals from an external chip. The second level cache is usually located on a different chip than the processor, and has a larger capacity, usually from 64 Kbytes to 1024 Kbytes.
FIG. 1
is a block diagram of a prior art computer system
100
using an SRAM second level cache configuration. The CPU or microprocessor
101
incorporates on-chip SRAM first level cache
102
to support the very fast internal CPU operations (typically from 33 Mhz to 150 Mhz).
First level cache
102
typically has a capacity of 4 Kbytes to 32 Kbytes and performs very high speed data and instruction accesses (typically with 5 to 15 ns). For first-level cache miss or other non-cacheable memory accesses, the memory read and write operations must go off-chip through the much slower external CPU bus
104
(typically from 25 Mhz to 60 Mhz) to the SRAM second level (L
2
) cache
106
(typically with 128 Kbytes to 1024 Kbytes capacity) with the additional latency (access time) penalty of round-trip off-chip delay.
The need for CPU
101
to manage the delay penalty of off-chip operation dictates that in almost all modern microprocessors, the fastest access cycle (read or write) through the CPU bus
104
is
2
-
1
-
1
-
1
. That is, the first external access will consume at least 2 clock cycles, and each subsequent external access will consume a single clock cycle. At higher CPU bus frequencies, the fastest first external access may take 3 or more clock cycles. A burst cycle having 4 accesses is mentioned here for purposes of illustration only. Some processors allow shorter (e.g., 2) or longer (e.g., 8 or more) burst cycles. Pipelined operation, where the parameters of the first external access of the second burst cycle are latched into CPU bus devices while the first burst cycle is still in progress, may hide the longer access latency for the first external access of the second burst cycle. Thus, the first and second access cycles may be
2
-
1
-
1
-
1
,
1
-
1
-
1
-
1
, respectively.
The cache tag memory
108
is usually relative small (from 8 Kbytes to 32 Kbytes) and fast (typically from 10 to 15 ns) and is implemented using SRAM cells. Cache tag memory
108
stores the addresses of the cache lines of second level cache
106
and compares these addresses with an access address on CPU bus
104
to determine if a cache hit has occurred. This small cache tag memory
108
can be integrated with the system logic controller chip
110
for better speed and lower cost. An integrated cache tag memory operates in the same manner as an external cache tag memory. Intel's 82430 PCI set for the Pentium processor is one example of a logic controller chip
110
which utilizes an SRAM integrated cache tag memory.
One reason for the slower operating frequency of CPU bus
104
is the significant loading caused by the devices attached to CPU bus
104
. Second level (L
2
) SRAM cache memory
106
provides loading on the data and address buses (through latch
112
) of CPU bus
104
. Cache tag memory
168
provides loading on the address bus, system logic controller chip
110
provides loading on the control, data and address buses, and main memory DRAM
114
provides loading on the data bus (through latch
116
).
In prior art computer system
100
, the system logic chip
110
provides an interface to a system (local) bus
118
having a typical operating frequency of 25 Mhz to 33 Mhz. System bus
118
may be attached to a variety of relatively fast devices
120
(such as graphics, video, communication, or fast disk drive subsystems). System bus
118
can also be connected to a bridge or buffer device
122
for connecting to a general purpose (slower) extension bus
124
(at 4 Mhz to 16 Mhz operating frequency) that may have many peripheral devices (not shown) attached to it.
Traditional high speed cache systems, whether first level or second level, are implemented using static random access memories (SRAMs) because the SRAMs are fast (with access times ranging from 7 to 25 nanoseconds (ns) and cycle times equal to access times). SRAMs are suitable for storing and retrieving data from high-speed microprocessors having bus speeds of 25 to 100 megahertz. Traditional dynamic random access memories (DRAMs), are less expensive than SRAMs on a per bit basis because DRAM has a much smaller cell size. For example, a DRAM cell is typically one quarter of the size of an SRAM cell using comparable lithography rules. DRAMs are generally not considered to be suitable for high speed operation because DRAM accesses inherently require a two-step process having access times ranging from 50 to 120 ns and cycle times ranging from 90 to 200 ns.
Access speed is a relative measurement. That is, while DRAMs are slower than SRAMs, they are much faster than other earlier-era memory devices such as ferrite core and charge-coupled devices (CCD). As a result, DRAM could theoretically be used as a “cache” memory in systems which use these slower memory devices as a “main memory.” The operation modes and access methods, however, are different from the operation modes and access methods disclosed herein.
In most computer systems, the second level cache operates in a fixed and rigid mode. That is, any read or write access to the second level cache is of a few constant sizes (line sizes of the first and second level caches) and is usually in a burst sequence of 4 or 8 words (i.e., consecutive reads or writes of 4 or 8 words) or in a single access (i.e., one word). These types of accesses allow standard SRAMs to be modified to allow these SRAMs to meet the timing requirements of very high speed processor buses. One such example is the burst or synchronous SRAM, which incorporates an internal counter and a memory clock to increment an initial access address. External addresses are not required after the first access, thereby allowing the SRAM to operate faster after the first access is performed. The synchronous SRAM may also have special logic to provide preset address sequences, such as Intel's interleaved address sequence. Such performance enhancement, however, does not reduce the cost of using SRAM cells to store memory bits.
Synchronous DRAMs (SDRAM) have adopted similar burst-mode operation. Video RAMs (VRAM) have adopted the serial port operation of dual-port DRAMs. These new DRAMs are still not suitable for second level cache operation, however, because their initial access time and random access cycle time remain much slower than necessary.
It would therefore be desirable to have a structure and method which enables DRAM memory to be used as a second level cache memory.
Prior art computer systems have also included multiple levels of SRAM cache memory integrated on the same chip as the CPU. For example, DEC's Alpha 21164 processor integrates 16 Kbytes of first level SRAM cache memory and 96 Kbytes of second level SRAM m
Hsu Fu-Chieh
Leung Wingyu
Ellis Kevin L.
Monolithic System Technology, Inc.
Skjerven Morrill LLP
LandOfFree
Apparatus for controlling data transfer between a bus and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus for controlling data transfer between a bus and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus for controlling data transfer between a bus and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3035521