Embedded DRAM cache

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S158000, C711S167000, C711S168000

Reexamination Certificate

active

06789168

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to cache memory structures for a processor based system and, more particularly, to an apparatus that utilizes embedded dynamic random access memory (eDRAM) as a level three (L3) cache in the system chipset of a processor based system.
BACKGROUND OF THE INVENTION
The ability of processors to execute instructions has typically outpaced the ability of memory systems to supply the instructions and data to the processors. Due to the discrepancy in the operating speeds of the processors and system memory, the processor system's memory hierarchy plays a major role in determining the actual performance of the system. Most of today's memory hierarchies utilize cache memory in an attempt to minimize memory access latencies.
Cache memory is used to provide faster access to frequently used instructions and data, which helps improve the overall performance of the system. Cache technology is based on the premise that programs frequently reuse the same instructions and data. When data is read from main memory, a copy is usually saved in the cache memory (a cache tag is usually updated as well). The cache then monitors subsequent requests for data (and instructions) to see if the requested information has already been stored in the cache. If the data has been stored in the cache, it is delivered with low latency to the processor. If, on the other hand, the information is not in the cache, it must be fetched at a much higher latency from the system main memory.
In more advanced processor based systems, there are multiple levels (usually two levels) of cache memory. The levels are organized such that a small amount of very high speed memory is placed close to the processor while denser, slower memory is placed further away. In the memory hierarchy, the closer to the processor that the data resides, the higher the performance of the memory and the overall system. When data is not found in the highest level of the hierarchy and a miss occurs, the data must be accessed from a lower level of the memory hierarchy. Since each level contains increased amounts of storage, the probability increases that the data will be found. However, each level typically increases the latency or number of cycles it takes to transfer the data to the processor.
The first cache level, or level one (L
1
) cache, is typically the fastest memory in the system and is usually integrated on the same chip as the processor. The L
1
cache is faster because it is integrated with the processor, which avoids delays associated with transmitting information to, and receiving information from, an external chip. The lone caveat is that the L
1
cache must be small (e.g., 32 Kb in the Intel® Pentium® III processor, 128 Kb in the AMD Athlon™ processor) since it resides on the same die as the processor.
A second cache level, or level
2
(L
2
) cache, is typically located on a different chip than the processor and has a larger capacity then the L
1
cache (e.g., 512 Kb in the Intel® Pentium® III and AMD Athlon™ processors). The L
2
cache is slower than the L
1
cache, but because it is relatively close to the processor, it is still many times faster than the main system memory. Recently, small L
2
cache memories have been placed on the same chip as the processor to speed up the performance of L
2
cache memory accesses.
Many current processor systems consist of a processor with an on-chip L
1
static random access memory (SRAM) cache and a separate off-chip L
2
SRAM cache. In some systems, a small L
2
SRAM cache has been moved onto the same chip as the processor and L
1
cache, in which case the reduced latency is traded for a smaller L
2
cache size. In other systems, the size of the L
1
cache has been increased by moving it onto a separate chip, thus trading off a larger L
1
cache for increased latency and reduced bandwidth that result from off chip accesses. These options are attempts to achieve the highest system performance by optimizing the memory hierarchy. In each case, various tradeoffs between size, latency, and bandwidth are made in an attempt to deal with the conflicting requirements of obtaining more, faster, and closer memory.
FIG. 1
illustrates a typical processor based system
10
having a memory hierarchy with two levels of cache memory. The system
10
includes a processor
20
having an on-board L
1
cache
22
. The processor
20
is coupled to an off-chip or external L
2
cache
24
. The system
10
includes a system chipset comprised of a north bridge
60
and a south bridge
80
. As known in the art, the chipset is the functional core of the system
10
. As will be described below, the bridges
60
,
80
are used to connect two or more busses and are responsible for routing information to and from the processor
20
and the other devices in the system
10
over the busses they are connected to.
The north bridge
60
contains a PCI (peripheral component interconnect) to AGP (accelerated graphics port) interface
62
, a PCI to PCI interface
64
and a host to PCI interface
66
. Typically, the processor
20
is referred to as the host and is connected to the north bridge
60
via a host bus
30
. The system
10
includes a system memory
50
connected to the north bridge
60
via a memory bus
34
. The typical system
10
may also include an AGP device
52
, such as e.g., a graphics card, connected to the north bridge
60
via an AGP bus
32
. Furthermore, the typical system
10
may include a PCI device
56
connected to the north bridge
60
via a PCI bus
36
a
.
The north bridge
60
is typically connected to the south bridge
80
via a PCI bus
36
b
. The PCI busses
36
a
,
36
b
may be individual busses or may be part of the same bus if so desired. The south bridge
80
usually contains a real-time clock (RTC)
82
, power management component
84
and the legacy components
86
(e.g., floppy disk controller and certain DMA (direct memory access) and CMOS (complimentary metal-oxide semiconductor) memory registers) of the system
10
. Although not illustrated, the south bridge
80
may also contain interrupt controllers, such as the input/output (I/O) APIC (advanced programmable interrupt controller).
The south bridge
80
may be connected to a USB (universal serial bus) device
92
via a USB bus
38
, an IDE (integrated drive electronics) device
90
via an IDE bus
40
, and/or an LPC (low pin count) device
94
via an LPC/ISA (industry standard architecture) bus
42
. The system's BIOS (basic input/output system) ROM
96
(read only memory) is also connected to the south bridge
80
via the LPC/ISA bus
42
. The BIOS ROM
96
contains, among other things, the set of instructions that initialize the processor
20
and other components in the system
10
. Examples of a USB device
92
include a scanner or a printer. Examples of an IDE device
90
include a floppy disk or hard drives and an examples of LPC devices
94
include various controllers and recording devices. It should be appreciated that the type of device connected to the south bridge
80
is system dependent.
As can be seen from
FIG. 1
, when the processor
20
can not access information from one of the two caches
22
,
24
, it is forced to access the information from the system memory
50
. This means that at least two buses
30
,
34
and the components of the north bridge
60
must be involved to access the information from the system memory
50
, which increases the latency of the access. Increased latency reduces the system bandwidth and overall performance. Accordingly, there is a desire and need for a third level of high speed cache memory (“L3 cache”) that is closer to the processor
20
with respect to the system memory
50
. Moreover, it is desirable that the L3 cache be much larger than the L
1
and L
2
caches
22
,
24
, yet does not substantially increase the size of the system
10
.
Additionally, it should be noted that memory access times are further compounded when other devices e.g., AGP device
52
or PCI device
56
are competing with the processo

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Embedded DRAM cache does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Embedded DRAM cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Embedded DRAM cache will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3239990

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.