Embedded DRAM cache memory and method having reduced latency

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S140000, C711S144000, C711S145000

Reexamination Certificate

active

06789169

ABSTRACT:

TECHNICAL FIELD
The present invention relates generally to cache memory for a processor-based system and, more particularly, to an apparatus and method that efficiently utilizes embedded dynamic random access memory (“eDRAM”) as a level three (L3) cache in the system controller of a processor based system.
BACKGROUND OF THE INVENTION
The speed at which processors can execute instructions has typically outpaced the speed at which memory systems can supply the instructions and data to the processors. Due to this discrepancy in the operating speeds of the processors and system memory, the system memory architecture plays a major role in determining the actual performance of the system. Most current memory hierarchies utilize cache memory in an attempt to minimize memory access latencies.
Cache memory is used to provide faster access to frequently used instructions and data, which helps improve the overall performance of the system. Cache memory is able to provide faster access for two primary reasons. First, cache memory is generally implemented with static random access memory (“SRAM”), which is substantially faster than dynamic random access memory (“DRAM”) that is normally used as system memory. Second, cache memory is normally coupled to the processor directly through a processor bus and thus has a hierarchy that places it closer to the processor. In memory hierarchy, the closer to the processor that the memory resides, the higher the performance of the memory and the overall system. Cache memory is effective to increase the speed at which programs can be executed because programs frequently reuse the same instructions and data. When data or instructions are read from main memory, a copy is usually saved in the cache memory (a cache tag is usually updated as well). The cache then monitors subsequent requests for data and instructions to see if the requested information has already been stored in the cache. If the data has been stored in the cache, which is known as a “cache hit,” it is delivered with low latency to the processor. If, on the other hand, the information is not in the cache, which is known as a “cache miss,” it must be fetched at a much higher latency from the system memory.
In more advanced processor based systems, there are multiple levels (usually two levels) of cache memory. The first cache level, or level one (L1) cache, is typically the fastest memory in the system and is usually integrated on the same chip as the processor. The L1 cache is faster because it is integrated with the processor and thus has a higher level of hierarchy. This higher level of hierarchy avoids delays associated with transmitting information to, and receiving information from, an external chip. Also, it generally operates at the usually faster speed of the processor. However, since it resides on the same die as the processor, the L1 cache must be relatively small (e.g., 32 Kb in the Intel® Pentium® III processor, 128 Kb in the AMD Athlon™ processor).
A second cache level, or level two (L2) cache, is typically located on a different chip than the processor and has a larger capacity then the L1 cache (e.g., 512 Kb in the Intel® Pentium® III and AMD Athlon™ processors). The L2 cache is slower than the L1 cache, but because it is relatively close to the processor, it is still many times faster than the system memory, which has an even lower level of memory hierarchy. Recently, small L2 cache memories have been placed on the same chip as the processor to speed up the performance of L2 cache memory accesses.
When data is not found in the highest level of the memory hierarchy and a cache miss occurs, the data must be accessed from a lower level of the memory hierarchy. Since each level contains increased amounts of storage, the probability increases that the data will be found. However, each level typically increases the latency or number of cycles it takes to transfer the data to the processor.
FIG. 1
illustrates a typical processor based system
10
having with two levels of cache memory hierarchy. The system
10
includes a processor
20
having an onboard L1 cache
22
that is fabricated on the same chip as the processor
20
. The processor
20
is coupled to an off-chip or external L2 cache
24
. The system
10
includes a system chipset comprised of a system controller
60
(also known as a “north bridge”) and a bus bridge
80
(also known as a “south bridge”). As known in the art, the chipset is the functional core of the system
10
. As will be described below, the system controller
60
and bus bridge
80
are used to connect two or more busses and are responsible for routing information to and from the processor
20
and the other devices in the system
10
over the busses to which they are connected.
The system controller
60
contains an accelerated graphics port (“AGP”) interface
62
, a PCI interface
64
and a host interface
66
. Typically, the processor
20
is referred to as the host and is connected to the host interface
66
of the system controller
60
via a host bus
30
. The system
10
includes a system memory
50
connected to a memory controller
67
in the system controller
60
via a memory bus
34
. The typical system
10
may also include an AGP device
52
, such as e.g., a graphics card, connected to the AGP interface
62
of the system controller
60
via an AGP bus
32
. Furthermore, the typical system
10
may include a PCI device
56
connected to the PCI interface
64
of the system controller
60
via a PCI bus
36
.
The PCI interface
64
is also typically connected to the bus bridge
80
via the PCI bus
36
. A single PCI bus
36
may be used, as shown in
FIG. 1
, or, alternatively, individual PCI busses may be used if so desired. The bus bridge
80
may be coupled through an expansion bus, such as an industry standard architecture (“ISA”) bus
42
, to a real-time clock (RTC)
82
, power management component
84
and various legacy components
86
(e.g., a floppy disk controller and certain direct memory access (“DMA”) and complimentary metal-oxide semiconductor (“CMOS”) memory registers) of the system
10
. A basic input/output system (“BIOS”) read only memory
96
(“ROM”) and a low pin count (“LPC”) device
94
are also connected to the bus bridge
80
via the ISA bus
42
. Examples of LPC devices
94
include various controllers and recording devices. The BIOS ROM
96
contains, among other things, the set of instructions that initialize the processor
20
and other components in the system
10
. Although not illustrated, the bus bridge
80
may also contain interrupt controllers, such as the input/output (“I/O”) advanced programmable interrupt controller (“APIC”). The bus bridge
80
may also be connected to a universal serial bus (“USB”) device
92
via a USB bus
38
, and to an integrated drive electronics (“IDE”) device
90
may be connected via an IDE bus
40
. Examples of a USB device
92
include a scanner or a printer. Examples of an IDE device
90
include a floppy disk or hard drives. It should be appreciated that the type of device connected to the bus bridge
80
is system dependent.
As can be seen from
FIG. 1
, when the processor
20
cannot access information from one of the two caches
22
,
24
, it is forced to access the information from the system memory
50
. As a result, at least two buses
30
,
34
and the components of the system controller
60
must be involved to access the information from the system memory
50
, thereby increasing the latency of the access. Increased latency reduces the system bandwidth and overall performance. Memory access times are further compounded when other devices e.g., AGP device
52
or PCI device
56
, are competing with the processor
20
by simultaneously requesting information from the cache and system memories.
Attempts have been made to solve or at least alleviate the above-described problems by integrating a third level of cache, known as “L3 cache”
68
, in the system controller
60
, and preferably as part of the memory controller
67
. This L3 cache is also known as “eDRAM” becau

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Embedded DRAM cache memory and method having reduced latency does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Embedded DRAM cache memory and method having reduced latency, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Embedded DRAM cache memory and method having reduced latency will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3214427

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.