Dynamic replacement technique in a shared cache

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S135000, C711S136000, C711S128000

Reexamination Certificate

active

06591347

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates generally to a unified or shared cache and more specifically to a dynamically configurable replacement technique to reduce domination by a particular functional unit or an application (e.g. caching instructions or data) by limiting the eviction ability to selected cache regions based on over and/or under utilization of the cache by the particular functional unit or application.
2. Description of Related Art
The following background information is provided to aid in the understanding of the application of the present invention and is not meant to be limiting to the specific examples set forth herein. Displaying 3D graphics is typically characterized by a pipelined process having tessellation, geometry and rendering stages. The tessellation stage is responsible for decomposing an object into geometric primitives (e.g. polygons) for simplified processing while the geometry stage is responsible for transforming (e.g. translating, rotating and projecting) the tessellated object. The rendering stage rasterizes the polygons into pixels and applies visual effects such as, but not limited to, texture mapping, MIP mapping, Z buffering, depth cueing, anti-aliasing and fogging.
The entire 3D graphics pipeline can be embodied in software running on a general purpose CPU core (i.e. integer and floating point units), albeit unacceptably slow. To accelerate performance, the stages of the graphics pipeline are typically shared between the CPU and a dedicated hardware graphics controller (a.k.a. graphics accelerator). The floating-point unit of the CPU typically handles the vector and matrix processing of the tessellation and geometry stages while the graphics controller generally handles the pixel processing of the rendering stage.
Reference is now made to
FIG. 1
that depicts a first prior art system of handling 3D graphics display in a computer. Vertex information stored on disk drive
100
is read over a local bus (e.g. the PCI bus) under control by chipset
102
into system memory
104
. The vertex information is then read from system memory
104
under control of chipset
102
into the L
2
cache
108
and L
1
cache
105
of CPU
106
. The CPU
106
performs geometry/lighting operations on the vertex information before caching the results along with texture coordinates back into the L
1
cache
105
, the L
2
cache
108
and ultimately back to system memory
104
. A direct memory access (DMA) is performed to transfer the geometry/lighting results, texture coordinates and texture maps stored in system memory
104
over the PCI bus into local graphics memory
112
of the graphics controller
110
for use in rendering a frame on the display
114
. In addition to storing textures for use with the graphics controller
110
, local graphics memory
112
also holds the frame buffer, the z-buffer and commands for the graphics controller
110
.
A drawback with this approach is inefficient use of memory resources since redundant copies of texture maps are maintained in both system memory
104
and the local graphics memory
112
. Another drawback with this approach is the local graphics memory
112
is dedicated to the graphics controller
110
, is more expensive than generalized system memory and is not available for general-purpose use by the CPU
106
. Yet another drawback with this approach is the attendant bus contention and relatively low bandwidth associated with the shared PCI bus. Efforts have been made to ameliorate these limitations by designating a “swap area” in local graphics memory
112
(sometimes misdescriptively referred to as an off chip L
2
cache) so that textures can be prefetched into local graphics memory
112
from system memory
104
before they are needed by the graphics controller
110
and swapped with less recently used textures residing in the texture cache of the graphics controller
110
. The local graphics memory swap area merely holds textures local to the graphics card (to avoid bus transfers) and does not truly back the texture cache as would a second level in a multi-level texture cache. This approach leads to the problem, among others, of deciding how to divide the local graphics memory
112
into texture storage and swap area. Still yet another drawback with this approach is the single level texture cache in prior art graphics controllers consume large amounts of die area since the texture cache must be multi-ported and be of sufficient size to avoid performance issues.
Reference is now made to
FIG. 2
that depicts an improved but not entirely satisfactory prior art system of handling 3D graphics display in a computer. The processor
120
, such as the Pentium II™ processor from Intel corporation of Santa Clara Calif., comprises a CPU
106
coupled to an integrated L
2
cache
108
over a so-called “backside” bus
126
that operates independently from the host or so-called “front-side” bus
128
. The system depicted in
FIG. 2
additionally differs from that in
FIG. 1
in that the graphics controller
110
is coupled over a dedicated and faster AGP bus
130
through chipset
102
to system memory
104
. The dedicated and faster AGP bus
130
permits the graphics controller
110
to directly use texture maps in system memory
104
during the rendering stage rather than first pre-fetching the textures to local graphics memory
112
.
Although sourcing texture maps directly out of system memory
104
mitigates local graphics memory constraints, some amount of local graphics memory
112
is still required for screen refresh, Z-buffering and front and back buffering since the AGP bus
130
cannot support such bandwidth requirements. Consequently, the system of
FIG. 2
suffers from the same drawbacks as the system of
FIG. 1
, albeit to a lesser degree. Moreover, there is no way for the graphics controller
110
to directly access the L
2
cache
108
that is encapsulated within the processor
120
and connected to the CPU
106
over the backside bus
126
.
From the foregoing it can be seen that memory components, bus protocols and die size are the ultimate bottleneck for presenting 3D graphics. Accordingly, there is a need for a highly integrated multimedia processor having tightly coupled central processing and graphical functional units that share a relatively large cache to avoid slow system memory access and the requirement to maintain separate and redundant local graphics memory. Moreover, there is a need to avoid polluting the shared cache resulting from storing a significant quantity of graphics data in the shared cache to a point that a significant amount of non-graphics data needed by the central processing unit is evicted from the shared cache such that the performance of the central processing unit is effected.
SUMMARY OF THE INVENTION
To overcome the limitations of the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a dynamically configurable cache replacement technique in a shared or unified cache to reduce domination by a particular functional unit or an application such as unified instruction/data caching by limiting the eviction ability to selected cache regions based on over and/or under utilization of the cache by the particular functional unit or application. A specific application of the present invention includes a highly integrated multimedia processor employing a tightly coupled shared cache between central processing and graphics units wherein the eviction ability of the graphics unit is limited to selected cache regions when the graphics unit over utilizes the cache. Dynamic configurability can take the form of a programmable register that enables either one of a plurality of replacement modes based on captured statistics such as measurement of cache misses and/or hits by a particular functional unit or application.
A feature of the present invention is providing the graphics unit access to data generated by the central processing unit

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Dynamic replacement technique in a shared cache does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Dynamic replacement technique in a shared cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Dynamic replacement technique in a shared cache will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3050521

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.