Computer graphics processing and selective visual display system – Computer graphics display memory system – Texture memory
Reexamination Certificate
1998-10-09
2002-11-19
Chauhan, Ulka J. (Department: 2671)
Computer graphics processing and selective visual display system
Computer graphics display memory system
Texture memory
C345S537000, C711S122000, C711S130000
Reexamination Certificate
active
06483516
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates generally to a highly integrated multimedia processor having a shared cache and tightly coupled central processing and graphical units and more specifically to employing a portion of the shared cache as a secondary level in a hierarchical texture cache architecture.
2. Description of Related Art
The following background information is provided to aid in the understanding of the application of the present invention and is not meant to be limiting to the specific examples set forth herein. Displaying 3D graphics is typically characterized by a pipelined process having tessellation, geometry and rendering stages. The tessellation stage is responsible for decomposing an object into geometric primitives (e.g. polygons) for simplified processing while the geometry stage is responsible for transforming (e.g. translating, rotating and projecting) the tessellated object. The rendering stage rasterizes the polygons into pixels and applies visual effects such as, but not limited to, texture mapping, MIP mapping, Z buffering, depth cueing, anti-aliasing and fogging.
The entire 3D graphics pipeline can be embodied in software running on a general purpose CPU core (i.e. integer and floating point units), albeit unacceptably slow. To accelerate performance, the stages of the graphics pipeline are typically shared between the CPU and a dedicated hardware graphics controller (a.k.a. graphics accelerator). The floating-point unit of the CPU typically handles the vector and matrix processing of the tessellation and geometry stages while the graphics controller generally handles the pixel processing of the rendering stage.
Reference is now made to 
FIG. 1
 that depicts a first prior art system of handling 3D graphics display in a computer. Vertex information stored on disk drive 
100
 is read over a local bus (e.g. the PCI bus) under control by chipset 
102
 into system memory 
104
. The vertex information is then read from system memory 
104
 under control of chipset 
102
 into the L2 cache 
108
 and L1 cache 
105
 of CPU 
106
. The CPU 
106
 performs geometry/lighting operations on the vertex information before caching the results along with texture coordinates back into the L1 cache 
105
, the L2 cache 
108
 and ultimately back to system memory 
104
. A direct memory access (DMA) is performed to transfer the geometry/lighting results, texture coordinates and texture maps stored in system memory 
104
 over the PCI bus into local graphics memory 
112
 of the graphics controller 
110
 for use in rendering a frame on the display 
114
. In addition to storing textures for use with the graphics controller 
110
, local graphics memory 
112
 also holds the frame buffer, the z-buffer and commands for the graphics controller 
110
.
A drawback with this approach is inefficient use of memory resources since redundant copies of texture maps are maintained in both system memory 
104
 and the local graphics memory 
112
. Another drawback with this approach is the local graphics memory 
112
 is dedicated to the graphics controller 
110
, is more expensive than generalized system memory and is not available for general-purpose use by the CPU 
106
. Yet another drawback with this approach is the attendant bus contention and relatively low bandwidth associated with the shared PCI bus. Efforts have been made to ameliorate these limitations by designating a “swap area” in local graphics memory 
112
 (sometimes misdescriptively referred to as an off chip L2 cache) so that textures can be prefetched into local graphics memory 
112
 from system memory 
104
 before they are needed by the graphics controller 
110
 and swapped with less recently used textures residing in the texture cache of the graphics controller 
110
. The local graphics memory swap area merely holds textures local to the graphics card (to avoid bus transfers) and does not truly back the texture cache as would a second level in a multi-level texture cache. This approach leads to the problem, among others, of deciding how to divide the local graphics memory 
112
 into texture storage and swap area. Still yet another drawback with this approach is the single level texture cache in prior art graphics controllers consume large amounts of die area since the texture cache must be multi-ported and be of sufficient size to avoid performance issues.
Reference is now made to 
FIG. 2
 that depicts an improved but not entirely satisfactory prior art system of handling 3D graphics display in a computer. The processor 
120
, such as the Pentium II™ processor from Intel corporation of Santa Clara California, comprises a CPU 
106
 coupled to an integrated L2 cache 
108
 over a so-called “backside” bus 
126
 that operates independently from the host or so-called “front-side” bus 
128
. The system depicted in 
FIG. 2
 additionally differs from that in 
FIG. 1
 in that the graphics controller 
110
 is coupled over a dedicated and faster AGP bus 
130
 through chipset 
102
 to system memory 
104
. The dedicated and faster AGP bus 
130
 permits the graphics controller 
110
 to directly use texture maps in system memory 
104
 during the rendering stage rather than first pre-fetching the textures to local graphics memory 
112
.
Although sourcing texture maps directly out of system memory 
104
 mitigates local graphics memory constraints, some amount of local graphics memory 
112
 is still required for screen refresh, Z-buffering and front and back buffering since the AGP bus 
130
 cannot support such bandwidth requirements. Consequently, the system of 
FIG. 2
 suffers from the same drawbacks as the system of 
FIG. 1
, albeit to a lesser degree. Moreover, there is no way for the graphics controller 
110
 to directly access the L2 cache 
108
 that is encapsulated within the processor 
120
 and connected to the CPU 
106
 over the backside bus 
126
.
From the foregoing it can be seen that memory components, bus protocols and die size are the ultimate bottleneck for presenting 3D graphics. Accordingly, there is a need for a highly integrated multimedia processor having tightly coupled central processing and graphical functional units that share a relatively large cache to avoid slow system memory access and the requirement to maintain separate and redundant local graphics memory, and to leverage the relatively large shared cache in a hierarchical texture cache architecture.
SUMMARY OF THE INVENTION
To overcome the limitations of the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a highly integrated multimedia processor employing a shared cache between the central processing and graphics units which may be used in, among other things, a hierarchical texture cache scheme. A dynamically configurable portion of the shared cache is engaged as a secondary level in a hierarchical texture cache architecture. The graphics unit includes a small multi-ported L1 texture cache local to its 2D/3D pipeline that is backed by the relatively large, single ported portion of the shared cache. The graphics unit issues data reads with physical addresses to locations that are cached in the shared cache. If the shared cache misses, a cache fill is performed similar to a cache fill that occurs with a miss from the central processor unit. Regions in the shared cache can also be selectively locked down (thereby disabling eviction or invalidation of data from a selected region) to provide the graphics unit with a local scratchpad area for applications such as composite buffering for blending texture maps in multi-pass rendering. Other 3D applications for the locked down regions may include but are not limited to, bump mapping, Z buffering, W buffering and 2D applications such as blit buffering.
A feature of the present invention is the shared cache can be leveraged as a secondary level texture cache to reduce die size without significant sacrifice in performance.
Another feature of the present invention
Chauhan Ulka J.
National Semiconductor Corporation
LandOfFree
Hierarchical texture cache does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Hierarchical texture cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hierarchical texture cache will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2928505