Multimedia processor employing a shared CPU-graphics cache

Computer graphics processing and selective visual display system – Computer graphics display memory system – Cache

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C345S503000, C345S531000, C345S537000, C711S122000, C711S128000, C711S118000

Reexamination Certificate

active

06801207

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates generally to a highly integrated multimedia processor having tightly coupled functional units and more specifically to sharing a cache between the central processing and graphics units.
2. Description of Related Art
The following background information is provided to aid in the understanding of the application of the present invention and is not meant to be limiting to the specific examples set forth herein. Displaying 3D graphics is typically characterized by a pipelined process having tessellation, geometry and rendering stages. The tessellation stage is responsible for decomposing an object into geometric primitives (e.g. polygons) for simplified processing while the geometry stage is responsible for transforming (e.g. translating, rotating and projecting) the tessellated object. The rendering stage rasterizes the polygons into pixels and applies visual effects such as, but not limited to, texture mapping, MIP mapping, Z buffering, depth cueing, anti-aliasing and fogging.
The entire 3D graphics pipeline can be embodied in software running on a general purpose CPU core (i.e. integer and floating point units), albeit unacceptably slow. To accelerate performance, the stages of the graphics pipeline are typically shared between the CPU and a dedicated hardware graphics controller (a.k.a. graphics accelerator). The floating-point unit of the CPU typically handles the vector and matrix processing of the tessellation and geometry stages while the graphics controller generally handles the pixel processing of the rendering stage.
Reference is now made to
FIG. 1
that depicts a first prior art system of handling 3D graphics display in a computer. Vertex information stored on disk drive
100
is read over a local bus (e.g. the PCI bus) under control by chipset
102
into system memory
104
. The vertex information is then read from system memory
104
under control of chipset
102
into the L2 cache
108
and L1 cache
105
of CPU
106
. The CPU
106
performs geometry/lighting operations on the vertex information before caching the results along with texture coordinates back into the L1 cache
105
, the L2 cache
108
and ultimately back to system memory
104
. A direct memory access (DMA) is performed to transfer the geometry/lighting results, texture coordinates and texture maps stored in system memory
104
over the PCI bus into local graphics memory
112
of the graphics controller
110
for use in rendering a frame on the display
114
. In addition to storing textures for use with the graphics controller
110
, local graphics memory
112
also holds the frame buffer, the z-buffer and commands for the graphics controller
110
.
A drawback with this approach is inefficient use of memory resources since redundant copies of texture maps are maintained in both system memory
104
and the local graphics memory
112
. Another drawback with this approach is the local graphics memory
112
is dedicated to the graphics controller
110
, is more expensive than generalized system memory and is not available for general-purpose use by the CPU
106
. Yet another drawback with this approach is the attendant bus contention and relatively low bandwidth associated with the shared PCI bus. Efforts have been made to ameliorate these limitations by designating a “swap area” in local graphics memory
112
(sometimes misdescriptively referred to as an off chip L2 cache) so that textures can be prefetched into local graphics memory
112
from system memory
104
before they are needed by the graphics controller
110
and swapped with less recently used textures residing in the texture cache of the graphics controller
110
. The local graphics memory swap area merely holds textures local to the graphics card (to avoid bus transfers) and does not truly back the texture cache as would a second level in a multi-level texture cache. This approach leads to the problem, among others, of deciding how to divide the local graphics memory
112
into texture storage and swap area. Still yet another drawback with this approach is the single level texture cache in prior art graphics controllers consume large amounts of die area since the texture cache must be multi-ported and be of sufficient size to avoid performance issues.
Reference is now made to
FIG. 2
that depicts an improved but not entirely satisfactory prior art system of handling 3D graphics display in a computer. The processor
120
, such as the Pentium II™ processor from Intel corporation of Santa Clara Calif., comprises a CPU
106
coupled to an integrated L2 cache
108
over a so-called “backside” bus
126
that operates independently from the host or so-called “front-side” bus
128
. The system depicted in
FIG. 2
additionally differs from that in
FIG. 1
in that the graphics controller
110
is coupled over a dedicated and faster AGP bus
130
through chipset
102
to system memory
104
. The dedicated and faster AGP bus
130
permits the graphics controller
110
to directly use texture maps in system memory
104
during the rendering stage rather than first pre-fetching the textures to local graphics memory
112
.
Although sourcing texture maps directly out of system memory
104
mitigates local graphics memory constraints, some amount of local graphics memory
112
is still required for screen refresh, Z-buffering and front and back buffering since the AGP bus
130
cannot support such bandwidth requirements. Consequently, the system of
FIG. 2
suffers from the same drawbacks as the system of
FIG. 1
, albeit to a lesser degree. Moreover, there is no way for the graphics controller
110
to directly access the L2 cache
108
that is encapsulated within the processor
120
and connected to the CPU
106
over the backside bus
126
.
From the foregoing it can be seen that memory components, bus protocols and die size are the ultimate bottleneck for presenting 3D graphics. Accordingly, there is a need for a highly integrated multimedia processor having tightly coupled central processing and graphical functional units that share a relatively large cache to avoid slow system memory access and the requirement to maintain separate and redundant local graphics memory.
SUMMARY OF THE INVENTION
To overcome the limitations of the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a highly integrated multimedia processor employing a shared cache between the central processing and graphics units which may be used in, among other things, a hierarchical texture cache scheme. The graphics unit issues data reads with physical addresses to locations that are cached in the shared cache. If the shared cache misses, a cache fill is performed similar to a cache fill that occurs with a miss from the central processor unit. Regions in the shared cache can also be selectively locked down (thereby disabling eviction or invalidation of data from a selected region) to provide the graphics unit with a local scratchpad area for applications such as line buffering to hold decompressed video for further combination (e.g. filtering) with frame buffer data and composite buffering for blending texture maps in multi-pass rendering. Other 3D applications for the locked down regions may include but are not limited to, bump mapping, Z buffering, W buffering and 2D applications such as blit buffering.
A feature of the present invention is the shared cache provides the graphics unit access to data generated by the central processing unit before the data is written-back or written-through to system memory.
Another feature of the present invention is reduction of the system memory bandwidth required by the central processing and graphics units.
Another feature of the present invention is pushing data transfer bottlenecks needed for 3D graphics display into system memory such that system performance will scale as more advanced memories become available.
These and various other objects,

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multimedia processor employing a shared CPU-graphics cache does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multimedia processor employing a shared CPU-graphics cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multimedia processor employing a shared CPU-graphics cache will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3328091

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.