Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories
Reexamination Certificate
2001-11-15
2004-06-22
Nguyen, Hiep T. (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Hierarchical memories
C711S138000, C711S133000, C711S141000
Reexamination Certificate
active
06754772
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to caching processed data, and, more particularly, to a method and apparatus for distributing the functions of cache to different stages in an execution pipeline such that the stages can operate independently while saving processing time and preventing data hazards.
BACKGROUND OF THE INVENTION
The function of a 3D graphics processing pipeline for a computer is to represent a 3-dimensional scene on a 2-dimensional computer display screen. In order to change view-point within the 3-dimensional scene, the graphics processing pipeline must redraw the scene according to the new view-point; this can affect perspective, lighting, reflections, the obscuring or revealing of objects and so forth. This requires a large amount of calculation.
Consider the polygon shown in
FIG. 1
a
. The symbols a, b, c, and d represent the vertices of the polygon. The characteristics of each vertex include such information as its position in 3-dimensional space (x,y,z Cartesian co-ordinates), its color, its transparency, and its texture.
When drawing a scene, the 3D graphics processor works with flat surfaces. The most complex surface that is guaranteed to be flat is a triangle. Therefore, part of the process of displaying the polygon abcd involves tessellating it into triangles. This process is shown in
FIG. 1
b
. The polygon abcd is now represented by the two triangles abc and adc. For more complex shapes, tessellation can lead to the introduction of new vertices, as shown in the tessellation of
FIG. 1
c.
The task of drawing the polygon is now replaced by the task of drawing the two triangles abc and adc. One way of representing the task to the 3D graphics pipeline is to produce a sequence of triplets representing the vertices to be drawn: abcadc. Each identifier a, b, c, and d must uniquely define a particular vertex within a scene, but a vertex can be shared with several triangles. In addition, the identifier allows the characteristics of the vertex (the position, color, transparency, etc.) to be retrieved from storage elsewhere in the system.
Part of the functions performed by a 3D-graphics processor are shown in FIG.
2
. The sequence of triplets is passed to data producer
20
. Producer
20
is responsible for translating a vertex identifier “a” into the vertex characteristics “A”. “a” represents a relatively small data item (for example, an 8-bit value) while “A” represents a number of relatively large data items (for example, 16, 32-bit values). The translation process is costly in terms of time and occupied system resources; for example, it might require a number of memory accesses and data conversion operations.
The vertex characteristics “A” are processed by a processor and the processed vertex “A″” is passed to the consumer (the next stage in the pipeline). The processing performed by the processor is costly in terms of time.
The vertices “a” and “c” are each processed twice for polygon abcd—once for triangle abc and once for triangle acd. The result of processing a given vertex is identical, irrespective of the triangle it is being processed for. It wastes resources to translate the vertex identifier and process the vertex multiple times, so performance would be improved by maintaining a cache of transformed vertices.
The rate at which producer
20
, processor
21
and consumer
22
handle data is different and may vary on a triangle-by-triangle or vertex-by-vertex basis. With the system shown in
FIG. 2
, the slowest unit determines the rate at any time, and faster units are stalled. This is an inefficient use of resources.
One way to more efficiently utilize the resources of a 3D-graphics pipeline is by using the cache as shown in FIG.
3
. Producer
30
translates a vertex identifier “a” and writes the translated vertex data “A” to data FIFO
31
. Data FIFO
31
is a first in first out queue. Data is sent from producer
30
to data FIFO
31
where processor
32
can access it when ready. When producer
30
translates a vertex faster than processor
32
processes a vertex, multiple data units can be stored in data FIFO
31
. Similarly, when processor
32
begins to processes data units at a faster rate than producer
30
translates vertices, processor
32
, rather than stalling immediately, continues to read translated vertices from data FIFO
31
until it is empty.
Processor
32
maintains a cache tag storage
36
with tags containing the minimum amount of information A′, B′, C′, D′ required to uniquely identify a data unit incoming to processor
32
, i.e. data A, B, C, D. The minimum amount of information required to uniquely identify a data unit that is stored in cache tag storage
36
can be the data unit itself, a shortened form of the data unit, or an identifier derived from the data unit. For each element A′, B′, C′, D′ of tag storage, there is a corresponding element of processed data A″, B″, C″, D″ stored in cache data storage
37
. Thus, cache data storage
37
contains processed data A″, B″, C″, D″ corresponding to input data A, B, C, D previously processed by processor
32
. The tags A′, B′, C′, D′ must be stored as well as the data A″, B″, C″, D″, so that the tags can be compared to the incoming data units before they are processed so the processor
32
can determine if there is cached data A″, B″, C″, D″ corresponding to incoming data A, B, C, D. When processor
32
removes a data unit from data FIFO
31
, it checks to see whether the data unit has a corresponding tag A′, B′, C′, D′ currently in cache tag storage
36
. If there is no corresponding tag in cache data storage
36
, that is, processor
32
gets a cache “miss,” processor
32
stores a tag for the incoming data unit from FIFO
31
in cache tag storage
36
by using a currently unused storage location or reallocating a location that currently holds an old tag in cache tag storage
36
. Processor
32
also processes the incoming data unit and stores the newly processed data unit as cache data in the corresponding location in cache data storage
37
. The processed data is also passed through multiplexor
33
under control of processor
32
to a FIFO
34
for processed data and from there to a data consumer
35
. If processor
32
finds a cache tag in tag storage
36
, that is, it gets a cache “hit,” then processor
32
operates multiplexor
33
and cache data storage
37
so that the cache data, corresponding to the cache tag for which there was a “hit,” is passed through FIFO
34
to consumer
35
. Consumer
35
can then take each processed data unit from data FIFO
34
in the correct order. For a 3D graphics processing pipeline, processor
32
might transform vertex data “A” according to the point of view for which the scene is to be rendered and according to the way in which the scene is lit to produce processed data A″.
There are, however, still some problems with the method discussed above. While a cache “hit” saves the latency associated with processor
32
having to reprocess previously processed data, it does not save the time taken by producer
30
to generate the stream ABCD from incoming data stream abcd or the time taken by processor
32
to check for cache hits and misses. In addition, the values A′B′C′D′ are likely to be larger than abcd requiring greater storage capacity in cache tag storage
36
.
Thus, there exists a desire and need for a system and method for more efficiently performing cache functions.
BRIEF SUMMARY OF THE INVENTION
The present invention mitigates the problems associated with the prior art and provides a unique method and system of distributed cache.
In accordance with an exemplary embodiment of the present invention, a cache tag storage is provided which is maintained by a producer which sends data to the processor. A cache data storage is also provided which holds data output from a processo
Crook Neal A.
Wootton Alan
Dickstein , Shapiro, Morin & Oshinsky, LLP
Micro)n Technology, Inc.
Nguyen Hiep T.
LandOfFree
Distributed cache does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Distributed cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Distributed cache will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3365390