Circuit and method for prefetching data for a texture cache

Electrical computers and digital processing systems: memory – Addressing combined with specific memory configuration or... – Addressing cache memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S118000, C711S137000, C711S202000, C711S216000, C345S552000, C345S557000, C345S568000, C345S582000

Reexamination Certificate

active

06629188

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates in general to graphics systems, and in particular to methods and apparatus for prefetching cache lines in a graphics system.
The sophistication of the market for computer and video graphics and games has exploded over the last few years. The time when simple games such as “Pong” was a marketable product is far in the past. Today's garners and computer users expect realistic three dimensional (3-D) images, whether the images are of a football game, race track, or new home's interior. Accordingly, this appetite has focused designers' efforts to improving the quality of the images produced by graphics systems in computers and video games.
Increasing the realism of video requires a higher screen resolution as well as displaying items as 3-D contoured objects, rather than simple two dimensional (2-D) pictures. These 3-D objects can be separated into 3-D shapes covered by a 2-D or 3-D texture.
A monitor's maximum resolution is set by the number of pixels on its screen. In color monitors, each pixel is made up of a red, green and blue “dot” in close proximity to one another. By varying the intensity of the “dots”, the color and brightness of the pixel can be changed. The more pixels on a screen, the more realistic an image will appear. For example, if a typical tire on a race car is represented on the screen by one pixel, that pixel will be black. A single black spec on a screen would not make for a very impressive tire. But if the tire is represented by many pixels, then details such as shape, hub caps, lug nuts can be seen, and the image is more convincing. To add more realism, a texture, for example tire tread, can be added. Where the rubber meets the road, an asphalt texture may be used.
These textures are stored in memory, and are retrieved as required by the graphics system. They may be two dimensional or three dimensional. Two dimensional textures are two dimensional images, and the dimensional coordinates are typically labeled either s and t, or u and v. In systems using a conventional bilinear filter, four pieces of texture information, referred to as texels, are used to determine the texel value, which is the texture information for one pixel. 16 bits is a common size for each texel. Alternately, texels may be 4, 8, 32, or any other integral number of bits in size. Three dimensional textures are sets of two dimensional textures, and the coordinates are usually labeled s, t, and r. Trilinear filtering is common in systems supporting three dimensional textures, and uses 8 texels to determine the texture information for one pixel.
But this means that a huge amount of information is needed to supply the textures for a video image. For example, a conventional monitor screen having a of 1280 1024 pixel resolution with a 75 Hz refresh rate requires about 100M pixels per second. Since four 16 bit texels are used for each pixel, such a system operates at 6,400M bits per second, or 800M bytes per second.
This texel information is stored in memory for fast access by the graphics controller. Preferably it would all be stored in memory on the same chip as the other graphics system elements, using fast circuitry, such as static random access memory (SRAM). But SRAMs are large, and have high operating currents, so the die area and power costs are prohibitive.
A conventional solution to the problem of making a fast but cost effective memory is to use an architecture type known as a memory hierarchy. The concept behind memory hierarchy is to use a smaller amount of SRAM, preferably on-chip, and have a larger memory off-chip using less expensive circuitry, such as dynamic random access memory (DRAM). This way, some data needed quickly by the graphics controller is readily available in the on-chip fast SRAM, while the bulk of the data waits in the DRAM. If the controller needs data that is not available in the SRAM, it can pull the data from the DRAM and overwrite existing data in the SRAM. In this system, the SRAM is known as the cache, and the DRAM is the main memory. Memory hierarchy systems using cache may be used for storing texels in graphics systems.
FIG. 1
is a block diagram illustrating one such conventional system. Central processing unit (CPU)
100
can access data directly from cache memory
110
. If the required data is not present, a copy is moved from the main memory
120
, to the cache memory
110
. Extra capacity and storage when the system is powered down is provided by an input output device such as a disk
130
. Each element in the memory hierarchy from left to right has a slower access time, but has a lower per bit storage cost. In this way a system may be optimized for both access time and cost.
The CPU
100
uses the data in the cache memory
110
by making requests for data to cache
110
and reading data from the same. If the CPU
100
requests data not present in cache
110
, a cache miss is said to have occurred. In this case, the cache will retrieve data from the main memory
120
, store it, and provide it to the CPU
100
. Similarly, if the main memory
120
does not contain the required data, the main memory
120
will retrieve data from the disk
130
. If CPU
100
requests data which is present in cache
110
, a cache hit is said to have occurred, and the data does not need to be retrieved from the main memory
120
.
Data may be found in the main memory and stored in cache according to its frame address. A frame address may be divided into three portions, the tag, index, and offset. Generally, the tag is the higher order bits of the frame address, the offset is the lower, and the index is between them. The index determines the location of a data block in cache; the location is referred to as a cache line. The offset identifies the location of a texel in a cache line. The tag is specifies which data block in memory provided the data in the cache line. The tag is generally stored in a table, such that the tag for the data block stored in each cache line may be read.
A required texel's address is used in finding that texel in cache. The index is used to identify which cache line may be holding the required texel. The tags of these cache lines are compared against the tag of the required texel. If there is a match, the required texel can be found in the matching cache line at the offset. If there is no match, the data block with the matching tag is retrieved from memory and placed in cache.
There are two methods by which data blocks in the DRAM are written into cache. These are referred to as direct and associative. In direct mapped the index determines the location in cache where a data block may be placed. Each data block in the main memory has one cache line where it may be placed. That is, each cache line is uniquely identified by the index portion of the frame address. The tag identifies the frame address of the data block stored in a cache line. The direct method has the benefit of the simplicity because once a block's main memory address is known, the location where it may be placed in cache is also known.
The associative method comes in two varieties. In the fully associative method, a data block from memory can be placed in any cache line. In a fully associative cache there is no index signal. This has the advantage of being very flexible, but requires complex circuitry to locate each data block. For example, when attempting to access a texel in cache, the tag for that texel is compared against the tags for every cache line in the cache. In the direct method, since a texel can be placed in only one cache line, only one tag is compared.
A compromise between the direct and fully associative methods is n-way associativity. For example, in 2-way associativity, a data block data may be written into one of two locations in cache. In n-way associativity, there is the advantage that a block in the main memory may be written into more than one location in cache. Furthermore, not all cache line tags need to be compared when looking for a texel, rather n tags are checked.
An inh

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Circuit and method for prefetching data for a texture cache does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Circuit and method for prefetching data for a texture cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Circuit and method for prefetching data for a texture cache will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3042359

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.