Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique
Reexamination Certificate
2000-06-30
2004-07-20
Elmore, Reba I. (Department: 2187)
Electrical computers and digital processing systems: memory
Storage accessing and control
Control technique
C711S126000, C711S118000, C711S158000
Reexamination Certificate
active
06766427
ABSTRACT:
TECHNICAL FIELD OF THE INVENTION
The invention relates generally to processing information using computers and more specifically to a technique to load data from memory to a cache.
BACKGROUND OF THE INVENTION
Computer system performance is sometimes limited by the rate and latency at which data can be transferred between memory and a processor. In an effort to increase the rate at which data can be provided to the processor and reduce access latency, a cache allowing faster access to a relatively small amount of data is often interposed between the memory and the processor. However, such a configuration can impede system performance under certain conditions.
Conventional caching memory architectures are described in Chapter 5 (pp. 373-484) of David A. Patterson and John L. Hennessy, Computer Architecture A Quantitative Approach, Second Edition, Morgan Kaufmann Publishers, Inc., San Francisco, Calif., 1990, 1996, which is incorporated herein by reference.
Conventional caching memory architectures have been developed under the assumption of a random pattern of data access. However, some applications, for example multimedia (e.g., graphics, video, and/or audio) processing, involve different data access patterns for which conventional caching architectures are suboptimal. For example, processing of multimedia data typically occurs in well-defined regular patterns that may be known even before the multimedia data is actually processed. The regularity encountered in the processing of multimedia data not only makes conventional caching architectures suboptimal, but even leads to serious degradation in performance in such architectures.
Moreover, conventional caching architectures are generally insensitive to the manner in which data will be used once they are loaded into the cache. For example, some types of data (e.g., multimedia data), which may be referred to as short-term data, are typically used once and not needed thereafter. However, other types of data (e.g., program code or program state variables), which may be referred to as long-term data, will be accessed repeatedly. Since conventional caching architectures are insensitive to these differences, they tend to keep short-term data in cache longer than necessary, causing cache pollution and resulting in eviction from the cache of long-term data when not desired. Since data that are not needed for later use are retained, while data that are needed for later use are evicted, conventional caching architectures operate inefficiently under such circumstances.
FIG. 1
is a block diagram illustrating a system architecture of the prior art. The system architecture includes memory
101
, cache
102
, memory management unit (MMU)
103
, and processor
104
. Processor
104
, which may also be referred to as a central processing unit (CPU), includes registers
105
. Memory
101
is coupled to cache
102
via bus
106
. Cache
102
is coupled to MMU
103
via bus
107
. MMU
103
is coupled to processor
104
via bus
108
. Alternatively, processor
104
may be coupled to cache
102
via bus
109
.
Processor
104
can execute an instruction to cause a data element stored in memory
101
to be loaded to one of registers
105
via cache
102
and MMU
103
. Processor
104
can also execute an instruction to cause data stored in one of registers
105
to be written to memory
101
via MMU
103
and cache
102
. When loading a data element from memory
101
or writing data to memory
101
, the information passes through and is stored in cache
102
.
A cache generally allows faster access to information than regular memory. Thus, storing information in a cache can help improve system performance by allowing faster subsequent access to information previously stored in cache. However, processing of large arrays of data often requires only a single access to each data element. Thus, use of a cache for such operations can impede system performance.
FIG. 2
is a block diagram illustrating a system of the prior art. The system includes data input array
201
, data input array
202
, data output array
203
, cache
102
, and central processing unit (CPU)
104
. Data input array
201
includes data element
206
. Data input
202
includes data element
207
. Data output array
203
includes data element
208
. Data input array
201
exists in memory with software variables
209
. Software variables
209
include local variable
210
. Cache
102
includes cache location
211
. CPU
104
includes registers
105
.
In the prior art, data elements from data input arrays
201
and
202
are stored in cache
102
prior to processing by CPU
104
. When software executed by CPU
104
processes these data elements, a local copy of the value of the data elements is stored in software variables
209
. A copy of a local variable in local variables
209
is stored in cache
102
. When a result is computed by CPU
104
, the result is written to cache
102
, and, subsequently, to data output array
203
. A technique, such as direct mapping or least recently used (LRU) set-associativity, is provided to determine where in cache
102
data are to be stored. Unfortunately, the technique for determining where in cache
102
data are to be stored does not provide a safeguard to ensure that data from data input array
201
, data from data input
202
, and software variables from software variables
209
do not map to the same cache location in cache
102
. As large arrays of data, spanning several blocks of memory, are processed, it is almost inevitable that the various data elements will map to the same cache location at the same time. This is referred to as cache aliasing. When this happens, it results in thrashing, where data is read from the data array into cache
102
and immediately replaced in cache
102
with other data from a different data array without being used. Thus, many additional unanticipated accesses to data arrays are required to process the data. Multiple accesses to data arrays degrade system performance because the processor
104
must stall while waiting for the data to arrive.
FIG. 3
is a block diagram illustrating a technique of the prior art. As in
FIG. 2
, a data input array
201
includes a data element
206
. The value stored at data element
206
is read into cache location
301
of cache
102
. When the value at cache location
301
is processed, it is copied to a local variable used in the course of processing, the value at cache location
301
is copied to cache location
302
, which corresponds to a local variable
210
of local variables
209
. When the value stored at cache location
302
is to be evicted from cache
102
, it is evicted to local variable
210
of local variables
209
.
Movement of the same data between multiple locations in cache
102
, data input array
201
, and local variables
209
impedes system performance because the data occupies multiple locations in the cache
102
, resulting in more cache traffic and thrashing. Thus, this prior art technique has significant disadvantages.
Other prior art techniques have been attempted. One example is uncached loads and stores when used with a single register. This technique is disadvantageous in that it can require loading the same cache line several times (e.g., four times to load four registers).
Another prior art technique is the use of associative cache. Two-way set associative cache with each memory location mapping to two possible cache locations may be used. However, associative cache is more expensive and more complicated. Moreover, thrashing problems can still occur with two-way set associative cache when multiple sets of data are being processed through the cache. Fully associative cache may be used, but is expensive and complicated. LRU and pseudo-LRU techniques used with associative caches are not necessarily valid for multimedia data because the data are generally accessed only once, and there is no need for persistence of the data in the cache.
Use of direct mapped cache is another technique. Direct mapped cache provides only one location in cache for each me
Wang Avery
Webb Richard W.
ATI International SRL
Vedder Price Kaufman & Kammholz P.C.
LandOfFree
Method and apparatus for loading data from memory to a cache does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for loading data from memory to a cache, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for loading data from memory to a cache will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3194200