Memory access request reordering to reduce memory access...

Electrical computers and digital processing systems: memory – Storage accessing and control – Hierarchical memories

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S169000, C711S122000

Reexamination Certificate

active

06487640

ABSTRACT:

FIELD OF THE INVENTION
The invention is generally related to data processing systems and processors therefor, and in particular to retrieval of data from a multi-level memory architecture.
BACKGROUND OF THE INVENTION
Computer technology continues to advance at a remarkable pace, with numerous improvements being made to the performance of both microprocessors—the “brains” of a computer—and the memory that stores the information processed by a computer.
In general, a microprocessor operates by executing a sequence of instructions that form a computer program. The instructions are typically stored in a memory system having a plurality of storage locations identified by unique memory addresses. The memory addresses collectively define a “memory address space,” representing the addressable range of memory addresses that can be accessed by a microprocessor.
Both the instructions forming a computer program and the data operated upon by those instructions are often stored in a memory system and retrieved as necessary by the microprocessor when executing the computer program. The speed of microprocessors, however, has increased relative to that of memory devices to the extent that retrieving instructions and data from a memory can often become a significant bottleneck on performance. To decrease this bottleneck, it is desirable to use the fastest available memory devices possible. However, both memory speed and memory capacity are typically directly related to cost, and as a result, many computer designs must balance memory speed and capacity with cost.
A predominant manner of obtaining such a balance is to use multiple “levels” of memories in a memory architecture to attempt to decrease costs with minimal impact on system performance. Often, a computer relies on a relatively large, slow and inexpensive mass storage system such as a hard disk drive or other external storage device, an intermediate main memory that uses dynamic random access memory devices (DRAM's) or other volatile memory storage devices, and one or more high speed, limited capacity cache memories, or caches, implemented with static random access memory devices (SRAM's) or the like. In some instances, instructions and data are stored in separate instruction and data cache memories to permit instructions and data to be accessed in parallel. One or more memory controllers are then used to swap the information from segments of memory addresses, often known as “cache lines”, between the various memory levels to attempt to maximize the frequency that requested memory addresses are stored in the fastest cache memory accessible by the microprocessor. Whenever a memory access request attempts to access a memory address that is not cached in a cache memory, a “cache miss” occurs. As a result of a cache miss, the cache line for a memory address typically must be retrieved from a relatively slow, lower level memory, often with a significant performance hit.
Some conventional cache designs are “pipelined”, which permits the caches to start subsequent missed memory access requests prior to completion of earlier missed memory access requests. Pipelining reduces memory access latency, or delay, since the handling of cache misses for subsequent memory access requests often does not need to wait until the processing of an earlier miss is complete.
One limitation to cache pipelining arises from the fact that oftentimes memory access requests that are issued closely in time are for data stored in a narrow range of memory addresses, and often within the same cache line. As a result, pipelining may still not effectively reduce memory access latency in some instances because multiple memory access requests may be waiting on the same cache line to be returned from the cache, thus preventing requests for other cache lines from being processed by the cache.
For example, many processors support the use of complex memory instructions that in essence request data from a sequential range of memory addresses. Conventional processor designs typically handle a complex memory instruction by sequentially issuing memory access requests for each memory address in the range specified by the instruction. Moreover, conventional processor designs typically have request queues that are finite in length, so that only a limited number of requests may be pending at any given time.
In many instances, complex memory instructions request data spanning across more than one cache line. A significant performance hit can occur, however, when a portion of the requested data is stored in a cache line other than the first requested cache line, and when that portion of the requested data is not currently stored in the cache. Specifically, with a fixed length request queue, a request for a subsequent cache line may not be issued until after a first cache miss has been completed. No pipelining of the cache misses occurs, and as a result, performance is degraded.
Put another way, assume a complex memory instruction generates a sequence of memory access requests for 0 . . . N words of data starting, for example, at the beginning of a cache line, and that each cache line is capable of storing n words of data. A conventional processor would issue the 0 . . . N memory access requests as follows:
0, 1, 2, . . . , n−1, n, n+1, n+2, . . . , 2n−1, 2n, 2n+1, 2n+2, . . . N.
Assuming that memory access request 0 misses the cache, all other memory access requests in the same cache line (requests 1 . . . n−1) will also miss. As a result, the request queue will typically fill up with requests that are waiting on the same cache line to be retrieved, thereby stalling the processor until the cache miss is completed. Once the cache line is retrieved, the requests can then be handled in sequence, until such time as the first request for the next cache line (request n) can be issued. Assuming also that this request also misses the cache, the next cache line will need to be retrieved. However, given that the request could not be issued until after the first cache miss was completed, pipelining of the cache misses would not occur. As a result, memory access latency increases and overall performance suffers.
Therefore, a significant need exists in the art for a manner of reducing the memory access latency associated with sequences of closely-spaced memory access requests, particularly with respect to complex memory instructions and the like.
SUMMARY OF THE INVENTION
The invention addresses these and other problems associated with the prior art by providing a data processing system, circuit arrangement, integrated circuit device, program product, and method that reduce the memory access latency of sequences of memory access requests by processing at least one request in a sequence out of order relative to another request based upon whether the other request is likely to be for data maintained in the same organizational block (e.g., a cache line) as data requested by an earlier memory access request in the sequence.
Embodiments of the invention attempt to increase the overall throughput of memory access requests by attempting to process certain memory access requests during times in which other memory access requests might otherwise be stalled. For example, in a multi-level memory architecture where data is passed between memory levels in organizational blocks such as cache lines, it is known that whenever a particular memory access request misses a higher level of memory, any other memory access requests that request data in the same organizational block will likewise miss that level of memory. Consequently, any such other memory access request cannot be processed until the block containing the data requested by the first request is retrieved from a lower level of memory. Therefore, by attempting to process other memory access requests prior to one or more requests directed to data in the same organizational block as a particular request, the likelihood increases that other memory access requests can be processed during any dead time that would otherw

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Memory access request reordering to reduce memory access... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Memory access request reordering to reduce memory access..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Memory access request reordering to reduce memory access... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2916603

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.