Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique
Reexamination Certificate
2000-03-10
2001-10-02
Yoo, Do Hyun (Department: 2185)
Electrical computers and digital processing systems: memory
Storage accessing and control
Control technique
C711S137000, C711S151000
Reexamination Certificate
active
06298424
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to memory latency issues within computer systems.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. On the other hand, superpipelined microprocessor designs divide instruction execution into a large number of subtasks which can be performed quickly, and assign pipeline stages to each subtask. By overlapping the execution of many instructions within the pipeline, superpipelined microprocessors attempt to achieve high performance.
Superscalar microprocessors demand low memory latency due to the number of instructions attempting concurrent execution and due to the increasing clock frequency (i.e. shortening clock cycle) employed by the superscalar microprocessors. Many of the instructions include memory operations to fetch (read) and update (write) memory operands. The memory operands must be fetched from or conveyed to memory, and each instruction must originally be fetched from memory as well. Similarly, superpipelined microprocessors demand low memory latency because of the high clock frequency employed by these microprocessors and the attempt to begin execution of a new instruction each clock cycle. It is noted that a given microprocessor design may employ both superscalar and superpipelined techniques in an attempt to achieve the highest possible performance characteristics.
Microprocessors are often configured into computer systems which have a relatively large, relatively slow main memory. Typically, multiple dynamic random access memory (DRAM) modules comprise the main memory system. The large main memory provides storage for a large number of instructions and/or a large amount of data for use by the microprocessor, providing faster access to the instructions and/or data than may be achieved from a disk storage, for example. However, the access times of modern DRAMs are significantly longer than the clock cycle length of modern microprocessors. The memory access time for each set of bytes being transferred to the microprocessor is therefore long. Accordingly, the main memory system is not a low latency system. Microprocessor performance may suffer due to high memory latency.
In order to allow low latency memory access (thereby increasing the instruction execution efficiency and ultimately microprocessor performance), computer systems typically employ one or more caches to store the most recently accessed data and instructions. Additionally, the microprocessor may employ caches internally. A relatively small number of clock cycles may be required to access data stored in a cache, as opposed to a relatively larger number of clock cycles required to access the main memory.
Low memory latency may be achieved in a computer system if the cache hit rates of the caches employed therein are high. An access is a hit in a cache if the requested data is present within the cache when the access is attempted. On the other hand, an access is a miss in a cache if the requested data is absent from the cache when the access is attempted. Cache hits are provided to the microprocessor in a small number of clock cycles, allowing subsequent accesses to occur more quickly as well and thereby decreasing the effective memory latency. Cache misses require the access to receive data from the main memory, thereby increasing the effective memory latency.
In order to increase cache hit rates, computer systems may employ prefetching to “guess” which data will be requested by the microprocessor in the future. The term prefetch, as used herein, refers to transferring data (e.g. a cache line) into a cache prior to a request for the data being received by the cache in direct response to executing an instruction (either speculatively or non-speculatively). A request is in direct response to executing the instruction if the definition of the instruction according to the instruction set architecture employed by the microprocessor includes the request for the data. A “cache line” is a contiguous block of data which is the smallest unit for which a cache allocates and deallocates storage. If the prefetched data is later accessed by the microprocessor, then the cache hit rate may be increased due to transferring the prefetched data into the cache before the data is requested.
Unfortunately, prefetching can consume memory bandwidth at an inopportune time with respect to the occurrence of non-speculative memory operations. For example, a prefetch memory operation may be initiated just slightly prior to the initiation of a non-prefetch memory operation. As the prefetch memory operation is occupying the memory system already, the latency of the non-prefetch memory operation is increased by the amount of time the memory system is occupied with the prefetch request. Particularly if the prefetch is incorrect (i.e. the prefetched data is not used later by the requester), the increased latency may decrease performance of the microprocessor (and the overall computer system).
SUMMARY OF THE INVENTION
The problems outlined above are in large part solved by a computer system in accordance with the present invention. The computer system includes one or more microprocessors. The microprocessors assign a priority level to each memory operation as the memory operations are initiated. In one embodiment, the priority levels employed by the microprocessors include a fetch priority level and a prefetch priority level. The fetch priority level is higher priority than the prefetch priority level, and is assigned to memory operations which are the direct result of executing an instruction. The prefetch priority level is assigned to memory operations which are generated according to a prefetch algorithm implemented by the microprocessor. As memory operations are routed through the computer system to main memory and corresponding data transmitted, the elements involved in performing the memory operations are configured to interrupt the transfer of data for the lower priority memory operation in order to perform the data transfer for the higher priority memory operation.
Advantageously, even though memory bandwidth is consumed by the prefetch memory operations, the latency experienced by the fetch memory operations may not be significantly impacted due to the interrupting of the prefetch memory operations to perform the fetch memory operations. Performance of the computer system may be increased due to the lack of impact on the latency of the fetch memory operations by the prefetch memory operations. Furthermore, more aggressive prefetch algorithms (e.g. algorithms which generate more prefetch memory operations) may be employed because the concerns regarding increasing the memory latency of non-prefetch memory operations because of interference by the prefetch memory operations is substantially allayed. The more aggressive prefetch algorithms may lead to increased prefetch effectiveness, further decreasing overall effective memory latency. Performance of the microprocessors employing the more aggressive prefetch algorithms may thereby by increased, and overall performance of the computer system may accordingly be improved.
While one embodiment of the computer system employs at least a fetch priority and a prefetch priority, the concept of applying priority levels to various memory operations and interrupting data transfers of lower priority memory operations to higher priority memory operations may be extended to other types of memory operations, even if prefetching is not employed within the computer system. For example, speculative memory operations may be prioritized lower than non-speculative memory operations throughout the computer system. Performance of the computer system may thereby be increased.
Broadly speaking, the present invention contemplates a method for transferring data in a computer system. A first memory operation having
Lewchuk W. Kurt
McMinn Brian D.
Pickett James K.
Advanced Micro Devices , Inc.
Conley Rose & Tayon PC
Encarnacion Yamir
Merkel Lawrence J.
Yoo Do Hyun
LandOfFree
Computer system including priorities for memory operations... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer system including priorities for memory operations..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer system including priorities for memory operations... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2568807