Electrical computers and digital processing systems: multicomput – Computer-to-computer data routing – Least weight routing
Reexamination Certificate
1997-12-31
2001-08-07
Banankhah, Majid (Department: 2151)
Electrical computers and digital processing systems: multicomput
Computer-to-computer data routing
Least weight routing
C712S023000, C712S205000, C712S229000, C709S241000
Reexamination Certificate
active
06272520
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the processors, and in particular, to methods for implementing multithreading in processors.
2. Background Art
Modern high-performance processors are designed to execute a large number of instructions per clock, and to this end, they typically provide extensive execution resources. Additional execution resources are often provided on the processor to boost the absolute level of performance, even though the resources are not fully utilized across all the target applications of interest. Processor execution is often marred with stalls for instruction fetches, data cache misses, unresolved data-dependencies and branch latencies. On application workloads which stress the memory subsystem, the latency of delivering instructions and data from the next several levels of memory can be extremely high (100-200 clock cycles). This leads to long pipeline stalls, which leave execution resources on the chip under-utilized. For example, on contemporary processors, over 30% of the application time spent on OLTP-TPC-C (an on-line transaction processing benchmark) may be spent waiting for main memory to return instructions or data to the processor. This under-utilization of resources represents a loss in performance.
One proposed solution to exploit under-utilized resources enhances the processor to execute instructions from multiple process threads simultaneously. This solution is commonly referred to as multi-processors (MP)-on-a-chip or simultaneous multi-threading (SMT). In MP-on-a-chip, a single physical processor chip (“chip”) appears as if it contains two or more logical processors, each executing its own process. In the following discussion, a distinct process executing on a distinct logical processor is referred to as a thread. The chip hardware resources are assigned to a new thread when a currently executing thread stalls waiting for dependent operations. Simultaneous multi-threading processors can even schedule resource utilization at the single instruction slot level.
Another approach to increasing resource utilization implements a coarse grained form of multi-threading. Coarse grained multi-threading switches utilization of chip resources from the currently executing thread to a new thread when the currently executing thread initiates a long latency operation. This reduces the likelihood of long pipeline stalls by allowing the second thread to execute while the long latency operation of the first thread completes.
Switching processor resources from one thread to another incurs a performance penalty, since the current thread's instructions must be flushed or drained from the pipeline, the thread's architectural state must be preserved, the new logical processor must be activated, and instructions from the new thread must be provided to the processor's resources. These steps can take tens of clock cycles (typically 20-40 clock cycles) to complete. Coarse-grained multi-threading thus enhances performance only when threads are switched on operations that would otherwise stall the processor longer than the time required to switch the threads.
Various events have been proposed for triggering thread switches. For example, long latency load operations, such as loads that miss in various stages of a processor's caches, may be used to trigger thread switches. However, not all such loads actually stall the pipeline, and even those operations that do stall the pipeline may not stall it long enough to justify the delay incurred by the thread switch operation. If the thread switch condition is not selected carefully, unnecessary thread switches can reduce or eliminate any performance advantage provided by multi-threading.
Thus, there is a need for methods that can trigger thread switches to avoid long pipeline stalls without generating unnecessary thread switches and maximize the benefits of course grained multithreading.
SUMMARY OF THE INVENTION
The present invention is a method for detecting thread switch conditions that support the efficient implementation of coarse-grained multi-threading.
In accordance with the present invention, a load generated to return data to a register is tracked, and a bit associated with the register is set if the load misses in a selected processor cache. Register read instructions are monitored, and a thread switch condition is indicated when a register read instruction to the register is detected while the associated bit is set.
REFERENCES:
patent: 5361337 (1994-11-01), Okin
patent: 5835705 (1998-11-01), Larsen
patent: 5918033 (1999-06-01), Heeb et al.
patent: 5933627 (1999-08-01), Parady
patent: 6018759 (2000-01-01), Doing
patent: 6088788 (2000-07-01), Borkenhagen
“Reducing Memory Latency via Non-blocking and Prefetching Caches” Tien-Fu Chen and Jean-Loup Baer, 1992 Seattle, WA, Univ. of Washington, Dep. of Comp. Science.*
“Evaluation of Multithreaded Uniprocessors for Commercial Application Environment”, Richard J. Eickemeyer, et al., 1996.*
“Characterization of Alpha AXP Performance Using TP and SPEC Workload.” Zarka Cvetanovic et al., IEEE, 1994.*
“Simultaneous Multithreading: A Platform For Next-Generation Processors”, Eggers, et al., Dept. of Computer Science and Engineering, Seattle, WA, pp. 1-15.
Compilation Issues For A Simultaneous Multithreading Processor, Lo, et al., Dept. of Computer Science and Engineering, Seattle, WA, 2 pp.
“Converting Thread-Level Parallelism To Instruction-Level Parallelism Via Simultaneous Multithreading”, Lo, et al., Dept. of Computer Science and Engineering, Seattle, WA, pp. 1-25.
“Simultaneous Multithreading: Maximizing On-Chip Parallelism”, Tullsen, et al., Dept. of Computer Science and Engineering, Seattle, WA, pp. 1-12.
“Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor”, Tullsen, et al., Dept. of Computer Science and Engineering, Seattle WA, pp. 1-12.
Increasing Superscalar Performance Through Multistreaming:, Yamamoto, et al., Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, pp. 1-10.
Arora Judge K.
Gupta Rajiv
Sharangpani Harshvardhan
Banankhah Majid
Intel Corporation
Novakoski Leo V.
LandOfFree
Method for detecting thread switch events does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for detecting thread switch events, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for detecting thread switch events will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2521546