Thread switch logic in a multiple-thread processor

Electrical computers and digital processing systems: processing – Processing control – Context preserving (e.g. – context swapping – checkpointing,...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S219000, C712S023000, C712S245000, C712S229000, C709S241000, C709S241000, C709S241000

Reexamination Certificate

active

06341347

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to processor or computer architecture. More specifically, the present invention relates to multiple-threading processor architectures and methods of operation and execution.
2. Description of the Related Art
In many commercial computing applications, a large percentage of time elapses during pipeline stalling and idling, rather than in productive execution, due to cache misses and latency in accessing external caches or external memory following the cache misses. Stalling and idling are most detrimental, due to frequent cache misses, in database handling operations such as OLTP, DSS, data mining, financial forecasting, mechanical and electronic computer-aided design (MCAD/ECAD), web servers, data servers, and the like. Thus, although a processor may execute at high speed, much time is wasted while idly awaiting data.
One technique for reducing stalling and idling is hardware multithreading to achieve processor execution during otherwise idle cycles. Hardware multithreading involves replication of some processor resources, for example replication of architected registers, for each thread. Replication is not required for most processor resources, including instruction and data caches, translation look-aside buffers (TLB), instruction fetch and dispatch elements, branch units, execution units, and the like.
Unfortunately duplication of resources is costly in terms of integrated circuit consumption and performance.
Accordingly, improved multithreading circuits and operating methods are needed that are economical in resources and avoid costly overhead which reduces processor performance.
SUMMARY OF THE INVENTION
A processor includes a thread switching control logic that performs a fast thread-switching operation in response to an L
1
cache miss stall. The thread switching control logic is typically implemented as a control processor, microcontroller, microcode control logic, a logic circuit, or the like, all of which are well known in the electronics arts. The fast thread-switching operation implements one or more of several thread-switching methods. A first thread-switching operation is “oblivious” thread-switching for every N cycle in which the individual flip-flops locally determine a thread-switch without notification of stalling. The oblivious technique avoids usage of an extra global interconnection between threads for thread selection. A second thread-switching operation is “semi-oblivious” thread-switching for use with an existing “pipeline stall” signal (if any). The pipeline stall signal operates in two capacities, first as a notification of a pipeline stall, and second as a thread select signal between threads so that, again, usage of an extra global interconnection between threads for thread selection is avoided. A third thread-switching operation is an “intelligent global scheduler” thread-switching in which a thread switch decision is based on a plurality of signals including: (1) an L
1
data cache miss stall signal, (2) an instruction buffer empty signal, (3) an L
2
cache miss signal, (4) a thread priority signal, (5) a thread timer signal, (6) an interrupt signal, or other sources of triggering. In some embodiments, the thread select signal is broadcast as fast as possible, similar to a clock tree distribution. In some systems, a processor derives a thread select signal that is applied to the flip-flops by overloading a scan enable (SE) signal of a scannable flip-flop.
A processor includes anti-aliasing logic coupled to an L
1
cache so that the L
1
cache is shared among threads via anti-aliasing. The L
1
cache is a virtually-indexed, physically-tagged cache that is shared among threads. The anti-aliasing logic avoids hazards that result from multiple virtual addresses mapping to one physical address. The anti-aliasing logic selectively invalidates or updates duplicate L
1
cache entries.
A processor reduces wasted cycle time resulting from stalling and idling, and increases the proportion of execution time, by supporting and implementing both vertical multithreading and horizontal multithreading. Vertical multithreading permits overlapping or “hiding” of cache miss wait times. In vertical multithreading, multiple hardware threads share the same processor pipeline. A hardware thread is typically a process, a lightweight process, a native thread, or the like in an operating system that supports multithreading. Horizontal multithreading increases parallelism within the processor circuit structure, for example within a single integrated circuit die that makes up a single-chip processor. To further increase system parallelism in some processor embodiments, multiple processor cores are formed in a single die. Advances in on-chip multiprocessor horizontal threading are gained as processor core sizes are reduced through technological advancements.
The described processor structure and operating method may be implemented in many structural variations. For example two processor cores are combined with an on-chip set-associative L
2
cache in one system. In another example, four processor cores are combined with a direct RAMBUS interface with no external L
2
cache. A countless number of variations are possible. In some systems, each processor core is a vertically-threaded pipeline.
In a further aspect of some multithreading system and method embodiments, a computing system may be configured in many different processor variations that allocate execution among a plurality of execution threads. For example, in a “1C2T” configuration, a single processor die includes two vertical threads. In a “4C4T” configuration, a four-processor multiprocessor is formed on a single die with each of the four processors being four-way vertically threaded. Countless other “nCkT” structures and combinations may be implemented on one or more integrated circuit dies depending on the fabrication process employed and the applications envisioned for the processor. Various systems may include caches that are selectively configured, for example as segregated L
1
caches and segregated L
2
caches, or segregated L
1
caches and shared L
2
caches, or shared L
1
caches and shared L
2
caches.
In an aspect of some multithreading system and method embodiments, in response to a cache miss stall a processor freezes the entire pipeline state of an executing thread. The processor executes instructions and manages the machine state of each thread separately and independently. The functional properties of an independent thread state are stored throughout the pipeline extending to the pipeline registers to enable the processor to postpone execution of a stalling thread, relinquish the pipeline to a previously idle thread, later resuming execution of the postponed stalling thread at the precise state of the stalling thread immediately prior to the thread switch.
In another aspect of some multithreading system and method embodiments, a processor include a “four-dimensional” register structure in which register file structures are replicated by N for vertical threading in combination with a three-dimensional storage circuit. The multi-dimensional storage is formed by constructing a storage, such as a register file or memory, as a plurality of two-dimensional storage planes.
In another aspect of some multithreading system and method embodiments, a processor implements N-bit flip-flop global substitution. To implement multiple machine states, the processor converts 1-bit flip-flops in storage cells of the stalling vertical thread to an N-bit global flip-flop where N is the number of vertical threads.
In one aspect of some processor and processing method embodiments, the processor improves throughput efficiency and exploits increased parallelism by introducing multithreading to an existing and mature processor core. The multithreading is implemented in two steps including vertical multithreading and horizontal multithreading. The processor core is retrofitted to support multiple machine states. System embodiments that exploit retro

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Thread switch logic in a multiple-thread processor does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Thread switch logic in a multiple-thread processor, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Thread switch logic in a multiple-thread processor will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2826585

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.