Branch and return on blocked load or store

Electrical computers and digital processing systems: processing – Processing control – Context preserving (e.g. – context swapping – checkpointing,...

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S205000

Reexamination Certificate

active

06578137

ABSTRACT:

BACKGROUND OF THE INVENTION
The present invention relates to microprocessors which execute multi-threaded programs, and in particular to the handling of blocked (waiting required) memory accesses in such programs.
Many modern computers support “multi-tasking” in which two or more programs are run at the same time. An operating system controls the alternating between the programs, and a switch between the programs or between the operating system and one of the programs is called a “context switch.”
Additionally, multi-tasking can be performed in a single program, and is typically referred to as “multi-threading.” Multiple actions can be processed concurrently using multi-threading.
Most modern computers include at least a first level and typically a second level cache memory system for storing frequently accessed data and instructions. With the use of multi-threading, multiple programs are sharing the cache memory, and thus the data or instructions for one thread may overwrite those for another, increasing the probability of cache misses.
The cost of a cache miss in the number of wasted processor cycles is increasing. This is due to the processor speed increasing at a higher rate than the memory access speeds over the last several years and into the foreseeable future. Thus, more processors cycles are required for memory accesses, rather than less, as speeds increase. Accordingly, memory accesses are becoming a limited factor on processor execution speed.
In addition to multi-threading or multi-tasking, another factor which increases the frequency of cache misses is the use of object oriented programming languages. These languages allow the programmer to put together a program at a level of abstraction away from the steps of moving data around and performing arithmetic operations, thus limiting the programmer control of maintaining a sequence of instructions or data at the execution level to be in a contiguous area of memory.
One technique for limiting the effect of slow memory accesses is a “non-blocking” load or store (read or write) operation. “Non-blocking” means that other operations can continue in the processor while the memory access is being done. Other load or store operations are “blocking” loads or stores, meaning that processing of other operations is held up while waiting for the results of the memory access (typically a load will block, while a store won't). Even a non-blocking load will typically become blocking at some later point, since there is a limit on how many instructions can be processed without the needed data from the memory access.
Another technique for limiting the effect of slow memory accesses is a thread switch. A discussion of the effect of multi-threading on cache memory systems is set forth in the article “Evaluation of Multi-Threaded Uniprocessors for Commercial Application Environments” by R. Eickemeyer et al. of IBM, May 22-24, 1996, 23rd Annual International Symposium on Computer Architecture. The IBM article shows the beneficial effect of a thread switch in a multi-threaded processor upon a level 2 cache miss. The article points out that the use of separate registers for each thread and instruction dispatch buffers for each thread will affect the efficiency. The article assumes a non-blocking level 2 cache, meaning that the level 2 cache can continue to access for a first thread and it can also process a cache request for a second thread at the same time, if necessary.
The IBM article points out that there exist fine-grain multi-threading processors which interleave different threads on a cycle-by-cycle basis. Coarse-grain multi-threading interleaves the instructions of different threads on some long-latency event(s).
As pointed out in the IBM article, switching in the Tera supercomputer, which switches every cycle, is done in round-robin fashion. The Alewife project is cited as handling thread switching in software using a fast trap.
It would be desirable to have an efficient mechanism for switching between threads upon long-latency events.
SUMMARY OF THE INVENTION
The present invention provides a method and apparatus for switching between threads of a program in response to a long-latency event. In one embodiment, the long-latency events are load or store operations which trigger a thread switch if there is a miss in the level 2 cache. A miss in a level 1 cache, or a hit in a level 2 cache will not trigger a thread switch.
In addition to providing separate groups of registers for multiple threads, a group of program address registers pointing to different threads are provided. A switching mechanism switches between the program address registers in response to the long-latency events.
In one embodiment, the next program address register to be switched to is indicated in a thread field within the long-latency instruction itself. In an alternate embodiment, the program address registers are switched in a round-robin fashion.
Preferably, in addition to the program address registers for each thread and the register files for each thread, instruction buffers are provided for each thread. In a preferred embodiment, there are up to four sets of registers to support four threads.
For a further understanding of the nature and advantages of the invention, reference should be made to the following description taken in conjunction with the accompanying drawings.


REFERENCES:
patent: 5057997 (1991-10-01), Chang et al.
patent: 5361337 (1994-11-01), Okin
patent: 5515538 (1996-05-01), Kleiman
patent: 5524250 (1996-06-01), Chesson et al.
patent: 5535361 (1996-07-01), Hirata et al.
patent: 5546593 (1996-08-01), Kimura et al.
patent: 5553305 (1996-09-01), Gregor et al.
patent: 5574939 (1996-11-01), Keckler et al.
patent: 5724565 (1998-03-01), Dubey et al.
patent: 5742822 (1998-04-01), Motomura
patent: 5796970 (1998-08-01), Higaki et al.
patent: 5872985 (1999-02-01), Kimura
patent: 6253313 (2001-06-01), Morrison et al.
Hirata et al., “An elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads,” Computer Architecture News, vol. 20, No. 2, May 1992, pp.1136-145.*
Eickemeyer et al., “Evaluation of multithreaded uniprocessors for commercial application environments,” pp. 203-212, May 1996.
Kawano et al., “Fine-grain multi-thread processor architecture for massively parallel processing,” pp. 308-317, May 1995.
“Register Banking for IBM System/370,” IBM Technical Disclusure Bulletin, vol. 34, No. 4B, Sep. 1991, pp. 372-373.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Branch and return on blocked load or store does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Branch and return on blocked load or store, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Branch and return on blocked load or store will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3110355

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.