Electrical computers and digital processing systems: multicomput – Computer-to-computer data routing – Least weight routing
Reexamination Certificate
1997-07-10
2001-05-15
Banankhah, Majid A. (Department: 2151)
Electrical computers and digital processing systems: multicomput
Computer-to-computer data routing
Least weight routing
C712S207000
Reexamination Certificate
active
06233599
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Technical Field of the Invention
The present invention relates in general to a method and apparatus for partitioning a processor register set to improve the performance of multi-threaded operations. More particularly, the present invention relates to a method and apparatus for retrofitting multi-threaded operations on a conventional computer architecture. Still more particularly, the present invention relates to a method and apparatus for partitioning the processor register set and managing the register subsets to improve multi-threading performance of a computer.
2. Description of Related Art
Single tasking operating systems have been available for many years. In single tasking operating systems, a computer processor executes computer programs or program subroutines serially. In other words, a computer program or program subroutine must be completely executed before execution of another program or subroutine can begin.
Single tasking operating systems are inefficient because the processor must wait during the execution of some steps. For example, some steps cause the processor to wait for a data resource to become available or for a synchronization condition to be met. To keep the processor busy and increase efficiency, multi-threaded operating systems were invented.
In multi-threaded operating systems, the compiler breaks a task into a plurality of threads. Each of the threads performs a specific task which may be executed independently of the other threads. Although the processor can execute only one thread at a time, if the thread being executed must wait for the occurrence of an external event such as the availability of a data resource or a synchronization event, then the processor switches threads. Although thread switching itself requires a few processor cycles, if the waiting time exceeds this switching time, then processor efficiency is increased.
Accessing internal state, for example on-chip processor registers, generally requires fewer processor clock cycles than accessing external state, for example cache or memory. Increasing the number of registers inside the processor generally decreases the probability of external accesses to cache or memory. In other words, to decrease the amount of external state memory requests, the prior art generally increases the number of processor registers.
For example, the latest generations of instruction set architectures, including RISC (Reduced Instruction Set Computers) and VLIW (Very Long Instruction Word) processors, typically improve execution of a single task by increasing the number of registers. Such processors often have 64 to 256 registers capable of retaining integer and/or floating point values.
Computer system architectures and programming trends are moving toward multi-threaded operations rather than a single, sequential tasks. To multithread an operation, each task is decomposed by the compiler into more than one thread. Because threads tend to run for much shorter intervals before being completed than a single large task, threads tend to have a smaller associated state per thread. In other words, each thread of a multithreaded operation tends to require fewer associated registers than a single large task which generally requires a large number of registers to execute.
Threads typically are allowed to run until a thread switch event occurs. A thread switch event occurs, for example, when a referenced memory location is not found in the cache or a program-defined synchronization condition is not met. For example, when an L
2
cache miss occurs, then the main memory must be accessed which is, of course, very time consuming. Instead of waiting, the processor switches threads.
When a thread is suspended due to a thread switch event, its inactive or NOT READY state may be retained within the processor registers. In the prior art, however, if a given thread does not resume execution within a few thread commutations, the finite register storage available within the processor leads to swapping of thread state between the processor and memory. In other words, the prior art swaps the entire thread context between the inactivated thread and the next thread to be processed.
Thread switching requires several processor cycles and directly competes for processor, bus and memory resources. Because the prior art switches the entire thread state upon a thread switch event, good multithreading performance dictates a reduced internal state or, in other words, a smaller amount of registers within the processor.
Thus, there is a conflict between established processor instruction set architectures optimized for a single task which require a large number of internal processor registers and the demands of newer, multithreaded architectures and programming systems which require relatively few internal processor registers for high-performance, multithreading operations.
Furthermore, the computer industry has a tremendous investment in software and hardware embodying existing instruction set architectures. As a result, it is very difficult to successfully introduce hardware and software which embodies a new and incompatible instruction set architecture.
For example, adding hardware to duplicate the register set is a known technique for increasing multithreaded performance. In other words, the prior art duplicates the entire register set including special purpose registers and general purpose registers so that each thread has its own dedicated register set to facilitate thread switching. Register set duplication, however, greatly increases the circuit complexity and makes the circuit layout more difficult to implement.
SUMMARY OF THE INVENTION
The present invention retrofits multithreaded operations on a computer utilizing an existing instruction set architecture. Introducing a specially designed multithreaded computer requiring an incompatible instruction set architecture may encounter marketing, difficulties. To retrofit multithreaded operations, the invention partitions an existing processor register set into register subsets, including an overlapping register. Because the existing instruction set may be utilized, the marketability of the present invention is enhanced.
After partitioning an existing register set into register subsets, the invention allocates the register subsets to a plurality of threads such that each thread has an associated register subset which stores that thread's resources. Partitioning the processor registers into register subsets permits the processor to have thread resources for each of the various threads readily at hand in the processor registers. To increase the capacity of the partitioned registers, the invention permits overlapping register subsets wherein some or all of the registers are allocated to more than one thread. This invention has clear advantages over the prior art because the entire state for each thread does not have to be exchanged and, instead, the state of each thread is maintained in register subsets within the processor registers.
After loading the register subsets, including the overlapping registers, with corresponding thread resources, the invention manages the register subsets during thread switching.
It is another object of the present invention to provide an improved data processing system and method for implementing multithreaded operations.
It is still another object of the present invention to improve multithreading performance of a processor implementing a conventional instruction set architecture.
It is, therefore, an object of the present invention to partition the processor registers into either overlapping or non-overlapping register subsets and allocate the partitioned register subsets to the plurality of threads.
It is a further object of the present invention to provide a method and apparatus for improving multithreaded performance which avoids swapping an entire thread context by swapping thread resources only when the thread resource or portion thereof is not within the corresponding register subset.
It is still another object of the pre
Nation George Wayne
Newshutz Robert N.
Willis John Christopher
Banankhah Majid A.
International Business Machines - Corporation
Ojanen Karuna
LandOfFree
Apparatus and method for retrofitting multi-threaded... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for retrofitting multi-threaded..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for retrofitting multi-threaded... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2475741