Electrical computers and digital processing systems: multicomput – Computer-to-computer data routing – Least weight routing
Reexamination Certificate
1998-10-30
2003-05-06
Oberley, Alvin (Department: 2126)
Electrical computers and digital processing systems: multicomput
Computer-to-computer data routing
Least weight routing
C712S010000
Reexamination Certificate
active
06560629
ABSTRACT:
BACKGROUND OF THE INVENTION
The invention relates to multi-thread processing, more especially to the handling of threads of control within a computer architecture that supports multi-thread processing in hardware.
Multiprogramming, or running multiple processes on a computer having fewer actual processors via time-slicing under operating-system control has been used on computer systems for a long time. Traditionally processes have distinct address spaces and are only permitted to interact via tightly constrained means.
More recently it has become common to provide for separating individual processes into threads of control which share more resources, e.g. address spaces, and so have faster communication between them to enable cooperation on a job, yet still be runnable in parallel on multiple processors. Several existing operating systems, such as Unix and SunOS, can support multi-threading by explicit or compiler-generated programming, whereby multiple, apparently concurrent threads are supported by time-slicing upon fewer actual processors.
An additional use of threads is to follow two or more possible branches of a process. For example, if a process or thread makes a control-path decision based on a binary result which is not yet established, the thread may be copied into two threads so that each can follow one possible value of the binary result. This is referred to as thread duplication. Duplicated threads may run in parallel if two processors are available. The duplicated threads may subsequently replicate further, depending on the subsequent instructions of the thread concerned. Once the binary result becomes established, the thread or threads which followed from that one of the binary results now known to be incorrect can be resolved, i.e. terminated.
To support multi-threading, a computer system thus needs a mechanism by which threads can be managed, including means for creating and terminating threads. The general goal of the designer is, of course, to provide an efficient processing capability.
SUMMARY OF THE INVENTION
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Features of the dependent claims may be combined with those of the independent claims as appropriate and in combinations other than those explicitly set out in the claims.
According to one aspect of the invention a plurality of processing units is provided which are interconnected by communication paths via which the contents of active threads can be transferred between processing units under the control of logic and driver circuits which form part of the processing units and which are arranged to act as intelligent transceiver elements controlling transfer of active threads over the inter-unit communication paths. In operation, thread duplication, for example, will proceed by running one of the threads on the same processing unit and by running the other of the threads on another processing unit. To do this the entire thread state of the originating processing unit is copied to the other processing unit.
Transferring entire thread state of threads from one processing unit to another while the threads are current provides a mechanism by which processing activity in a multi-thread process can be distributed among plural processing units so as to improve processing performance.
A thread transfer may be effected in the course of thread replication or migration. Thread duplication is a special case of thread replication which may involve the splitting of a thread into any plural number of threads. Thread migration involves the movement of a thread of control from one processing unit to another. A thread migration operation is effected by copying the entire state of the originating processing unit to the recipient processing unit and then terminating the thread in the originating processing unit.
In a preferred embodiment of the invention, the concept of activity values is used to provide each processing unit with information that can be used to decide on thread transfer to neighbor processing units. Activity values are passed between neighboring processing units via activity signals to provide each processing unit with an indication of the level of processing activity of the processing units in its vicinity.
For example, if a processing unit is running a thread and encounters an operation on an unknown binary result, then the processing unit may wish to consider whether (a) to generate a thread duplication by transferring one thread to a neighbor processing unit, (b) to stall the thread, or (c) to continue with only one of the possible two branches. To consider option (a), the processing unit can look at the activity values received from each of its neighbor processing units. One or more of the activity values may indicate that the neighbor processing unit concerned is already busy, i.e. running a thread of its own, in which case thread transfer to that processing unit would be excluded. On the other hand, so long as at least one of the received activity values indicates availability to run a thread, there is at least one neighbor processor to which a thread could be transferred. Only if all the received activity values indicate “busy”-ness, will it be necessary to consider options (b) and (c).
An important advantage of the processing unit designs herein described is their scaleability. Scaleability is possible, because each processor unit is autonomous as regards decisions on thread transfer. In particular, the activity value concept as herein described is scaleable provided that the time scales associated with the distribution of activity value signals within the array of processing units are kept shorter than the time scales associated with thread propagation within the array.
Not only may the activity value convey information on nearest, i.e. direct, neighbor processing units but, preferably, also on processing units less directly connected to the processing unit concerned, for example second neighbors or more distant neighbors. This is possible with digital activity values having more than one bit.
The composition of the activity value will depend on the protocol applied by each processing unit for generating activity values for Output. Generally, if a processing unit is active, i.e. currently running a thread, then it will output a BUSY activity value to each of its neighbor processing units. On the other hand, if the processing unit is inactive, i.e. not currently running a thread, then it will output a NOT BUSY activity value to each of its neighbor processing units. Preferably, there will be a plurality of NOT BUSY activity value levels. In this case, the different NOT BUSY activity values may indicate activity levels of processing units in the immediate or more remote proximity of the processing unit concerned and may carry a plurality of contributions from different processing units weighted according not only to proximity, i.e. distance, but also relative position, i.e. direction. The activity value may thus be a composite entity, which, although it will most usually be a scalar quantity, may contain vectorial information. A number of different activity value protocols are discussed further below.
The activity value may be discrete or continuous. In a system designed to operate with discrete activity values, bi-directional digital communication paths are provided between neighboring processing units. In one embodiment, the activity values are discrete, adopting values between 0 and 15, and are conveyed via 4-bit bi-directional communication paths. Other discrete value sets of activity values may be adopted using a multi-bit communication paths of appropriate width. In one embodiment, a two-bit communication path is used to convey a four-state activity value. A primitive implementation, adopted in another embodiment, has only a single bit of width for conveying a two-state activity value. In this case, the activity value conveys only neighbor activity information. Embodiments with a single bit activity value can be implemented wi
O'Melveny & Myers LLP
Oberley Alvin
Sun Microsystems Inc.
Zhen Li
LandOfFree
Multi-thread processing does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Multi-thread processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multi-thread processing will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3006268