Prefix sums and an application thereof

Electrical computers: arithmetic processing and calculating – Electrical digital calculating computer – Particular function performed

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06542918

ABSTRACT:

FIELD AND BACKGROUND OF THE INVENTION
The present invention relates to computer architecture and, more particularly, to a method for computing prefix sums and the application of the method to allocating computational resources to concurrently executable tasks, for example, ensuring conflict-free access to multiple single-ported register files.
The definition of a prefix sum is as follows: given an input base, x, and an input array y of k elements, y
1
through y
k
, the prefix sum of the base and the array is an output array, of k+1 elements, whose elements are:
z
0
=x
z
1
=x+y
1
z
2
=x+y
1
+y
2
. . .
z
k
=x+y
1
+y
2
+ . . . +y
k
The latest generation of computer processors is capable of overlapping the performance of several instructions, and even of issuing several instructions per clock cycle. These instructions may belong to one or several “computational threads”. Methods known to the art for coordination among instructions include scoreboarding and Thomasulo's algorithm for uniprocessors, used primarily to efficiently enforce precedence among performed instructions; and “fetch and add” for multiprocessors. These methods are described in standard references, such as “Computer Architecture, a Quantitative Approach”, by John L. Hennessy and David A. Patterson, and “Highly Parallel Computing”, by George S. Almasi and Allan Gottlieb. As the instruction level parallelism of uniprocessors increases, some of the coordination efforts needed on uniprocessors will begin to somewhat resemble those needed on multiprocessors. A sufficiently low-level implementation of prefix summation would be a valuable additional tool for coordinating multiple instructions from a single thread or from multiple computational threads, as well as being a useful primitive operation for higher-level applications to use. It would be highly advantageous to have both a low level implementation of prefix summation and methods of coordinating multiple computational threads that exploit such an implementation.
SUMMARY OF THE INVENTION
According to the present invention there is provided a method for performing a prefix sum of a base and an input array including at least one input element, comprising the step of providing a prefix sum instruction in an instruction set architecture of a microprocessor.
According to the present invention, there is provided a method for performing a base-zero prefix sum of an array including at least one element, comprising the step of providing a base-zero prefix sum instruction in an instruction set architecture of a microprocessor.
According to the present invention, there is provided a method for performing a base-zero suffix sum of an array including at least one element, comprising the step of providing a base-zero prefix sum instruction in an instruction set architecture of a microprocessor.
According to the present invention, there is provided a functional unit for performing the prefix sum of a base and an input array including at least one input element, comprising: (a) a first logical unit, for performing a base-zero prefix sum of the input array, thereby providing a first intermediate array including at least one element, one of the at least one element being a last element; and (b) a second logical unit, for performing a base-zero suffix sum of the input array, thereby providing a second intermediate array including at least one element.
According to the present invention, there is provided a method of providing conflict-free access to a plurality of register files for a plurality of computational cycles, each of the computational cycles requiring access to at least one value having an ordinal number, there being a largest of the at least one ordinal number, each of the computational cycles having a counter, the method comprising the steps of: for each computational cycle: for each value, adding the counter to the ordinal number, thereby providing a serial number; thereby providing, for each computational cycle, a largest serial number.
To date, no microprocessor has included a prefix sum instruction in its instruction set architecture. The present invention is precisely that: the inclusion of a prefix sum instruction in the instruction set architecture. Such an instruction is expressed in instruction code using syntax of the form
PSR
i
R
j
.
Here, “PS” is the instruction name, and R
i
and R
j
are registers. One PS instruction by itself adds the value in register R
i
to the value in register R
j
, returns the result to register R
i
, and stores the original value of R
i
in R
j
. Any other syntax having the same meaning, and including an instruction label field and at least two operand fields, may be used instead. In and of itself, this instruction has an effect similar to that of an “add” instruction. The difference between the PS instruction and an “add” instruction is that several PS instructions may be cascaded into a multiple-PS instruction: for example, the sequence of instructions
PSR
0
R
1
PSR
0
R
2
PSR
0
R
3
. . .
PSR
0
R
k
performs the prefix sum of the input base, stored in register R
0
, and the input array, stored in registers R
1
through R
k
. Upon completion of this sequence of instructions, register R
1
contains the value originally stored in R
0
; register R
2
contains the sum of the values originally stored in registers R
0
and R
1
; register R
3
contains the sum of the values originally stored in registers R
0
through R
2
; and so on until register R
k
, which contains the sum of the values originally stored in registers R
0
through R
k−1
. Finally, R
0
contains the sum of all the values originally stored in all k+1 registers R
0
through R
k
. The microprocessor, through static analysis and dynamic analysis, and through permuting the order of instructions, recognizes a plurality of consecutive single PS instructions, which can comprise a multiple-PS instruction without changing the semantics of the original code. The PS instructions then are decoded collectively as a prefix summation, if possible replacing groups of two or more consecutive single PS instructions by multiple-PS instructions (the allowed multiplicity in a multiple-PS instruction may vary greatly among microprocessors whose instruction set architecture includes a prefix sum).
Prefix summation may be implemented using the instructions of existing instruction sets, for example as a balanced binary tree. Balanced binary tree algorithms for prefix summation are well-known in the art, being presented, for example, in Joseph J
]
J
]
, “An Introduction to Parallel Algorithms”, pp. 43-49, which is incorporated by reference for all purposes as if fully set forth herein. Preferred embodiments of the present invention, however, implement prefix summation in hardware, so that short prefix summations may be completed within one clock cycle and therefore used, for example, for the allocation of computational resources to concurrent tasks, including the allocation of memories and functional units, and including load balancing among tasks coming from single or multiple computational threads. For example, suppose there are three computational threads, each needing three independent adders for its next three instructions, suppose that there are five computational units that can serve as adders, and suppose that each addition requires one clock cycle. Prefix sums may be used to assign unique serial numbers to the instructions so that the adders can be allocated to the instructions and the nine additions completed in two clock cycles. Note that in this application it does not matter in which order the elements of the prefix sum output array are computed.
In a microprocessor configured to do more than one prefix sum concurrently, the two register fields of the prefix sum instruction can serve to indicate to the microprocessor that several prefix sums are to be done in parallel. Specifically, if two or more different values appear in the base field of cascaded PS instructions, and the sets of values i

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Prefix sums and an application thereof does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Prefix sums and an application thereof, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Prefix sums and an application thereof will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3102252

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.