Cycle segmented prefix circuits

Electrical computers and digital processing systems: processing – Processing architecture – Superscalar

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C712S003000, C712S012000, C712S221000, C708S421000, C708S632000

Reexamination Certificate

active

06609189

ABSTRACT:

1. BACKGROUND OF THE INVENTION
This invention draws from two different areas: parallel-prefix circuits and superscalar-processor circuits. Parallel-prefix circuits have, in the past, been most often used for parallel computation in machine such as the Connection Machine CM-5 supercomputer. (See, for example, [25, 18, 16, 7, 2].) Throughout this patent, numbers enclosed in square brackets refer to the references cited in Section 4.4 below, each of which is incorporated by reference herein.
Superscalar processor circuits are used to implement processors that exploit instruction-level parallelism (ILP) using out-of order execution. Instruction-level parallelism is the parallelism that can be found in a serial instruction stream because certain of the serial chain of operations are actually independent. One strategy to exploit ILP is to execute the instructions in a different order from that specified by the original serial execution, hence “Out-of-order execution”. A mechanism for out-of-order execution was first described in [44].
The rest of this section first describes parallel-prefix circuits and then describes today's superscalar-processor circuits.
Notation
We use the following notation:
O: The “big-Oh” notation is used to indicate how fast a function grows, ignoring constant factors. Intuitively, we write “g(x) is O(f(x))” if g(x) grows no faster than f(x), for large x ignoring a constant multiplicative factor.
&OHgr;: The “big-Omega” notation is used similarly. Intuitively, we write “g(x) is &OHgr;(f(x))” if g(x) grows no slower than f(x), for large x ignoring a constant multiplicative factor.
&THgr;: The “big-Theta” notation is the intersection of big-Oh and big-Omega. g(x) is &THgr;(f(x)) exactly when g(x) is O(f(x)) and g(x) is &OHgr;(f(x)). Intuitively, this means that g(x) grows at the same speed as f(x) for large x ignoring a constant multiplicative factor. See [5] for a complete and exact treatment of the O, &OHgr;, and &THgr; notation.
lg I is the logarithm, base 2, of I.
log I is the logarithm of I in the natural base e.
Thus, lg I=log
2
I. Note that choice of the base of a logarithm makes only a difference of a constant factor. E.g., lg I≈0.693 log I, which means that lg I is &THgr;(log I) and log I is &THgr;(lg I). In this document we generally use the log base two and we use binary trees. There may be engineering reasons to use a trees of higher degree because the base of the log changes which gives a constant-fold change in the performance. Our designs work for any base and any degree of the tree, including trees of mixed degree.
Ceiling: We write ┌x┐ (pronounced “the ceiling of x”) to be the smallest integer greater than or equal to x.
append lists: If a and b are two lists, then {a,b} is the concatenation of the two lists.
The set of integers x such that a≦x≦b is denoted by [a . . . , b].
Base numerals: When we write 0010
2
the string “0010” should be interpreted as a base two number. Thus 1010
2
=12
8
=10
10
=10.
1.1 Parallel Prefix
This section contains a tutorial on parallel-prefix circuits. First we define the prefix problem, then show how to implement it to run fast. Parallel-prefix circuit design is a technique that can often convert linear-time circuits into logarithmic-time circuits. (See, for example, [5] for a discussion of log-depth parallel-prefix circuits. Segmented parallel prefix circuits are the subject of an exercise of [5], and were implemented in the CM-5 supercomputer [25, 18, 16, 7].)
The prefix problem is as follows. One is given an associative operator {circle around (×)} with an identity value, I. Given some inputs x
0
, x
1
, . . . , x
n−1
we need to compute y
0
, y
1
, . . . , y
n
as: y
i
=x
0
{circle around (×)} x
1
{circle around (×)} . . . {circle around (×)}x
i−1
, where y
0
is defined to be the identity value for the {circle around (×)}. (For example, if {circle around (×)} is addition (and the identity for addition is 0), then y
i
=&Sgr;
j=0
i−1
x
j
.)
Sometimes one wants special initial and final values. One can formulate the prefix problem as having an initial value z that is passed to the circuit. In this case we have y
i
=z{circle around (×)}x
0
{circle around (×)}x
1
. . . {circle around (×)}x
i−1
. This can be viewed as the earlier case simply by renumbering the subscripts so that we have
x
i

=
{
z
if



i
=
0
,


and
x
i
-
1
otherwise
.
and then performing a parallel prefix on the x′ values. Similarly, one would like to get a final output value w from the circuit which is defined to be w=z{circle around (×)}x
0
{circle around (×)}x
1
. . . {circle around (×)}x
n
.
Again, this can be implemented by the earlier case by manipulating subscripts. We simply extend the subscript range to n+1 and compute w as y
n+1
.
The natural and easy thing to do is to compute the y
i
's serially. First one computes each y
i−1
and uses that to compute y
i
as
y
i
=
{
the



identity



value
if



i
=
0
,


and
y
i
-
1



x
i
-
1
otherwise
.
FIG. 1
shows a circuit
10
that computes the prefix operation in linear time. Circuit
10
comprises a plurality of function generators
15
, each having two inputs and one output. Each output is connected as one of the inputs to the next function generator and the other input is an x value. It is easy to see that prefix can be computed in time linear in n. It is surprising to many people that the prefix problem can be solved in time logarithmic in n by using a circuit structure known as parallel prefix. The next three sections review the parallel prefix circuit.
1.1.1 Log-Time Parallel Prefix
Before reviewing the construction of parallel-prefix circuits in general, we present an example.
FIG. 2
shows a parallel-prefix circuit
20
that takes eight inputs, x
0
, x
1
, . . . , x
7
, and computes their prefix sums
y
i
=

j
=
0
i
-
1



x
j
.
Circuit
20
comprises fourteen two-input adders
25
connected in a tree-like structure by signal wires
27
as shown. The inputs x
i
are provided at the bottom of the circuit, and the outputs y
i
come out the bottom, with output y
i
coming out just to the left of where input x
i
goes in. The identity (zero) goes in at the top and the sum of all the values (y
8
) comes out at the top. The critical-path length of this circuit is logarithmic in the number of inputs. This circuit can be laid out in VLSI using an H-tree layout [23] with a resulting area of about A=O(n
2
b
2
) where b is the number of bits in the result y
n
. The resulting wire delay is about O({square root over (A)}). We can further optimize the parallel-prefix sum circuit of FIG.
2
. If we use a redundant representation (such as the carry-save adder as used in Wallace-tree multipliers), with a single final sum at the end, we can perform the entire parallel-prefix sum in only O(log n) gate delays as opposed to O(log
2
n). Furthermore, often the width of the data values is smaller at the inputs than at the outputs (for example, when the inputs x, to a sum are only one bit each, but the output is log n bits, a case which will come up later in this patent), then we can carefully size the ALUs so that they take just the right number of bits as inputs and produce the right number of bits as outputs, which will save area and power. One important special case is when the x
i
's are one-bit each. The problem of summing one-bit inputs is often referred to as the enumeration problem.
In general, a parallel prefix circuit comprises a tree as shown in FIG.
3
. The tree
30
comprises a plurality of treefix modules
35
at its vertices connected by signal wires
37
as shown. The x
i
values are input at the leaves of the tree (at the bottom of the figure). The r

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Cycle segmented prefix circuits does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Cycle segmented prefix circuits, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cycle segmented prefix circuits will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3101952

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.