Apparatus and methods for high throughput self-timed domino...

Electronic digital logic circuitry – Clocking or synchronizing of logic stages or gates – Field-effect transistor

Utility Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C326S119000, C326S121000

Utility Patent

active

06169422

ABSTRACT:

I. BACKGROUND OF THE INVENTION
A. Field of the Invention
The present invention relates to the field of asynchronous circuits. More particularly, apparatus and methods consistent with the present invention relate to logic circuits designed for high-speed asynchronous operation.
B. Description of the Prior Art
Advances in semiconductor fabrication technology allow increasing numbers of logic gates operating at increasing speeds. Synchronous design methodologies require a global clock signal keeping all gates operating in lock-step, which is becoming a greater challenge at such high speeds. Asynchronous design methodologies use local control to determine when a gate may operate, thereby eliminating the global clock distribution problem and potentially offering improved speed, lower power, reduced electromagnetic interference, and a host of other benefits.
There are two classes of asynchronous circuits: “self-timed” and “timed.” Self-timed circuits, also referred to as delay-insensitive circuits, use a handshake between data and control circuits to guarantee that the control does not request an operation until the data is ready. Timed circuits attempt to match the delays of control and data circuits so that the control does not activate until the data is ready. Self-timed circuits are therefore more robust because they do not depend on accurate matching of delays.
In order to use self-timing, data signals must indicate not only a value, but also validity so that the control can check for data validity before proceeding. This can be done by encoding a data bit on two signals rather than one: X_H and X_L. This is called dual-rail signaling. When both signals are low, the data is invalid. When X_H is high, the data bit is a valid high level. When X_L is high, the data bit is a valid low level. X_H and X_L are never simultaneously high.
A popular way of building such data circuits is with dual-rail domino logic. Dual-rail domino gates, also known as dynamic differential cascode voltage switch (DCVS) gates or simply domino gates, accept a control signal and dual-rail inputs. They compute a function of the inputs and produce one or more dual-rail outputs. When the control signal is low, the domino gate is precharged such that both outputs are low. When the control signal is high, the domino gate evaluates, causing one of the two output rails to rise. Such domino gates evaluate quickly, allowing low latency computation.
A variety of approaches exist for building self-timed circuits with dual-rail domino gates. The approaches involve control circuits which apply control signals to the dual-rail domino gates so that the gates evaluate and precharge at the correct times. See, for example, Williams, T. E., “Self-Timed Rings and Their Application to Division,” Computer Systems Laboratory, Departments of Electrical Engineering and Computer Science, Stanford University, Technical Report No. CSL-TR-91-482, May 1991. Using certain control schemes, Williams achieves zero-overhead latency, meaning that the delay from the input of a path to the output consists only of the delays of each gate in the path. Computation does not have to wait for control signals or latch delays.
Unfortunately, these control schemes have poor throughput, or cycle time, compared to aggressive synchronous designs. This is caused by the control schemes which spend excessive time handshaking with data to guarantee the data is ready. Therefore, existing self-timed domino circuits are too slow to be generally competitive with synchronous systems.
FIG. 1
shows a circuit schematic of a conventional dual-rail domino logic gate with completion detection suitable for use in a self-timed system. The particular gate in the illustration computes an AND/NAND function on inputs A and B. The gate accepts a request signal R and dual-rail inputs A_H, A_L, B_H, and B_L. It produces dual-rail outputs OUT_H and OUT_L, which are true and complementary versions of the function A AND B, along with a done signal {overscore (D)} indicating completion of processing by the circuit, and thus validity of the output data. In this context, the true version means A AND B, while the complementary version means {overscore (A AND B)}. Request R is low during the precharge phase, at which time the gate precharges, pulling both outputs low and setting {overscore (D)} high to indicate that the output is invalid. Request R is high during the evaluation phase, and if suitable inputs are high then either OUT_H or OUT_L will evaluate high, and {overscore (D)} will fall to indicate the output is valid.
The gate comprises series n-channel field effect transistors (NFETs)
101
-
102
coupled between nodes
120
and
122
and parallel NFETs
103
-
104
coupled between nodes
121
and
122
. Precharge p-channel field effect transistors (PPETs)
105
and
106
pull nodes
120
and
121
, respectively, to a high level when request R is low. Series evaluation NFET
107
allows node
122
and hence either node
120
or
121
to pull low only when request R is high. Output inverter
108
is coupled between node
120
and output OUT_H, while output inverter
109
is coupled between node
121
and output OUT_L. NOR gate
110
coupled between the output nodes OUT_H and OUT_L and the active low done output {overscore (D)} senses completion.
FIG. 2
is a block diagram of a self-timed domino system showing the interaction of control
210
and data circuits in the datapath
212
. Datapath
212
comprises multiple stages, each stage comprising one or more domino gates sharing the same request signal R. The done signal D
i
from stage i, is computed from the done signals of each gate in the stage to indicate that the entire stage is done. The done signals from each stage of datapath
212
are communicated to control circuits (not shown) in control
210
, which generate appropriate request signals as inputs to datapath
212
. Control
210
comprises generalized control elements (C-elements) (not shown) corresponding to each stage of datapath
212
. There are many conventional control schemes. Two schemes, proposed by Williams, PC
0
and PS
0
, and the cycle time of each, will be discussed.
FIG. 3
shows a C-element control circuit for a conventional PC
0
self-timed domino control scheme. The C-element may be used to implement control
210
of
FIG. 2
, and is responsible for computing request signal R for a particular stage of datapath
212
. The circuit shown in
FIG. 3
computes request signal R
i
for datapath stage i. It comprises a generalized C-element, including NFETs
301
and
302
coupled between node
305
and ground and PFETS
303
and
304
coupled between node
305
and power. The output inverter
306
is coupled between nodes
305
and the output R
i
. Input inverter
307
is coupled from the done signal {overscore (D)}
i−1
of the previous stage to transistors
302
and
303
. Done signal {overscore (D)}
i+1
of the next stage is coupled to transistors
301
and
304
. These connections allow stage i to evaluate when stage i+1 is done precharging, and when stage i−1 is done evaluating. Stage i may precharge when stage i+1 is done evaluating and stage i−1 is done precharging.
FIG. 4
is a portion of the flat dependency graph for the PC
0
self-timed domino control scheme of
FIG. 3
, used to compute the cycle time of the scheme. The nodes of the graph represent the delays of particular transitions, where R is the delay of the generalized C-element computing a request, F is the delay of a stage, or functional block, in the datapath, and D is the delay of completion detection. When an up-arrow or down arrow follows the letter, the delay refers specifically to the delay of the rising or falling transition. For convenience, we refer to the rising delay of F as the evaluation time, E, and the falling delay of F as the precharge time, P. Directed edges between nodes represent constraints enforced on the stage.
Edge
401
indicates that a stage must have a high request before it can evaluate. Edge
402
indicates that a stage must complete e

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and methods for high throughput self-timed domino... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus and methods for high throughput self-timed domino..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and methods for high throughput self-timed domino... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2454225

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.