Electrical computers and digital processing systems: processing – Processing control – Branching
Reexamination Certificate
2001-03-23
2004-07-20
Coleman, Eric (Department: 2183)
Electrical computers and digital processing systems: processing
Processing control
Branching
Reexamination Certificate
active
06766445
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to computer hardware, and more particularly, to a hardware structure for use in custom hardware accelerators used to expedite loops.
BACKGROUND OF THE INVENTION
Computer programs often make use of “loops” to process data. A loop consists of a network of operations that are repeatedly applied to a stream of input data to generate a stream of results. Custom integrated circuits likewise make use of such loops.
Hardware arrangements designed to accelerate the computation of loops are known to the art. In general, these hardware structures employ a plurality of function units working on different iterations of the loop to reduce the time needed to compute the loop by overlapping the computations of a number of loop iterations. The highest degree of overlap is obtained when a distinct function unit executes each operation within the body of the loop, and a new iteration is initiated on every clock cycle. In this case, there is a simple one-to-one correspondence between hardware function units and operations within the program graph as well as a simple correspondence between dataflow edges in the program graph and actual hardware datapaths. Simple one-to-one solutions are very efficient because they feature a minimal set of resources that are all busy on every cycle. Such designs, however, are often too costly. Less costly designs utilize schemes in which a plurality of function units are used to provide overlapped computations; however, the ensemble of function units only initiates a loop iteration every II cycles, where II>1.
In general, one iteration of the loop generates values that are needed in subsequent computations, either in the current iteration or in a subsequent iteration. These values must be stored in some form of high-speed storage that is accessible to all of the function units that require these values. The cost of this storage represents a significant fraction of the cost of a hardware loop accelerator.
Broadly, it is the object of the present invention to provide an improved hardware accelerator architecture for accelerating loops.
It is a further object of the present invention to provide a high-speed storage system for use in hardware accelerators and the like.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.
SUMMARY OF THE INVENTION
The present invention is a computational unit for use in loop computations. The computational unit includes a function unit, a plurality of phase lines, and a storage register. The computational unit is programmed to initiate one iteration of the loop every II cycles. The function unit has a result output for outputting one computational result each cycle. There is one phase line corresponding to each of the II cycles. The storage register includes a linear connected array of shift cells having a first shift cell. Each shift cell has an input port, an output port, a shift control port, and an OR gate. Each shift cell receives the value to be stored in the shift cell on the input port, the stored value being stored in response to a control signal on the shift control port. The OR gate has an output connected to the shift enable port and one input for each cycle on which that shift cell is to receive the control signal, that input being connected to the phase line corresponding to that cycle. The input port of the first shift cell is connected to the result output. A plurality of such computational units can be connected together to form a loop accelerator. The accelerator includes a cross-connect circuit for coupling at least one shift cell output of one of the computational units to an input of a function unit of another of the computational units on a selected one of the II cycles.
REFERENCES:
patent: 3944989 (1976-03-01), Yamada
patent: 4078258 (1978-03-01), Lindsey et al.
patent: 4097920 (1978-06-01), Ozga
patent: 4437166 (1984-03-01), O'Brien
patent: 5958048 (1999-09-01), Babaian et al.
patent: 6226776 (2001-05-01), Panchul
patent: 0286356 (1988-10-01), None
patent: 0416513 (1991-03-01), None
patent: 0254123 (1998-01-01), None
S Aditya et al—Automatic Architectural Synthesis of VLIW and EPIC Processors—System Synthesis 1999—Proceedings 12th International Symposium -Nov. 1999 -pp. 107-113.
Gupta Shail Aditya
Kathail Vinod Kumar
Schlansker Michael Steven
Coleman Eric
Hewlett--Packard Development Company, L.P.
LandOfFree
Storage system for use in custom loop accelerators and the like does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Storage system for use in custom loop accelerators and the like, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Storage system for use in custom loop accelerators and the like will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3192508