Data processing: software development – installation – and managem – Software program development tool – Translation of code
Reexamination Certificate
2001-08-21
2004-11-16
Dam, Tuan (Department: 2122)
Data processing: software development, installation, and managem
Software program development tool
Translation of code
C717S144000, C717S149000, C717S156000
Reexamination Certificate
active
06820257
ABSTRACT:
BACKGROUND OP THE INVENTION
1. Field of the Invention
The invention relates a method of compiling a source program to produce hardware.
2. Description of the Related Art
High-level synthesis (also referred to as behavioral synthesis, and described in D. Gajski, A. Wu, N. Dutt, and S. Lin. High-level synthesis: introduction to chip and system design. Kluwer Academic Publishers, 1992.), involves automatically compiling a description of the behaviour of a circuit in a language such as Behavioural VHDL, C/C++ or Java down to equivalent register transfer level (RTL for short) code (e.g. K. Nishida, K. Okada, M. Ohnishi, A. Kay, P. Boca, A. Yamada, and T. Kambe. A high-level synthesis method considering synchronous communications between threads. In Proceedings of VLD '99, 1999). RTL is data providing a low level description of the desired circuit. Compiling a source program down to RTL code typically involves subjecting it to a number of steps, as follows:
(a) First, the source program has to be turned into an internal representation. There are several such representations to choose from, but typically control and data-flow graphs (CDFGs) (see D. Gajskl , A. Wu, N. Dutt, and S. Lin. High-level synthesis: introduction to chip and system design. Kluwer Academia Publishers, 1992.) are used. The CDFG is produced by a process known as data flow analysis.
The CDFG nodes which the source-level optimiser produces (see
FIG. 4
) are abstract operations which correspond to source-level concepts such as multiplication, shifting, addition, communication and so on. For example, in
FIG. 4
, the boxes with “*” and “*” in them are nodes.
(b) The CDFG is then optimised. The optimisations, which are performed on the behavioural description, fall into two categories those known in software compiler technology and those specifically tailored towards hardware. It is prudent to apply as many optimisations as possible before one commits to a particular hardware architecture, as the size (in terms of silicon area, i.e. number of gates) of expensive operations such as multipliers and hardware shifters can be significantly reduced.
(c) Binding. This maps operations in the CDFGs to possible functional units (FUs for short) which may be required to realise the design in hardware.
During binding each abstract operation is assigned an appropriate bound operator. For example an 8-bit abstract add operation may be assigned to an 8-bit adder, it could equally well be assigned to a more general bound operator, such as a combined add/subtract unit.
(d) Scheduling. This works out in which clock cycle each of the operations in the CDFG is to be performed, subject to constraints such as clock speed and available resources.
(e) Allocation. This assigns an actual FU to each operation in the CDFG. To minimise the number of FUs used, sharing is carried out. This involves using the same FU for two different operations of the same kind. For example, if in one clock cycle there is a 3 bit add and in another a 6 bit add, a single 6 bit adder FU can be used for both operations.
During allocation each bound operator is allocated to a particular instance of a circuit component, called a functional unit. Functional units usually correspond to instantiations of library components in the RTL. There is a notion of a general circuit component, e.g. a multiplier, and the allocator assigns a bound operator to a particular instance of that component. For example, a 3×3→3 bound multiplier may be allocated to a 3×3→3 functional unit.
Thus a single functional unit in the low-level RTL might be responsible for implementing several of the abstract operations in the optimised CDFG.
(f) The final stage of high-level synthesis is to generate RTL code from the scheduled CDFG.
Referring to
FIG. 2
, steps (a) and (b) above correspond to the source-level optimisation stage, and steps (c) to (f) correspond to the high-level synthesis stage.
A logic-synthesis tool (such as Synopsys' Design Compiler) can then be used to turn the RTL into a gate-level description. From such a description, a chip
10
(FIG.
1
)can be produced. (Note: not all high-level synthesis systems treat binding, allocation and scheduling an three distinct stages some of them combine these stages.)
Some examples of high-level synthesis systems include the following; Behavioral Compiler,(see synopsys.com on the World Wide Web); FrontierDesign; (see frontierd.com on the World Wide Web); C level Design, (see cleveldesian.com on the World Wide Web); and A. Kay, “Hardware Compiler”, UK Patent Application No. 2317245, filed Sep. 12, 1996.
In the high-level languages mentioned above, the multiplication operations are typically homogeneously typed. That is to say, the types (the word “type” in this specification is used to indicate sign and bit width) of the inputs to these operators are the same as the types of the output. Compilers for such languages (e.g. the Bach compiler A. Yamada, K Nishida, R. Sakurai, A. Kay, T. Nomura, and T. Kambe. “Hardware synthesis with the Bach system”. In ISCAS '99, 1999) insert type casts automatically so that multiplications are homogenous. Users are encouraged to use homogenous operators rather than to optimise their code by hand for performance. This is because homogeneity reduces the complexity of the language, leading to designs that are more likely to be correct. However, the cost one pays is that the design may not be efficient.
FIG. 2
shows the two stages of high-level hardware design to be source-level optimisation followed by high level synthesis. This is then followed by low-level synthesis. Typically some kind of data-flow representation is used between the source level and the first synthesis level, for example CDFG.
A multiplier takes two integers as its input, and returns their product as its output . Multipliers have different implementations depending on whether inputs and output are to be regarded as signed or unsigned integers. There are 8 different cases in all. To simplify matters we consider here a simpler model where all inputs and outputs are unsigned. The specific embodiments described below cover more cases.
In previous tools the “alphabet” of possible multipliers available to be synthesised was always of the form (a) r×r→r or (b) r×r→2r where p×q→r stands for a multiplier whose inputs are of width p and q and whose output is of width r. We call these types (a) and (b) homogeneous multipliers.
Non-homogeneous multipliers can be of the form p×q→r where p and q may be different, and unrelated in size to r.
High-level synthesis systems apply a number of optimizations to designs to reduce their area. One such optimisation is replacing power-of-two multiplications by left-shift operations. (A left-shift operation is written using the symbol <<.) For example: x*8=×<<3. This is known as strength reduction (see G. Micheli. “Synthesis and Optimization of Digital Circuits.” McGraw Hill, 1994).
Strength reduction also applies to multipliers where one of the operands is expressible as the sum or difference of powers of 2. In such cases, the multiplier can be replaced by a sum or difference of two shifts. For example: x*7=x <<3−x, (Note; x=x<<0.)
Previous optimisers may perform strength reduction to completely remove a multiplier, but always generate a homogeneous representation to pass to the high-level synthesis stage. Typically, any particular tool will use type (a) or type (b) but not both.
The invention provides a method of compiling a source program to produce hardware, including the steps of:
(a) carrying out data flow analysts of the source program to produce a data flow representation of the source program, which data flow representation comprises a number of multipliers each arranged to accept first and second input arguments having first and second input bit widths respectively, and to produce an output having an output bit width; and
(b) optimising the data flow representation so tha
Boca Paul Philip
Kay Andrew
Dam Tuan
Renner , Otto, Boisselle & Sklar, LLP
Sharp Kabushiki Kaisha
Vo Ted T.
LandOfFree
Optimized production of hardware from source programs... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Optimized production of hardware from source programs..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimized production of hardware from source programs... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3360035