Optimized production of hardware from source programs...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Optimized production of hardware from source programs... Optimized production of hardware from source programs...

: 2001-08-21
: 2004-11-16
: Dam, Tuan (Department: 2122)
: Data processing: software development, installation, and managem
: Software program development tool
: Translation of code

: C717S144000, C717S149000, C717S156000
: Reexamination Certificate
: active
: 06820257
: ABSTRACT:

BACKGROUND OP THE INVENTION
1. Field of the Invention
The invention relates a method of compiling a source program to produce hardware.
2. Description of the Related Art
High-level synthesis (also referred to as behavioral synthesis, and described in D. Gajski, A. Wu, N. Dutt, and S. Lin. High-level synthesis: introduction to chip and system design. Kluwer Academic Publishers, 1992.), involves automatically compiling a description of the behaviour of a circuit in a language such as Behavioural VHDL, C/C++ or Java down to equivalent register transfer level (RTL for short) code (e.g. K. Nishida, K. Okada, M. Ohnishi, A. Kay, P. Boca, A. Yamada, and T. Kambe. A high-level synthesis method considering synchronous communications between threads. In Proceedings of VLD '99, 1999). RTL is data providing a low level description of the desired circuit. Compiling a source program down to RTL code typically involves subjecting it to a number of steps, as follows:
(a) First, the source program has to be turned into an internal representation. There are several such representations to choose from, but typically control and data-flow graphs (CDFGs) (see D. Gajskl , A. Wu, N. Dutt, and S. Lin. High-level synthesis: introduction to chip and system design. Kluwer Academia Publishers, 1992.) are used. The CDFG is produced by a process known as data flow analysis.
The CDFG nodes which the source-level optimiser produces (see
FIG. 4
) are abstract operations which correspond to source-level concepts such as multiplication, shifting, addition, communication and so on. For example, in
FIG. 4
, the boxes with “*” and “*” in them are nodes.
(b) The CDFG is then optimised. The optimisations, which are performed on the behavioural description, fall into two categories those known in software compiler technology and those specifically tailored towards hardware. It is prudent to apply as many optimisations as possible before one commits to a particular hardware architecture, as the size (in terms of silicon area, i.e. number of gates) of expensive operations such as multipliers and hardware shifters can be significantly reduced.
(c) Binding. This maps operations in the CDFGs to possible functional units (FUs for short) which may be required to realise the design in hardware.
During binding each abstract operation is assigned an appropriate bound operator. For example an 8-bit abstract add operation may be assigned to an 8-bit adder, it could equally well be assigned to a more general bound operator, such as a combined add/subtract unit.
(d) Scheduling. This works out in which clock cycle each of the operations in the CDFG is to be performed, subject to constraints such as clock speed and available resources.
(e) Allocation. This assigns an actual FU to each operation in the CDFG. To minimise the number of FUs used, sharing is carried out. This involves using the same FU for two different operations of the same kind. For example, if in one clock cycle there is a 3 bit add and in another a 6 bit add, a single 6 bit adder FU can be used for both operations.
During allocation each bound operator is allocated to a particular instance of a circuit component, called a functional unit. Functional units usually correspond to instantiations of library components in the RTL. There is a notion of a general circuit component, e.g. a multiplier, and the allocator assigns a bound operator to a particular instance of that component. For example, a 3×3→3 bound multiplier may be allocated to a 3×3→3 functional unit.
Thus a single functional unit in the low-level RTL might be responsible for implementing several of the abstract operations in the optimised CDFG.
(f) The final stage of high-level synthesis is to generate RTL code from the scheduled CDFG.
Referring to
FIG. 2
, steps (a) and (b) above correspond to the source-level optimisation stage, and steps (c) to (f) correspond to the high-level synthesis stage.
A logic-synthesis tool (such as Synopsys' Design Compiler) can then be used to turn the RTL into a gate-level description. From such a description, a chip
10
(FIG.
1
)can be produced. (Note: not all high-level synthesis systems treat binding, allocation and scheduling an three distinct stages some of them combine these stages.)
Some examples of high-level synthesis systems include the following; Behavioral Compiler,(see synopsys.com on the World Wide Web); FrontierDesign; (see frontierd.com on the World Wide Web); C level Design, (see cleveldesian.com on the World Wide Web); and A. Kay, “Hardware Compiler”, UK Patent Application No. 2317245, filed Sep. 12, 1996.
In the high-level languages mentioned above, the multiplication operations are typically homogeneously typed. That is to say, the types (the word “type” in this specification is used to indicate sign and bit width) of the inputs to these operators are the same as the types of the output. Compilers for such languages (e.g. the Bach compiler A. Yamada, K Nishida, R. Sakurai, A. Kay, T. Nomura, and T. Kambe. “Hardware synthesis with the Bach system”. In ISCAS '99, 1999) insert type casts automatically so that multiplications are homogenous. Users are encouraged to use homogenous operators rather than to optimise their code by hand for performance. This is because homogeneity reduces the complexity of the language, leading to designs that are more likely to be correct. However, the cost one pays is that the design may not be efficient.
FIG. 2
shows the two stages of high-level hardware design to be source-level optimisation followed by high level synthesis. This is then followed by low-level synthesis. Typically some kind of data-flow representation is used between the source level and the first synthesis level, for example CDFG.
A multiplier takes two integers as its input, and returns their product as its output . Multipliers have different implementations depending on whether inputs and output are to be regarded as signed or unsigned integers. There are 8 different cases in all. To simplify matters we consider here a simpler model where all inputs and outputs are unsigned. The specific embodiments described below cover more cases.
In previous tools the “alphabet” of possible multipliers available to be synthesised was always of the form (a) r×r→r or (b) r×r→2r where p×q→r stands for a multiplier whose inputs are of width p and q and whose output is of width r. We call these types (a) and (b) homogeneous multipliers.
Non-homogeneous multipliers can be of the form p×q→r where p and q may be different, and unrelated in size to r.
High-level synthesis systems apply a number of optimizations to designs to reduce their area. One such optimisation is replacing power-of-two multiplications by left-shift operations. (A left-shift operation is written using the symbol <<.) For example: x*8=×<<3. This is known as strength reduction (see G. Micheli. “Synthesis and Optimization of Digital Circuits.” McGraw Hill, 1994).
Strength reduction also applies to multipliers where one of the operands is expressible as the sum or difference of powers of 2. In such cases, the multiplier can be replaced by a sum or difference of two shifts. For example: x*7=x <<3−x, (Note; x=x<<0.)
Previous optimisers may perform strength reduction to completely remove a multiplier, but always generate a homogeneous representation to pass to the high-level synthesis stage. Typically, any particular tool will use type (a) or type (b) but not both.
The invention provides a method of compiling a source program to produce hardware, including the steps of:
(a) carrying out data flow analysts of the source program to produce a data flow representation of the source program, which data flow representation comprises a number of multipliers each arranged to accept first and second input arguments having first and second input bit widths respectively, and to produce an output having an output bit width; and
(b) optimising the data flow representation so tha

Affiliated with

Boca Paul Philip

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kay Andrew

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Dam Tuan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Renner , Otto, Boisselle & Sklar, LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Sharp Kabushiki Kaisha

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Vo Ted T.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Optimized production of hardware from source programs... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Optimized production of hardware from source programs..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Optimized production of hardware from source programs... will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3360035

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure