Computer-aided design and analysis of circuits and semiconductor – Nanotechnology related integrated circuit design
Reexamination Certificate
1999-07-23
2002-07-16
Niebling, John F. (Department: 2812)
Computer-aided design and analysis of circuits and semiconductor
Nanotechnology related integrated circuit design
C716S030000
Reexamination Certificate
active
06421809
ABSTRACT:
FIELD OF THE INVENTION
The invention relates to methods for designing essentially digital devices, and focuses on memory related design issues, more in particular with respect to power consumption of said digital devices.
BACKGROUND OF THE INVENTION
An essentially digital device comprises at least of a memory organization (an amount of memories with their sizes and an interconnection pattern); and registers. Such a memory organization is determined during the design process of said digital device. The operation of an essentially digital system can essentially be described as a set of data access operations or instructions on data structures or variables, being stored in said memories.
In [L. Stok, Data path synthesis, integration, the VLSI journal, Vol.1.18, pp.1-71, June 1994.] register allocation, starting from a filly scheduled flow graph (thus ordered data access operations or instructions are used as input), are resented. Said allocation techniques are scalar oriented. Many of these techniques construct a scalar conflict or compatibility graph and solve the problem using graph coloring or clique partitioning. This conflict graph is fully determined by the schedule which is fixed before. This means that no effort is spent to come up with an optimal conflict graph and thus the potential optimization by reconsidering the schedule is. not exploited. Moreover only register allocation is addressed and not memories.
In the less explored domain of memory allocation and assignment for hardware systems, the current techniques start from a given schedule [L, Ramachandran, D. Gajski, V. Chaiyakul, An algorithm for array variable clustering, Proceedings European Design and Test Conference, pp.262-266, Paris, March. 1994.],[P. Lippens, J. van Meerbergen, W. Verhaegh, A. van der Werf, Allocation of Multiport Memories for Hierarchical Data Streams, Proceedings IEEE International Conference on Computer-Aided Design, pp.728-735, Santa Clara, November 1993.],[O. Sentieys, D. Chillet, J. P. Diguet, J. Philippe, Memory module selection for high-level synthesis, Proceedings IEEE workshop on VLSI signal processing, Monterey Calif., Oct. 1996.] or perform first a bandwidth estimation step [F. Balasa, F. Catthoor, H. DeMan, Dataflow-driven memory allocation for multi-dimensional processing systems”, Proceedings IEEE International Conference on Computer Aided Design}, San Jose Calif., November 1994.] which is a kind of crude ordering that does not really optimize the conflict graph either. These techniques have to operate on groups of signals instead of on scalars to keep the complexity acceptable.
In the parallel compiler domain [M. Al-Mouhamed, S. Seiden, A Heuristic Storage for Minimizing Access Time of Arbitrary Data Paterns, IEEE Transactions on Parallel and Distributed Systems, Vol.8, No.4, pp.441-447, Apr. 1997.] proposes a technique to partition arrays into groups of data that have to be assigned to different memories such that they can be accessed simultaneously for an SIMD architecture. They combine the constraints of a number of given access patterns into a single linear address transformation that calculates for every data element the memory in which it should be stored to minimize the total access time. This technique allows to avoid the allocation of multi-port memories for storing data with self-conflicts, by explicitly splitting arrays into smaller arrays that can be assigned to single port memories. However said method does not exploit all optimization opportunities for instance by rescheduling data access instructions.
In the scheduling domain, the techniques optimizing for the number of resources given the cycle budget mostly operate on the scalar level. Many of these techniques try to reduce the memory related cost by estimating the required number of registers for a given schedule. Only few of them try to reduce the required memory bandwidth, which they do by minimizing the number of simultaneous data accesses. They do not take into account which data is being accessed simultaneously. Also no real effort is spent to optimize the data access conflict graphs such that subsequent register/memory allocation tasks can do a better job.
[S. Pinter, Register Allocation with Instruction Scheduling: a New Approach, ACM SIGPLAN Notices, Vol.28, pp.248-25, June 1993.] optimizes a conflict graph in the context of scalar register allocation by removing weighted edges in a coloring problem prior to scheduling. However, the conflicts in their initial conflict graph are determined by the sequential ordering of the input code. Also this idea was not applied to groups of scalars.
The Improved Force Directed Scheduling (IFDS) [W. Verhaegh, P. Lippens, E. Aarts, J. Korst, J. van Meerbergen, A. van der Werf, Improved Force-Directed Scheduling in High-Throughput Digital Signal Processing, IEEE Transactions on CAD and Systems, Vol.14, No.8, August 1995.] shows a method wherein scheduling intervals are gradually reduced until the desired result is obtained. The cost function used to determine which scheduling interval has to be reduced at each iteration only takes the number of parallel data accesses to reduce the required memory bandwidth into account. (I)FDS does not take into account which data is being accessed. Balancing the number of simultaneous data accesses is a local optimization which can be very bad globally. In IFDS all data is treated equally, although in practice some simultaneous data accesses are more expensive in terms of memory cost than other. Also the required number of memories cannot be estimated accurately by looking locally only, as is done in IFDS, because all conflicts have to be considered for this.
SUMMARY OF THE INVENTION
In a first aspect of the invention a method and a design system for determining an optimized memory organization of an essentially digital device is presented. The design system may be a suitable computer such as a workstation for carrying out the method. The design system is adapted to carry out each of the method steps. Said method and system exploit a representation, comprising at least data access instructions on groups of scalar signals, of the functionality of said digital device, which is under construction. As the method and system focuses on data transfer and storage, it is sufficient to have a control flow graph representation, although the method is not limited to such representation. For said data access instructions the scheduling intervals are optimized, meaning modified, in order to optimize a certain optimization criterion, with the restriction that the execution of said functionality with said digital device is within a predetermined cycle budget or timing. The method and design system according to the present invention provides sufficient memory bandwidth (parallel memory ports) such that the application can be scheduled within the cycle budget during further digital device design steps. The method and design system according to the present invention solves a Storage-Bandwidth Optimization (SBO) problem. The method and system determines for which data parallel access capabilities should be provided such that the cycle budget can still be met with minimum bandwidth requirements on the memory architecture. These requirements are expressed as conflicts in a conflict graph. Access conflicts may be described as single or intra-cycle conflicts as conflicts not in the same cycle are not necessarily considered as conflicts in accordance with the present invention, i.e. the lifetime of the data is not considered in a first approximation. In said evaluation criterion the conflict cost between basic groups and self-conflicts of basic groups can be weighted separately. The task of SBO is to come up with an optimized conflict graph, allowing the memory allocation and assignment tasks to come up with a cheaper memory architecture with fewer memories and ports. In the method and design system, optimized scheduling intervals are determined by optimizing an extended co
Catthoor Francky
De Man Hugo
Wuytack Sven
Interuniversitaire Micro-Elektronica Centrum (IMEC VZW)
Knobbe Martens Olson & Bear LLP
Niebling John F.
Whitmore Stacy
LandOfFree
Method for determining a storage bandwidth optimized memory... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for determining a storage bandwidth optimized memory..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for determining a storage bandwidth optimized memory... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2905750