Method for determining an optimized data organization

Electrical computers and digital processing systems: memory – Storage accessing and control – Control technique

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method for determining an optimized data organization Method for determining an optimized data organization

: 1999-07-23
: 2001-11-27
: Nguyen, Hiep T. (Department: 2187)
: Electrical computers and digital processing systems: memory
: Storage accessing and control
: Control technique

: C711S118000
: Reexamination Certificate
: active
: 06324629
: ABSTRACT:

FIELD OF THE INVENTION
The invention relates to methods for determining a data organization in an essentially digital system with caches.
BACKGROUND OF THE INVENTION
Many software compilers try to come up with the best array layout in memory for optimal cache performance [C. Ancourt, D. Barthou, C. Guettier, F. Irigoin, B. Jeannet, J. Jourdan, J. Mattioli, Automatic data mapping of signal processing applications, Proc. ntnl. Conf. on Applic.-Spec. Array Processors}, Zurich, Switzerland, pp.350-362, July 1997. ], [M. Cierniak, W. Li, Unifying data and control transformations for distributed shared-memory machines, Proc. of the SIGPLAN'95 Conf. on Programming Language Design and Implementation, La Jolla, pp.205-217, February 1995. ], [J. Z. Fang, M. Lu, An iteration partition approach for cache or local memory thrashing on parallel processing, IEEE Trans. On Computers, Vol.C-42, No.5, pp.529-546, May 1993. ] for approach based on compile-time analysis) but they do not try to directly reduce the storage requirements as memory is allocated based on the available variable declarations. Said compilers choose for each array as a whole a column- or row-oriented storage order and potentially some offset, for optimizing the cache performance. Moreover, the number of transfers to large memories or the amount of cache misses is not fully minimized this way. Consequently, there is a big loss in power consumption and overhead cycles (due to off-chip access).
Most of the work related to efficient utilization of caches has been directed towards optimization of the throughput by means of (local) loop nest transformations to improve locality [J. Anderson, S. Amarasinghe, M. Lam, Data and computation transformations for multiprocessors, in 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.39-50, August 1995. ],[N. Manjikian and T. Abdelrahman, Array data layout for reduction of cache conflicts, In Proc. of 8th Int'l conference on parallel and distributed computing systems, September, 1995. ] The most commonly known loop transformation for improving cache usage is loop locking [M. Lam, E. Rothberg and M. Wolf, The cache performance and optimizations of blocked algorithms, In Proc. ASPLOS-IV, pp.63-74, Santa Clara, Calif., 1991.]. Such approaches are focussed on changing the execution order of the application for cache usage optimization.
Some work [P. R. Panda, N. D. Dutt and A. Nicolau, Memory data organization for improved cache performance in embedded processor applications], In Proc. ISSS-96, pp.90-95, La Jolla, Calif., November 1996.] has been reported on the data organization for improved cache performance in embedded processors but they do not take into account a power oriented model. Instead they try to reduce cache conflicts in order to increase the throughput. Such approach is based on merging or clustering arrays found in the application in order to optimize cache performance. K. Pettis and R. C. Hansen, Profile guided code positioning, In ACM SIGPLAN'90 Conference on Programming Language and Design Implementation, pp. 16-27, June 1990. propose basic-block layout heuristics (they pack frequently used looping sequences) as well as procedure layout (they lay together procedures calling each other to minimize cache interference risks). But they still do explore this type of strategy for every individual data. Besides [D. C. Burger, J. R. Goodman and A. Kagi, The declining effectiveness of dynamic caching for general purpose mutliprocessor, Technical Report, University of Wisconsin, Madison, 1261, 1995] and [D. N. Truong, F. Bodin and A. Seznec, Accurate data distribution into blocks may boost cache performance, Technical Report, RISA, Rennes (France), 1996. ], very little has been done to measure the impact of data organization (or layout) on the cache performance. Moreover, said approaches deal with dynamic caching and do not try to influence data organization statically. Hence, the actual cycle time (or the number of iterations) taken to obtain a good data layout is quite large.
One can conclude that in the current state of the art compile-time system-level optimization strategies focussed on cache utilization, aiming at low power, based on static data organization of the main memory and fully exploiting the knowledge of cache parameters and program characteristics, is not found.
AIM OF THE INVENTION
An aim of the invention is to provide a method for determining an optimized data organization in an essentially digital system with caches, such that an optimal cache performance is obtained.
SUMMARY OF THE INVENTION
In a first aspect of the invention a method and a design system for determining an optimized data organization in at least one first memory of an essentially digital system is proposed. Said digital system comprises at least of said first memory and a second memory. Said second memory acts as cache for said first memory. Said optimized data organization is characteristic for an application, to be executed by said digital system. In said method and said design system first a representation of said application is loaded. Said representation comprises at least data access instructions on arrays. Said arrays are partitioned in a plurality (one or more) subarrays. Said subarrays are distributed over said first memory such that an optimal cache performance of said second memory is obtained. Said distribution defines said optimized data organization. With optimal cache performance is meant that the least number of conflict misses is obtained while executing said application on said digital system. Each of said subarrays is located in said first memory at locations with subsequent (contiguous) addresses. Said second memory or cache is characterized by the amount of lines and the size of said lines. Said lines are grouped in sets. Said grouping is characterized by the amount of lines per set, and defines the associativitiy of said cache memory. The product of said amount of lines and the size of said lines defines the cache size. As typically another data organization different from the initial data organization, defined by said representation, is obtained, it can be said that the method modifies said representation, and more in particular the storage order of said arrays in said representation is changed. This should not be confused with modifying the execution order of the data access instructions.
In an embodiment of the invention a condition on said subarray distribution is presented. As the transfer of data from said first to said second memory is typically determined by a modulo operation on the base addresses of said data in the first memory, an optimal cache performance can be obtained by organizing said subarrays in said first memory such that each of the subarrays within an address range being equal to the cache size originate from different arrays.
In some embodiments of the invention methods for determining the size of each subarray is proposed. Said size of each subarray can be proportional with the size of the related array. The proportionality factor is defined by said cache size divided by the sum of the sizes of said arrays. Again as the transfer of data from said first to said second memory is typically determined by a modulo operation on the base addresses of said data in the first memory, the cache performance is optimized by choosing the size of each of said subarrays to be a multiple of said line sizes. The propotionality condition and the multiplicity condition can be merged by stating that the size of each of said subarrays is the nearest rounded number of the size of the related array multiplied with the cache size divided by the sum of the sizes of said arrays. Said rounded number is a multiple of said line sizes. The merging of proportionality and multiplicity results in a possible initial set of sizes of said subarrays. However, further optimization of said sizes can be obtained by determining said sizes via optimi

Affiliated with

Catthoor Francky

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Danckaert Koen

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

De Man Hugo

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kulkarni Chidamber

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

CoWare N.V., Frontier Design BYBA

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Knobbe Martens Olson & Bear LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Nguyen Hiep T.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for determining an optimized data organization does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for determining an optimized data organization, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for determining an optimized data organization will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2584601

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure