System and method for automatically optimizing software...

Electrical computers and digital processing systems: memory – Storage accessing and control – Memory configuring

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C711S005000, C711S156000, C709S241000, C709S241000

Reexamination Certificate

active

06347366

ABSTRACT:

BACKGROUND OF THE INVENTION
A. Field of the Invention
The present invention relates generally to optimizing memory allocation, and more particularly to using evolutionary algorithms to automatically optimize the memory allocation of a software application in a multiple memory architecture.
B. Description of the Related Art
Allocating application processing to various processors and memory banks in complex embedded multiprocessor systems to achieve optimum performance is an extremely difficult problem. When developing a real-time application on a multi-processor non-uniform memory access (NUMA) system, there are a number of factors that can affect application execution speed including processor types, memory types, concurrent utilization of memory and bus resources, and hierarchical interconnection schemes. These factors all affect application execution in ways that are both difficult to model and to predict. Even when these effects can be measured, it can be difficult for a programmer to use this information to minimize the combined effect of all the factors on execution time.
FIG. 1
shows an example of a multi-processor NUMA system
100
, such as a VME based quad Texas Instruments C
40
Digital Signal Processor (DSP) board manufactured by Mizar Inc. (Model MZ7772), on which an application may be run. System
100
includes a plurality of ALU processor cards
110
each having an ALU
112
, a small and very fast on chip static RAM (SRAM)
114
, a private, small and fast memory
116
and a shared, small and fast memory
118
. A buffer
120
interconnects each of the cards
110
to a local bus
130
. A large, slow shared memory
140
, such as a global SRAM or hard disk drive, is coupled directly to local bus
130
. System
100
also includes an asynchronous interface
150
, which couples local bus
130
with a VME bus
160
to which a very large and slow memory
170
is connected.
As shown in
FIG. 1
, the memory hierarchy in system
100
includes four distinct banks of memory, each with different performance levels including small and very fast on chip SRAM
114
, small but not as fast memories
116
and
118
, large and slow shared memory
140
, and very large slow globally shared memory
170
. In an architecture such as system
100
in
FIG. 1
, the choices a programmer makes in selecting where data and intermediate results are stored can have a significant impact on program performance, often by an order of magnitude or more. DSPs can use, for example, the Harvard bus architecture, which has multiple buses that allow a CPU to retrieve data from two different memory banks in parallel. Furthermore, instructions are used that can fetch two pieces of data and perform a mathematical calculation all in one clock cycle.
In a simple implementation of a DSP algorithm, only one bank of memory is used for both the program code and data. This implementation exhibits poor performance due to excessive contention for the single memory bank. The situation can be exacerbated by the use of on-chip direct memory access (DMA) engines, which are commonly used to move input and output data between processors and I/O subsystems and can cause further contention on the various data buses available to the system.
Performance can be improved by separating data that is retrieved in parallel into different banks. Another option is to place the buffers that require the greatest amount of access into the fastest memory banks, although these fast memory banks may not be large enough to allow this allocation in some memory-restricted architectures. Thus, the buffer allocation problem can be viewed as a combination of an optimal packing problem and a contention reduction problem.
The manual approach to optimizing the data buffer allocation is commonly done by trying various placements of buffers to different memory banks and measuring the resulting execution time of the software. Many iterations are usually necessary to optimize performance. The difficulty of the problem rapidly increases with the number of buffers that need to be allocated. With many modem applications requiring hundreds of buffers, the manual approach is beyond what can be effectively solved by a programmer, requiring the use of sub-optimal techniques such as local optimization of key execution loops, rather than the entire program.
SUMMARY OF THE INVENTION
A method consistent with the present invention for optimizing the allocation of data buffers referenced by a software program among a plurality of memory banks in a target system generates randomly a population of trial allocations of the data buffers among the plurality of memory banks, determines a fitness score for each of the trial allocations, creates a new trial allocation based on at least one of the trial allocations in the existing population, repeats the determining and creating steps until the occurrence of a predetermined event, and identifies the trial allocation having the best fitness score.
Both the foregoing general description and the following detailed description provide examples and explanations only. They do not restrict the claimed invention.


REFERENCES:
patent: 5319781 (1994-06-01), Syswerda
patent: 5586285 (1996-12-01), Hasbun et al.
patent: 5682328 (1997-10-01), Roeber et al.
D. Cousins et al., “Multiprocessor Event Collection and Analysis (MECA) System and Embedded Genetic Allocator (EGA) for Advanced Embedded Architectures,” DARPA AO F277, pp. 1-35, Oct. 8, 1997.
“Multiprocessor Event Collection and Analysis System and Embedded Genetic Allocator for Advanced Embedded Architectures,” Award/Contract, F30602-97-C-0296, Sep. 9, 1997.
“Multiprocessor Event Collection and Analysis System and Embedded Genetic Allocator for Advanced Embedded Architectures,” Proposal Abstract, Dec. 1996.
“Multiprocessor Event Collection and Analysis (MECA) System and Embedded Genetic Allocator (EGA) for Advanced Embedded Architectures,” Response to DARPA/ITO BAA 97-06, Feb. 7, 1997.
D. Cousins et al., “The Embedded Genetic Allocator—A System to Automatically Optimize the Use of Memory Resources in High Performance, Scalable Computing Systems,” Conference Proceedings of the 1998 IEEE International Conference on Systems, Man and Cybernetics, Oct. 11-14, 1998.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for automatically optimizing software... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for automatically optimizing software..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for automatically optimizing software... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2953188

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.