System and method for automatically optimizing heterogenous...

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C717S143000, C709S241000

Reexamination Certificate

active

06539542

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a system and method for optimizing software performance, and more particularly to using evolutionary algorithms to automatically allocate application processing in complex multiprocessor environments.
DESCRIPTION OF RELATED ART
Allocating application processing among the various processors in complex multiprocessor systems to achieve optimum performance is an extremely difficult problem. When developing applications for multi-processing systems, the software design is driven both by algorithmic considerations and by system performance considerations. With modern multi-stage interconnect systems (e.g., Mercury's RACEWAY, Sky's SkyChannel, and the Intel Paragon) the effect of inter-processor data routing on system performance is often difficult, if not impossible to predict, particularly for large systems.
As an example, consider the Simple Signal Processing (SSP) application
100
of FIG.
1
. SSP application
100
includes a plurality of “matched” filters
110
, a plurality of 2-D Fast Fourier Transform (FFT) beamformers
120
, and a plurality of detectors
130
. The output of every processor in each stage is broadcast to all processors in the next stage via data broadcasting functions
140
and
150
. Data from M channels are fed through a simple time domain matched filter
110
using an overlap-add FFT technique (FFT length N). The overlap-add technique is a conventional signal processing technique as discussed in
Digital Signal Processing
by Alan V. Oppenheim and Ronald W. Schafer, Prentis Hall, Copyright 1975, incorporated herein by reference. All channels of each sample time are then fed or broadcast, via data broadcasting function
140
to a simple 2-D FFT beamformer
120
, producing B beams. The beam data is then broadcast via data broadcasting function
150
to a simple threshold detector
130
that sets thresholds in each beam by examining adjacent beams.
Mapping this algorithm onto P processors involves the consideration of several tradeoffs, such as the size of the individual tasks, the relative speed of the processors, the interconnect, and the location of processors on the interconnect topology. Task size is chosen to maximize processor usage and minimize idle time. The ability to keep these processors busy is driven by the ability of the interconnect to move input and output data from functional units operating on different processors. Balancing I/O requirements and task processing size requires balancing utilization of the interconnection bandwidth and the processor execution speed. Proper placement of functional components on the multiprocessor can reduce the amount of communication necessary by confining intermediate results to a single processor. Many modem multi-processor connection circuitry divide the processor pool into sub-groups in such a way that communication with processors that are closely located is preferable (in terms of contention and speed) to communication with more remote processors. This affects function placement by imposing a preferred assignment so that functions that must share data are located close to one another.
Using the SSP application
100
as an example to illustrate the tradeoffs inherent in the processor placement, divide the filter processing into M parallel filters. Similarly, divide the beamformers
120
into S tasks, each generating all the beams for one time sample. Finally, break the detectors
130
into B parallel tasks, again one per beam. Thus, we would have M+S+B=T parallel tasks.
Several problems arise with this assignment scheme. First, the samples from each of the M matched filter
110
tasks must be routed to all of the B beamformer
120
tasks. Additionally, since many multiprocessor connection circuits are most efficient transferring contiguous blocks of data (as opposed to single data words), for certain systems it will be far more efficient to have the beamformer
120
tasks work on a contiguous block of samples (as opposed to single time slices). To simplify programming, this block size is often tied to the size of the FFT used for the matched filter
110
convolution (usually N divided by a power two), and is chosen to correspond to an efficient transfer block size. Therefore, the number of parallel tasks for beamforming becomes N/2
i
, where i is a design parameter. Finally, the output of the beamformer
120
tasks must each be sent to some of the B detectors
130
, since these detectors use neighboring beams for threshold setting (in the most extreme case, all the beams would be used for each detection). Again, for performance and programming simplicity, this is best done in blocks.
An engineer with some knowledge of the system can make reasonable estimates for allocating application processing to achieve optimum performance given these processing parameters. However, optimum selections of these values will vary from machine to machine, and usually have to be determined by manually timing the tasks and hand-tuning the values. The difficulty of the problem rapidly increases with the number of processors that need to be allocated. With many modem applications requiring hundreds of processors, the manual approach is beyond the capability of even the most skilled engineer.
SUMMARY OF THE INVENTION
A method consistent with the present invention optimizes the allocation of processes among a plurality of processors in a target system. In operation, the method first generates a random population of trial allocations of the processes among the plurality of processors in a target system. Next, a fitness score is determined for each of the trial allocations. Then, a new trial allocation is created based on at least one of the trial allocations in the existing population. Finally, the system identifies the trial allocation having the best score.
Additional desires, features and advantages of the invention are set forth in the following description, apparent from the description, or may be learned by practicing the invention.
Both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.


REFERENCES:
patent: 4590555 (1986-05-01), Bourrez
patent: 5103393 (1992-04-01), Harris et al.
patent: 5283897 (1994-02-01), Georgiadis et al.
patent: 5301324 (1994-04-01), Dewey et al.
patent: 5452461 (1995-09-01), Umekita et al.
patent: 5590063 (1996-12-01), Golio et al.
patent: 5745757 (1998-04-01), Lecourtier
patent: 5822570 (1998-10-01), Lacey
patent: 5838968 (1998-11-01), Culbert
patent: 5848402 (1998-12-01), Pao et al.
patent: 6009507 (1999-12-01), Brooks et al.
patent: 6078945 (2000-06-01), Hinsley
patent: 6112023 (2000-08-01), Dave et al.
patent: 6185601 (2001-02-01), Wolff
patent: 6205465 (2001-03-01), Shoening et al.
patent: 6321317 (2001-11-01), Boronsky et al.
patent: 6347366 (2002-02-01), Cousins
Oppenheim, Alan V., et al., “Digital Signal Processing,” Prentice-Hall, Inc. (1975).

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for automatically optimizing heterogenous... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for automatically optimizing heterogenous..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for automatically optimizing heterogenous... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3045291

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.