Method for automatic parallelization of software

Data processing: software development – installation – and managem – Software program development tool – Translation of code

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method for automatic parallelization of software Method for automatic parallelization of software

: 2000-10-06
: 2004-03-16
: Ingberg, Todd (Department: 2124)
: Data processing: software development, installation, and managem
: Software program development tool
: Translation of code

: Reexamination Certificate
: active
: 06708331
: ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to parallel programs, and specifically, to a method for automatic parallelization of certain programs.
BACKGROUND OF THE INVENTION
Parallel programming is a difficult and expensive process. While tools have been developed to assist with parallelization, these tools still require that the user have a deep knowledge of parallelization—a topic not within the grasp of most users who could benefit from parallelizing, especially when the programs at issue are inherently parallel. Furthermore, the target platforms for these programs is often a specialty multiprocessing computer.
A program which is inherently parallel implements algorithms that lend themselves to straightforward parallel implementation. This type of program does not require use of a specialty multiprocessing computer, and can be effectively implemented on a network of workstations (NOW) instead. In parallelizing programs for a NOW, a number of difficulties arise: communication is not cheap, faults and slowdowns are common, and machines may join and leave the computation at any time.
The most straightforward way to provide for communication in a parallel program is through the use of a shared memory. This is easy to implement on a multiprocessor, because every memory location has a unique id in the global address space. On a NOW, a software layer called a Distributed Shared Memory (DSM) can create an illusion of a physical global address space.
Further, a method known as Two-Phase Idempotent Eager Scheduling (TIES) was created to address the problem of slowdowns and failures, as well as machines joining and leaving computation, when parallelizing on a NOW. The traditional method of parallelization called for division of a sequential application into alternating serial and parallel steps. Each parallel step was composed of a number of independent threads of control that could be executed simultaneously. While serial steps would be executed locally, individual threads of parallel steps were scheduled on available processors. When the last thread terminates, the parallel step is finished. The TIES system instead simultaneously assigns several copies of individual threads to several machines to guarantee termination as long as at least one continues to progress. When the first has successfully reported the results, the other machines are freed for other tasks. In order to guarantee correctness, the starting memory state must be identical for each copy, and thus, this original memory state is preserved until the parallel step has successfully concluded. Not only does this address the issues of failure and slowdown and machines joining and leaving the computation, but also the TIES scheme minimizes communications during the parallel step because the complete memory image for each thread is available at the start of the parallel step.
An embarrassingly parallel program is a program whose execution time is dominated by a small number of coarse-grained parallel loops. The lack of internode communication requirements within the main loops of the programs is ideal for use with the TIES methodology, where no inter-iteration communication is permitted.
Another problem that arises in parallelization is the issue of data dependence, which occurs particularly if two iterations access the same memory location and at least one access is a write. Some dependencies can be satisfied through a technique called privatization, which allocates individual copies of a variable to each loop iteration. However, if an earlier iteration writes a value which is to be read by a later iteration, a flow dependence is indicated, and a loop containing this type of dependence may not be parallelizable. A dependence may also be categorized as either loop-independent, if it exists independently of any loop inside which it is nested, or loop-carried, if not. While a loop-independent dependence does not inhibit parallelization, a loop-carried dependence may inhibit parallelization.
Detecting dependencies has traditionally been done by comparing each possibly overlapping array reference with every other. Array summarization techniques, representing aggregate activity for a set of accesses have been proposed. However, in real world applications, flow dependence has been shown to be undecidable. In part, this is because static analysis cannot completely address analysis issues which arise in complex real-world applications.
The following program illustrates this problem.
main ( int argc, char *argv )
{
extern int *A;
for ( int i = 0; i < 5; i += argc )
A[i] = A[i + atoi(argv[1])];
}
The main loop of this program has different dependencies depending on the runtime value of argv[
1
]. If argv[
1
] is between −5 and −1, there is a loop-carried flow dependence. If argv[
1
] is between 1 and 5, there is a loop-carried anti-dependence, which means an earlier iteration reads a value which is written by a later iteration. If argv[
1
] is 0, there is a loop-independent flow dependence. Other values of argv[
1
] show no dependence. Clearly static analysis may not be able to provide sufficient information to permit parallelization.
Because static analysis is not adequate, various approaches of runtime analysis have been suggested. In the previous example, a runtime test on the value of argv[
1
] Would determine whether a serial or parallel version of the loop was selected. One strategy is to use “inspector loops”, or side-effect-free skeletons of the original loop in which address calculations are performed. The inspector loop executes quickly, and then the pattern of memory accesses can be used to allow or disallow parallelization of the original loop. The drawback to this strategy is that the inspector loop may be difficult or impractical to extract. Another strategy is speculative parallelization, in which the loops are parallelized with possible dependencies, the actual existence of which is checked at run-time. A shadow array is allocated for every array that might contain a flow dependence. As the elements in the original array are accessed, the shadow array records the access type. When the parallel step is over, the access patterns of all iterations can be compared to determine if there is an actual flow dependence. If it turns out to be illegal, a backup copy of the shared memory is used to re-execute the loop sequentially. Drawbacks to this method include mangling of code and high overhead for parallelization of coarse grained loops.
Thus, a method that will automatically parallelize code for implementation on a NOW, while avoiding the problems and drawbacks detailed above is desirable.
SUMMARY OF THE INVENTION
The present invention provides a scalable, automated, network friendly method for building parallel applications from embarrassingly parallel serial programs. The steps of an exemplary method follow: First, the application loops with significant potential parallelism are identified. Second, from the set of loops identified, those loops which can statically be shown to not be parallelizable are disqualified. Next, the program is transformed into a parallel form in which the remaining identified loops are optimistically parallelized and packaged into per-iteration functions. Variables that need to be shared are segregated into a particular section of memory via simple source code transformation. The previous steps are performed by the compiler, which further inserts calls to the runtime system for scheduling and shared memory allocation and set-up.
The server application, then, contains all code outside of any parallel loop as well as calls to the scheduling subsystem that replace the parallelized loop. The client applications contain the loop bodies of the parallelized loops (per-iteration functions) Each client application is constructed to accept a consecutive range of iteration space as well as the iteration step as input parameters.
At run-time, ma

Affiliated with

Schwartz Leon

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

De La Rosa John

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

Ingberg Todd

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Vu Tuan Anh

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for automatic parallelization of software does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for automatic parallelization of software, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for automatic parallelization of software will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3258201

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure