Data processing: software development – installation – and managem – Software program development tool – Translation of code
Reexamination Certificate
2000-02-08
2003-09-16
Nguyen-Ba, Hoang-Vu Anthony (Department: 2122)
Data processing: software development, installation, and managem
Software program development tool
Translation of code
C717S106000, C717S119000, C717S160000, C712S028000, C712S032000
Reexamination Certificate
active
06622301
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to a method of generating by a paralleling computer a parallel program from a source program, and in particular, to a parallel program generating method capable of optimizing data locality using data distribution and a recording media on which a program of the method is stored.
As a method of a logically shared, physically distributed memory for a distributed shared memory parallel computer, there has been a method in which a virtual memory space to be logically shared among a plurality of processors (nodes) is subdivided into units called pages such that the pages are allocated to physically distributed memories of respective processors. To determine allocation of pages to the processors, there have been known two methods as follows.
A first data distribution method is called first touch method in which when data is first referred to, a page including the data is distributed to a memory of a processor which refers to the data.
In a second data distribution method, a data distribution indicating statement or sentence is explicitly used to specify a data distribution format.
Assume, for example, a sequential execution source program
11
shown in
FIG. 9
is inputted. Assume that the system includes a distributed shared memory parallel computer including four processors and a page size is five array elements. Array elements are allocated to processors Pe
0
to Pe
3
according to first touch data distribution. Elements of array A are first referred to by a processor in an initialization loop (lines
23
to
25
of
FIG. 9
) of procedure init. Therefore, the elements of array A, i.e., A(
1
:
25
), A(
26
:
50
), A(
51
:
75
), and A(
76
:
100
) are allocated to pe
0
to pe
3
, respectively. In this connection, pe
0
to pe
3
represent processors
0
to
3
, respectively.
When the array elements are simply allocated according to an initialization loop first referred to by a processor as above, the data is distributed such that the elements
1
:
100
are equally distributed, i.e., 25 elements are distributed to each of processors pe
0
to pe
3
.
On the other hand, when a data distribution indicating statement “c$distibute A(block)” is inserted in a program declarative section of a sequential execution source program (e.g.,
1
:
25
,
26
:
50
,
51
:
75
, and
76
:
100
are specified in lines
4
to
7
of
FIG. 11
, which will be described later), the data are equally distributed to processors pe
0
to pe
3
in the same way as for FIG.
10
A.
The data distribution method of the first touch scheme and that using the data distribution indicating statement have been described, for example, in pages 334 to 345 of “Data Distribution Support on Distributed Shared Memory Multiprocessors” written by Rohit Chandra, Ding-Kai Chen, Robert Cox, Dror E. Maydan, Nedeljkovic, and Jennifer M. Anderson (Sigplan'97 Conference on Programming Language Design and Implementation (PLDI) Las Vegas, Nev., Jun. 15-18, 1997).
In the simple first touch data distribution method described above, if a data access pattern in the initialization loop does not match that in a kernel loop (a loop requiring a longest execution time among the loops in the entire program), when a parallel program obtained by converting a sequential execution source program is executed, data locality in the kernel loop is deteriorated. In the simple first touch scheme, this consequently is one of the causes which hinder improvement of the parallel program processing speed. For example, in a situation in which a program is equally distributed to four processors pe
0
to pe
3
as shown in
FIG. 10A
, when a subroutine of a kernel loop in which variable i in lines
33
to
35
of
FIG. 9
ranges from one to 60 for repetitious processing is 10000 times repeatedly executed, if the elements of array A are not entirely allocated to the respective memories of processors pe which execute the processing, it is necessary to access a faraway memory location to acquire the elements. This resultantly lowers the processing speed.
Moreover, in the data distribution method using the simple data distribution indicating statement, there possibly exists data distribution which cannot be easily expressed by an indicating statement. Therefore, data cannot be optimally distributed. In such a situation, when the simple data distribution indicating statement is used, data locality is possibly deteriorated. This results in one of causes which prevent improvement of the processing speed of the parallel program generated.
For example, when sequential execution source program
11
shown in
FIG. 9
is inputted to a compiler and is converted into a parallel program, if there are four processors and the first touch data distribution is adopted, elements of array A are allocated as shown in
FIG. 10A
by an initial loop (lines
23
to
25
of
FIG. 9
) of procedure init which first refers to array A. Namely, A(
1
:
25
), A(
26
:
50
), A(
51
:
75
), and A(
76
:
100
) are allocated to pe
0
to pe
3
, respectively. However, a kernel loop (lines
33
to
35
of
FIG. 9
) of procedure kernel refers to array A in the following ranges, i.e., A(
41
:
55
), A(
56
:
70
), A(
71
:
85
), and A(
86
:
100
) for pe
0
to pe
3
, respectively. As can be seen from
FIG. 10C
, (
41
:
70
) and (
76
:
85
) of array A are data reference objects assigned to another processor, namely, are associated with remote reference (R). Resultantly, 66.7% of all data reference is made through the remote reference (R). In the situation of
FIG. 10B
, local reference (L) to access data allocated to own processor takes place quite little, namely, only the entire data of processor pe
3
and part of data of processor pe
2
are accessed by local reference (L) In the data allocation employing a simple data distribution indicating statement, it is difficult to indicate data distribution shown in FIG.
10
B.
SUMMARY OF THE INVENTION
It is therefore an object of the present invention to provide a parallel program generating method in which data is optimally distributed by the kernel loop to thereby improve data locality to increase the processing speed of the parallel program.
To achieve the object in accordance with the present invention, there is provided a parallel program generating method in which loops to be paralleled are detected and then a kernel loop is detected in the loops. Next, a first touch control code is generated and then the code is placed before a first execution loop of a main program, for example, before a first position of execution statements of the main program or the code is placed immediately before the kernel loop to thereby produce a parallel program. By this operation, when sequential execution source program
11
of
FIG. 9
is inputted to a compiler, A(
1
:
25
) and A(
41
:
55
) are allocated to pe
0
, A(
26
:
40
) and A(
56
:
70
) are allocated to perl, and A(
71
:
85
) and A(
86
:
100
) are respectively allocated to pe
2
to pe
3
as shown in FIG.
10
D. This improves data locality in the kernel loop and can resultantly increases the parallel program processing speed.
Additionally, in the parallel program generating method of the present invention, it is also possible that profile information, compiler static analysis information, or user indication information is obtained to generate a first touch control code such that a parallel program is generated by placing the code, for example, at a first position of execution statements.
Moreover, in the parallel program generating method of the present invention, it is also possible that profile information, compiler static analysis information, or user indication information is obtained to produce a page allocation information to generate a parallel program in which the page allocation information is inserted.
First, description will be given of terms used in the following embodiments and a correspondence thereof to drawings.
{circle around (1)} A paralleling compiler (
2
of
FIG. 10
is a compiler which receives as an input thereto a sequential execution source program (
1
Hirooka Takashi
Iitsuka Takayoshi
Kikuchi Sumio
Ohta Hiroshi
Antonelli Terry Stout & Kraus LLP
Hitachi , Ltd.
Nguyen-Ba Hoang-Vu Anthony
LandOfFree
Parallel program generating method does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Parallel program generating method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Parallel program generating method will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3009506