the following questions: How can this level of parallelism be expressed conveniently by the programmer? How can this level of parallelism be modeled and implemented by the compiler and runtime system?
To answer the first question, we express programs in Id, a parallel, mostlyfunctional programming language . Parallelism is not specified explicitly by the programmer, but is implicit in that all sub-expressions whose control dependences are satisfied may execute in parallel, subject only to data dependencies. All local variables and most data structures are single-assignment, and synchronization is implicit in access to individual data structure components.
To answer the second question, our compiler uses P-RISC graphs as the principal intermediate language. These are parallel, structured, executable controlflow graphs that model locality in distributed memory parallel machines. They are a suitable target for many source languages, including Id. Parallelism in P-RISC graphs occurs both at a coarse grain, where each procedure and loop iteration is a unit of parallelism that may be distributed across processors, and at a fine grain, within basic blocks. The latter (fine grain parallelism) allows function calls and global accesses to occur in parallel, and allows these operations, which possibly involve remote computation and synchronization, to be overlapped with local computation. These properties support dynamic and irregular parallelism.
In this paper, we describe aspects of the implementation of Id using P-RISC graphs. In Section 2 we briefly describe P-RISC graphs by way of an example recursive Id procedure. In Section 3 we describe the implementation of P-RISC graphs on distributed memory machines. In Section 4 we describe various optimizations. In Section 5.1, we draw comparisons with the TAM approach , which is quite close in spirit to ours. Finally, in Section 6 we close with some remarks about implementation status and plans.
Although we use Id for concreteness, the emphasis in this paper is on the multithreaded implementation of P-RISC graphs. The discussions and techniques are equally relevant for other source languages.
2 P-RISC graphs
To set the stage, we first describe the P-RISC machine model and runtime state, since variables in P-RISC graphs refer to components of the runtime state, and the graphs themselves reflect the locality assumptions of the machine model.
2.1 P-RISC machine model and runtime state
Figure 1 shows the abstract machine model for P-RISC graphs. Each processor can access its attached local memory as usual, and can send messages (nonblocking) to other processors via the communication network. Global memory accesses are split phase: they involve separate request and response messages, may have long latencies, and are not ordered, i.e., a processor can issue multiple requests, and responses may arrive in any order. Global Memory also supports synchronized accesses, which we will describe in more detail later.