page 1  (12 pages)
2to next section

Expanding the text here will generate a large amount of data for your browser to display

Evaluation of Release Consistent

Software Distributed Shared Memory on

Emerging Network Technology

Sandhya Dwarkadas, Pete Keleher, Alan L. Cox, and Willy Zwaenepoel

Department of Computer Science

Rice University ?


We evaluate the effect of processor speed, network characteristics, and software overhead on the performance of release-consistent software distributed shared memory. We examine five different protocols for implementing release consistency: eager update, eager invalidate, lazy update, lazy invalidate, and a new protocol called lazy hybrid. This lazy hybrid protocol combines the benefits of both lazy update and lazy invalidate.

Our simulations indicate that with the processors and networks that are becoming available, coarse-grained applications such as Jacobi and TSP perform well, more or less independent of the protocol used. Mediumgrained applications, such as Water, can achieve good performance, but the choice of protocol is critical. For sixteen processors, the best protocol, lazy hybrid, performed more than three times better than the worst, the eager update. Fine-grained applications such as Cholesky achieve little speedup regardless of the protocol used because of the frequency of synchronization operations and the high latency involved.

While the use of relaxed memory models, lazy implementations, and multiple-writer protocols has reduced the impact of false sharing, synchronization latency remains a serious problem for software distributed shared memory systems. These results suggest that future work on software DSMs should concentrate on reducing the amount of synchronization or its effect.

1 Introduction

Although several models and algorithms for software distributed shared memory (DSM) have been published, performance reports have been relatively rare. The few performance results that have been published

?This work was supported in part by NSF Grants CCR-9116343 and CCR-9211004, Texas ATP Grant No. 0036404013 and by a NASA Graduate Fellowship.

consist of measurements of a particular implementation in a particular hardware and software environment [3, 5, 6, 13]. Since the cost of communication is very important to the performance of a DSM, these results are highly sensitive to the implementation of the communication software. Furthermore, the hardware environments of many of these implementations are by now obsolete. Much faster processors are commonplace, and much faster networks are becoming available.

We are focusing on DSMs that support release consistency [9], i.e., where memory is guaranteed to be consistent only following certain synchronization operations. The goals of this paper are two-fold: (1) to gain an understanding of how the performance of release consistent software DSM depends on processor speed, network characteristics, and software overhead, and (2) to compare the performance of several protocols for supporting release consistency in a software DSM.

The evaluation is done by execution-driven simulation [7]. The application programs we use have been written for (hardware) shared memory multiprocessors. Our results may therefore be viewed as an indication of the possibility of porting" shared memory programs to software DSMs, but it should be recognized that better results may be obtained by tuning the programs to a DSM environment. The application programs are Jacobi, Traveling Salesman Problem (TSP), and Water and Cholesky from the SPLASH benchmark suite [14]. Jacobi and TSP exhibit coarsegrained parallelism, with little synchronization relative to the amount of computation, whereas Water may be characterized as medium-grained, and Cholesky as finegrained.

We find that, with current processors, the bandwidth of the 10-megabit Ethernet becomes a bottleneck, limiting the speedups even for a coarse-grained application such as Jacobi to about 5 on 16 processors. With a 100- megabit point-to-point network, representative of the ATM LANs now appearing on the market, we get good speedups even for small sizes of coarse-grained prob-