Published in: Proceedings of Performance Evaluation of Parallel Systems PEPS'93, pages 94?101, University of Warwick, UK, 1993
Performance Evaluation of Parallel Programs on the
Data Diffusion Machine
Paul W.A. Stallard Henk L. Muller David H.D. Warren
Department of Computer Science, University of Bristol.
Working at: PACT/SRF 10 Priory Road, Bristol. BS8 1TU, UK.
A tool set for the monitoringand performance evaluation of parallel programs has been developed for the Data Diffusion Machine?a virtual shared memory architecture. The tool set has a layered structure, allowing the user to observe the machine at various levels of detail. The tools are built on top of a software emulation of the DDM. This emulator provides realistic timings because certain parts of the emulator are artificially slowed down. This gives us the time to extract highly detailed statistics at run time without disturbing program execution.
The tool set consists of a number of low level interrogation tools and an X-windows based graphical interface. The graphical display shows the essential characteristics of the DDM, such as the hit rate and the load of each processor, during program execution. On request, the usage of individual memory items is monitored and displayed in an animated histogram. This facility has proved very useful for identifying problems with some application programs, an example of which is included as a case study at the end.
The lack of quality evaluation tools makes the optimisation of parallel programs very difficult. A programmer can easily measure how fast the program executes, or its efficiency on a certain number of processors, but the exact reason why the program does not run faster is quite often unclear. A tool that can unravel (some of) these reasons can prove very valuable, as shown by the success of tools like gprof [Graham82] used for optimising sequential programs.
The problems of developing comparable tools for parallel systems are well known [Hayes91]: it is harder to get the information out of the machine; the amount of statistical data that can be obtained is orders of magnitude higher because of the multiple nodes (especially when trying to evaluate bulk parallel systems); and the complex structure of the statistical data. Dependencies between nodes, communication overheads and the interconnection network (both in shared and distributed memory machines) all add to this problem.
In this article we present a set of tools developed to assess the performance of programs runningon the Data Diffusion Machine, a virtual shared memory machine designed to run efficiently with hundreds of nodes. Although the DDM has not yet