Measuring and Analyzing Parallel Computing Scalability ?
Xiaodong Zhang Yong Yan Qian Ma
High Performance Computing and Software Laboratory
The University of Texas at San Antonio
San Antonio, Texas 78249
In this paper we compare a group of existing scalability metrics and evaluation models through experiments and analyses. This work is mainly motivated by an experimental metric using network latency to measure and evaluate the scalability of parallel programs and architectures. Experiments of running a large physics simulation program and its scaling variations on the KSR-1 multiprocessor have been intensively conducted in this study. The scalability data are collected and visualized by an integrated software environment, called S-Graph. Our objectives are to address fundamental issues of parallel computing scalability, such as evaluation models, analytical and experimental metrics, and measurement methodology; to examine latency patterns inherent in the architecture and the program and their effects on scalability; and to evaluate relative merits and limits of several existing metrics. Our comparisons indicate that both time-bound and memory-bound models have their limitations for evaluating scalability performance, while efficiency-based model could obtain reasonably precise measurements.
Scalability metrics measure the ability of a parallel architecture to increase the number of processors for solving a problem of a increasing size where the parallelism of a given algorithm has already been effectively exploited. An evaluation of the scalability can also be used to predict the performance of large problems on large systems, based on the performance of small problems on small systems. A rigorous scalability definition and metric provides an important guideline for precisely understanding the nature of the scalability,
?This work is supported in part by the National Science Foundationunder research grants CCR-9008991, CCR-9102854, and under instrumentation grant DUE-9250265, by the U.S. Air Force under research agreement FD-204092-64157, and by a visiting scholar Fellowship from the United Nations Children's Fund. Part of the experiments were conducted on the KSR-1 machines at Cornell University, and at the University of Washington.
and for effectively measuring the scalability in practice. Isoefficiency  and Isospeed  are two useful scalability metrics. The former evaluates the performance of an algorithm-machine combination through modeling an isoefficiency function. The latter evaluates the performance of an algorithm-machine combination through measuring the workload increment with a change of the machine size under the condition of the isospeed. Isoefficiency is considered to be an analytical method for algorithm scalability evaluation. Although the isospeed metric is an experimental metric, it may be unable to measure certain real machine factors in practice. In this paper, we first give an overview of an experimental metric using network latency for measuring and evaluating parallel program and architecture scalability .
In the process of evaluating the scalability using a
metric, different models may be applied in the parallel
program execution measurement. The major models
include problem size bound model, memory-bound
model, time-bound model and efficiency-conserved
model. Here we briefly introduce these measurement
Problem size bound scaling measurement. This basic measurement method quantitatively observes the execution performance changes of a fixed size problem as the number of processors increases . Amdahl's Law roughly models the performance of this measurement,
T (p) = ts + tp
where the execution time T (p) is divided into two parts: ts is the amount of time spent on serial parts of a program, tp is the amount of time spent on parts of the program that can be executed in parallel, and p is the number of processors used for the execution. This model indicates that as the number of processors is increased, the execution time eventually hits a minimum, after which adding processors can only cause the program to take a longer time to complete. The maximum number of processors used to achieve the minimum execution time for the same program is different on different shared-memory architectures. Thus, architecture scalabilities of different machines can be distinguished by running the same program on different machines, based on this basic measurement