
Efficient Oblivious Parallel Sorting on the MasPar MP1 ?
Klaus Brockmann
Heinz Nixdorf Institute
Paderborn University
33095 Paderborn, Germany
email: brockm@hni.unipaderborn.de
Rolf Wanka
y
International Computer Science Institute
1947 Center Street
Berkeley, CA 947041198, USA
email: wanka@icsi.berkeley.edu
Abstract
We address the problem of sorting a large number N of keys on a MasPar MP1 parallel
SIMD machine of moderate size P where the processing elements (PEs) are interconnected
as a toroidal mesh and have 16KB local storage each. We present a comparative study of
implementations of the following deterministic oblivious sorting methods: Bitonic Sort,
OddEven Merge Sort, and FastSort. We successfully use the guarded split&merge operation
introduced by R?ub. The experiments and investigations in a simple, parameterized,
analytical model show that, with this operation, from a certain ratio N=P upwards both
OddEven Merge Sort and FastSort become faster on average than the up to the present
fastest, sophisticated implementation of Bitonic Sort by Prins. Though it is not as efficient
as OddEven Merge Sort, FastSort is to our knowledge the first method specially
tailored to the mesh architecture that can be, when implemented, competitive on average
with a meshadaptation of Bitonic Sort for large N=P .
1 Introduction
The problem. Sorting is one of the most investigated problems in computer science. In the area of parallel computing, sorting is also a classical topic. Its roots can be traced back to the Fifties [9, p. 244]. Richards' bibliography [14] covers the extensive literature until 1986. In many parallel algorithms, parallel sorting is one of the subroutines that determine the overall performance. E. g., it is used in applications like the computation of convex hulls, parallel data bases, and certain imageprocessing methods, to name a few. Many parallel sorting circuits and other sorting methods on networks are described in Leighton's book [10].
One type of parallel computer is the SIMD (Single Instruction, Multiple Data) machine, in contrast to the MIMD (Multiple Instruction, Multiple Data) type of machines. In SIMD machines, all processing elements (PEs) execute in one time step on their local data the same operation that is sent to them by an external control unit or are idle. Typically, SIMD machines have a large number of PEs, but the amount of local memory is modest. The time to set up a communication between PEs is usually small compared to the setup times on MIMD machines. Wellknown examples of SIMD machines are the MasPar MP1 and the Thinking
?This work is supported by the DFG Sonderforschungsbereich 376: "Massive Parallelit?at: Algorithmen,
Entwurfsmethoden, Anwendungen".
ySupported by DFG Leibniz Grant Me 872/61 and by EU ESPRIT Long Term Research Project 20244
(ALCOMIT). On leave from Dept. of Mathematics and Computer Science, Paderborn University, 33095
Paderborn, Germany, email: wanka@unipaderborn.de.