Efficient Oblivious Parallel Sorting on the MasPar MP-1 ?
Heinz Nixdorf Institute
33095 Paderborn, Germany
International Computer Science Institute
1947 Center Street
Berkeley, CA 94704-1198, USA
We address the problem of sorting a large number N of keys on a MasPar MP-1 parallel SIMD machine of moderate size P where the processing elements (PEs) are interconnected as a toroidal mesh and have 16KB local storage each. We present a comparative study of implementations of the following deterministic oblivious sorting methods: Bitonic Sort, Odd-Even Merge Sort, and FastSort. We successfully use the guarded split&merge operation introduced by R?ub. The experiments and investigations in a simple, parameterized, analytical model show that, with this operation, from a certain ratio N=P upwards both Odd-Even Merge Sort and FastSort become faster on average than the up to the present fastest, sophisticated implementation of Bitonic Sort by Prins. Though it is not as efficient as Odd-Even Merge Sort, FastSort is to our knowledge the first method specially tailored to the mesh architecture that can be, when implemented, competitive on average with a mesh-adaptation of Bitonic Sort for large N=P .
The problem. Sorting is one of the most investigated problems in computer science. In the area of parallel computing, sorting is also a classical topic. Its roots can be traced back to the Fifties [9, p. 244]. Richards' bibliography  covers the extensive literature until 1986. In many parallel algorithms, parallel sorting is one of the subroutines that determine the overall performance. E. g., it is used in applications like the computation of convex hulls, parallel data bases, and certain image-processing methods, to name a few. Many parallel sorting circuits and other sorting methods on networks are described in Leighton's book .
One type of parallel computer is the SIMD (Single Instruction, Multiple Data) machine, in contrast to the MIMD (Multiple Instruction, Multiple Data) type of machines. In SIMD machines, all processing elements (PEs) execute in one time step on their local data the same operation that is sent to them by an external control unit or are idle. Typically, SIMD machines have a large number of PEs, but the amount of local memory is modest. The time to set up a communication between PEs is usually small compared to the set-up times on MIMD machines. Well-known examples of SIMD machines are the MasPar MP-1 and the Thinking
?This work is supported by the DFG Sonderforschungsbereich 376: "Massive Parallelit?at: Algorithmen,
ySupported by DFG Leibniz Grant Me 872/6-1 and by EU ESPRIT Long Term Research Project 20244 (ALCOM-IT). On leave from Dept. of Mathematics and Computer Science, Paderborn University, 33095 Paderborn, Germany, email: firstname.lastname@example.org.