A Scalable MIMD Volume Rendering Algorithm
Craig M. Wittenbrink Michael Harrington
Dept. of Electrical Engineering, Applied Physics Laboratory University of Washington University of Washington Seattle, WA 98195 Seattle, WA 98105 e-mail: email@example.com firstname.lastname@example.org
Volume rendering is a compute intensive graphics algorithm with wide application. Researchers have sought to speed it up using parallel computers. Our new algorithm distributes the data for storage efficiency, avoids bottlenecks, and scales to more processors than rays. The main contribution is explicit partitioning of the input volume for higher memory utilization, while retaining viewpoint freedom and speedup. The largest volumes processed on our MIMD (multiple instruction multiple data) machine (Proteus) are 512x512x128 voxels (32 Mbytes). Performance measurements show a speedup of 22 over our sequential code on 32 Intel i860 processors. We have used no preprocessing or data dependent optimization. The efficiency results from nonconflicting communication, a permutation warp, that remains efficient with larger data sets, larger parallel machines, and high order filters showing scalability can be achieved through object space partitioning.
Keywords: Parallel Algorithms, Object Space Partitioning, Permutation Warping.
Volume rendering  is transparency visualization of sampled 3D data. Inputs are point samples, or volume elements called voxels. A variety of applications create voxels such as magnetic resonance imaging and finite element analysis. Most applications are calculation intensive because of the size of the data sets (up to one billion voxels), and even the simplest calculations on workstations take too much time for interactive visualization. Volume rendering is described fully elsewhere .
Parallelism is difficult to use because partitioning may result in lower quality (See  for filter comparison), or large memory requirements  (and  where shared memory requires large caches and high cache overhead). We have developed a practical solution that does not sacrifice quality or memory. In this paper we discuss the performance attained on the Proteus Supercomputer  and
how to maintain full parallelism on scalable MIMD (multiple instruction multiple data) machines. We also give a brief overview of Proteus, the parallel software development environment, and existing methods for parallel volume rendering.
2 MIMD Volume Rendering Algorithm
In data parallel machines such as the CM-2 and MasPar MP-1, our preferred solution is a one step warping phase to align to the view direction , but in high granularity machines such as Proteus  small messages create network congestion. Instead we use an algorithm that takes advantage of the high virtualization ratio and sends large messages. The approach of sending large messages, using object space partitioning, and evaluating compositing by parallel product is new for a MIMD volume rendering algorithm. Paper reviewers have also made us aware of recent results of others using large messages , object space partitioning , and parallel compositing . Our approach uses an innovative resampling step without communication congestion that scales in input and machine size.
The resampling is performed by permutation warping. We have worked on permutation warping for data parallel machines , and this paper extends that work to MIMD (multiple instruction multiple data) machines. Permutation warping is better than previous parallel resampling algorithms because it is simultaneously memory efficient, processor efficient, general, and accurate. The algorithm requires a strong interconnection network, which many parallel computers including Proteus provide. Permutation, or one-to-one, communication is used to communicate resampled data from object space to screen space aligned frames. Resampling is required for volume rendering because the input voxels are a discrete sampling of a continuous function. The resampling is an interpolation to find the values in between the provided voxel values. Imagine looking through a lattice of points. Each pixel for the image resulting from that view is a ray