Processor Lower Bounds for Array Computations
with Linear Schedules
Peter Cappello and ?Omer E>=gecio>=glu
Department of Computer Science
University of California at Santa Barbara
Using a directed acyclic graph (dag) model of algorithms, this paper treats a problem related to precedence-constrained multiprocessor schedules for array computations: Given a dag and a valid linear schedule for it, compute a lower bound on the number of processors required by the schedule. This is an important practical problem: A good schedule uses time efficiently; a good mapping of nodes to processors uses energy efficiently. The lower bounds obtained are good; we know of no case where they are not exactly tight. Actually, a harder problem is solved: Given a parametrized family of dags and a correspondingly parametrized valid linear schedule, symbolically generate a processor lower bound formula. To visualize the method for computing this formula, the nodes, representing computational tasks, are viewed as a set of lattice points in a convex polyhedron. The number of tasks that are scheduled for execution during any given time step of a linear schedule is the number of non-negative integer solutions to a set of linear Diophantine equations parametrized by the number of nodes of the dag. An algorithm, based on construction of generating functions, is presented for computing these numbers. This is the first such algorithm to be reported. Using this algorithm and a symbolic algebra package, formulas for processor lower bounds are obtained automatically.
The algorithm has been implemented as a Mathematica program. Example runs for MatrixVector Product, and Triangular Matrix Product problems are given. While widely applicable and elegant, it has worst case exponential running time. This is not surprising however; the simpler computation: Are any processors scheduled for a particular time step?", which is equivalent to Is a particular coefficient of the generating function nonzero?" is already known to be an NP-complete problem.
Keywords: Parallel processing, array computation, lower bound, linear schedule, Diophantine equation, generating function.
We consider array computations, often referred to as systems of uniform recurrence equations . Parallel execution of uniform recurrence equations has been studied extensively, from at least as far back as 1966 (see, e.g., [8, 11, 10, 1, 17, 2, 3, 18, 19, 4]). In such computations, the tasks to be computed are viewed as the nodes of a directed acyclic graph, where the data dependencies are represented as arcs. Given a dag G = (N; A), a multiprocessor schedule assigns node v for processing during step o(v) on processor ss(v). A valid multiprocessor schedule is subject to two constraints:
Causality: A node can be computed only when its children have been computed at previous steps:
(u; v) 2 A ) o(u) < o(v): (1)