
Parallel Very Fast Simulated Reannealing by Temperature Block
Partitioning ?
Sandra G. Dykes and Bruce E. Rosen
Division of Mathematics, Computer Science, and Statistics
The University of Texas at San Antonio
San Antonio, Texas 78249
Abstract
This work describes a parallel implementation of Very Fast Simulated Reannealing (VFSR), an advanced simulated annealing method for optimization of nonlinear, multidimensional functions with large numbers of local minima. Parallel VFSR speedups on a CM2 connection machine are reported for eight functions: De Jong's test suite, a 10D parabolic function, and two multimodal, highly nonlinear functions. Within the test set, the function characteristic most affecting parallel VFSR performance is number of optimized function parameters. Low dimension functions profited least from parallelization, exhibiting speedups from 2 to 78 (where speedups are based on number of function evaluation cycles). Speedups for the three 10D cost functions increased to 410, 823 and 1124. On a stochastic high dimensional (D=30) quartic cost function, the cycle ratio was over 19000. We present results of a systematic study of the dimensionality effect on three test functions.
1 Introduction
Simulated annealing (SA) is a probabilistic optimization technique wellsuited to multimodal, discrete, nonlinear and nondifferentiable functions. SA's main strength is its statistical guarantee of global minimization, even in the presence of many local minima. The price one pays for global minimization is execution time: simulated annealing methods are notoriously slow. Hence there is mounting interest in using parallel algorithms and architectures to speed up the annealing process. This work introduces a parallel
?This work is supported in part by a grant from the University of Texas at San Antonio Faculty Research Awards and Ph.D. Strategic Fund.
implementation of the advanced simulated annealing method known as Very Fast Simulated Reannealing (VFSR), which has previously been shown to outperform standard Boltzmann and Cauchy simulated annealing algorithms [4, 6, 8], as well as standard Genetic Algorithms [6].
Simulated annealing is an inherently iterative algorithm, making it difficult to parallelize efficiently. In each annealing cycle a trial set of function parameters is generated stochastically from the current state and a generating distribution. This trial state may be either rejected or accepted as the new current state, depending probabilistically upon the difference in function value between the current and trial states. All trial states have a finite probability of acceptance, no matter how large their corresponding function value. However, trial states with lower function values are more likely to be accepted. Typically, the ratio of accepted to generated states is initially large, but drops rapidly in the first few annealing cycles. The search process continues for many more cycles until a global solution (or a good" approximation to it) is found. After the first few annealing cycles a low cost state is usually located. From then on, an overwhelming number of trial states are rejected and the current state is seldom changed.
Our parallel implementation of the VFSR algorithm exploits this predominate rejection of trial states. In the parallel algorithm, a block or sequence of generating temperatures is computed from the annealing schedule, starting at a current global generating temperature Tgen. Each parallel processor is assigned a unique Tgen;i from the sequence, and uses this temperature to stochastically generate a local trial state. The processors simultaneously evaluate their local cost functions and then probabilistically accept or reject their local trial states. If one or more local states are accepted, the global current state is replaced by the
accepted trial state with highest generation temperature Tgen;i. If no local state is accepted, the global current state remains unchanged.
We refer to this parallel VFSR algorithm as temperature block partitioning. This parallelization strictly follows its sequential VFSR counterpart. That is, the solution route employs only independent parallel transitions because only the first accepted state in a parallel block is used.
In the following sections we describe in detail the parallel VFSR algorithm and report its performance on eight test functions: De Jong's test suite, a 10D parabolic function, and two multimodal, highly nonlinear functions. Our results indicate parallel speedup depends less upon function complexity than upon size of the parameter search space.
2 General Simulated Annealing
Consider a general simulated annealing procedure for optimization of a one dimensional function as shown in Figure 1. For optimizing functions with real parameters, the generating temperatures Tgen describe the width of each parameter's generation distribution. Each generating temperature is initialized to a very high value, resulting in a relatively flat generating distribution. As optimization progresses and the temperatures are reduced, these distributions narrow, making it more likely the trial state will lie near the current state. Acceptance of a trial state xtrial depends upon the trial state cost f(xtrial), the current state cost f(xcurrent), and the acceptance temperature Taccept. The acceptance function h(t) is defined as:
h(xtrial) =
<=
1 + exp
?f(xtrial) ? f(xcurrent)
Taccept
?>=?1
:
(1)
A trial state is accepted if h(xtrial) exceeds a random
value. The acceptance ratio is initially large, but falls
as Taccept is reduced. At every acceptance temperature,
however, there is a finite probability of accepting
the trial state.
Because the algorithm occasionally chooses states uphill (i.e. having a higher function value) from its current state, it can escape from local minima and more effectively search the state space to find the global minimum. Thus, SA algorithms are often well suited to solving multimodal, nonlinear optimizations.
GENERAL SIMULATED ANNEALING
Initialize acceptance temperature to high value
Initialize generating temperatures to high values
Initialize function parameters to random values
Calculate function cost
DO UNTIL (user specified termination)
Stochastically generate a trial state
Calculate its function cost
Probabilistically accept trial state
IF (trial state accepted) THEN
Replace current state with trial state
Anneal acceptance temperature
Anneal generating temperature
END
Return function minimum
Figure 1: The general simulated annealing algorithm.
3 Sequential VFSR
SA methods often differ in their choice of generating distributions g(). The annealing schedule and therefore performance of a specific SA algorithm depends critically upon g(). For function minimization tasks, Boltzmann Annealing (standard SA) uses a Gaussian probability density for state generation. Global minimization can be achieved by standard SA if the annealing temperature is lowered at a rate no faster than T (k) = T0=ln k, where k is the annealing cycle index. A faster annealing schedule is obtained using a Cauchy distribution [9], which uses the annealing schedule
T (k) = T0=k:
Very Fast Simulated Reannealing (VFSR) generates states from a distribution that allows an exponential annealing schedule
T (k) = T0 exp(?k):
As a result sequential VFSR runs faster than both Boltzmann and Cauchy Annealing.
Unlike Boltzmann Annealing and Cauchy Annealing, VFSR was primarily designed to search bounded and constrained search spaces and to minimize multidimensional functions [3]. In VFSR, a trial state set is formed by independently generating D scalar parameters xtrial;i using the random variable yj .
xtrial;j = xcurrent;j+(Bj ?Aj)yj ; xj 2 [Aj; Bj]: (2)
where Aj and Bj are the minimum and maximum of the jth dimensional range. Values of yj are generated from generating temperature Tj and uniform random values uj
yj = sgn(uj ? 1
2)Tj[(1 + 1=Tj)j2uj?1j ? 1]
where yj 2 [?1; 1]: The VFSR generating distribution is defined as
gT (y) =
D
Y
j=1
1
2(jyjj + Tj) ln(1 + 1=Tj) : (3)
The main advantages of VFSR are its fast annealing time and mathematical guarantee of convergence. A detailed description of the VFSR algorithm can be found in [3, 4, 6].
4 Parallel VFSR
Figure 2 shows an overview of the parallel VFSR temperature block partitioning algorithm. In each parallel temperature block, the global generating temperature Tgen and current state xcurrent are broadcast to all processors. Each parallel processor computes a unique local Tgen;i from the VFSR annealing schedule, using the current global generating temperature as the starting point and its processor index as the annealing sequence number.
In parallel, the processors stochastically generate their local trial states xtrial;i, evaluate the corresponding local function costs and probabilistically accept or reject their local trial states. If one or more local states are accepted, the global current state is replaced by the accepted trial state whose parameters were generated at the the highest temperature Tgen;i. In addition, the acceptance temperature Taccept is lowered according to its annealing schedule and the global generation temperature is replaced by Tgen;i+1. Should no local state be accepted, the global current state remains unchanged and the global generating temperature is updated to reflect the number of rejected trial states.
5 Methods
The parallel VFSR algorithm was implemented and tested on a Connection Machine 2, attached to a SUN 4 interface. Each parallel test used 8k of the 64k parallel processors available on the CM2, and ran in exclusive mode. Sequential VFSR tests were performed
PARALLEL VFSR
Initialize acceptance temperature to high value.
Initialize generating temperatures to high values.
Initialize function parameters to random values
Calculate function cost
DO UNTIL (user specified termination)
Broadcast state and annealing temperatures
BEGIN PARALLEL
Calculate local Tgen;i
Stochastically generate local trial state
Calculate local function cost
Probabilistically accept local trial state
END PARALLEL
IF (any state accepted) THEN
Replace current state with accepted local
trial state of highest Tgen;i
Anneal generating temperature starting
from new state's Tgen;i
Anneal Taccept
ELSE
Anneal generating temperature starting
from lowest Tgen;i in block.
Aperiodically reanneal generating temperatures
END
Return function minimum
Figure 2: The parallel VFSR algorithm.
exclusively on the SUN4 computer serving as the CM 2 interface.
The probabilistic nature of simulated annealing requires multiple trials with different pseudorandom number generator seeds to produce statistically significant results. Therefore, for each test function we measured sequential and parallel VFSR performance using ten different random number generator (RNG) seeds and reported the average values. All tests used the default VFSR optimization parameter set, as supplied with sequential VFSR source [5]. VFSR has multiple exit criteria, of which the most commonly met criteria are 1) reaching sufficiently low annealing temperatures, 2) not accepting a trial state after a sufficient number of iterations, and 3) exceeding a set number of total loop iterations.
5.1 Performance Measure
As in previous simulated annealing studies, performance is measured by number of function evaluation cycles required to meet standard program termination criteria [3, 4, 6]. For sequential VFSR, one function evaluation is performed per iteration of the annealing loop. In the parallel VFSR temperature block partitioning algorithm, the P parallel processors simultaneously evaluate the cost function for their local set of parameter values. Each block evaluation is considered a function evaluation cycle. Because function evaluations after the block's first accepted state are not used, the number of parallel blocks can vary from Cs to Cs=P , where Cs is the number of sequential cycles and P the number of parallel processors. (This ignores minor variations due to the algorithm's probabilistic nature.)
5.2 Test Functions
Parallel VFSR and sequential VFSR performances were compared on a set of eight optimizing test functions, including the 5 functions from De Jong's test suite typically used for benchmarking Genetic Algorithms [2], a tendimensional paraboloid, Katsuura's nowheredifferentiable function [7] and Corana's highly multimodal function [1]. This set was designed to test optimization on functions that are (1) continuous / discontinuous, (2) linear / nonlinear, (3) low / high dimensional, and (4) deterministic / stochastic.
5.3 Corana's function in 10D
Our test used the objective function of Corana,
et.al. [1] with coefficients si; ti; di; and c defined such
that the function resembles a paraboloid with holes
that increase in depth near the origin. The large
number of local minima (approximately 105n) make
optimization of this function very difficult for most
methods. The formula for the Corana's function is
given below and a onedimensional version is pictured
in Figure 3. Corana's function is
f(x1; :::; x10) =
10
X
i=1
ae (ti sgn(zi) + zi)2cdi if jxi ? zij < jtij
dixi2 otherwise
oe
(4)
zi = bj xi
si j + :49999c sgn(xi)si;
si = :2; ti = :05; c = :15;
di = f1; 1000; 10; 100; 1; 1000; 10; 100; 1; 1000g;
?1000: <= xi <= 1000:
5
10
15
20
25
30
35
40
4 3 2 1 1 2 3 4
Corana(x)
Figure 3: Corana's function in one dimension.
5.4 Katsuura's function in 4D
Katsuura's function is a variation of a contraction mapping that is continuous and nowhere differentiable [7]. It is defined as
f(x) =
DY
k=1
(1 + k X
n=0
j2nxk ? b2nxkcj
2n ); ?1 < xi < 1:
(5)
For = 1, Katsuura's function contains an infinite
number of local minima. Our test function used D = 4
and = 30.
5.5 Parabolic function in 10D
The ten dimensional parabolic function
f(x1; :::; x10) =
10
X
i=1
x2i ; ?1 <= xi <= 1 (6)
is a smooth differentiable function with a single minima. This type of function is more efficiently optimized by standard gradient techniques and was included to compare parallel VFSR performance on easy and difficult functions.
5.6 De Jong's 5 function test suite
De Jong's test set [2] contains five functions designed to benchmark performances of Genetic Algorithms, another global optimization technique. Function f1 is the 3dimensional parabolic function with a single minimum value of at xi = 0.
f1(x1; x2; x3) =
3X
i=1
x2i ; ?5:12 <= xi <= 5:12 (7)
Function f2 is the classical function of Rosenbrock and Chebyquad in two dimensions with a minimum value of at xi = 1:0 :
f2(x1; x2) = 100(x21 ? x2)2 + (1 ? x1)2; (8)
?2:048 <= xi <= 2:048:
Function f3 is the plateau function, generated as the sum of integer threshold values. The five dimensional space has one minimum and is discontinuous.
f3(x1; :::; x5) = 30 +
5X
i=1
bxic; ?5:12 <= xi <= 5:12 (9)
Function f4 is a noisy quartic function of 30 variables with one minima located at 0. Random noise ? is added from a uniform distribution bounded by [0,1). The stochastic nature of this function provides a difficult test for optimization techniques.
f4(x1; :::; x30) =
30
X
i=1
x4i + ?; ?1:28 <= xi <= 1:28 (10)
Function f5 is a 2dimensional function similar to
Corana's function. It has 25 local minima, and a
global minima value of ss 0.998004.
f(x1; :::; xn) =
:002 + 1
P25i=1[i + (x1 ? ai1)6 + (x2 ? ai2)6]?1 (11)
ai1 = f?32; ?16; ; 16; 32; :::; ?32; ?16; ; 16; 32g
ai2 = f?32; ?32; ?32; ?32; ?32;
?16; ?16; ?16; ?16; ?16;
:::; 32; 32; 32; 32; 32g
?65:536 <= xi <= 65:536
6 Results
Results of parallel VFSR and sequential VFSR for optimizations of the test functions are given in Table 1. In all eight functions, the number of parallel cycles was substantially lower than the number of sequential cycles. This parallel speedup was expected because of the low acceptance rate of trial states in VFSR. Parallel VFSR optimizations converge more rapidly than their sequential counterparts. Figure 4 displays the course of a parallel and sequential VFSR optimization for one of the more difficult test functions, Corana's multimodal function with ten parameters (10D). Notice the sequential optimization does not move substantially closer to the global minimum
FUNCTION EVALUATION CYCLES
Optimized Seq. Par. Speedup Function VFSR VFSR
Katsuura 4D 17376 223 78 Corana 10D 319483 779 410 Parabolic 10D 626404 761 823
De Jong
f1 3D 4875 312 16
f2 2D 53695 990 54
f3 5D 1728 599 3
f4 30D 3346340 171 19569
f5 2D 1476 607 2
Table 1: Performance of Parallel VFSR as compared to Sequential VFSR using number of function evaluation cycles at program exit (see text). ND indicates an N dimensional function (i.e., N optimized parameters). Each value is the average of ten trials using different RNG seeds.
until after 103 cycles. Conversely, the parallel optimization finds states with much lower function values after only a few temperature blocks.
Efficiency of the parallel VFSR algorithm increases with increasing dimensionality of the function. As shown in Table 1, the parallel speedups (in terms of sequential/parallel cycle ratios) were less than 100 for 2D, 3D, and 4D functions while the 10D functions exhibited speedups between 400 and 900. For the 30 D De Jong f4 function the speedup was over 19500.
The 30D stochastic function of De Jong (f4) appears to exhibit superlinear speedup of annealing cycles. However, in our comparison the sequential VFSR optimization used a single RNG to supply random values for both the objective function and VFSR's generating and acceptance functions. Conversely, each processor in the parallel VFSR implementation used a different RNG, each uniquely seeded. Thus the stochastic noise in function f4 differed between the sequential and parallel implementations. This noise affected convergence time dramatically. For example, sequential VFSR optimizations of f4 using different RNG seeds varied between 4 and 60 hours. Future optimizations of stochastic functions should use separate RNGs for the objective function and the VFSR generating and acceptance functions.
1e06
1e05
0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
1e+06
1e+07
1e+08
1 10 100 1000 10000 100000 1e+06
Function Value
Function Evaluation Cycles
Convergence of Sequential and Parallel VFSR Optimizations
Parallel VFSR
Sequential VFSR
Figure 4: Course of Sequential VFSR and Parallel VFSR optimizations for Corana's function in 10D.
Parallel VFSR Speedups for
Varying Function Dimensionality
Function 4D 5D 7D 10D 20D
Parabolic 48 298 823 1421
Corana 14 410
Katsuura 78 1124
Table 2: Effect of Function Dimension on Parallel VFSR Efficiency. Table values are ratios of Sequential VFSR cycles to Parallel VFSR cycles at program exit for an average of ten trials using different RNG seeds.
6.1 Effects of Dimensionality
The results in Table 1 indicate that our algorithm is more efficient for functions with more parameters. We examined this effect in more detail by measuring parallel and sequential VFSR performance on the parabolic, Corana, and Katsuura functions with varying dimensionality. Results are shown in Table 2, reported as ratios of function evaluation cycles for sequential VFSR to parallel VFSR.
In general, increasing the function dimensionality allows more potential states with high function values than with low function values. Thus at any given temperature, adding more parameters increases the probability of generating a rejected parameter set. This increased rejectance ratio increases the performance gain of the parallel VFSR algorithm.1
1If the number of function parameters is increased without changing VFSR program parameters, the average accuracy of the optimization deteriorates. Good optimization of high di
7 Conclusions
The sequential VFSR algorithm is a powerful tool for optimizing difficult functions. Parallelization improves its performance substantially, especially when optimizing higher dimensional functions. What makes the VFSR temperature block partitioning method effective is VFSR's low acceptance rate of generated trial states. Hence temperature partitioning appears as an efficacious use of parallel resources for VFSR optimizations.
References
[1] A. Corana and M. Marchesi and C. Martini and S. Ridella, Minimizing multimodal Functions of Continuous Variables with the "Simulated Annealing" Algorithm", ACM Trans. on Mathematical Software, Vol 13, No. 3, pp. 272280, 1987.
[2] K. A. De Jong, An Analysis of the Behavior of a Class of Genetic Adaptive system", Ph.D. Dissertation, Department of Computer and Communication Sciences, University of Michigan, Ann Arbor, MI, 1981.
[3] L. Ingber, Very Fast Simulated Reannealing", Mathl. Comput. Modeling, Vol. 12, No. 8, pp. 967 973, 1989.
[4] L. Ingber, Simulated Annealing: Practice versus Theory", Statistics and Computing, 1993.
[5] L. Ingber and B. Rosen, Very Fast Simulated Reannealing (VFSR)", [ftp ringer.cs.utsa.edu: /pub/rosen/vfsr.tar.Z], 1993.
[6] L. Ingber and B. E. Rosen, Genetic Algorithms and Very Fast Simulated Reannealing: A Comparison", Mathematical and Computer Modeling, Vol. 16, No. 11, pp. 87100, 1992.
[7] H. Katsuura, Continuous Nowhere Differentiable Function { An Application of Contraction Mappings", The American Mathematical Monthly, Vol. 98, No. 5, 1991.
[8] B. Rosen, Function Optimization based on Advanced Simulated Annealing", IEEE press, Workshop on Physics and Computation, PhysComp 92, Dallas, Texas, 1992.
[9] H. Szu and R. Hartley, Fast Simulated Annealing", Phys. Lett. A, Vol. 122, No. 157162, pp. 1987.
mension functions requires reducing two of the VFSR program parameters, TEMPERATURE RATIO SCALE and TEM PERATURE REANNEAL SCALE.