page 1  (6 pages)
2to next section

Parallel Very Fast Simulated Reannealing by Temperature Block

Partitioning ?

Sandra G. Dykes and Bruce E. Rosen

Division of Mathematics, Computer Science, and Statistics

The University of Texas at San Antonio

San Antonio, Texas 78249

Abstract

This work describes a parallel implementation of Very Fast Simulated Reannealing (VFSR), an advanced simulated annealing method for optimization of non-linear, multi-dimensional functions with large numbers of local minima. Parallel VFSR speed-ups on a CM-2 connection machine are reported for eight functions: De Jong's test suite, a 10-D parabolic function, and two multi-modal, highly non-linear functions. Within the test set, the function characteristic most affecting parallel VFSR performance is number of optimized function parameters. Low dimension functions profited least from parallelization, exhibiting speedups from 2 to 78 (where speedups are based on number of function evaluation cycles). Speed-ups for the three 10-D cost functions increased to 410, 823 and 1124. On a stochastic high dimensional (D=30) quartic cost function, the cycle ratio was over 19000. We present results of a systematic study of the dimensionality effect on three test functions.

1 Introduction

Simulated annealing (SA) is a probabilistic optimization technique well-suited to multi-modal, discrete, non-linear and non-differentiable functions. SA's main strength is its statistical guarantee of global minimization, even in the presence of many local minima. The price one pays for global minimization is execution time: simulated annealing methods are notoriously slow. Hence there is mounting interest in using parallel algorithms and architectures to speed up the annealing process. This work introduces a parallel

?This work is supported in part by a grant from the University of Texas at San Antonio Faculty Research Awards and Ph.D. Strategic Fund.

implementation of the advanced simulated annealing method known as Very Fast Simulated Reannealing (VFSR), which has previously been shown to outperform standard Boltzmann and Cauchy simulated annealing algorithms [4, 6, 8], as well as standard Genetic Algorithms [6].

Simulated annealing is an inherently iterative algorithm, making it difficult to parallelize efficiently. In each annealing cycle a trial set of function parameters is generated stochastically from the current state and a generating distribution. This trial state may be either rejected or accepted as the new current state, depending probabilistically upon the difference in function value between the current and trial states. All trial states have a finite probability of acceptance, no matter how large their corresponding function value. However, trial states with lower function values are more likely to be accepted. Typically, the ratio of accepted to generated states is initially large, but drops rapidly in the first few annealing cycles. The search process continues for many more cycles until a global solution (or a good" approximation to it) is found. After the first few annealing cycles a low cost state is usually located. From then on, an overwhelming number of trial states are rejected and the current state is seldom changed.

Our parallel implementation of the VFSR algorithm exploits this predominate rejection of trial states. In the parallel algorithm, a block or sequence of generating temperatures is computed from the annealing schedule, starting at a current global generating temperature Tgen. Each parallel processor is assigned a unique Tgen;i from the sequence, and uses this temperature to stochastically generate a local trial state. The processors simultaneously evaluate their local cost functions and then probabilistically accept or reject their local trial states. If one or more local states are accepted, the global current state is replaced by the

accepted trial state with highest generation temperature Tgen;i. If no local state is accepted, the global current state remains unchanged.

We refer to this parallel VFSR algorithm as temperature block partitioning. This parallelization strictly follows its sequential VFSR counterpart. That is, the solution route employs only independent parallel transitions because only the first accepted state in a parallel block is used.

In the following sections we describe in detail the parallel VFSR algorithm and report its performance on eight test functions: De Jong's test suite, a 10-D parabolic function, and two multi-modal, highly nonlinear functions. Our results indicate parallel speed-up depends less upon function complexity than upon size of the parameter search space.

2 General Simulated Annealing

Consider a general simulated annealing procedure for optimization of a one dimensional function as shown in Figure 1. For optimizing functions with real parameters, the generating temperatures Tgen describe the width of each parameter's generation distribution. Each generating temperature is initialized to a very high value, resulting in a relatively flat generating distribution. As optimization progresses and the temperatures are reduced, these distributions narrow, making it more likely the trial state will lie near the current state. Acceptance of a trial state xtrial depends upon the trial state cost f(xtrial), the current state cost f(xcurrent), and the acceptance temperature Taccept. The acceptance function h(t) is defined as:

h(xtrial) =

<=

1 + exp

?f(xtrial) ? f(xcurrent)

Taccept

?>=?1
:

(1)
A trial state is accepted if h(xtrial) exceeds a random value. The acceptance ratio is initially large, but falls as Taccept is reduced. At every acceptance temperature, however, there is a finite probability of accepting the trial state.

Because the algorithm occasionally chooses states uphill (i.e. having a higher function value) from its current state, it can escape from local minima and more effectively search the state space to find the global minimum. Thus, SA algorithms are often well suited to solving multi-modal, non-linear optimizations.

GENERAL SIMULATED ANNEALING

Initialize acceptance temperature to high value

Initialize generating temperatures to high values

Initialize function parameters to random values

Calculate function cost

DO UNTIL (user specified termination)

Stochastically generate a trial state

Calculate its function cost

Probabilistically accept trial state

IF (trial state accepted) THEN

Replace current state with trial state
Anneal acceptance temperature

Anneal generating temperature

END

Return function minimum

Figure 1: The general simulated annealing algorithm.

3 Sequential VFSR

SA methods often differ in their choice of generating distributions g(). The annealing schedule and therefore performance of a specific SA algorithm depends critically upon g(). For function minimization tasks, Boltzmann Annealing (standard SA) uses a Gaussian probability density for state generation. Global minimization can be achieved by standard SA if the annealing temperature is lowered at a rate no faster than T (k) = T0=ln k, where k is the annealing cycle index. A faster annealing schedule is obtained using a Cauchy distribution [9], which uses the annealing schedule

T (k) = T0=k:

Very Fast Simulated Reannealing (VFSR) generates states from a distribution that allows an exponential annealing schedule

T (k) = T0 exp(?k):

As a result sequential VFSR runs faster than both Boltzmann and Cauchy Annealing.

Unlike Boltzmann Annealing and Cauchy Annealing, VFSR was primarily designed to search bounded and constrained search spaces and to minimize multidimensional functions [3]. In VFSR, a trial state set is formed by independently generating D scalar parameters xtrial;i using the random variable yj .

xtrial;j = xcurrent;j+(Bj ?Aj)yj ; xj 2 [Aj; Bj]: (2)

where Aj and Bj are the minimum and maximum of the jth dimensional range. Values of yj are generated from generating temperature Tj and uniform random values uj

yj = sgn(uj ? 1

2)Tj[(1 + 1=Tj)j2uj?1j ? 1]

where yj 2 [?1; 1]: The VFSR generating distribution is defined as

gT (y) =
D
Y

j=1

1

2(jyjj + Tj) ln(1 + 1=Tj) : (3)

The main advantages of VFSR are its fast annealing time and mathematical guarantee of convergence. A detailed description of the VFSR algorithm can be found in [3, 4, 6].

4 Parallel VFSR

Figure 2 shows an overview of the parallel VFSR temperature block partitioning algorithm. In each parallel temperature block, the global generating temperature Tgen and current state xcurrent are broadcast to all processors. Each parallel processor computes a unique local Tgen;i from the VFSR annealing schedule, using the current global generating temperature as the starting point and its processor index as the annealing sequence number.

In parallel, the processors stochastically generate their local trial states xtrial;i, evaluate the corresponding local function costs and probabilistically accept or reject their local trial states. If one or more local states are accepted, the global current state is replaced by the accepted trial state whose parameters were generated at the the highest temperature Tgen;i. In addition, the acceptance temperature Taccept is lowered according to its annealing schedule and the global generation temperature is replaced by Tgen;i+1. Should no local state be accepted, the global current state remains unchanged and the global generating temperature is updated to reflect the number of rejected trial states.

5 Methods

The parallel VFSR algorithm was implemented and tested on a Connection Machine 2, attached to a SUN- 4 interface. Each parallel test used 8k of the 64k parallel processors available on the CM-2, and ran in exclusive mode. Sequential VFSR tests were performed

PARALLEL VFSR

Initialize acceptance temperature to high value.

Initialize generating temperatures to high values.

Initialize function parameters to random values

Calculate function cost

DO UNTIL (user specified termination)

Broadcast state and annealing temperatures

BEGIN PARALLEL

Calculate local Tgen;i
Stochastically generate local trial state
Calculate local function cost
Probabilistically accept local trial state

END PARALLEL

IF (any state accepted) THEN

Replace current state with accepted local
trial state of highest Tgen;i
Anneal generating temperature starting
from new state's Tgen;i
Anneal Taccept

ELSE

Anneal generating temperature starting
from lowest Tgen;i in block.

Aperiodically reanneal generating temperatures

END

Return function minimum

Figure 2: The parallel VFSR algorithm.

exclusively on the SUN-4 computer serving as the CM- 2 interface.

The probabilistic nature of simulated annealing requires multiple trials with different pseudo-random number generator seeds to produce statistically significant results. Therefore, for each test function we measured sequential and parallel VFSR performance using ten different random number generator (RNG) seeds and reported the average values. All tests used the default VFSR optimization parameter set, as supplied with sequential VFSR source [5]. VFSR has multiple exit criteria, of which the most commonly met criteria are 1) reaching sufficiently low annealing temperatures, 2) not accepting a trial state after a sufficient number of iterations, and 3) exceeding a set number of total loop iterations.

5.1 Performance Measure

As in previous simulated annealing studies, performance is measured by number of function evaluation cycles required to meet standard program termination criteria [3, 4, 6]. For sequential VFSR, one function evaluation is performed per iteration of the annealing loop. In the parallel VFSR temperature block partitioning algorithm, the P parallel processors simultaneously evaluate the cost function for their local set of parameter values. Each block evaluation is considered a function evaluation cycle. Because function evaluations after the block's first accepted state are not used, the number of parallel blocks can vary from Cs to Cs=P , where Cs is the number of sequential cycles and P the number of parallel processors. (This ignores minor variations due to the algorithm's probabilistic nature.)

5.2 Test Functions

Parallel VFSR and sequential VFSR performances were compared on a set of eight optimizing test functions, including the 5 functions from De Jong's test suite typically used for benchmarking Genetic Algorithms [2], a ten-dimensional paraboloid, Katsuura's nowhere-differentiable function [7] and Corana's highly multi-modal function [1]. This set was designed to test optimization on functions that are (1) continuous / discontinuous, (2) linear / nonlinear, (3) low / high dimensional, and (4) deterministic / stochastic.

5.3 Corana's function in 10-D

Our test used the objective function of Corana, et.al. [1] with coefficients si; ti; di; and c defined such that the function resembles a paraboloid with holes that increase in depth near the origin. The large number of local minima (approximately 105n) make optimization of this function very difficult for most methods. The formula for the Corana's function is given below and a one-dimensional version is pictured in Figure 3. Corana's function is
f(x1; :::; x10) =
10
X

i=1

ae (ti sgn(zi) + zi)2cdi if jxi ? zij < jtij
dixi2 otherwise

oe

(4)

zi = bj xi
si j + :49999c sgn(xi)si;

si = :2; ti = :05; c = :15;

di = f1; 1000; 10; 100; 1; 1000; 10; 100; 1; 1000g;
?1000: <= xi <= 1000:

5

10

15

20

25

30

35

40

-4 -3 -2 -1 1 2 3 4

Corana(x)

Figure 3: Corana's function in one dimension.

5.4 Katsuura's function in 4-D

Katsuura's function is a variation of a contraction mapping that is continuous and nowhere differentiable [7]. It is defined as

f(x) =
DY

k=1
(1 + k X

n=0

j2nxk ? b2nxkcj
2n ); ?1 < xi < 1:

(5)
For = 1, Katsuura's function contains an infinite number of local minima. Our test function used D = 4 and = 30.

5.5 Parabolic function in 10-D

The ten dimensional parabolic function

f(x1; :::; x10) =
10
X

i=1
x2i ; ?1 <= xi <= 1 (6)

is a smooth differentiable function with a single minima. This type of function is more efficiently optimized by standard gradient techniques and was included to compare parallel VFSR performance on easy and difficult functions.

5.6 De Jong's 5 function test suite

De Jong's test set [2] contains five functions designed to benchmark performances of Genetic Algorithms, another global optimization technique. Function f1 is the 3-dimensional parabolic function with a single minimum value of at xi = 0.

f1(x1; x2; x3) =
3X

i=1
x2i ; ?5:12 <= xi <= 5:12 (7)

Function f2 is the classical function of Rosenbrock and Chebyquad in two dimensions with a minimum value of at xi = 1:0 :

f2(x1; x2) = 100(x21 ? x2)2 + (1 ? x1)2; (8)

?2:048 <= xi <= 2:048:

Function f3 is the plateau function, generated as the sum of integer threshold values. The five dimensional space has one minimum and is discontinuous.

f3(x1; :::; x5) = 30 +
5X

i=1
bxic; ?5:12 <= xi <= 5:12 (9)

Function f4 is a noisy quartic function of 30 variables with one minima located at 0. Random noise ? is added from a uniform distribution bounded by [0,1). The stochastic nature of this function provides a difficult test for optimization techniques.

f4(x1; :::; x30) =
30
X

i=1
x4i + ?; ?1:28 <= xi <= 1:28 (10)

Function f5 is a 2-dimensional function similar to Corana's function. It has 25 local minima, and a global minima value of ss 0.998004.
f(x1; :::; xn) =

:002 + 1
P25i=1[i + (x1 ? ai1)6 + (x2 ? ai2)6]?1 (11)

ai1 = f?32; ?16; ; 16; 32; :::; ?32; ?16; ; 16; 32g

ai2 = f?32; ?32; ?32; ?32; ?32;

?16; ?16; ?16; ?16; ?16;

:::; 32; 32; 32; 32; 32g

?65:536 <= xi <= 65:536

6 Results

Results of parallel VFSR and sequential VFSR for optimizations of the test functions are given in Table 1. In all eight functions, the number of parallel cycles was substantially lower than the number of sequential cycles. This parallel speedup was expected because of the low acceptance rate of trial states in VFSR. Parallel VFSR optimizations converge more rapidly than their sequential counterparts. Figure 4 displays the course of a parallel and sequential VFSR optimization for one of the more difficult test functions, Corana's multi-modal function with ten parameters (10-D). Notice the sequential optimization does not move substantially closer to the global minimum

FUNCTION EVALUATION CYCLES

Optimized Seq. Par. Speedup Function VFSR VFSR

Katsuura 4-D 17376 223 78 Corana 10-D 319483 779 410 Parabolic 10-D 626404 761 823

De Jong
f1 3-D 4875 312 16
f2 2-D 53695 990 54
f3 5-D 1728 599 3
f4 30-D 3346340 171 19569 f5 2-D 1476 607 2

Table 1: Performance of Parallel VFSR as compared to Sequential VFSR using number of function evaluation cycles at program exit (see text). N-D indicates an N dimensional function (i.e., N optimized parameters). Each value is the average of ten trials using different RNG seeds.

until after 103 cycles. Conversely, the parallel optimization finds states with much lower function values after only a few temperature blocks.

Efficiency of the parallel VFSR algorithm increases with increasing dimensionality of the function. As shown in Table 1, the parallel speedups (in terms of sequential/parallel cycle ratios) were less than 100 for 2-D, 3-D, and 4-D functions while the 10-D functions exhibited speedups between 400 and 900. For the 30- D De Jong f4 function the speedup was over 19500.

The 30-D stochastic function of De Jong (f4) appears to exhibit superlinear speedup of annealing cycles. However, in our comparison the sequential VFSR optimization used a single RNG to supply random values for both the objective function and VFSR's generating and acceptance functions. Conversely, each processor in the parallel VFSR implementation used a different RNG, each uniquely seeded. Thus the stochastic noise in function f4 differed between the sequential and parallel implementations. This noise affected convergence time dramatically. For example, sequential VFSR optimizations of f4 using different RNG seeds varied between 4 and 60 hours. Future optimizations of stochastic functions should use separate RNGs for the objective function and the VFSR generating and acceptance functions.

1e-06

1e-05

0.0001

0.001

0.01

0.1

1

10

100

1000

10000

100000

1e+06

1e+07

1e+08

1 10 100 1000 10000 100000 1e+06

Function Value

Function Evaluation Cycles

Convergence of Sequential and Parallel VFSR Optimizations

Parallel VFSR

Sequential VFSR

Figure 4: Course of Sequential VFSR and Parallel VFSR optimizations for Corana's function in 10-D.

Parallel VFSR Speedups for
Varying Function Dimensionality

Function 4-D 5-D 7-D 10-D 20-D

Parabolic 48 298 823 1421 Corana 14 410
Katsuura 78 1124

Table 2: Effect of Function Dimension on Parallel VFSR Efficiency. Table values are ratios of Sequential VFSR cycles to Parallel VFSR cycles at program exit for an average of ten trials using different RNG seeds.

6.1 Effects of Dimensionality

The results in Table 1 indicate that our algorithm is more efficient for functions with more parameters. We examined this effect in more detail by measuring parallel and sequential VFSR performance on the parabolic, Corana, and Katsuura functions with varying dimensionality. Results are shown in Table 2, reported as ratios of function evaluation cycles for sequential VFSR to parallel VFSR.

In general, increasing the function dimensionality allows more potential states with high function values than with low function values. Thus at any given temperature, adding more parameters increases the probability of generating a rejected parameter set. This increased rejectance ratio increases the performance gain of the parallel VFSR algorithm.1

1If the number of function parameters is increased without changing VFSR program parameters, the average accuracy of the optimization deteriorates. Good optimization of high di-

7 Conclusions

The sequential VFSR algorithm is a powerful tool for optimizing difficult functions. Parallelization improves its performance substantially, especially when optimizing higher dimensional functions. What makes the VFSR temperature block partitioning method effective is VFSR's low acceptance rate of generated trial states. Hence temperature partitioning appears as an efficacious use of parallel resources for VFSR optimizations.

References

[1] A. Corana and M. Marchesi and C. Martini and S. Ridella, Minimizing multimodal Functions of Continuous Variables with the "Simulated Annealing" Algorithm", ACM Trans. on Mathematical Software, Vol 13, No. 3, pp. 272-280, 1987.

[2] K. A. De Jong, An Analysis of the Behavior of a Class of Genetic Adaptive system", Ph.D. Dissertation, Department of Computer and Communication Sciences, University of Michigan, Ann Arbor, MI, 1981.

[3] L. Ingber, Very Fast Simulated Re-annealing", Mathl. Comput. Modeling, Vol. 12, No. 8, pp. 967- 973, 1989.

[4] L. Ingber, Simulated Annealing: Practice versus Theory", Statistics and Computing, 1993.

[5] L. Ingber and B. Rosen, Very Fast Simulated Reannealing (VFSR)", [ftp ringer.cs.utsa.edu: /pub/rosen/vfsr.tar.Z], 1993.

[6] L. Ingber and B. E. Rosen, Genetic Algorithms and Very Fast Simulated Reannealing: A Comparison", Mathematical and Computer Modeling, Vol. 16, No. 11, pp. 87-100, 1992.

[7] H. Katsuura, Continuous Nowhere Differentiable Function { An Application of Contraction Mappings", The American Mathematical Monthly, Vol. 98, No. 5, 1991.

[8] B. Rosen, Function Optimization based on Advanced Simulated Annealing", IEEE press, Workshop on Physics and Computation, PhysComp 92, Dallas, Texas, 1992.

[9] H. Szu and R. Hartley, Fast Simulated Annealing", Phys. Lett. A, Vol. 122, No. 157-162, pp. 1987.

mension functions requires reducing two of the VFSR program parameters, TEMPERATURE RATIO SCALE and TEM- PERATURE REANNEAL SCALE.