Y. Gao and J. Culberson
Volume 17, 2002
Links to Full Text:Journal of Artificial Intelligence Research 17 {2002} 309-332 Submitted 06/02;
published 10/02
An Analysis of Phase Transition in NK Landscapes
Yong Gao ygao@cs.ualbert a.ca
Joseph Culberson joe@cs.ualbert a.ca
Department of Computing Science
University of Alberta
Edmonton, Alberta, Canada, T6G 2H1
Abstract In this paper, we analyze the decision version of the NK landscape
model from the perspective of threshold phenomena and phase transitions under
two random distributions, the uniform probability model and the fixed ratio
model. For the uniform probability
model, we prove that the phase transition is easy in the sense that there
is a polynomial
algorithm that can solve a random instance of the problem with the probability
asymptotic
to 1 as the problem size tends to infinity. For the fixed ratio model, we
establish several
upper bounds for the solubility threshold, and prove that random instances
with parameters
above these upper bounds can be solved polynomially. This, together with
our empirical
study for random instances generated below and in the phase transition region,
suggests thatthe phase transition of the fixed ratio model is also easy.
1. Introduction
The NK landscape is a fitness landscape model devised by Kauffman {1989}.
An appealing property of the NK landscape is that the \ruggedness" of the
landscape can be tuned
bychanging some parameters. Over the years, theNK landscape model itself
has been
studied from the perspectives of statistics and computational complexity
{Weinberger, 1996;
Wright, Thompson,& Zhang, 2000}. In the study of genetic algorithms, NK
landscape
models have been used as a prototype and benchmark in the analysis of the
performance of
different genetic operators and the effects of different encoding methods
on the algorithm's
performance {Altenberg, 1997; Hordijk,1997; Jones, 1995}. In the field of
combinatorial search and optimization, one of the interesting discoveries
isthe threshold phenomena and phase transitions. Roughly speaking, a phase
transition in
combinatorial search refers to the phenomenon that the probability that
a random instance
of the problem has a solution drops abruptly from 1 to 0 as the order parameter
of the
random model crosses a critical value called the threshold. Closely related
to this phase
transition in solubility is the hardness of solving the problems. There
has been strong empir- ical evidence and theoretical arguments showing that
the hardest instances of the problems usually occur around the threshold
and instances generated with parameters far away from the threshold are relatively
easy. Since the seminal work of Cheeseman et al. {Cheese-
man, Kanefsky, & Taylor, 1991}, many NP-complete combinatorial search problems
have beenshown to have the phase transition and the associated easy-hard-easy
pattern {Cook & Mitchell, 1997; Culberson & Gent, 2001; Freeman, 1996; Gent,
MacIntyre, Prosser, &
c
fl2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.Gao
& Culberson Walsh, 1998; Kirkpatrick & Selman, 1994; Mitchell, Selman, &
Levesque, 1992; Vandegriend
& Culberson, 1998}. In this paper, we analyze the NK landscape model from
the perspective of threshold phenomena and phase transitions. We establish
two random models for the decision problem
of NK landscapes and study the threshold phenomena and the associated hardness
of the
phasetransitions in these two models.
The rest of the paper is organized as follows. In Section 2, we introduce
the NK fitness
landscape and our probabilistic models, the uniform probability model and
the fixed ratio
model. In Section 3 and Section 4, the threshold phenomena and phase transitions
in
NK landscapes are analyzed. For the uniform probability model, we prove
that the phase
transition of the uniform probability model is easy in the sense that there
is a polynomial
algorithm that can solve a random instance of the problem with the probability
asymptotic
to 1 as the problem size tends to infinity. For the fixed ratio model, we
establish two upper
bounds for the solubility threshold, and prove that random instances with
parameters above
these upper bounds can be solved polynomially. This, together withour empirical
study for random instances generated below and in the phase transition region,
suggests that the phase transition of the fixed ratio model is also easy.
In Section 5, we report our
experimental results on typical hardness of the fixed ratio model. In Section
6, we conclude
our investigation and discuss implications of our results. 2. NK Landscapes
and their Probabilistic Models
An NK landscape f {x} =
n P
i=1
f i
{x
i ; 005{x
i }); is a real-valued function defined on binary strings
of fixed length, where n > 0is a positive integer and x = {x
1
; 001 001 001 ; x
n
} 2 f0; 1g
n . It is the sum
of n local fitness functions f i
; 1 024 i 024 n. Each local fitness function f
i
{x
i
; 005{x
i
}) depends on the main variable x i
and its neighborhood 005{x
i
} 032 P k
{fx
1
; 001 001 001 ; x
n
gnfx i
g} where P k
{X }
denotes the set of all subsets of size k from X . The most important parameters
of an NK
landscape are the number of variables n, and the size of the neighborhood
k = j005{x
i }j.
In an NK landscape, the neighborhood 005{x i
} can be chosen in two ways: the random
neighborhood,where the k variables are randomly chosen from the set fx
1
; 001 001 001 ; x
n
gnfx i
g;
and the adjacent neighborhood, where k variables with indices nearest to
i {modulo n} are
chosen. For example, for any even integer k, the k variables in 005{x
i
} can be defined as
x
{(n+i000
k2
} mod n} ; 001 001 001 ; x {(n+i+
k2
} mod n}
: Once the variables in the neighborhood are deter- mined, the local fitness
function f
i
is determined by a fitness lookup table which specifies the function value
f i
for each of the 2 k+1
possible assignments to the variables x
i and 005{x
i
}.
Throughout this paper, we consider NK landscapes with random neighborhoods.
To
simplify the discussion, we further assume that the local fitness functions
take on binary
values. Given an NK landscape f , the corresponding decision problem is
stated as follows: Is the maximum of f {x} equal to n? An NK landscape decision
problem is insoluble if there
is no solution for it.
It has been proved that the NK landscape model is NP complete for k 025
2{e.g.,
Weinberger, 1996; Wright et al., 2000}. The proofs were based on a reduction
from SAT to
the decision problem of NK landscapes. To study the typical hardness of
the NK landscape
decision problems in the framework of thresholds and phase transitions,
we introduce two 310Phase Transition in NK Landscapes
random models. In both of the models defined below, the neighborhood set
005{x
i
} of a variable x
i
is selected by randomly choosing without replacement k = j005{x
i
}j variables
from xnfx i
g.
Definition 2.1. The Uniform Prob ability ModelN {n; k; p}: In this model,
the fitness value
of the loc al fitness function f
i
{x
i
; 005{x
i
}) is determined as fol lows: For each assignment y 2Dom{f
i
} = f0; 1g
k+1
, let f
i
{y} = 0 with the prob ability p and f
i
{y} = 1 with the
prob ability 1 000 p, where this is done for each possible assignment and
each loc al fitness
function independently.
Definition 2.2. The Fixed Ratio Model N {n; k; z}: In this model, the par
ameter z takes
on values from [0; 2
k+1 ]. If z is an integer, we spe cify the loc al fitness function f
i
{x
i
; 005{x
i
}) by randomly choosing without replac ement z tuplesof possible assignments
Y = {y
1
; 001 001 001 ; y
z
} from Dom{f i
} = f0; 1g
k+1
, and defining the loc al fitness function as fol lows:
f
i {y} =
032 0; if y 2 Y ;
1; else:
For a non-integer z = {1 000 ff}[z] + ff[z +1] where [z] is the integer
part of z, we choose r andomly without replac ement [{1 000 ff}n] loc al
fitness functions and determine their fitness
values acc or ding to N {n; k; [z]}. The rest of the loc al fitness functions
are determined ac-
cor ding to N {n; k; [z] + 1}.
In the theory of random graphs, there are two related random models G{n;
p} where
each of the
n{n0001}2 possible edges is included in the graph independently with probability
p,and G{n; m} where exactly m edges are chosen randomly and without replacement
from
the set of n{n0001}2
possible edges. It is well known that for most of the monotone graph
properties, results proved in G{n; p} {or G{n; m}) also hold asymptotically
for G{n; N p}
{correspondingly, G{n;
mN
}) where N =
n{n0001}2
. However, wecannot expect that similar relations exist between the two
random models of NK landscapes defined above unless the
parameter k tends to infinity. As a result, the asymptotic behaviors of
the two NK landscape
models are significantly different for fixed k.
We conclude this section by establishing a relation between the decision
problem of NK landscapes and the SAT problem. A decision problem of the NK
landscape f {x} =
n
X
i=1 f
i
{x i
; 005{x i
});
\is the maximum of f {x} equal or greater than n?", can be reduced to a
{k+1}-SAT problem
as follows:
{1} For each local fitness function f
i
{x
i
; 005{x
i
}), construct a conjunction C
i =
z
V
j=1
C
j i
of clauses with exactly k + 1 variable-distinct literals from the set of
variables fx i
; 005{x i
}g,
where z is the number of zero values that f
i takes and C
j i
is such that for any assignment
y
j
2 f0; 1g
k+1
that falsifies C j
i
, we have f
i
{y
j
} = 0.
{2} The {k+1}-SAT is the conjunction ' =
n
V
i=1
C
i .
311Gao & Culbersonxyzf
iClauses0000x _ y _z001101010110x _ 026y _ 026z10011010026 x _ y _ 026z1100026
x _ 026y _z1111Table 1: A local fitness function and its equivalent 3-clauses.
Table 1 shows an example of the fitness assignment of a local fitness function
f
i
= f
i
{x; y; z}
and its associated equivalent 3-SAT clauses. It is easy to see that for
any assignment s to
the variables x; y; z, f
i
{s} = 1 if and only if the assignment satisfies the formula x _ y _ z; x
_ 026y _ 026z; 026x _ y _ 026z; 026x _ 026y _ z: 3. Analysis of The
Uniform Probability Model
In the uniform probability modelN {n; k; p}, the parameter p determines
how many zero
values a local fitness function can take. We are interested in how the solubility
and hardness
of the NK landscape decision problem change as the parameter p increases
from 0 to 1. It
turns out that for fixed p > 0, the decision problem is asymptotically trivially
insoluble.
This is quite similar to the phenomena in the random models of the constraint
satisfaction
problem observed by Achlioptas et al. {1997}.
To gain some more insight into the problem, we consider the case where p
= p{n} is a
function of the problem size n with lim
n p{n}= 0. Our analysis shows that the solubility of the problem depends
on how fast p{n} decreases:
{1} If lim
n
p{n}n
12
k+1
= +1; {3.1}
the problem is still asymptotically trivially insoluble because with the
probability asymp- totic to 1, there is at least one local fitness function
that always has a fitness value 0;
{2} On the other hand if p{n} decreases fast enough, i.e., lim
n
p{n}n
12
k+1
< +1; {3.2}
the problem can be decomposed into a set of independent sub-problems. In
either case the problem can be solved in polynomial time. The case of {3.1}
is not difficult to prove, but
to prove the case of {3.2}, we need to make use of the following concepts
and results.
Definition 3.1. The connection graph of an NK landscap e instance f {x}
=
n
P
i=1
f i
{x
i ; 005{x
i })
is a graph G = G{V ; E } satisfying
312Phase Transition in NK Landscapes
{1} Each vertex v2V corr esp onds to a loc al fitness function; and
{2}There is an edge betwe en v
i
; v
j
if and only if the corr esp onding loc al fitness functions f
i
; f
j
share variables, i.e., the neighborho o ds 005{x
i } and 005{x
j } of x
i
and x
j
have a non-empty intersection, and both of them have at least one zero value.
Definition 3.2. Let f {x} =
n
P
i=1
f
i
{x
i
; 005{x
i
}) be an NK landscap e instance with the con-
ne ction graph G = G{V ; E }. Let G
1
; 001 001 001 ; G
l
be the conne cte d comp onents of G. Since the
vertices of G corr esp ond toloc al fitness functions, we can re gar d G
i as a set of loc al fitness functions. For each 1 024 i 024 l, let U
i
032 x = {x
1
; 001 001 001 ; x
n
} be the set of variables that appe ar
in the definition of the loc al fitness functions in G
i
. It is easy to see that {U
1
; 001 001 001 U
l
} excluding independent vertices forms a disjoint partition of {a subset
of } the variables x = {x
1 ; 001 001 001 ; x n
}, and that the local fitness functions in G
i only depend on the variables in U
i
. Furthermore, the NK decision problem is soluble if and
only if for each 1 024 i 024 l, there is an assignment s
i
2f0; 1g
jU i
j
to the variables in U
i
such
that for each local fitness function g 2G
i , g{s}= 1. Theorem 3.1 summarizes the result on the uniform probability
model.
Theorem 3.1. For any p{n} such that lim
n p{n}n
12
k+1
exists, k fixed,ther e is a polynomial
time algorithm that succ essful ly solves a random instance ofN {n; k; p}
with prob ability
asymptotic to 1 as n tends to infinity.
Proof: We consider two cases: lim
n
p{n}n 12
k+1
= +1 and lim n
p{n}n
12 k+1
<+1.
{1} The case of lim n
p{n}n
12 k+1
= +1.
Let A
i
be the event that f
i
{y} = 0 for each possibleassignment y 2f0; 1g
k+1
and let
A =
n
S
i=1
A
i
be the event that at least one of the A
i
's occurs. We have
lim
n!1
P rfAg = 1 000 lim
n!1
P rf
n
i=1 A
c
i
g
= 1 000 lim n!1
{1 000 p{n}
2
k+1
}
n :
It can be shown that if k is fixed and lim
n
p{n}n 12
k+1
= +1, then lim n!1
P rfAg = 1. It follows that with probability asymptotic to one, there is
at least one local fitness function which
takes on values 0 for any possibleassignments. We can therefore show that
in this case, the
NK decision problem is insoluble by checking the local fitness functions
one by one. And
this only takes lineartime.
{2} The case of lim
n p{n}n
12
k+1 < +1.
Consider an algorithm that first finds the connected components G
i
; 1 024 i 024 l of the connection graph G of the NK model, and then uses
brute force to find an assignment
s
i
2 f0; 1g
jU i
j
to the variables in U
i
such that for each local fitness function g 2G
i
, g{s} = 1. The time complexity of this algorithm is O{n
2
+ n 003 2
M{n;k;p}
} where M{n; k; p} =
max{jU
i
j; 1 024 i 024 l} is the maximum size of the subsets {U
i ; 1 024 i 024 l} associated with the
313Gao & Culberson connected components of the connection graph. To prove
the theorem, we only need to show
that M{n; k; p} 2 O{log n}. In the following, we willshow that for lim
n
p{n}n
12
k+1
< +1,
we have lim
n!1
P rfM{n; k; p} 024 2
k
+ 2g = 1
Consider the connection graph G = G{V ; E } of the NK model. It is a random
graph and
there is an edge between two nodes if and only if the two correspondinglocal
fitness functions
share variables and both of the local fitness functions take at least one
zero as their fitness
value. However, underthis definition the edge probabilitiesare not independent.
If vx 2 E then we know that f
x
has at least one zero and so the probability that xw is in E is greater
than if there were no other edge on x.
To deal with this we resort to the following proof construction. Let C
m = fv
1
; : : : ; v
m
g
be a subset of V of size m. Let 031 be an ordering {permutation} of v
1
: : : v m
. We say that C
m
is variable conne cte d with resp e ct to the ordering 031, denoted as
C {C
m
; 031}, if for each
i; 2 024 i 024 m there is either
1. a j < i such that f
031{j} and f
031{i}
share a variable; or 2. a j; 1 024 j 024 i such that the variable x
j
is one of the k random variables in f i
.
Lemma If the induce d subgr aph G[C
m
] is conne cte d then there exists at least one ordering 031 ofv
1 : : : v
m
such that C {C
m
; 031}.
As proof, consider the ordering of vertices of any depth first search of
a connected subgraph. In this case, the connections are all by case 1.
The expected number of permutations 031 for which C {C
m
; 031} is
E
c = E[jf031 : C {C
m
; 031}gj] = m!PrfC {C
m
; 031}g
We then observe that the expected number of connected induced graphs on
m vertices is
less than p
m
0
000 n
m
001
E
c
, where p 0
is the probability that f
i
takes at least one value zero. We
show this value goes to zero in the limit if m 025 2
k
+ 2. Finally, since if there is a connected
subgraph on m vertices then there must be one for each i< m, it follows
that the largest
connected component has size at most 2
k
+ 1.
For a randomly generated permutation 031 of C
m , let C
i
be the set of the first i vertices of the permutation. For i025 2 define
P
i
to be the probability that f
031{i} shares at least one
variable with f
031{j} for some j < i given that C {C
i0001 ; 031=1; 001 001 001 ; i 000 1}. Let P 1
=1. {A one vertex subgraphis always connected.} For i > 1 we have P
i
=Prf9j < i; f
031{i}
and f
031{j}
share variables, given C {C i0001
; 031} or
one of the k random variables in f
031{i}
is in fx 1
: : : x
m
g 000 fx
i
gg.
PrfC {C
m
; 031}g =
m
Y i=2
P
i :
Finally, for i > 1 we note that C
i0001
has at most {i 000 1}k distinct other variables. If C
i0001
is connected then the number of variables may be less than this. Thus,
P
i 024 1000
000 n000k{i0001}000m
k
001000
n0001
k
001 :
314Phase Transition in NK Landscapes
The combinatorial part reduces to {n 000 k{i 000 1} 000 m} : : : {n 000
k{i 000 1} 000 m 000 k + 1}{n 000 1} : : : {n 000 k} 025
022
n 000 ki 000 m + 1n 000 1
k
:
So, PrfC {C
m
; 031}g is
024 m
Y
i=2
1 000
022
n 000 ki 000 m + 1n 000 1
k
!
024
1 000
022
1 000
km + m 000 2n 000 1
k
!
m0001
2 O
022
1n
m0001
!
; m; k fixed. Noting that p
m
0
2 O
020 n
000m2
k
+1 021
, we see that the expected number of connected subgraphs of size m is bounded
by p
m
0
022
n
m
E
c
2 O
n
m
n
000m2
k
+1
022
1n
m0001
!
whichgoes to zero if m = 2
k +2. It follows that M{n; k; p} is less than 2 k
+2 with probability asymptotic to 1. This completes the proof.
4. Analysis of The Fixed Ratio Model
Ashas been discussed in the previous section, the uniform probability modelN
{n; k; p} of
NK landscapes is asymptotically trivial. Part of the cause of this asymptotic
triviality lies in
the fact that if the parameter p does not decrease very quickly with n,
then asymptotically
there willbe at least one local fitness function that takes the value 0
for all the possible
assignments, making the whole decision problem insoluble. In this section,
we study the
fixedratio model N {n; k; z}. In this model, we require that each local
fitness function has
fixed number of zero values so that the trivially insoluble situation in
the uniform probability model is avoided. We note that the same idea has
been used in the study of the flawless
CSP {Gent et al., 1998}.
Recall that in the fixed ratio model, we choose the neighborhood structure
for each local
fitness in the same way as in the uniform probability modelN {n; k; p}.
To determine the
fitness value for a local fitness function f
i
, we randomly without replacement select exactly
z tuples fs
1
; 001 001 001 ; s z
g from f0; 1g
k+1
, and let f
i
{s
j
}= 0 for each 1024 j 024 z and f
i
{s} = 1 for
every other s 2 f0; 1g
k+1
.
For the fixed ratio model, we are interested in how the probability of an
instance of
N {n; k; z} being soluble changes as the parameter z increases from 0 to
2 k+1
. It is easy to see that the property \There exists an assignment x such
that f {x} =
n
P
i=1
f i
{x
i ; 005{x
i }) = n"
315Gao & Culberson is monotone in the parameter z - the number of tuples
at which a local fitness function takes
zero. Actually, we have the following Lemma on the property of the solubilityprobability
of the fixed ratio model: Lemma 4.1. For the fixed ratio model, if z 1
> z
2 , then
P rfN {n; k; z
1
}is solubleg 024 P rfN {n; k; z 2
}is solubleg:
Furthermor e, we have P rfN {n; k; z}is solubleg =
032
1; if z0241;
0; if z=2
k+1
:
Based on the above Lemma and in parallel to the study of the threshold phenomena
in
other random combinatorial structures such as 3-Coloring of random graphs
and random 3-SAT, we suggest the following conjecture:
Conjecture 4.1. There exists a threshold z
c
such that lim
n!1
P rfN {n; k; z}is solubleg =
032 1; if z z
c
:
Conjectures like this are the starting point of the study of phase transition
in many
random combinatorial structures such as 3-coloring of random graphs and
random SAT,
but the existence of the thresholds is still an open question {Achlioptas,
1999; Cook &
Mitchell, 1997}. However, bounding the thresholds has been an important
topic in the
studyof phase transition {Achlioptas, 1999, 2001; Dubois, 2001; Franco &
Gelder, 1998; Franco & Paul, 1983; Frieze & Suen, 1996; Kirousis, P.Kranakis,
D.Krizanc, & Y.Stamation,
1994}. In this section, we will establish two upper bounds on the threshold
of the parameter
z
c
, and theoretically prove that random instances generated with the parameter
z above
theseupper bounds can be solved with probability asymptotic to 1 by polynomial
{even
linear} algorithms. Characterizing the sharpness of the thresholds is also
of great interest in the study of the phase transition. After proving that
every monotone graph property has a threshold behavior {Friedgut & Kalai,
1996}, Friedgut {1999} established a necessary and sufficient
condition for a monotone graph property to have sharp threshold, which has
been used to
provethe sharpness of the thresholds of 3-colorability and 3-SAT problems
{Friedgut, 1999; Achlioptas, 1999}. For the fixed ratio model discussed in
this paper, we suspect that it will
exhibit a coarse threshold behavior, and would like to leave a detailed
investigation into
this problem as a future research direction. 4.1 The Upper Bound of z =3:0
The derivation of this upper bound is based on the concept of a conflicting
pair of local fitness functions. We say that two local fitness functions
f i
and f
j conflict with each other
if
1. f
i
and f
j
share at least one variable x; and
316Phase Transition in NK Landscapes
2. For any assignment s 2 f0; 1g n
, we have f
i
{s}f
j
{s} = 0.
It is obvious that an instance of the NK decision problem is insoluble if
there exists a pair
of conflicting local fitness functions.
Based on the second moment method in the theory of probability {Alon & Spencer,
1992}, wecan prove the following upper bound result. As it takes linear
time to check if
there is a pair of conflicting local fitness functions, we can see that
the fixed ratio model
N {n; 2; z} is linearly solvable when z > 3:0.
Theorem 4.1. Define A to be the event that there is a conflicting pair of
loc al fitness
functions in N {n; 2; z}. For the fixed ratio model N {n; 2; z} with z =
3:0+ ", we have lim
n
P rfAg = 1
and thus the problem is insoluble with prob ability asymptotic to 1.
Proof: Without loss of generality, we may write f as where f
i
has 4 zeroes in its fitness
value assignment for 1024 i 024 "n, and 3 zeroes for "n + 1 024 i 024
n. Let I
ij
be the indicator
function of the event that f
i
and f
j
conflicts with each other, i.e.,
I
ij =
032
1; if f
i
and f
j
conflicts with each other;
0; else: and S =
P
1024i;j024"n
I
ij
. We claim that lim
n!1 P rfS = 0g = 0.
By Chebyschev's inequality, we have
P rfS = 0g 024 P rfjS 000 E {S }j 025 E {S }g
024
V ar{S }{E {S }
2 }
:
{4.3} Since for each 1 024 i 024 "n; f
i
has exactly 4 zeros in its fitness value assignment, we know
that two local fitness function f
i
; f
j
; 1 024 i; j 024 "n, conflict with each other if and only if
they have exactly one common variable x such that one of the following is
true: {1}f
i
{s} =
0{or 1}; f j
{s} = 1{or 0} for all the assignments s such that x = 1{respectively x =
0}; and
{2}f
i
{s} = 1{or 0}; f
j
{s} = 0{or 1} for all the assignments s such that x = 1{respectively x =
0};
Since the probability that two local fitness functions share at least one
variable is equal
to
1 000
000 n0002
2
001000
n0004 2
001000
n0001
2
001000
n0001
2
001
; we have
P rfI
ij
= 1g =
1 000 000
n0002 2
001000
n0004
2
001000
n0001 2
001000
n0001
2
001
! 001 2
1000
8 4
001
!
2
= 012{
1n
}; " > 0; 1 024 i; j 024 "n;
{4.4} 317Gao & Culberson and hence,
E {S } =
X
1024i;j024"n
E {I
ij
} =
X
1024i;j024"n
P rfI
ij
= 1g 2 012{n}:
We now consider the variance of S . Since S =
P
1024i;j024"n
I
ij , we have
V ar{S } =
P
i;j
V ar{I ij
} + 2
P {i;j}6={l;m}
[E fI
ij
I
lm
g 000 E fI
ij gE fI
lm g]{E {S })
2
:
Let
A
1
=
P
i;j V ar{I
ij }{E {S })
2
and
A
2
=
2
P
{i;j}6={l;m} [E fI
ij I
lm
g 000 E fI
ij gE fI
lm g]{E {S })
2
:
It is easy to see that lim
n!1
A
1
= 0. To prove lim
n!1
A 2
= 0, we consider two cases:
Case 1: i 6= j 6= m 6= l. In this case, the two random variables I
ij
and I
lm
are actually independent. It follows that E fI
ij
I lm
g 000 E fI
ij
gE fI
lm
g = 0.
Case 2: {i; j } 6= {l; m}, but they have one in common, say j = l. In this
case, we have
E fI
ij I
lm
g 000 E fI
ij
gE fI
lm
g = P rfI
ij = 1gP rfI jm
= 1jI ij
= 1g 000 012
022
1n
2
!
= 012 022
1n
P rfI
jm
= 1jI
ij
= 1g000 012
022 1n
2
!
Given that f
i
and f
j
conflict with each other, the conditional probability that f
j
and f m
conflictwith each other is still in 012{
1n
}.
Since there are only C
3
"n pairs of I
ij
and I
jm
satisfying the condition in Case 2, we know
that
P
{i;j}6={l;m}
[E fI
ij
I
lm
g 000 E fI
ij
gE fI
lm
g] is in 012{n}. And therefore, lim
n!1
A 2
= 0: It follows that
lim
n!1 P rfS = 0g 024 lim
n!1
V ar{S }{E {S }
2 }
= 0:
Since the event fS > 0g implies that there exists a conflicting pair of
local fitness functions,
the theorem follows.4.2 2-SAT Sub-problems in N {n; 2; z} and a Tighter
Upper Bound
In this subsection, we establish a tighter upper bound z >2:837 for the
threshold of the fixed
ratio model N {n; 2; z} by showing that asymptotically N {n; 2; z} contains
an unsatisfiable
2-SAT sub-problem with probability 1 for any value of z greater than 2.873.
This also gives us a polynomialtime algorithm which determines that N {n;
2; z} is insoluble with
probability asymptotic to 1 for z >2:837.
318Phase Transition in NK Landscapes
Recallfrom Section 2 that each instance of N {n; 2; z} has an equivalent
3-SAT instance.
The idea is to show that with probability asymptotic to 1, an instance of
N {n; 2; z} will
contain a set of specially structured 3-clauses, called a t-3-module {Definition
10.3, Franco
& Gelder, 1998}: M = fM
1 ; : : : ; M
3p+2
g; t = 3p+ 2;
where
M 1
= { 026u 1
_ u
2 _ z
1
; 026u
1
_ u
2
_ 026z
1
};
001 001 001
M
p0001
= { 026u
p0001
_ u
p
_ z p0001
; 026u
p0001
_ u
p
_ 026z
p0001 };
M
p = { 026u
p _ 026u
0 _ z
p
; 026u
p
_ 026u
0
_ 026z
p
}; M
p+1
= { 026u
p+1 _ u
p+2
_ z
p+1
; 026u
p+1
_ u
p+2
_ 026z
p+1 };
001 001 001 M
3p0001 = { 026u
3p0001
_ u 3p
_ z
3p0001
; 026u
3p0001 _ u
3p
_ 026z
3p0001
};
M 3p
= { 026u
3p
_ u
0
_ z
3p
; 026u 3p
_ u
0
_ 026z 3p
}
M 3p+1
= { 026u
0
_ u
1
_ z 3p+1
; 026u
0
_ u 1
_ 026z 3p+1
};
M
3p+2
= {u
0
_ u p+1
_ z
3p+2
; u
0
_ u
p+1
_ 026z
3p+2
};
and u
1
; 001 001 001 ; u
3p+1
; z
1
; 001 001 001 ; z
3p+1
are binary variables. Notice that a t-3-module can be re-
ducedto a 2-SAT problem containing two contradictory cycles and hence is
unsatisfiable.
The result is proved in two steps. In the first step, it is shown that for
z > 2:837 the average number of t-3-modules contained in N {n; 2; z} tends
to infinityas n increases. In
the second step, weuse a result established by Alon and Spencer {1992} on
the second
moment method to prove that for z > 2:837 the probability that N {n; 2;
z} contains at least
one t-3-module tends to 1.
Let us start with the first step to show that the average number of t-3-modules
contained
in N {n; 2; z} tends to infinity as n increases. Definition 4.1. Given a
t-3-module M and an NK landscap e instance f =
n P
i=1
f i
; k =2, a
sequenc e of loc al fitness functions
g = {g
1
; 001 001 001 ; g
t
} 032 {f
1
; 001 001 001 ; f
n
}
is said to be a possiblematch{PM} if for each 1 024 m 024 t, the main
variable of g
m
is one of the thre e variables that oc cur in the 3-module M m
. A subsequenc e {h
1
; 001 001 001 ; h
l } of a possible
match g is legal if for any 1 024 m < j 024 l; h
m
6= h j
.
Lemma 4.2. Let f {x} =
n
P
i=1
f
i
{x
i
; 005{x
i
}) be an instance of N {n; 2; z} and M be a t-3-
module. Then the number of possible matches for the t-3-module M is 3
t
. Further, the
number of legal possible matches is 002 020
{
3+ p52
}
t 021
.
319Gao & Culberson Proof: For each 1 024 m 024 t, there are exactly 3
possiblechoices for g
m
:
f
i 1
{x
i 1
; 005{x i
1
}); f i
2
{x i
2
; 005{x
i
2
}); f
i
3
{x
i
3
; 005{x
i
3 });
where x i
1
; x
i
2
; and x
i
3
correspond to the three variables that occur in the 3-module M
m
. Therefore, there are 3
t possible matches for the t-3-module. To prove the second conclusion, we
divide the t-3-module into 3 parts M = {M
1
; M
2
; M 3
},
where M 1
= {M
m ; 1 024 m 024 p}, M
2
= {M
m
; p + 1 024 m 024 3p 000 1}, and M
3 =
{M
3p ; M
3p+1 ; M
3p+2
}. Letting L
1 ; L
2
; and L
3
be the number of legal possible matches for
M 1
; M
2 ; M
3
respectively. Since the literals in M
1
are variable-distinct from the literals in
M
2
, we have that the number of legal possible matches, L, for the t-3-module
M satisfies L
1
L
2
024 L 024 27L
1
L
2 :
We now estimate the order of L
1
. To this end, we consider the probability space {012; P },
where 012 is the set of sequences {g 1
; 001 001 001 ; g
p
} of local fitness functions that possiblymatch M
1
and P is the uniform probability distribution. Then, the number of legal
possible matches
is
L
1 =j012j 001 P rfa random sample from 012 is legalg {4.5}
Let g = {g
1
; 001 001 001 ; g p
} be a random sample from 012 and x
g
m denote the main variable of the local fitness function g
m
, then we have P rfx
g m
= ju
m
jg = P rfx
g
m
= ju
m+1
jg = P rfx
g m
= jz
m
jg =
13
;
where juj denotes the variable corresponding to the literal u.
Let B
m ; 0 < m 024 p be the event that the first m local fitness functions g
1
; 001 001 001 ; g
m
in the possible match g = {g
1
; 001 001 001 ; g
p
} are mutually distinct. Since in M
1
only consecutive
3-modules share variables, we have
B
m
= f{g
1
; 001 001 001 ; g
m
} : g
i 6= g
i+1 ; 1 024 i 024 m 000 1g:
Let b
m
= P rfg
m
6= g
m0001
j B
m0001
g; m 025 2, and b 1
=1. Notice that B
1
= 012. Then, we have
P rfg = {g
1 ; 001 001 001 ; g p
}is legalg = P rfB
p
g
= P rfg 1
6= g
2
; g
2 6= g
3
; 001 001 001 ; g p0001
6= g
p
g
= P rfB
1
gP rfg
2
6= g
1
j B
1
g 001 P rfg
3
6= g
2
j B
2
g 001 001 001 Prfg
p
6= g
p0001 j B
p0001 g
= b
1 b
2
001 001 001 b
p
{4.6}
Recalling that x
g
m
denotes the main variable of the local fitness function g m
, we have b
p
= P rfg
p0001 6= g
p
; x
g
p0001
= ju
p j j B
p0001 g + P rfg p0001
6= g
p
; x g
p0001
6= ju
p j j B
p0001 g
= P rfg
p0001
6= g
p
j B p0001
; x g
p0001
= ju
p
jg 001 P rfx
g
p0001
= ju
p
j j B
p0001
g+
P rfg
p0001
6= g p
j B
p0001
; x
g p0001
6= ju
p
jg001 P rfx
g
p0001
6= ju
p
j j B p0001
g
=
23
a
p
+ {1 000 a
p
}
= 1 000
13
a
p
; {4.7}
320Phase Transition in NK Landscapes
where a
p
= P rfx
g
p0001
= ju p
j j B
p0001
g. For a
p
, we have a
p
=
P rfB
p0001 ; x
g
p0001
= ju
p jgP rfB
p0001
g
=
1P rfB
p0001
g
{P rfB
p0001
; x g
p0001
= ju
p
j; x
g
p0002
= ju
p0001
jg
+P rfB
p0001 ; x
g
p0001
= ju
p j; x
g
p0002
6= ju p0001
jg} =
1P rfB
p0001 g
{P rfx g
p0001
= ju
p
j j B
p0001 ; x
g
p0002
= ju p0001
jg001 P rfB
p0001
; x
g
p0002
= ju
p0001
jg + P rfx
g
p0001
= ju
p
j j B
p0001
; x
g
p0002 6= ju
p0001
jg001 P rfB
p0001
; x
g
p0002
6= ju
p0001
jg} =
1P rfB
p0001 g
022
12
P rfB
p0001
; x g
p0002
= ju
p0001
jg +
13
P rfB p0001
; x g
p0002
6= ju
p0001
jg
{4.8}
The last equation in the above formula is because that given B
p0001
and x
g
p0002 = ju
p0001
j
{or x g
p0002
6= ju
p0001
j}, we have two {three, respectively} choices in selecting the local fitness
function g p0001
. Consider the two terms P rfB p0001
; x g
p0002
= ju
p0001
jg and P rfB
p0001
; x g
p0002
6=
ju
p0001
jg in {4.8}, we have
P rfB
p0001
; x
g p0002
= ju
p0001
jg = P rfg
p0002
6= g p0001
j B p0002
; x g
p0002
= ju
p0001
jg001 P rfB
p0002
; x g
p0002
= ju
p0001
jg
=
23
Prfx g
p0002
= ju
p0001 j j B
p0002 g 001 P rfB p0002
g
=
23
a
p0001
001 P rfB
p0002
g
{4.9}
and
P rfB
p0001
; x
g p0002
6= ju
p0001 jg
= P rfg p0002
6= g
p0001
j B
p0002
; x
g
p0002
6= ju
p0001
jg001 P rfB
p0002 ; x
g
p0002
6= ju
p0001
jg
= P rfx
g p0002
6= ju
p0001 j j B
p0002 g 001 P rfB p0002
g
= {1000 a
p0001
} 001 P rfB
p0002
g {4.10}
By plugging {4.9} and{4.10} into {4.8}, we get a
p
=
P rfB
p0002
gP rfB
p0001 g
022
13
a
p0001
+
13
{1 000 a
p0001
}
=
13b
p0001
:
This, together with {4.7}, gives us
b
p
= 1000 19b
p0001
: {4.11} It is not difficult to show that the sequence fb
p
g is decreasing and lower bounded by 0.
Letting lim
p
b
p
= b and taking the limit on both sides, we get
b = 1000
19b
; {4.12}
321Gao & Culberson and thus, b =
3006
p56
. In our case, b =
3+
p56 since b
1
= 1. It follows that b
p
025 b =
3+ p56
and
thus, b
1
001 001 001 b
p
025
3 +
p56
! p
:
From {4.5}, we know that the number of legal possible matches is greater
than 3
p
3 +
p56
! p
=
3 +
p52
! p
: {4.13}
To prove that the expected number of legal possible matches L 1
for M
1 is in 002
020020 3+
p52
021 p
021
, let ff
p
= b
p
000
3+
p56
= b p
000 b. From {4.11} and{4.12}, we have ff
p
= b
p
000 b =
b
p0001
000 b9bb
p0001
024 dff
p0001
; 0 < d < 1;
which means that the series p
P
m=1 ff
m
is convergent. It follows that
{1 +
ff
1b
} 001 001 001 {1 +
ff
pb
}
converges to a finite positive constant c. Therefore,
b
1
001 001 001 b p
= {b + ff
1
} 001 001 001 {b + ff
p
}
= b
p 020
1+
ff 1b
021
001 001 001
020
1 +
ff pb
021
024 c
3+
p56
!
p
{4.14}
for sufficient large p and some constant c.
Similarly, we can show that the number of legal possible matches L
2
for M 2
is in
002 022
020
3+
p52
021
2p+2
. Recalling that the number of legal possible matches L for the t-3-module
satisfies L
1
L
2 024 L 024 27L 1
L
2
, the second conclusion follows.The following Lemma calculates the probability
that a matching local fitness function
implies the matched 3-module. Lemma 4.3. Given a 3-module x _ y _w; x _
y _ 026w, and a loc al fitness function g such
that the main variable x
g
of g is one of the thre e Boole an variables jxj; jyj; jwj, let z = 2 +
ff; 0 024 ff 024 1 be the par ameter in the fixed ratio model N {n; 2;
z}. Then the prob ability that g contains the 3-module is
p
0 =
1000
n0001
2
001
! 022
128
{1 000 ff} + 656
ff
{4.15}
322Phase Transition in NK Landscapes
Proof: Since x g
is already one of the variables in the 3-module, the probabilitythat
the other two variables are also in the 3-module is
1{
n0001
2
}
.
Now, assume that the variables of the local fitness function g are the same
as the
variables in the 3-module. From the definition of the fixed ratio model,
g has two zeros
inits fitness value assignment with probability {1 000 ff}, and has three
zeros in its fitness
assignment with probability ff. Note that the local fitness function g implies
the 3-module x _ y _ w; x _ y _ 026w ifand only if
g{ 026x; 026y; 026w} = 0 and g{ 026x; 026y; w} = 0:
From the definition of the fixed ratio model, this happens with the probability
1000
8
2
001
{1 000 ff} +
6000 8
3
001
ff
The Lemma follows.With the above preparation, we can now prove that the
average number of t-3-modules
contained in N {n; 2; z} tends to infinity.
Theorem 4.2. Let A
t be the number of t-3-modules containe d in N {n; 2; z} and t = 002{ln
2
n}.
Then, if z = 2 + ff > 2:837, lim
n!1
E fA
t
g = 1: {4.16}
Proof: From Lemma 4.2, there are more than {
3+
p52 }
t
legal possible matches for a fixed
t-3-module. From Lemma 4.3, we know that each possible legal match g = fg
1
; 001 001 001 ; g
t
g
implies the t-3-module with probability p
t
0
. From the proof of Theorem 10.1 in {Franco &
Gelder, 1998}, there are 2
t0002
n
t0001{n 000 t + 1}
t
{4.17} possible t-3-modules, where n t0001=
n!{n000t+1}!
. Let r =
000 128
{1 000 ff} +
656
ff 001
, and write p 0
=
1{
n0001
2
}
r. We have
E fA
t
g =
3+
p52 p
0
!
t
001 2
t0002
n
t0001{n 000 t + 1}
t
=
3 +
p52 r
!
t
001 2
t0002 n
t0001{n 000 t + 1}
t
001 1000 n0001
2
001
t
=
14{n 000 t + 1}
3 + p52
r
! t
001
2 t
n
t{n 000 t + 1}
t000
n
2 001
t
000
n
2
001000
n0001
2
001 !
t
=
14{n 000 t + 1}
3 + p52
r
! t
001
4 t
n
t{n 000 t + 1}
t{n{n 000 1}) t
022
nn 000 2
t
=
14n
{2{3 +
p5}r}
t
{1 000 O
022
t
2n
}; {4.18}
323Gao & Culberson where the fourth equation in {4.18} is due to the fact
that for any positive integer n and q
such that q < n2
, we have n
q e
000q
2 =2n
024 n q024 n q
. It follows that lim
n!1
E fA
t
g = 1 if
2{3 +
p5}r > 1: {4.19}
Solving the inequality {4.19} gives us ff > 0:837, that is, z = 2 + ff >
2:837. This proves Theorem 4.2.Based on the Chebychev's inequality, to prove
that N {n; 2; z} contains t-3-modules with
probability 1, we need to show that the variance of A
t
, the number of contained t-3-modules, is o{E fA
t
g}. For this purpose, we follow Franco and Gelder's approach {Lemma 4.1,
Franco
& Gelder, 1998} to apply the second moment method {Alon & Spencer, 1992}:
Lemma 4.4. {Alon & Spenc er, 1992, Ch. 4.3 Cor 3.5} Given a random structure{e.g.,
a
random CNF formula}, let W be the set of substructures under consider ation,
A{w} be the
set of substructures sharing some clauses with w 2 W . Let I
w
=1 when w is in the random
structur e and 0 otherwise. If
{1} elements of W are symmetric;
{2} 026 = E f
P w2W
I
w
g ! 1; and {3}
P
026w2A{w}
P r{ 026w j w} = o{026}, for each w 2 W ,
then as n ! 1, the prob ability that the random structure contains a substructure
tends to
1.
To use the above Lemma to study the 2-SAT sub-problem in NK landscapes,
we view the random structure to be a random instance of N {n; 2; z}, andW
to be the set of all
t-3-modules which are symmetric by their definitions{Sections 5 and 10,
Franco & Gelder,
1998}.
Theorem 4.3. If z=2+ ff > 2:837, then N {n; 2; z} is asymptotical ly insoluble
with prob-
ability 1.
Proof: Let A
t
be the number of t-3-modules implied by N {n; 2; z} and t = O{ln
2 n}.
Theorem 4.2 shows that lim
n!1
E fA
t
g = 1. By Lemma 4.4, it is enough to show that for
eachw 2 W ,
X
026w2A{w}
P r{ 026w j w} = o{E fA
t g}; {4.20}
where P r{ 026w j w} is the conditional probability that N {n; 2; z} implies
the t-3-module 026w given that it implies w, and A{w} is the set of all
t-3-modules sharing some clauses with w.
Suppose that 026w shares Q; 1 024 Q 024 2t clauses with w, and that these
Q clauses are distributed among q 3-modules. Further, let q
1 be the number of 3-modules whose two
clauses are both shared and q
2
= q 000 q
1
the number of 3-modules that only has one clause shared.
Let T
1 be a 3-module in 026w that shares exactly one clause with a 3-module
T
2
in w.
We claim that the conditional probability that T
1
is implied by N {n; 2; z} given that w is
324Phase Transition in NK Landscapes
impliedby N {n; 2; z}, is
16
ff + O{
1n
}: {4.21} Without loss of generality, assume that T
2
= fx_y _u; x_y _ 026ug and T
1
= fx_y _u; 026x_y _ 026ug. Since w is implied by N {n; 2; z}, there is
a local fitness function g =g{jxj; jyj; juj} that implies T
2
. The conditional probability that T
1
is implied, is less than or equal to P
1
+ P 2
where
P
1
is the conditional probability that g alsoimplies the clause 026x _ y _
026u given that g implies
T 2
, and P
2 is the conditional probability that the clause 026x _ y _ 026u is implied
by other local fitness functions. By the definition of N {n; 2; z}, wehave
that P
1 =
16
ff. Since a local fitness
function implies 026x _ y _ 026u only if it has the same variables with
g =g{jxj; jyj; juj}, we have
that P 2
= O{
1n
}. The claim is proved. It follows that, for sufficiently large n,
P rf 026w j wg 024 c
3+
p52
p 0
!
t000q
001 1
q 1
001
022 16
ff
q
2
{4.22}
where p
0
is defined in Lemma 4.3 and c is a fixed constant. Let A
Q;q;q
2 {w} be the set of t-3-modules that share Q clauses with w such that these
Q
clauses are distributed over q different 3-modules. As before, q
1
is the number of 3-modules
whose two clauses are both shared and q
2
= q 000q
1
the number of 3-modules that only has one clause shared. We claim that jA
Q;q;q
2
{w}j = jA
2q;q;0 {w}j6
q
2
: {4.23} where A
2q;q;0
{w} is the set of t-3-modules that share all the 2q clauses in the q 3-modules
with w. LetM = fM
1
; 001 001 001 ;M
t
g be a t-3-module in which all the clausesM
i
; 1 024 i 024 q
are shared with w. Let M = fM
1 ; 001 001 001 ; M t
g be a t-3-module in which all the clauses in
M i
; 1 024 i 024 q
1
are shared and each of the 3-modules M
i
; q
1 + 1 024 q
1 + q
2
has only one
clause shared. Since for each of the q
2 3-modules, we have 6 ways to choose the non-shared
clauses, there are 6 q
2
such t-3-modules M in A
Q;q;q 2
{w} that correspond to one t-3-moduleM in A
2q;q;0
. The claims follow. From formula {55} and {56} in {Franco & Gelder, 1998}
and {4.23}, it follows that
jA
Q;q;q
2
{w}j <
{
O{t}n
2
2
t000q
n
2{t000q}
6
q
2
; q 024 p + 1;
O{1}n
2
t000q
n
2{t000q} 6
q
2
; q > p + 1:
{4.24}
Let r =
000
128
{1 000 ff} + 656
ff
001
, and write p
0
=
1{
n0001
2
}
r. Then, we have jA
Q;q;q
2
{w}jP rf 026w j wg
024
O{t}n
2
2
t000q
n
2{t000q}
6
q
2
{
3 +
p52
p
0 }
t000q
{
16 ff}
q
2 024
O{t}n
2 2
t000q
n
2{t000q}
3 +
p52 r
!
t000q
1000
n0001
2 001
t000q 024
O{t}n
14n
{2{3 + p5}r}) t000q
024 O{t}n
E fA t
g{2{3 + p5}r}) 000q
; q 024 p + 3
{4.25}
325Gao & Culberson and
jA
Q;q;q 2
{w}jP rf 026w j wg 024 O{1}E fA t
g{2{3 +
p5}r}) 000q
; q > p + 3: {4.26}
Therefore, X
026w2A{w}
P r{ 026w j w} =
X
Q;q;q
2
jA
Q;q;q
2 {w}jP rf 026w j wg
=
t
X
Q=1
X
q024p+3
X
q
2 O{t}n
E fA
t
g{2{3 +
p5}r}) 000q
+
t X
Q=1
X q>p+3
X
q
2
O{1}E fA
t
g{2{3 +
p5}r})
000q
: {4.27}
Since 2{3 + p5}r} > 1 for z > 2:837, we have
X
026w2A{w}
P r{ 026w j w} 024
O{t 4
}n
E fA
t
g + t
3 E fA
t
g{4r}
000{p+3}
= o{E fA
t
g}:
{4.28}
This completes the proof of Theorem 4.3.5. Experiments
Our study of the threshold phenomena in NK landscapes started with an experimental
investigation. Many of the theoretical results in the previous section are
motivated by
the observations made in our experiments. In this section, we describe the
approach and
methods we used in the experimental study, and report the results and observations
we
have made. In our experiments, an instance of the NK landscape decision
problem is converted to an
equivalent 3-SAT problem, and then the 3-SAT problem is solved using Roberto's
relsat|an
enhanced version of the famous Davis-Putnam algorithm for SAT problems implemented
in
C ++
. The source code of relsat can be found at http://www.cs.ubc.ca/ hoos/SATLIB/.
In the experiments, we generated random instances of the NK landscape decision
prob-
lem from the random model N {n; 2; z}. As a result, the equivalent SAT problem
for each
random NK landscape instance is a 3-SAT problem with n variables and {on
average} zn
clauses. By definition, the parameter z is between 0 and 8. For z 024 1,
the 3-SAT instance
can be solved easily by setting the literals that correspond to the main
variables of the
local fitness function to true. As z increases,we get more and more clauses
and the 3-SAT
problem becomes more and more constrained. The aims of the experiments are
three-
fold:{1}Investigating if there exists a threshold phenomenon in the random
NK landscape
model;{2} Locating the threshold of the parameter z; and {3}Determining
if there are any
hard instances around the threshold.
5.1 Experiments on the Fixed Ratio Model
In this part of the experiments, we generate 100 random instances of N {n;
2; z} for each of the parameters n = 2
9
001 001 001 2 16
and z =2:71; 2:72; 001 001 001 ; 3:00. These instances are then
326Phase Transition in NK Landscapes
2.72.752.82.852.92.95300.10.20.30.40.50.60.70.80.91n=512 n=1024 n=2048 n=4096
n=8192 n=16384n=32768n=65536z=2.84 Figure 1: Fractions of insolubleinstances{Y-axis}
as a function of z {X-axis}.
convertedto 3-SAT instances and solved by relsat. Figure 1 shows the fraction
of insoluble
instances as a function of the parameter z. It can be seen that there exists
a threshold
phenomenon and the threshold is around 2.83. This shows that our upper bound
z = 2:837
is very tight.
In Figure 2, we plot the square root of the average search cost as a function
of the
parametern. The figure indicates that the average search is in O{n 2
} for any parameter z. We have also observed that more than 99 percent of
the insoluble instances are solved
quickly in the preprocessing stage of relsat. This indicates that there
must be some \small"
structures that make the instances insoluble. More detailed experimental
results can be
found in Gao's thesis {Gao, 2001}.
5.2 Experiments on the 2-SAT sub-Problem
This is the part of the experiments that motivated our theoretical analyses
in Section 4.2.
The idea can be explained as follows. Let
f {x} =
n
X
i=1
f
i
{x
i
; 005{x
i
})
be an instance of the decision problem of NK landscape and
' = C
1 ^
C
2
001 001 001
^
C
n
the equivalent 3-SAT problem where C i
is the set of 3-clauses equivalent to the local fitness
function f
i
. For each i, there is a set of 2-clauses D
i
{possibly empty} implied by C
i . For
327Gao & Culberson01234567x 10405101520253035z=2.71z=2.80z=2.84z=2.86z=2.90Figure
2: Square root of the average search cost {Y-axis, in seconds} as a function
of n {X-axis}.
2.72.752.82.852.92.95300.10.20.30.40.50.60.70.80.91n=512 n=1024 n=2048 n=4096
n=8192 n=16384n=32768n=65536z=2.84 Figure 3: Fractions of insoluble instances{Y-axis}
as a function of z {X-axis}for 2-SAT
sub-problems. example, if C
i
has three 3-clauses {(x; y; z}; {x; 026y ; z}; {x; y; 026z }), then
the set of 2-clauses D
i
would be {(x; z}; {x; y}). The conjunction of D
i
, denoted by 026', is a 2-SAT problem. It is
328Phase Transition in NK Landscapes
01234567x 1040510152025303540z=2.71z=2.80z=2.84z=2.86z=2.90Figure 4: Square
root of the average search cost {Y-axis, in seconds} as a function of n {X-axis}
for 2-SAT sub-problems. obviousthat the original 3-SAT problem ' is satisfiable
only if the 2-SAT sub-problem 026' is satisfiable. In the experiment, we
generate instances of the NK landscape N {n; 2; z}, convert
them to the equivalent 3-SAT problems, and extract the 2-SAT sub-problems.
These 2-SAT
problems are then solved by the relsat solver. If the 2-SAT problem is unsatisfiable,
then
the original NK landscape instance is also insoluble. The experimental settings
are the same as those in the experiment on the original problem. The results
are shown in Figures 3-4, inparallel to the Figures 1-2 of the results
on the original 3-SAT problems in Section 5.1. We see that the patterns
of insoluble fractions
and search cost are similar to those we found in the original 3-SAT problems.
There is a
soluble-insoluble phase transition occurring around 2.83, but the fraction
of unsatisfiable instances is lower than the fraction in the original 3-SAT
problems.
We also observed that the average search cost for the 2-SAT sub-problemsremains
the
same as that for the original 3-SAT problems. This tells us that the difficulty
of solving a
soluble instance of NK landscape is almost the same as that of solving a
2-SAT problem, and hence is easy. Therefore, on average the NK landscape
N {n; 2; z} is also easy at parameters
below the threshold where almost all of the instances are soluble.
6. Implications and Conclusions One of the questions that arises about this
work is its implications to the design and anal-
ysis of genetic algorithms. NK landscapes were initiallyconceived as simplified
models
ofevolutionary landscapes which could be tuned with respect to ruggedness
and epistatic
interactions {Kauffman, 1989}. In the study of genetic algorithms, NK landscape
models
329Gao & Culberson have been used as a prototype and benchmark in the analysis
of the performance of dif-
ferent genetic operators and the effects of different encoding methods on
the algorithm's
performance {Altenberg, 1997; Hordijk, 1997; Jones, 1995}. Kauffman {1993}
points out
that the parameters that primarilyaffect a number of ruggedness measures
are n and k.
Nevertheless, the fact that for k 025 2 the discrete NK landscape is NP-complete
{Wright
et al., 2000} when the neighbors are arbitrarily chosen could be construed
as implyingthat random landscapes with fixed k are in practice hard.
The results in this paper should serve as a cautionary note that this may
not be the
case. Our analyses show that for fixed k theuniform probability model is
trivially solvable
as the problem size tends to infinity. For the fixed ratio model, we have
derived two upper bounds for the threshold of the solubility phase transition,
and proved that the problem with the control parameter above the upper bounds
can be solved in polynomial time with
probability asymptotic to 1 due to the existence of easy sub-problems such
as 2-SAT. A
series of experiments has also been conducted to investigate the hardness
of the problem
with the control parameters around and below the threshold. From the experiments,
we
have observed that the problem is also easy around and below the threshold.
Ourproofs hold only for the decision version of the problem where the component
func-
tions are discrete on f0; 1g. The proofs are obtained by noticing that the
clustering of functions, or clauses, on selected subsets of variables implies
that the overall problem is decomposable into independent subproblems, or
that the problem contains small substruc- tures that identify the solution.
The subproblems are the components of the connection
graph defined in Section 3 and the 2-SAT sub-problems studied in Section
4.2. It is cur-
rently unclear to us to what extent our analysis can be extended to the
optimization version
of the NK model, and we would like to study this problem further in the
future.
In response to the question `what are the implicationsfor GAs?' we suggest
the following
speculative line of enquiry. For the discrete model we use, the soluble
instances are readily
solved by a standard algorithmic approach based on recognizing the components
of the connection graph. {This should not be a surprise for us as it has
been pointed out by
Heckendorn, Rana, and Whitley {1999} that `Even relatively old algorithms
such as Davis-
Putnam which are deterministic and exact are orders of magnitude faster
than GAs'.}
1
A
similar connectivity can be developed for real valued distributions, for
example by capping the minimum value which we allow a sub-functionto take.
We can speculate that the
clustering imposed by fixed values of k wouldalso generate localized structures
when real
values are applied and when considering optimization instead of decision,
but perhaps with
fuzzy boundaries. In fact, this observation is just the flip side of limited
epistasis. Genetic algorithms, or their variants such as the probabilisticmodel-building
algorithms {Larranaga
& Lozano, 2001}, designed to mimic natural evolution, are supposed to take
advantage of
this situation. So, to the extent that NK landscapes are an accurate reflection
of the
features exploited by evolutionary algorithms, we pose the following question.
Is it possible to identify these fuzzy components if they exist, and in doing
so design an algorithm that
exploits the same landscape features that the evolutionary algorithms do,
but far more efficiently, as we have done for the uniform discrete decision
problem?1. We thanks an anonymous referee for pointing out to us the work
of Heckendorn, et al. {Heckendorn
et al., 1999}
330Phase Transition in NK Landscapes
These landscapes were designed with the intent of studying limited interactions,
and our
results can also be seen as a confirmation that indeed limited epistasis
leads to easier prob-
lems. In another domain, that of the more traditional research into search
and optimization,
there is a need for test bed problems with real world connections which
are tunable with
respect to difficulty. NK landscapes might have been such a domain for generating
3-SAT instances. It is disappointing that for restricted k theinstances generated
are easy with
high probability. Acknowledgments
This research supported in part by Natural Sciences and Engineering Research
Council
Grant No. OGP8053. We thank the anonymous reviewers for their comments.
References
Achlioptas, D. {1999}. Threshold Phenomena in Random Graph Colouring and
Satisfiabil- ity. Ph.D. thesis, Department of Computer Science, University
of Toronto, Toronton,
Canada.
Achlioptas, D. {2001}. A survey of lower bounds for random 3-sat via differential
equations. Theor etic al Computer Science, 265, 159{185.
Achlioptas, D., Kirousis, L., Kranakis, E., Krizanc, D., & Molloy, M. {1997}.
Random
constraint satisfaction: A more accurate picture. In Pro c e e dings of
CP97, pp. 107{
120.Springer.
Alon, N., & Spencer, J. {1992}. The Prob abilistic Method. Wiley, New York.
Altenberg, L. {1997}. Nk fitness landscapes. In Back, T., Fogel, D., & Michalewicz,
Z.
{Eds.}, Handbo ok of Evolutionary Computation. Oxford University Press,
New York.
Cheeseman, P., Kanefsky, B., & Taylor, W. {1991}. Where the re al ly hard
problems are. In
Pro c e e dings of the 12th International Joint Conferenc e on Artificial
Intel ligence, pp.
331{337. Morgan Kaufmann. Cook, S., & Mitchell, D. {1997}. Finding hard
instances of the satisfiability problem: A
survey. In Du, Gu, & Pardalos {Eds.}, Satisfiability Problem: Theory and
Applications,
V ol. 35 of DIMACS Series in Discrete Mathematics and Theor etic al Computer
Science. American Mathematical Society.
Culberson, J., & Gent, I. {2001}. Frozen development in graph coloring.
Theor etic al Com-
puter Science, 265 {1-2}, 227{264. Dubois, O. {2001}. Upper bounds on the
satisfiability threshold. Theor etic al Computer
Science, 265 {1-2}, 187{197. Franco, J., & Gelder, A. {1998}. A perspective
on certain polynomial time solvable classes
of satisfiability. Discrete Applied Mathematics, to appe ar.
F ranco, J., & Paul, M. {1983}. Probabilistic analysis of the davis-putnam
procedure for solving satisfiability. Discrete Applied Mathematics, 5, 77{87.
331Gao & Culberson Freeman, J. {1996}. Hard random 3-sat problems and the
davis-putman procedure. Artificial
Intel ligence, 81, 183{198.
Friedgut, E. {1999}. Sharp thresholds of graph properties and the k-sat
problem. J. Amer.
Math. Soc., 1017{1054. Friedgut, E., & Kalai, G. {1996}. Every monotone
graph property has a sharp threshold.
Pro c. Amer. Math. Soc., 2993{3002.
Frieze, A., & Suen, S. {1996}. Analysis of two simple heuristics on a random
instance of k-sat. J. of Algorithm, 20 {2}, 312{355.
Gao, Y. {2001}. Threshold phenomena in NK landscapes. Master's thesis, Department
of
Computing Science, University of Alberta, Edmonton, Alberta, Canada.
Gent, I., MacIntyre, I., Prosser, P.and Smith, B., & Walsh, T. {1998}. Random
constraint satisfaction: Flaws and structure. Tech. rep. APES-08-1998, APES
Research Group.
Heckendorn, R., Rana, S., & Whitley, D. {1999}. Polynomial time summary
statistics for a
generalization of maxsat. In GECCO99: Pro c e e dings of the Genetic and
Evolutinary
Computation Conferenc e, pp. 281{288. Morgan Kaufmann.
Hordijk, W. {1997}. A measure of landscapes. Evolutionary Computation, 4
{4}, 335{360. Jones, T. {1995}. Evolutionary Algorithms, Fitness Landsc ap
es and Sear ch. Ph.D. thesis, University of New Mexico, Albuquerque, NM.
Kauffman, S. {1989}. Adaptation on rugged fitness landscapes. In Stein,
D. {Ed.}, Le ctur es in the Sciences of Complexity, Santa Fe Institute Studies
in the Sciences of Complexity,
pp. 527{618. Addison Wesley .
Kauffman, S. {1993}. The Origins of Order: Self-organization and Selection
in Evolution.
Oxford University Press, Inc.
Kirkpatrick, S., & Selman, B. {1994}. Critical behavior in the satisfiabilityof
random boolean expressions. Science, 264, 1297{1301.
Kirousis, L., P.Kranakis, D.Krizanc, & Y.Stamation {1994}. Approximating
the unsatis-
fiability threshold of random formulas. Random Structures and Algorithms,
12 {3},
253{269.
Larranaga, P., & Lozano, J. {2001}. Estimation of Distribution Algorithms:
A New To ol for
Evolutinary Computation. Kluwer Academic Publishers,New York.
Mitchell, D., Selman, B., & Levesque, H. {1992}. Hard and easy distributions
of sat problems.
In Pro c e e dings of the 10th Natl. Conf on Artificial Intel ligence, pp.
459{465. AAAI
Press.
Vandegriend, B., & Culberson, J. {1998}. The G
n;m
phase transition is not hard for the
Hamiltonian Cycle problem. Journal of Artificial Intel ligence Rese ar ch,
9, 219{245. Weinberger, E. D. {1996}. Np completeness of kauffman's NK model,
a tunable rugged
fitness landscape. Tech. rep. 96-02-003, Santa Fe Institute, Santa Fe.
W right, A. H., Thompson, R. K., & Zhang, J. {2000}. The computational complexity
of NK fitness functions. IEEE Tr ansactions on Evolutionary Computation,
4 {4}, 373{379.
332