An Analysis of Phase Transition in NK Landscapes

Y. Gao and J. Culberson

Volume 17, 2002

Links to Full Text:

Abstract:
In this paper, we analyze the decision version of the NK landscape model from the perspective of threshold phenomena and phase transitions under two random distributions, the uniform probability model and the fixed ratio model. For the uniform probability model, we prove that the phase transition is easy in the sense that there is a polynomial algorithm that can solve a random instance of the problem with the probability asymptotic to 1 as the problem size tends to infinity. For the fixed ratio model, we establish several upper bounds for the solubility threshold, and prove that random instances with parameters above these upper bounds can be solved polynomially. This, together with our empirical study for random instances generated below and in the phase transition region, suggests that the phase transition of the fixed ratio model is also easy.

Extracted Text

Journal of Artificial Intelligence Research 17 {2002} 309-332 Submitted 06/02;
published 10/02
 An Analysis of Phase Transition in NK Landscapes
 Yong Gao ygao@cs.ualbert a.ca
 Joseph Culberson joe@cs.ualbert a.ca
 Department of Computing Science
 University of Alberta
 Edmonton, Alberta, Canada, T6G 2H1
 Abstract In this paper, we analyze the decision version of the NK landscape
model from the perspective of threshold phenomena and phase transitions under
two random distributions, the uniform probability model and the fixed ratio
model. For the uniform probability
 model, we prove that the phase transition is easy in the sense that there
is a polynomial
 algorithm that can solve a random instance of the problem with the probability
asymptotic
 to 1 as the problem size tends to infinity. For the fixed ratio model, we
establish several
 upper bounds for the solubility threshold, and prove that random instances
with parameters
 above these upper bounds can be solved polynomially. This, together with
our empirical
 study for random instances generated below and in the phase transition region,
suggests thatthe phase transition of the fixed ratio model is also easy.
 1. Introduction
 The NK landscape is a fitness landscape model devised by Kauffman {1989}.
An appealing property of the NK landscape is that the \ruggedness" of the
landscape can be tuned
 bychanging some parameters. Over the years, theNK landscape model itself
has been
 studied from the perspectives of statistics and computational complexity
{Weinberger, 1996;
 Wright, Thompson,& Zhang, 2000}. In the study of genetic algorithms, NK
landscape
 models have been used as a prototype and benchmark in the analysis of the
performance of
 different genetic operators and the effects of different encoding methods
on the algorithm's
 performance {Altenberg, 1997; Hordijk,1997; Jones, 1995}. In the field of
combinatorial search and optimization, one of the interesting discoveries
 isthe threshold phenomena and phase transitions. Roughly speaking, a phase
transition in
 combinatorial search refers to the phenomenon that the probability that
a random instance
 of the problem has a solution drops abruptly from 1 to 0 as the order parameter
of the
 random model crosses a critical value called the threshold. Closely related
to this phase
 transition in solubility is the hardness of solving the problems. There
has been strong empir- ical evidence and theoretical arguments showing that
the hardest instances of the problems usually occur around the threshold
and instances generated with parameters far away from the threshold are relatively
easy. Since the seminal work of Cheeseman et al. {Cheese-
 man, Kanefsky, & Taylor, 1991}, many NP-complete combinatorial search problems
have beenshown to have the phase transition and the associated easy-hard-easy
pattern {Cook & Mitchell, 1997; Culberson & Gent, 2001; Freeman, 1996; Gent,
MacIntyre, Prosser, &
 c
 fl2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.Gao
& Culberson Walsh, 1998; Kirkpatrick & Selman, 1994; Mitchell, Selman, &
Levesque, 1992; Vandegriend
 & Culberson, 1998}. In this paper, we analyze the NK landscape model from
the perspective of threshold phenomena and phase transitions. We establish
two random models for the decision problem
 of NK landscapes and study the threshold phenomena and the associated hardness
of the
 phasetransitions in these two models.
 The rest of the paper is organized as follows. In Section 2, we introduce
the NK fitness
 landscape and our probabilistic models, the uniform probability model and
the fixed ratio
 model. In Section 3 and Section 4, the threshold phenomena and phase transitions
in
 NK landscapes are analyzed. For the uniform probability model, we prove
that the phase
 transition of the uniform probability model is easy in the sense that there
is a polynomial
 algorithm that can solve a random instance of the problem with the probability
asymptotic
 to 1 as the problem size tends to infinity. For the fixed ratio model, we
establish two upper
 bounds for the solubility threshold, and prove that random instances with
parameters above
 these upper bounds can be solved polynomially. This, together withour empirical
study for random instances generated below and in the phase transition region,
suggests that the phase transition of the fixed ratio model is also easy.
In Section 5, we report our
 experimental results on typical hardness of the fixed ratio model. In Section
6, we conclude
 our investigation and discuss implications of our results. 2. NK Landscapes
and their Probabilistic Models
 An NK landscape f {x} =
 n P
 i=1
 f i
 {x
 i ; 005{x
 i }); is a real-valued function defined on binary strings
 of fixed length, where n > 0is a positive integer and x = {x
 1
 ; 001 001 001 ; x
 n
 } 2 f0; 1g
 n . It is the sum
 of n local fitness functions f i
 ; 1 024 i 024 n. Each local fitness function f
 i
 {x
 i
 ; 005{x
 i
 }) depends on the main variable x i
 and its neighborhood 005{x
 i
 } 032 P k
 {fx
 1
 ; 001 001 001 ; x
 n
 gnfx i
 g} where P k
 {X }
 denotes the set of all subsets of size k from X . The most important parameters
of an NK
 landscape are the number of variables n, and the size of the neighborhood
k = j005{x
 i }j.
 In an NK landscape, the neighborhood 005{x i
 } can be chosen in two ways: the random
 neighborhood,where the k variables are randomly chosen from the set fx
 1
 ; 001 001 001 ; x
 n
 gnfx i
 g;
 and the adjacent neighborhood, where k variables with indices nearest to
i {modulo n} are
 chosen. For example, for any even integer k, the k variables in 005{x
 i
 } can be defined as
 x
 {(n+i000
 k2
 } mod n} ; 001 001 001 ; x {(n+i+
 k2
 } mod n}
 : Once the variables in the neighborhood are deter- mined, the local fitness
function f
 i
 is determined by a fitness lookup table which specifies the function value
f i
 for each of the 2 k+1
 possible assignments to the variables x
 i and 005{x
 i
 }.
 Throughout this paper, we consider NK landscapes with random neighborhoods.
To
 simplify the discussion, we further assume that the local fitness functions
take on binary
 values. Given an NK landscape f , the corresponding decision problem is
stated as follows: Is the maximum of f {x} equal to n? An NK landscape decision
problem is insoluble if there
 is no solution for it.
 It has been proved that the NK landscape model is NP complete for k 025
2{e.g.,
 Weinberger, 1996; Wright et al., 2000}. The proofs were based on a reduction
from SAT to
 the decision problem of NK landscapes. To study the typical hardness of
the NK landscape
 decision problems in the framework of thresholds and phase transitions,
we introduce two 310Phase Transition in NK Landscapes
 random models. In both of the models defined below, the neighborhood set
005{x
 i
 } of a variable x
 i
 is selected by randomly choosing without replacement k = j005{x
 i
 }j variables
 from xnfx i
 g.
 Definition 2.1. The Uniform Prob ability ModelN {n; k; p}: In this model,
the fitness value
 of the loc al fitness function f
 i
 {x
 i
 ; 005{x
 i
 }) is determined as fol lows: For each assignment y 2Dom{f
 i
 } = f0; 1g
 k+1
 , let f
 i
 {y} = 0 with the prob ability p and f
 i
 {y} = 1 with the
 prob ability 1 000 p, where this is done for each possible assignment and
each loc al fitness
 function independently.
 Definition 2.2. The Fixed Ratio Model N {n; k; z}: In this model, the par
ameter z takes
 on values from [0; 2
 k+1 ]. If z is an integer, we spe cify the loc al fitness function f
 i
 {x
 i
 ; 005{x
 i
 }) by randomly choosing without replac ement z tuplesof possible assignments
Y = {y
 1
 ; 001 001 001 ; y
 z
 } from Dom{f i
 } = f0; 1g
 k+1
 , and defining the loc al fitness function as fol lows:
 f
 i {y} =
 032 0; if y 2 Y ;
 1; else:
 For a non-integer z = {1 000 ff}[z] + ff[z +1] where [z] is the integer
part of z, we choose r andomly without replac ement [{1 000 ff}n] loc al
fitness functions and determine their fitness
 values acc or ding to N {n; k; [z]}. The rest of the loc al fitness functions
are determined ac-
 cor ding to N {n; k; [z] + 1}.
 In the theory of random graphs, there are two related random models G{n;
p} where
 each of the
 n{n0001}2 possible edges is included in the graph independently with probability
 p,and G{n; m} where exactly m edges are chosen randomly and without replacement
from
 the set of n{n0001}2
 possible edges. It is well known that for most of the monotone graph
 properties, results proved in G{n; p} {or G{n; m}) also hold asymptotically
for G{n; N p}
 {correspondingly, G{n;
 mN
 }) where N =
 n{n0001}2
 . However, wecannot expect that similar relations exist between the two
random models of NK landscapes defined above unless the
 parameter k tends to infinity. As a result, the asymptotic behaviors of
the two NK landscape
 models are significantly different for fixed k.
 We conclude this section by establishing a relation between the decision
problem of NK landscapes and the SAT problem. A decision problem of the NK
landscape f {x} =
 n
 X
 i=1 f
 i
 {x i
 ; 005{x i
 });
 \is the maximum of f {x} equal or greater than n?", can be reduced to a
{k+1}-SAT problem
 as follows:
 {1} For each local fitness function f
 i
 {x
 i
 ; 005{x
 i
 }), construct a conjunction C
 i =
 z
 V
 j=1
 C
 j i
 of clauses with exactly k + 1 variable-distinct literals from the set of
variables fx i
 ; 005{x i
 }g,
 where z is the number of zero values that f
 i takes and C
 j i
 is such that for any assignment
 y
 j
 2 f0; 1g
 k+1
 that falsifies C j
 i
 , we have f
 i
 {y
 j
 } = 0.
 {2} The {k+1}-SAT is the conjunction ' =
 n
 V
 i=1
 C
 i .
 311Gao & Culbersonxyzf
 iClauses0000x _ y _z001101010110x _ 026y _ 026z10011010026 x _ y _ 026z1100026
x _ 026y _z1111Table 1: A local fitness function and its equivalent 3-clauses.
 Table 1 shows an example of the fitness assignment of a local fitness function
f
 i
 = f
 i
 {x; y; z}
 and its associated equivalent 3-SAT clauses. It is easy to see that for
any assignment s to
 the variables x; y; z, f
 i
 {s} = 1 if and only if the assignment satisfies the formula x _ y _ z; x
_ 026y _ 026z; 026x _ y _ 026z; 026x _ 026y _ z: 3. Analysis of The
Uniform Probability Model
 In the uniform probability modelN {n; k; p}, the parameter p determines
how many zero
 values a local fitness function can take. We are interested in how the solubility
and hardness
 of the NK landscape decision problem change as the parameter p increases
from 0 to 1. It
 turns out that for fixed p > 0, the decision problem is asymptotically trivially
insoluble.
 This is quite similar to the phenomena in the random models of the constraint
satisfaction
 problem observed by Achlioptas et al. {1997}.
 To gain some more insight into the problem, we consider the case where p
= p{n} is a
 function of the problem size n with lim
 n p{n}= 0. Our analysis shows that the solubility of the problem depends
on how fast p{n} decreases:
 {1} If lim
 n
 p{n}n
 12
 k+1
 = +1; {3.1}
 the problem is still asymptotically trivially insoluble because with the
probability asymp- totic to 1, there is at least one local fitness function
that always has a fitness value 0;
 {2} On the other hand if p{n} decreases fast enough, i.e., lim
 n
 p{n}n
 12
 k+1
 < +1; {3.2}
 the problem can be decomposed into a set of independent sub-problems. In
either case the problem can be solved in polynomial time. The case of {3.1}
is not difficult to prove, but
 to prove the case of {3.2}, we need to make use of the following concepts
and results.
 Definition 3.1. The connection graph of an NK landscap e instance f {x}
=
 n
 P
 i=1
 f i
 {x
 i ; 005{x
 i })
 is a graph G = G{V ; E } satisfying
 312Phase Transition in NK Landscapes
 {1} Each vertex v2V corr esp onds to a loc al fitness function; and
 {2}There is an edge betwe en v
 i
 ; v
 j
 if and only if the corr esp onding loc al fitness functions f
 i
 ; f
 j
 share variables, i.e., the neighborho o ds 005{x
 i } and 005{x
 j } of x
 i
 and x
 j
 have a non-empty intersection, and both of them have at least one zero value.
Definition 3.2. Let f {x} =
 n
 P
 i=1
 f
 i
 {x
 i
 ; 005{x
 i
 }) be an NK landscap e instance with the con-
 ne ction graph G = G{V ; E }. Let G
 1
 ; 001 001 001 ; G
 l
 be the conne cte d comp onents of G. Since the
 vertices of G corr esp ond toloc al fitness functions, we can re gar d G
 i as a set of loc al fitness functions. For each 1 024 i 024 l, let U
 i
 032 x = {x
 1
 ; 001 001 001 ; x
 n
 } be the set of variables that appe ar
 in the definition of the loc al fitness functions in G
 i
 . It is easy to see that {U
 1
 ; 001 001 001 U
 l
 } excluding independent vertices forms a disjoint partition of {a subset
of } the variables x = {x
 1 ; 001 001 001 ; x n
 }, and that the local fitness functions in G
 i only depend on the variables in U
 i
 . Furthermore, the NK decision problem is soluble if and
 only if for each 1 024 i 024 l, there is an assignment s
 i
 2f0; 1g
 jU i
 j
 to the variables in U
 i
 such
 that for each local fitness function g 2G
 i , g{s}= 1. Theorem 3.1 summarizes the result on the uniform probability
model.
 Theorem 3.1. For any p{n} such that lim
 n p{n}n
 12
 k+1
 exists, k fixed,ther e is a polynomial
 time algorithm that succ essful ly solves a random instance ofN {n; k; p}
with prob ability
 asymptotic to 1 as n tends to infinity.
 Proof: We consider two cases: lim
 n
 p{n}n 12
 k+1
 = +1 and lim n
 p{n}n
 12 k+1
 <+1.
 {1} The case of lim n
 p{n}n
 12 k+1
 = +1.
 Let A
 i
 be the event that f
 i
 {y} = 0 for each possibleassignment y 2f0; 1g
 k+1
 and let
 A =
 n
 S
 i=1
 A
 i
 be the event that at least one of the A
 i
 's occurs. We have
 lim
 n!1
 P rfAg = 1 000 lim
 n!1
 P rf
 n
   i=1 A
 c
 i
 g
 = 1 000 lim n!1
 {1 000 p{n}
 2
 k+1
 }
 n :
 It can be shown that if k is fixed and lim
 n
 p{n}n 12
 k+1
 = +1, then lim n!1
 P rfAg = 1. It follows that with probability asymptotic to one, there is
at least one local fitness function which
 takes on values 0 for any possibleassignments. We can therefore show that
in this case, the
 NK decision problem is insoluble by checking the local fitness functions
one by one. And
 this only takes lineartime.
 {2} The case of lim
 n p{n}n
 12
 k+1 < +1.
 Consider an algorithm that first finds the connected components G
 i
 ; 1 024 i 024 l of the connection graph G of the NK model, and then uses
brute force to find an assignment
 s
 i
 2 f0; 1g
 jU i
 j
 to the variables in U
 i
 such that for each local fitness function g 2G
 i
 , g{s} = 1. The time complexity of this algorithm is O{n
 2
 + n 003 2
 M{n;k;p}
 } where M{n; k; p} =
 max{jU
 i
 j; 1 024 i 024 l} is the maximum size of the subsets {U
 i ; 1 024 i 024 l} associated with the
 313Gao & Culberson connected components of the connection graph. To prove
the theorem, we only need to show
 that M{n; k; p} 2 O{log n}. In the following, we willshow that for lim
 n
 p{n}n
 12
 k+1
 < +1,
 we have lim
 n!1
 P rfM{n; k; p} 024 2
 k
 + 2g = 1
 Consider the connection graph G = G{V ; E } of the NK model. It is a random
graph and
 there is an edge between two nodes if and only if the two correspondinglocal
fitness functions
 share variables and both of the local fitness functions take at least one
zero as their fitness
 value. However, underthis definition the edge probabilitiesare not independent.
If vx 2 E then we know that f
 x
 has at least one zero and so the probability that xw is in E is greater
 than if there were no other edge on x.
 To deal with this we resort to the following proof construction. Let C
 m = fv
 1
 ; : : : ; v
 m
 g
 be a subset of V of size m. Let 031 be an ordering {permutation} of v
 1
 : : : v m
 . We say that C
 m
 is variable conne cte d with resp e ct to the ordering 031, denoted as
C {C
 m
 ; 031}, if for each
 i; 2 024 i 024 m there is either
 1. a j < i such that f
 031{j} and f
 031{i}
 share a variable; or 2. a j; 1 024 j 024 i such that the variable x
 j
 is one of the k random variables in f i
 .
 Lemma If the induce d subgr aph G[C
 m
 ] is conne cte d then there exists at least one ordering 031 ofv
 1 : : : v
 m
 such that C {C
 m
 ; 031}.
 As proof, consider the ordering of vertices of any depth first search of
a connected subgraph. In this case, the connections are all by case 1.
 The expected number of permutations 031 for which C {C
 m
 ; 031} is
 E
 c = E[jf031 : C {C
 m
 ; 031}gj] = m!PrfC {C
 m
 ; 031}g
 We then observe that the expected number of connected induced graphs on
m vertices is
 less than p
 m
 0
 000 n
 m
 001
 E
 c
 , where p 0
 is the probability that f
 i
 takes at least one value zero. We
 show this value goes to zero in the limit if m 025 2
 k
 + 2. Finally, since if there is a connected
 subgraph on m vertices then there must be one for each i< m, it follows
that the largest
 connected component has size at most 2
 k
 + 1.
 For a randomly generated permutation 031 of C
 m , let C
 i
 be the set of the first i vertices of the permutation. For i025 2 define
P
 i
 to be the probability that f
 031{i} shares at least one
 variable with f
 031{j} for some j < i given that C {C
 i0001 ; 031=1; 001 001 001 ; i 000 1}. Let P 1
 =1. {A one vertex subgraphis always connected.} For i > 1 we have P
 i
 =Prf9j < i; f
 031{i}
 and f
 031{j}
 share variables, given C {C i0001
 ; 031} or
 one of the k random variables in f
 031{i}
 is in fx 1
 : : : x
 m
 g 000 fx
 i
 gg.
 PrfC {C
 m
 ; 031}g =
 m
 Y i=2
 P
 i :
 Finally, for i > 1 we note that C
 i0001
 has at most {i 000 1}k distinct other variables. If C
 i0001
 is connected then the number of variables may be less than this. Thus,
 P
 i 024 1000
 000 n000k{i0001}000m
 k
 001000
 n0001
 k
 001 :
 314Phase Transition in NK Landscapes
 The combinatorial part reduces to {n 000 k{i 000 1} 000 m} : : : {n 000
k{i 000 1} 000 m 000 k + 1}{n 000 1} : : : {n 000 k} 025
 022
 n 000 ki 000 m + 1n 000 1
  k
 :
 So, PrfC {C
 m
 ; 031}g is
 024 m
 Y
 i=2  
 1 000
 022
 n 000 ki 000 m + 1n 000 1
 
 k
 !
 024
  
 1 000
 022
 1 000
 km + m 000 2n 000 1
 
 k
 !
 m0001
 2 O
  
 022
 1n
 
 m0001
 !
 ; m; k fixed. Noting that p
 m
 0
 2 O
 020 n
 000m2
 k
 +1 021
 , we see that the expected number of connected subgraphs of size m is bounded
by p
 m
 0
 022
 n
 m
 
 E
 c
 2 O
  
 n
 m
 n
 000m2
 k
 +1
 022
 1n
 
 m0001
 !
 whichgoes to zero if m = 2
 k +2. It follows that M{n; k; p} is less than 2 k
 +2 with probability asymptotic to 1. This completes the proof.
 4. Analysis of The Fixed Ratio Model
 Ashas been discussed in the previous section, the uniform probability modelN
{n; k; p} of
 NK landscapes is asymptotically trivial. Part of the cause of this asymptotic
triviality lies in
 the fact that if the parameter p does not decrease very quickly with n,
then asymptotically
 there willbe at least one local fitness function that takes the value 0
for all the possible
 assignments, making the whole decision problem insoluble. In this section,
we study the
 fixedratio model N {n; k; z}. In this model, we require that each local
fitness function has
 fixed number of zero values so that the trivially insoluble situation in
the uniform probability model is avoided. We note that the same idea has
been used in the study of the flawless
 CSP {Gent et al., 1998}.
 Recall that in the fixed ratio model, we choose the neighborhood structure
for each local
 fitness in the same way as in the uniform probability modelN {n; k; p}.
To determine the
 fitness value for a local fitness function f
 i
 , we randomly without replacement select exactly
 z tuples fs
 1
 ; 001 001 001 ; s z
 g from f0; 1g
 k+1
 , and let f
 i
 {s
 j
 }= 0 for each 1024 j 024 z and f
 i
 {s} = 1 for
 every other s 2 f0; 1g
 k+1
 .
 For the fixed ratio model, we are interested in how the probability of an
instance of
 N {n; k; z} being soluble changes as the parameter z increases from 0 to
2 k+1
 . It is easy to see that the property \There exists an assignment x such
that f {x} =
 n
 P
 i=1
 f i
 {x
 i ; 005{x
 i }) = n"
 315Gao & Culberson is monotone in the parameter z  -  the number of tuples
at which a local fitness function takes
 zero. Actually, we have the following Lemma on the property of the solubilityprobability
 of the fixed ratio model: Lemma 4.1. For the fixed ratio model, if z 1
 > z
 2 , then
 P rfN {n; k; z
 1
 }is solubleg 024 P rfN {n; k; z 2
 }is solubleg:
 Furthermor e, we have P rfN {n; k; z}is solubleg =
 032
 1; if z0241;
 0; if z=2
 k+1
 :
 Based on the above Lemma and in parallel to the study of the threshold phenomena
in
 other random combinatorial structures such as 3-Coloring of random graphs
and random 3-SAT, we suggest the following conjecture:
 Conjecture 4.1. There exists a threshold z
 c
 such that lim
 n!1
 P rfN {n; k; z}is solubleg =
 032 1; if z z
 c
 :
 Conjectures like this are the starting point of the study of phase transition
in many
 random combinatorial structures such as 3-coloring of random graphs and
random SAT,
 but the existence of the thresholds is still an open question {Achlioptas,
1999; Cook &
 Mitchell, 1997}. However, bounding the thresholds has been an important
topic in the
 studyof phase transition {Achlioptas, 1999, 2001; Dubois, 2001; Franco &
Gelder, 1998; Franco & Paul, 1983; Frieze & Suen, 1996; Kirousis, P.Kranakis,
D.Krizanc, & Y.Stamation,
 1994}. In this section, we will establish two upper bounds on the threshold
of the parameter
 z
 c
 , and theoretically prove that random instances generated with the parameter
z above
 theseupper bounds can be solved with probability asymptotic to 1 by polynomial
{even
 linear} algorithms. Characterizing the sharpness of the thresholds is also
of great interest in the study of the phase transition. After proving that
every monotone graph property has a threshold behavior {Friedgut & Kalai,
1996}, Friedgut {1999} established a necessary and sufficient
 condition for a monotone graph property to have sharp threshold, which has
been used to
 provethe sharpness of the thresholds of 3-colorability and 3-SAT problems
{Friedgut, 1999; Achlioptas, 1999}. For the fixed ratio model discussed in
this paper, we suspect that it will
 exhibit a coarse threshold behavior, and would like to leave a detailed
investigation into
 this problem as a future research direction. 4.1 The Upper Bound of z =3:0
 The derivation of this upper bound is based on the concept of a conflicting
pair of local fitness functions. We say that two local fitness functions
f i
 and f
 j conflict with each other
 if
 1. f
 i
 and f
 j
 share at least one variable x; and
 316Phase Transition in NK Landscapes
 2. For any assignment s 2 f0; 1g n
 , we have f
 i
 {s}f
 j
 {s} = 0.
 It is obvious that an instance of the NK decision problem is insoluble if
there exists a pair
 of conflicting local fitness functions.
 Based on the second moment method in the theory of probability {Alon & Spencer,
 1992}, wecan prove the following upper bound result. As it takes linear
time to check if
 there is a pair of conflicting local fitness functions, we can see that
the fixed ratio model
 N {n; 2; z} is linearly solvable when z > 3:0.
 Theorem 4.1. Define A to be the event that there is a conflicting pair of
loc al fitness
 functions in N {n; 2; z}. For the fixed ratio model N {n; 2; z} with z =
3:0+ ", we have lim
 n
 P rfAg = 1
 and thus the problem is insoluble with prob ability asymptotic to 1.
 Proof: Without loss of generality, we may write f as where f
 i
 has 4 zeroes in its fitness
 value assignment for 1024 i 024 "n, and 3 zeroes for "n + 1 024 i 024
n. Let I
 ij
 be the indicator
 function of the event that f
 i
 and f
 j
 conflicts with each other, i.e.,
 I
 ij =
 032
 1; if f
 i
 and f
 j
 conflicts with each other;
 0; else: and S =
 P
 1024i;j024"n
 I
 ij
 . We claim that lim
 n!1 P rfS = 0g = 0.
 By Chebyschev's inequality, we have
 P rfS = 0g 024 P rfjS 000 E {S }j 025 E {S }g
 024
 V ar{S }{E {S }
 2 }
 :
 {4.3} Since for each 1 024 i 024 "n; f
 i
 has exactly 4 zeros in its fitness value assignment, we know
 that two local fitness function f
 i
 ; f
 j
 ; 1 024 i; j 024 "n, conflict with each other if and only if
 they have exactly one common variable x such that one of the following is
true: {1}f
 i
 {s} =
 0{or 1}; f j
 {s} = 1{or 0} for all the assignments s such that x = 1{respectively x =
0}; and
 {2}f
 i
 {s} = 1{or 0}; f
 j
 {s} = 0{or 1} for all the assignments s such that x = 1{respectively x =
 0};
 Since the probability that two local fitness functions share at least one
variable is equal
 to
 1 000
 000 n0002
 2
 001000
 n0004 2
 001000
 n0001
 2
 001000
 n0001
 2
 001
 ; we have
 P rfI
 ij
 = 1g =
  
 1 000 000
 n0002 2
 001000
 n0004
 2
 001000
 n0001 2
 001000
 n0001
 2
 001
 ! 001 2
  
 1000
 8 4
 001
 !
 2
 = 012{
 1n
 }; " > 0; 1 024 i; j 024 "n;
 {4.4} 317Gao & Culberson and hence,
 E {S } =
 X
 1024i;j024"n
 E {I
 ij
 } =
 X
 1024i;j024"n
 P rfI
 ij
 = 1g 2 012{n}:
 We now consider the variance of S . Since S =
 P
 1024i;j024"n
 I
 ij , we have
 V ar{S } =
 P
 i;j
 V ar{I ij
 } + 2
 P {i;j}6={l;m}
 [E fI
 ij
 I
 lm
 g 000 E fI
 ij gE fI
 lm g]{E {S })
 2
 :
 Let
 A
 1
 =
 P
 i;j V ar{I
 ij }{E {S })
 2
 and
 A
 2
 =
 2
 P
 {i;j}6={l;m} [E fI
 ij I
 lm
 g 000 E fI
 ij gE fI
 lm g]{E {S })
 2
 :
 It is easy to see that lim
 n!1
 A
 1
 = 0. To prove lim
 n!1
 A 2
 = 0, we consider two cases:
 Case 1: i 6= j 6= m 6= l. In this case, the two random variables I
 ij
 and I
 lm
 are actually independent. It follows that E fI
 ij
 I lm
 g 000 E fI
 ij
 gE fI
 lm
 g = 0.
 Case 2: {i; j } 6= {l; m}, but they have one in common, say j = l. In this
case, we have
 E fI
 ij I
 lm
 g 000 E fI
 ij
 gE fI
 lm
 g = P rfI
 ij = 1gP rfI jm
 = 1jI ij
 = 1g 000 012  
 022
 1n
  2
 !
 = 012 022
 1n
 
 P rfI
 jm
 = 1jI
 ij
 = 1g000 012
  
 022 1n
 
 2
 !
 Given that f
 i
 and f
 j
 conflict with each other, the conditional probability that f
 j
 and f m
 conflictwith each other is still in 012{
 1n
 }.
 Since there are only C
 3
 "n pairs of I
 ij
 and I
 jm
 satisfying the condition in Case 2, we know
 that
 P
 {i;j}6={l;m}
 [E fI
 ij
 I
 lm
 g 000 E fI
 ij
 gE fI
 lm
 g] is in 012{n}. And therefore, lim
 n!1
 A 2
 = 0: It follows that
 lim
 n!1 P rfS = 0g 024 lim
 n!1
 V ar{S }{E {S }
 2 }
 = 0:
 Since the event fS > 0g implies that there exists a conflicting pair of
local fitness functions,
 the theorem follows.4.2 2-SAT Sub-problems in N {n; 2; z} and a Tighter
Upper Bound
 In this subsection, we establish a tighter upper bound z >2:837 for the
threshold of the fixed
 ratio model N {n; 2; z} by showing that asymptotically N {n; 2; z} contains
an unsatisfiable
 2-SAT sub-problem with probability 1 for any value of z greater than 2.873.
This also gives us a polynomialtime algorithm which determines that N {n;
2; z} is insoluble with
 probability asymptotic to 1 for z >2:837.
 318Phase Transition in NK Landscapes
 Recallfrom Section 2 that each instance of N {n; 2; z} has an equivalent
3-SAT instance.
 The idea is to show that with probability asymptotic to 1, an instance of
N {n; 2; z} will
 contain a set of specially structured 3-clauses, called a t-3-module {Definition
10.3, Franco
 & Gelder, 1998}: M = fM
 1 ; : : : ; M
 3p+2
 g; t = 3p+ 2;
 where
 M 1
 = { 026u 1
 _ u
 2 _ z
 1
 ; 026u
 1
 _ u
 2
 _ 026z
 1
 };
 001 001 001
 M
 p0001
 = { 026u
 p0001
 _ u
 p
 _ z p0001
 ; 026u
 p0001
 _ u
 p
 _ 026z
 p0001 };
 M
 p = { 026u
 p _ 026u
 0 _ z
 p
 ; 026u
 p
 _ 026u
 0
 _ 026z
 p
 }; M
 p+1
 = { 026u
 p+1 _ u
 p+2
 _ z
 p+1
 ; 026u
 p+1
 _ u
 p+2
 _ 026z
 p+1 };
 001 001 001 M
 3p0001 = { 026u
 3p0001
 _ u 3p
 _ z
 3p0001
 ; 026u
 3p0001 _ u
 3p
 _ 026z
 3p0001
 };
 M 3p
 = { 026u
 3p
 _ u
 0
 _ z
 3p
 ; 026u 3p
 _ u
 0
 _ 026z 3p
 }
 M 3p+1
 = { 026u
 0
 _ u
 1
 _ z 3p+1
 ; 026u
 0
 _ u 1
 _ 026z 3p+1
 };
 M
 3p+2
 = {u
 0
 _ u p+1
 _ z
 3p+2
 ; u
 0
 _ u
 p+1
 _ 026z
 3p+2
 };
 and u
 1
 ; 001 001 001 ; u
 3p+1
 ; z
 1
 ; 001 001 001 ; z
 3p+1
 are binary variables. Notice that a t-3-module can be re-
 ducedto a 2-SAT problem containing two contradictory cycles and hence is
unsatisfiable.
 The result is proved in two steps. In the first step, it is shown that for
z > 2:837 the average number of t-3-modules contained in N {n; 2; z} tends
to infinityas n increases. In
 the second step, weuse a result established by Alon and Spencer {1992} on
the second
 moment method to prove that for z > 2:837 the probability that N {n; 2;
z} contains at least
 one t-3-module tends to 1.
 Let us start with the first step to show that the average number of t-3-modules
contained
 in N {n; 2; z} tends to infinity as n increases. Definition 4.1. Given a
t-3-module M and an NK landscap e instance f =
 n P
 i=1
 f i
 ; k =2, a
 sequenc e of loc al fitness functions
 g = {g
 1
 ; 001 001 001 ; g
 t
 } 032 {f
 1
 ; 001 001 001 ; f
 n
 }
 is said to be a possiblematch{PM} if for each 1 024 m 024 t, the main
variable of g
 m
 is one of the thre e variables that oc cur in the 3-module M m
 . A subsequenc e {h
 1
 ; 001 001 001 ; h
 l } of a possible
 match g is legal if for any 1 024 m < j 024 l; h
 m
 6= h j
 .
 Lemma 4.2. Let f {x} =
 n
 P
 i=1
 f
 i
 {x
 i
 ; 005{x
 i
 }) be an instance of N {n; 2; z} and M be a t-3-
 module. Then the number of possible matches for the t-3-module M is 3
 t
 . Further, the
 number of legal possible matches is 002 020
 {
 3+ p52
 }
 t 021
 .
 319Gao & Culberson Proof: For each 1 024 m 024 t, there are exactly 3
possiblechoices for g
 m
 :
 f
 i 1
 {x
 i 1
 ; 005{x i
 1
 }); f i
 2
 {x i
 2
 ; 005{x
 i
 2
 }); f
 i
 3
 {x
 i
 3
 ; 005{x
 i
 3 });
 where x i
 1
 ; x
 i
 2
 ; and x
 i
 3
 correspond to the three variables that occur in the 3-module M
 m
 . Therefore, there are 3
 t possible matches for the t-3-module. To prove the second conclusion, we
divide the t-3-module into 3 parts M = {M
 1
 ; M
 2
 ; M 3
 },
 where M 1
 = {M
 m ; 1 024 m 024 p}, M
 2
 = {M
 m
 ; p + 1 024 m 024 3p 000 1}, and M
 3 =
 {M
 3p ; M
 3p+1 ; M
 3p+2
 }. Letting L
 1 ; L
 2
 ; and L
 3
 be the number of legal possible matches for
 M 1
 ; M
 2 ; M
 3
 respectively. Since the literals in M
 1
 are variable-distinct from the literals in
 M
 2
 , we have that the number of legal possible matches, L, for the t-3-module
M satisfies L
 1
 L
 2
 024 L 024 27L
 1
 L
 2 :
 We now estimate the order of L
 1
 . To this end, we consider the probability space {012; P },
 where 012 is the set of sequences {g 1
 ; 001 001 001 ; g
 p
 } of local fitness functions that possiblymatch M
 1
 and P is the uniform probability distribution. Then, the number of legal
possible matches
 is
 L
 1 =j012j 001 P rfa random sample from 012 is legalg {4.5}
 Let g = {g
 1
 ; 001 001 001 ; g p
 } be a random sample from 012 and x
 g
 m denote the main variable of the local fitness function g
 m
 , then we have P rfx
 g m
 = ju
 m
 jg = P rfx
 g
 m
 = ju
 m+1
 jg = P rfx
 g m
 = jz
 m
 jg =
 13
 ;
 where juj denotes the variable corresponding to the literal u.
 Let B
 m ; 0 < m 024 p be the event that the first m local fitness functions g
1
 ; 001 001 001 ; g
 m
 in the possible match g = {g
 1
 ; 001 001 001 ; g
 p
 } are mutually distinct. Since in M
 1
 only consecutive
 3-modules share variables, we have
 B
 m
 = f{g
 1
 ; 001 001 001 ; g
 m
 } : g
 i 6= g
 i+1 ; 1 024 i 024 m 000 1g:
 Let b
 m
 = P rfg
 m
 6= g
 m0001
 j B
 m0001
 g; m 025 2, and b 1
 =1. Notice that B
 1
 = 012. Then, we have
 P rfg = {g
 1 ; 001 001 001 ; g p
 }is legalg = P rfB
 p
 g
 = P rfg 1
 6= g
 2
 ; g
 2 6= g
 3
 ; 001 001 001 ; g p0001
 6= g
 p
 g
 = P rfB
 1
 gP rfg
 2
 6= g
 1
 j B
 1
 g 001 P rfg
 3
 6= g
 2
 j B
 2
 g 001 001 001 Prfg
 p
 6= g
 p0001 j B
 p0001 g
 = b
 1 b
 2
 001 001 001 b
 p
 {4.6}
 Recalling that x
 g
 m
 denotes the main variable of the local fitness function g m
 , we have b
 p
 = P rfg
 p0001 6= g
 p
 ; x
 g
 p0001
 = ju
 p j j B
 p0001 g + P rfg p0001
 6= g
 p
 ; x g
 p0001
 6= ju
 p j j B
 p0001 g
 = P rfg
 p0001
 6= g
 p
 j B p0001
 ; x g
 p0001
 = ju
 p
 jg 001 P rfx
 g
 p0001
 = ju
 p
 j j B
 p0001
 g+
 P rfg
 p0001
 6= g p
 j B
 p0001
 ; x
 g p0001
 6= ju
 p
 jg001 P rfx
 g
 p0001
 6= ju
 p
 j j B p0001
 g
 =
 23
 a
 p
 + {1 000 a
 p
 }
 = 1 000
 13
 a
 p
 ; {4.7}
 320Phase Transition in NK Landscapes
 where a
 p
 = P rfx
 g
 p0001
 = ju p
 j j B
 p0001
 g. For a
 p
 , we have a
 p
 =
 P rfB
 p0001 ; x
 g
 p0001
 = ju
 p jgP rfB
 p0001
 g
 =
 1P rfB
 p0001
 g
 {P rfB
 p0001
 ; x g
 p0001
 = ju
 p
 j; x
 g
 p0002
 = ju
 p0001
 jg
 +P rfB
 p0001 ; x
 g
 p0001
 = ju
 p j; x
 g
 p0002
 6= ju p0001
 jg} =
 1P rfB
 p0001 g
 {P rfx g
 p0001
 = ju
 p
 j j B
 p0001 ; x
 g
 p0002
 = ju p0001
 jg001 P rfB
 p0001
 ; x
 g
 p0002
 = ju
 p0001
 jg + P rfx
 g
 p0001
 = ju
 p
 j j B
 p0001
 ; x
 g
 p0002 6= ju
 p0001
 jg001 P rfB
 p0001
 ; x
 g
 p0002
 6= ju
 p0001
 jg} =
 1P rfB
 p0001 g
 022
 12
 P rfB
 p0001
 ; x g
 p0002
 = ju
 p0001
 jg +
 13
 P rfB p0001
 ; x g
 p0002
 6= ju
 p0001
 jg
 
 {4.8}
 The last equation in the above formula is because that given B
 p0001
 and x
 g
 p0002 = ju
 p0001
 j
 {or x g
 p0002
 6= ju
 p0001
 j}, we have two {three, respectively} choices in selecting the local fitness
 function g p0001
 . Consider the two terms P rfB p0001
 ; x g
 p0002
 = ju
 p0001
 jg and P rfB
 p0001
 ; x g
 p0002
 6=
 ju
 p0001
 jg in {4.8}, we have
 P rfB
 p0001
 ; x
 g p0002
 = ju
 p0001
 jg = P rfg
 p0002
 6= g p0001
 j B p0002
 ; x g
 p0002
 = ju
 p0001
 jg001 P rfB
 p0002
 ; x g
 p0002
 = ju
 p0001
 jg
 =
 23
 Prfx g
 p0002
 = ju
 p0001 j j B
 p0002 g 001 P rfB p0002
 g
 =
 23
 a
 p0001
 001 P rfB
 p0002
 g
 {4.9}
 and
 P rfB
 p0001
 ; x
 g p0002
 6= ju
 p0001 jg
 = P rfg p0002
 6= g
 p0001
 j B
 p0002
 ; x
 g
 p0002
 6= ju
 p0001
 jg001 P rfB
 p0002 ; x
 g
 p0002
 6= ju
 p0001
 jg
 = P rfx
 g p0002
 6= ju
 p0001 j j B
 p0002 g 001 P rfB p0002
 g
 = {1000 a
 p0001
 } 001 P rfB
 p0002
 g {4.10}
 By plugging {4.9} and{4.10} into {4.8}, we get a
 p
 =
 P rfB
 p0002
 gP rfB
 p0001 g
 022
 13
 a
 p0001
 +
 13
 {1 000 a
 p0001
 } 
 =
 13b
 p0001
 :
 This, together with {4.7}, gives us
 b
 p
 = 1000 19b
 p0001
 : {4.11} It is not difficult to show that the sequence fb
 p
 g is decreasing and lower bounded by 0.
 Letting lim
 p
 b
 p
 = b and taking the limit on both sides, we get
 b = 1000
 19b
 ; {4.12}
 321Gao & Culberson and thus, b =
 3006
 p56
 . In our case, b =
 3+
 p56 since b
 1
 = 1. It follows that b
 p
 025 b =
 3+ p56
 and
 thus, b
 1
 001 001 001 b
 p
 025
   3 +
 p56
 ! p
 :
 From {4.5}, we know that the number of legal possible matches is greater
than 3
 p
  
 3 +
 p56
 ! p
 =
  
 3 +
 p52
 ! p
 : {4.13}
 To prove that the expected number of legal possible matches L 1
 for M
 1 is in 002
 020020 3+
 p52
 021 p
 021
 , let ff
 p
 = b
 p
 000
 3+
 p56
 = b p
 000 b. From {4.11} and{4.12}, we have ff
 p
 = b
 p
 000 b =
 b
 p0001
 000 b9bb
 p0001
 024 dff
 p0001
 ; 0 < d < 1;
 which means that the series p
 P
 m=1 ff
 m
 is convergent. It follows that
 {1 +
 ff
 1b
 } 001 001 001 {1 +
 ff
 pb
 }
 converges to a finite positive constant c. Therefore,
 b
 1
 001 001 001 b p
 = {b + ff
 1
 } 001 001 001 {b + ff
 p
 }
 = b
 p 020
 1+
 ff 1b
 021
 001 001 001
 020
 1 +
 ff pb
 021
 024 c
   3+
 p56
 !
 p
 {4.14}
 for sufficient large p and some constant c.
 Similarly, we can show that the number of legal possible matches L
 2
 for M 2
 is in
 002 022
 020
 3+
 p52
 021
 2p+2
 
 . Recalling that the number of legal possible matches L for the t-3-module
 satisfies L
 1
 L
 2 024 L 024 27L 1
 L
 2
 , the second conclusion follows.The following Lemma calculates the probability
that a matching local fitness function
 implies the matched 3-module. Lemma 4.3. Given a 3-module x _ y _w; x _
y _ 026w, and a loc al fitness function g such
 that the main variable x
 g
 of g is one of the thre e Boole an variables jxj; jyj; jwj, let z = 2 +
ff; 0 024 ff 024 1 be the par ameter in the fixed ratio model N {n; 2;
z}. Then the prob ability that g contains the 3-module is
 p
 0 =
  
 1000
 n0001
 2
 001
 ! 022
 128
 {1 000 ff} + 656
 ff 
 {4.15}
 322Phase Transition in NK Landscapes
 Proof: Since x g
 is already one of the variables in the 3-module, the probabilitythat
 the other two variables are also in the 3-module is
 1{
 n0001
 2
 }
 .
 Now, assume that the variables of the local fitness function g are the same
as the
 variables in the 3-module. From the definition of the fixed ratio model,
g has two zeros
 inits fitness value assignment with probability {1 000 ff}, and has three
zeros in its fitness
 assignment with probability ff. Note that the local fitness function g implies
the 3-module x _ y _ w; x _ y _ 026w ifand only if
 g{ 026x; 026y; 026w} = 0 and g{ 026x; 026y; w} = 0:
 From the definition of the fixed ratio model, this happens with the probability
 1000
 8
 2
 001
 {1 000 ff} +
 6000 8
 3
 001
 ff
 The Lemma follows.With the above preparation, we can now prove that the
average number of t-3-modules
 contained in N {n; 2; z} tends to infinity.
 Theorem 4.2. Let A
 t be the number of t-3-modules containe d in N {n; 2; z} and t = 002{ln
 2
 n}.
 Then, if z = 2 + ff > 2:837, lim
 n!1
 E fA
 t
 g = 1: {4.16}
 Proof: From Lemma 4.2, there are more than {
 3+
 p52 }
 t
 legal possible matches for a fixed
 t-3-module. From Lemma 4.3, we know that each possible legal match g = fg
 1
 ; 001 001 001 ; g
 t
 g
 implies the t-3-module with probability p
 t
 0
 . From the proof of Theorem 10.1 in {Franco &
 Gelder, 1998}, there are 2
 t0002
 n
 t0001{n 000 t + 1}
 t
 {4.17} possible t-3-modules, where n t0001=
 n!{n000t+1}!
 . Let r =
 000 128
 {1 000 ff} +
 656
 ff 001
 , and write p 0
 =
 1{
 n0001
 2
 }
 r. We have
 E fA
 t
 g =  
 3+
 p52 p
 0
 !
 t
 001 2
 t0002
 n
 t0001{n 000 t + 1}
 t
 =  
 3 +
 p52 r
 !
 t
 001 2
 t0002 n
 t0001{n 000 t + 1}
 t
 001 1000 n0001
 2
 001
 t
 =
 14{n 000 t + 1}
  
 3 + p52
 r
 ! t
 001
 2 t
 n
 t{n 000 t + 1}
 t000
 n
 2 001
 t
   000
 n
 2
 001000
 n0001
 2
 001 !
 t
 =
 14{n 000 t + 1}
  
 3 + p52
 r
 ! t
 001
 4 t
 n
 t{n 000 t + 1}
 t{n{n 000 1}) t
 022
 nn 000 2
 
 t
 =
 14n
 {2{3 +
 p5}r}
 t
 {1 000 O
 022
 t
 2n
 
 }; {4.18}
 323Gao & Culberson where the fourth equation in {4.18} is due to the fact
that for any positive integer n and q
 such that q < n2
 , we have n
 q e
 000q
 2 =2n
 024 n q024 n q
 . It follows that lim
 n!1
 E fA
 t
 g = 1 if
 2{3 +
 p5}r > 1: {4.19}
 Solving the inequality {4.19} gives us ff > 0:837, that is, z = 2 + ff >
2:837. This proves Theorem 4.2.Based on the Chebychev's inequality, to prove
that N {n; 2; z} contains t-3-modules with
 probability 1, we need to show that the variance of A
 t
 , the number of contained t-3-modules, is o{E fA
 t
 g}. For this purpose, we follow Franco and Gelder's approach {Lemma 4.1,
Franco
 & Gelder, 1998} to apply the second moment method {Alon & Spencer, 1992}:
 Lemma 4.4. {Alon & Spenc er, 1992, Ch. 4.3 Cor 3.5} Given a random structure{e.g.,
a
 random CNF formula}, let W be the set of substructures under consider ation,
A{w} be the
 set of substructures sharing some clauses with w 2 W . Let I
 w
 =1 when w is in the random
 structur e and 0 otherwise. If
 {1} elements of W are symmetric;
 {2} 026 = E f
 P w2W
 I
 w
 g ! 1; and {3}
 P
 026w2A{w}
 P r{ 026w j w} = o{026}, for each w 2 W ,
 then as n ! 1, the prob ability that the random structure contains a substructure
tends to
 1.
 To use the above Lemma to study the 2-SAT sub-problem in NK landscapes,
we view the random structure to be a random instance of N {n; 2; z}, andW
to be the set of all
 t-3-modules which are symmetric by their definitions{Sections 5 and 10,
Franco & Gelder,
 1998}.
 Theorem 4.3. If z=2+ ff > 2:837, then N {n; 2; z} is asymptotical ly insoluble
with prob-
 ability 1.
 Proof: Let A
 t
 be the number of t-3-modules implied by N {n; 2; z} and t = O{ln
 2 n}.
 Theorem 4.2 shows that lim
 n!1
 E fA
 t
 g = 1. By Lemma 4.4, it is enough to show that for
 eachw 2 W ,
 X
 026w2A{w}
 P r{ 026w j w} = o{E fA
 t g}; {4.20}
 where P r{ 026w j w} is the conditional probability that N {n; 2; z} implies
the t-3-module 026w given that it implies w, and A{w} is the set of all
t-3-modules sharing some clauses with w.
 Suppose that 026w shares Q; 1 024 Q 024 2t clauses with w, and that these
Q clauses are distributed among q 3-modules. Further, let q
 1 be the number of 3-modules whose two
 clauses are both shared and q
 2
 = q 000 q
 1
 the number of 3-modules that only has one clause shared.
 Let T
 1 be a 3-module in 026w that shares exactly one clause with a 3-module
T
 2
 in w.
 We claim that the conditional probability that T
 1
 is implied by N {n; 2; z} given that w is
 324Phase Transition in NK Landscapes
 impliedby N {n; 2; z}, is
 16
 ff + O{
 1n
 }: {4.21} Without loss of generality, assume that T
 2
 = fx_y _u; x_y _ 026ug and T
 1
 = fx_y _u; 026x_y _ 026ug. Since w is implied by N {n; 2; z}, there is
a local fitness function g =g{jxj; jyj; juj} that implies T
 2
 . The conditional probability that T
 1
 is implied, is less than or equal to P
 1
 + P 2
 where
 P
 1
 is the conditional probability that g alsoimplies the clause 026x _ y _
026u given that g implies
 T 2
 , and P
 2 is the conditional probability that the clause 026x _ y _ 026u is implied
by other local fitness functions. By the definition of N {n; 2; z}, wehave
that P
 1 =
 16
 ff. Since a local fitness
 function implies 026x _ y _ 026u only if it has the same variables with
g =g{jxj; jyj; juj}, we have
 that P 2
 = O{
 1n
 }. The claim is proved. It follows that, for sufficiently large n,
 P rf 026w j wg 024 c
  
 3+
 p52
 p 0
 !
 t000q
 001 1
 q 1
 001
 022 16
 ff
 
 q
 2
 {4.22}
 where p
 0
 is defined in Lemma 4.3 and c is a fixed constant. Let A
 Q;q;q
 2 {w} be the set of t-3-modules that share Q clauses with w such that these
Q
 clauses are distributed over q different 3-modules. As before, q
 1
 is the number of 3-modules
 whose two clauses are both shared and q
 2
 = q 000q
 1
 the number of 3-modules that only has one clause shared. We claim that jA
 Q;q;q
 2
 {w}j = jA
 2q;q;0 {w}j6
 q
 2
 : {4.23} where A
 2q;q;0
 {w} is the set of t-3-modules that share all the 2q clauses in the q 3-modules
 with w. LetM = fM
 1
 ; 001 001 001 ;M
 t
 g be a t-3-module in which all the clausesM
 i
 ; 1 024 i 024 q
 are shared with w. Let M = fM
 1 ; 001 001 001 ; M t
 g be a t-3-module in which all the clauses in
 M i
 ; 1 024 i 024 q
 1
 are shared and each of the 3-modules M
 i
 ; q
 1 + 1 024 q
 1 + q
 2
 has only one
 clause shared. Since for each of the q
 2 3-modules, we have 6 ways to choose the non-shared
 clauses, there are 6 q
 2
 such t-3-modules M in A
 Q;q;q 2
 {w} that correspond to one t-3-moduleM in A
 2q;q;0
 . The claims follow. From formula {55} and {56} in {Franco & Gelder, 1998}
 and {4.23}, it follows that
 jA
 Q;q;q
 2
 {w}j <
 {
 O{t}n
 2
 2
 t000q
 n
 2{t000q}
 6
 q
 2
 ; q 024 p + 1;
 O{1}n
 2
 t000q
 n
 2{t000q} 6
 q
 2
 ; q > p + 1:
 {4.24}
 Let r =
 000
 128
 {1 000 ff} + 656
 ff
 001
 , and write p
 0
 =
 1{
 n0001
 2
 }
 r. Then, we have jA
 Q;q;q
 2
 {w}jP rf 026w j wg
 024
 O{t}n
 2
 2
 t000q
 n
 2{t000q}
 6
 q
 2
 {
 3 +
 p52
 p
 0 }
 t000q
 {
 16 ff}
 q
 2 024
 O{t}n
 2 2
 t000q
 n
 2{t000q}  
 3 +
 p52 r
 !
 t000q
 1000
 n0001
 2 001
 t000q 024
 O{t}n
 14n
 {2{3 + p5}r}) t000q
 024 O{t}n
 E fA t
 g{2{3 + p5}r}) 000q
 ; q 024 p + 3
 {4.25}
 325Gao & Culberson and
 jA
 Q;q;q 2
 {w}jP rf 026w j wg 024 O{1}E fA t
 g{2{3 +
 p5}r}) 000q
 ; q > p + 3: {4.26}
 Therefore, X
 026w2A{w}
 P r{ 026w j w} =
 X
 Q;q;q
 2
 jA
 Q;q;q
 2 {w}jP rf 026w j wg
 =
 t
 X
 Q=1
 X
 q024p+3
 X
 q
 2 O{t}n
 E fA
 t
 g{2{3 +
 p5}r}) 000q
 +
 t X
 Q=1
 X q>p+3
 X
 q
 2
 O{1}E fA
 t
 g{2{3 +
 p5}r})
 000q
 : {4.27}
 Since 2{3 + p5}r} > 1 for z > 2:837, we have
 X
 026w2A{w}
 P r{ 026w j w} 024
 O{t 4
 }n
 E fA
 t
 g + t
 3 E fA
 t
 g{4r}
 000{p+3}
 = o{E fA
 t
 g}:
 {4.28}
 This completes the proof of Theorem 4.3.5. Experiments
 Our study of the threshold phenomena in NK landscapes started with an experimental
investigation. Many of the theoretical results in the previous section are
motivated by
 the observations made in our experiments. In this section, we describe the
approach and
 methods we used in the experimental study, and report the results and observations
we
 have made. In our experiments, an instance of the NK landscape decision
problem is converted to an
 equivalent 3-SAT problem, and then the 3-SAT problem is solved using Roberto's
relsat|an
 enhanced version of the famous Davis-Putnam algorithm for SAT problems implemented
in
 C ++
 . The source code of relsat can be found at http://www.cs.ubc.ca/ hoos/SATLIB/.
 In the experiments, we generated random instances of the NK landscape decision
prob-
 lem from the random model N {n; 2; z}. As a result, the equivalent SAT problem
for each
 random NK landscape instance is a 3-SAT problem with n variables and {on
average} zn
 clauses. By definition, the parameter z is between 0 and 8. For z 024 1,
the 3-SAT instance
 can be solved easily by setting the literals that correspond to the main
variables of the
 local fitness function to true. As z increases,we get more and more clauses
and the 3-SAT
 problem becomes more and more constrained. The aims of the experiments are
three-
 fold:{1}Investigating if there exists a threshold phenomenon in the random
NK landscape
 model;{2} Locating the threshold of the parameter z; and {3}Determining
if there are any
 hard instances around the threshold.
 5.1 Experiments on the Fixed Ratio Model
 In this part of the experiments, we generate 100 random instances of N {n;
2; z} for each of the parameters n = 2
 9
 001 001 001 2 16
 and z =2:71; 2:72; 001 001 001 ; 3:00. These instances are then
 326Phase Transition in NK Landscapes
2.72.752.82.852.92.95300.10.20.30.40.50.60.70.80.91n=512  n=1024 n=2048 n=4096
n=8192 n=16384n=32768n=65536z=2.84 Figure 1: Fractions of insolubleinstances{Y-axis}
as a function of z {X-axis}.
 convertedto 3-SAT instances and solved by relsat. Figure 1 shows the fraction
of insoluble
 instances as a function of the parameter z. It can be seen that there exists
a threshold
 phenomenon and the threshold is around 2.83. This shows that our upper bound
z = 2:837
 is very tight.
 In Figure 2, we plot the square root of the average search cost as a function
of the
 parametern. The figure indicates that the average search is in O{n 2
 } for any parameter z. We have also observed that more than 99 percent of
the insoluble instances are solved
 quickly in the preprocessing stage of relsat. This indicates that there
must be some \small"
 structures that make the instances insoluble. More detailed experimental
results can be
 found in Gao's thesis {Gao, 2001}.
 5.2 Experiments on the 2-SAT sub-Problem
 This is the part of the experiments that motivated our theoretical analyses
in Section 4.2.
 The idea can be explained as follows. Let
 f {x} =
 n
 X
 i=1
 f
 i
 {x
 i
 ; 005{x
 i
 })
 be an instance of the decision problem of NK landscape and
 ' = C
 1 ^
 C
 2
 001 001 001
 ^
 C
 n
 the equivalent 3-SAT problem where C i
 is the set of 3-clauses equivalent to the local fitness
 function f
 i
 . For each i, there is a set of 2-clauses D
 i
 {possibly empty} implied by C
 i . For
 327Gao & Culberson01234567x 10405101520253035z=2.71z=2.80z=2.84z=2.86z=2.90Figure
2: Square root of the average search cost {Y-axis, in seconds} as a function
of n {X-axis}.
2.72.752.82.852.92.95300.10.20.30.40.50.60.70.80.91n=512  n=1024 n=2048 n=4096
n=8192 n=16384n=32768n=65536z=2.84 Figure 3: Fractions of insoluble instances{Y-axis}
as a function of z {X-axis}for 2-SAT
 sub-problems. example, if C
 i
 has three 3-clauses {(x; y; z}; {x; 026y ; z}; {x; y; 026z }), then
the set of 2-clauses D
 i
 would be {(x; z}; {x; y}). The conjunction of D
 i
 , denoted by 026', is a 2-SAT problem. It is
 328Phase Transition in NK Landscapes
01234567x 1040510152025303540z=2.71z=2.80z=2.84z=2.86z=2.90Figure 4: Square
root of the average search cost {Y-axis, in seconds} as a function of n {X-axis}
for 2-SAT sub-problems. obviousthat the original 3-SAT problem ' is satisfiable
only if the 2-SAT sub-problem 026' is satisfiable. In the experiment, we
generate instances of the NK landscape N {n; 2; z}, convert
 them to the equivalent 3-SAT problems, and extract the 2-SAT sub-problems.
These 2-SAT
 problems are then solved by the relsat solver. If the 2-SAT problem is unsatisfiable,
then
 the original NK landscape instance is also insoluble. The experimental settings
are the same as those in the experiment on the original problem. The results
are shown in Figures 3-4, inparallel to the Figures 1-2 of the results
 on the original 3-SAT problems in Section 5.1. We see that the patterns
of insoluble fractions
 and search cost are similar to those we found in the original 3-SAT problems.
There is a
 soluble-insoluble phase transition occurring around 2.83, but the fraction
of unsatisfiable instances is lower than the fraction in the original 3-SAT
problems.
 We also observed that the average search cost for the 2-SAT sub-problemsremains
the
 same as that for the original 3-SAT problems. This tells us that the difficulty
of solving a
 soluble instance of NK landscape is almost the same as that of solving a
2-SAT problem, and hence is easy. Therefore, on average the NK landscape
N {n; 2; z} is also easy at parameters
 below the threshold where almost all of the instances are soluble.
 6. Implications and Conclusions One of the questions that arises about this
work is its implications to the design and anal-
 ysis of genetic algorithms. NK landscapes were initiallyconceived as simplified
models
 ofevolutionary landscapes which could be tuned with respect to ruggedness
and epistatic
 interactions {Kauffman, 1989}. In the study of genetic algorithms, NK landscape
models
 329Gao & Culberson have been used as a prototype and benchmark in the analysis
of the performance of dif-
 ferent genetic operators and the effects of different encoding methods on
the algorithm's
 performance {Altenberg, 1997; Hordijk, 1997; Jones, 1995}. Kauffman {1993}
points out
 that the parameters that primarilyaffect a number of ruggedness measures
are n and k.
 Nevertheless, the fact that for k 025 2 the discrete NK landscape is NP-complete
{Wright
 et al., 2000} when the neighbors are arbitrarily chosen could be construed
as implyingthat random landscapes with fixed k are in practice hard.
 The results in this paper should serve as a cautionary note that this may
not be the
 case. Our analyses show that for fixed k theuniform probability model is
trivially solvable
 as the problem size tends to infinity. For the fixed ratio model, we have
derived two upper bounds for the threshold of the solubility phase transition,
and proved that the problem with the control parameter above the upper bounds
can be solved in polynomial time with
 probability asymptotic to 1 due to the existence of easy sub-problems such
as 2-SAT. A
 series of experiments has also been conducted to investigate the hardness
of the problem
 with the control parameters around and below the threshold. From the experiments,
we
 have observed that the problem is also easy around and below the threshold.
Ourproofs hold only for the decision version of the problem where the component
func-
 tions are discrete on f0; 1g. The proofs are obtained by noticing that the
clustering of functions, or clauses, on selected subsets of variables implies
that the overall problem is decomposable into independent subproblems, or
that the problem contains small substruc- tures that identify the solution.
The subproblems are the components of the connection
 graph defined in Section 3 and the 2-SAT sub-problems studied in Section
4.2. It is cur-
 rently unclear to us to what extent our analysis can be extended to the
optimization version
 of the NK model, and we would like to study this problem further in the
future.
 In response to the question `what are the implicationsfor GAs?' we suggest
the following
 speculative line of enquiry. For the discrete model we use, the soluble
instances are readily
 solved by a standard algorithmic approach based on recognizing the components
of the connection graph. {This should not be a surprise for us as it has
been pointed out by
 Heckendorn, Rana, and Whitley {1999} that `Even relatively old algorithms
such as Davis-
 Putnam which are deterministic and exact are orders of magnitude faster
than GAs'.}
 1
 A
 similar connectivity can be developed for real valued distributions, for
example by capping the minimum value which we allow a sub-functionto take.
We can speculate that the
 clustering imposed by fixed values of k wouldalso generate localized structures
when real
 values are applied and when considering optimization instead of decision,
but perhaps with
 fuzzy boundaries. In fact, this observation is just the flip side of limited
epistasis. Genetic algorithms, or their variants such as the probabilisticmodel-building
algorithms {Larranaga
 & Lozano, 2001}, designed to mimic natural evolution, are supposed to take
advantage of
 this situation. So, to the extent that NK landscapes are an accurate reflection
of the
 features exploited by evolutionary algorithms, we pose the following question.
Is it possible to identify these fuzzy components if they exist, and in doing
so design an algorithm that
 exploits the same landscape features that the evolutionary algorithms do,
but far more efficiently, as we have done for the uniform discrete decision
problem?1. We thanks an anonymous referee for pointing out to us the work
of Heckendorn, et al. {Heckendorn
 et al., 1999}
 330Phase Transition in NK Landscapes
 These landscapes were designed with the intent of studying limited interactions,
and our
 results can also be seen as a confirmation that indeed limited epistasis
leads to easier prob-
 lems. In another domain, that of the more traditional research into search
and optimization,
 there is a need for test bed problems with real world connections which
are tunable with
 respect to difficulty. NK landscapes might have been such a domain for generating
3-SAT instances. It is disappointing that for restricted k theinstances generated
are easy with
 high probability. Acknowledgments
 This research supported in part by Natural Sciences and Engineering Research
Council
 Grant No. OGP8053. We thank the anonymous reviewers for their comments.
 References
 Achlioptas, D. {1999}. Threshold Phenomena in Random Graph Colouring and
Satisfiabil- ity. Ph.D. thesis, Department of Computer Science, University
of Toronto, Toronton,
 Canada.
 Achlioptas, D. {2001}. A survey of lower bounds for random 3-sat via differential
equations. Theor etic al Computer Science, 265, 159{185.
 Achlioptas, D., Kirousis, L., Kranakis, E., Krizanc, D., & Molloy, M. {1997}.
Random
 constraint satisfaction: A more accurate picture. In Pro c e e dings of
CP97, pp. 107{
 120.Springer.
 Alon, N., & Spencer, J. {1992}. The Prob abilistic Method. Wiley, New York.
 Altenberg, L. {1997}. Nk fitness landscapes. In Back, T., Fogel, D., & Michalewicz,
Z.
 {Eds.}, Handbo ok of Evolutionary Computation. Oxford University Press,
New York.
 Cheeseman, P., Kanefsky, B., & Taylor, W. {1991}. Where the re al ly hard
problems are. In
 Pro c e e dings of the 12th International Joint Conferenc e on Artificial
Intel ligence, pp.
 331{337. Morgan Kaufmann. Cook, S., & Mitchell, D. {1997}. Finding hard
instances of the satisfiability problem: A
 survey. In Du, Gu, & Pardalos {Eds.}, Satisfiability Problem: Theory and
Applications,
 V ol. 35 of DIMACS Series in Discrete Mathematics and Theor etic al Computer
Science. American Mathematical Society.
 Culberson, J., & Gent, I. {2001}. Frozen development in graph coloring.
Theor etic al Com-
 puter Science, 265 {1-2}, 227{264. Dubois, O. {2001}. Upper bounds on the
satisfiability threshold. Theor etic al Computer
 Science, 265 {1-2}, 187{197. Franco, J., & Gelder, A. {1998}. A perspective
on certain polynomial time solvable classes
 of satisfiability. Discrete Applied Mathematics, to appe ar.
 F ranco, J., & Paul, M. {1983}. Probabilistic analysis of the davis-putnam
procedure for solving satisfiability. Discrete Applied Mathematics, 5, 77{87.
 331Gao & Culberson Freeman, J. {1996}. Hard random 3-sat problems and the
davis-putman procedure. Artificial
 Intel ligence, 81, 183{198.
 Friedgut, E. {1999}. Sharp thresholds of graph properties and the k-sat
problem. J. Amer.
 Math. Soc., 1017{1054. Friedgut, E., & Kalai, G. {1996}. Every monotone
graph property has a sharp threshold.
 Pro c. Amer. Math. Soc., 2993{3002.
 Frieze, A., & Suen, S. {1996}. Analysis of two simple heuristics on a random
instance of k-sat. J. of Algorithm, 20 {2}, 312{355.
 Gao, Y. {2001}. Threshold phenomena in NK landscapes. Master's thesis, Department
of
 Computing Science, University of Alberta, Edmonton, Alberta, Canada.
 Gent, I., MacIntyre, I., Prosser, P.and Smith, B., & Walsh, T. {1998}. Random
constraint satisfaction: Flaws and structure. Tech. rep. APES-08-1998, APES
Research Group.
 Heckendorn, R., Rana, S., & Whitley, D. {1999}. Polynomial time summary
statistics for a
 generalization of maxsat. In GECCO99: Pro c e e dings of the Genetic and
Evolutinary
 Computation Conferenc e, pp. 281{288. Morgan Kaufmann.
 Hordijk, W. {1997}. A measure of landscapes. Evolutionary Computation, 4
{4}, 335{360. Jones, T. {1995}. Evolutionary Algorithms, Fitness Landsc ap
es and Sear ch. Ph.D. thesis, University of New Mexico, Albuquerque, NM.
 Kauffman, S. {1989}. Adaptation on rugged fitness landscapes. In Stein,
D. {Ed.}, Le ctur es in the Sciences of Complexity, Santa Fe Institute Studies
in the Sciences of Complexity,
 pp. 527{618. Addison Wesley .
 Kauffman, S. {1993}. The Origins of Order: Self-organization and Selection
in Evolution.
 Oxford University Press, Inc.
 Kirkpatrick, S., & Selman, B. {1994}. Critical behavior in the satisfiabilityof
random boolean expressions. Science, 264, 1297{1301.
 Kirousis, L., P.Kranakis, D.Krizanc, & Y.Stamation {1994}. Approximating
the unsatis-
 fiability threshold of random formulas. Random Structures and Algorithms,
12 {3},
 253{269.
 Larranaga, P., & Lozano, J. {2001}. Estimation of Distribution Algorithms:
A New To ol for
 Evolutinary Computation. Kluwer Academic Publishers,New York.
 Mitchell, D., Selman, B., & Levesque, H. {1992}. Hard and easy distributions
of sat problems.
 In Pro c e e dings of the 10th Natl. Conf on Artificial Intel ligence, pp.
459{465. AAAI
 Press.
 Vandegriend, B., & Culberson, J. {1998}. The G
 n;m
 phase transition is not hard for the
 Hamiltonian Cycle problem. Journal of Artificial Intel ligence Rese ar ch,
9, 219{245. Weinberger, E. D. {1996}. Np completeness of kauffman's NK model,
a tunable rugged
 fitness landscape. Tech. rep. 96-02-003, Santa Fe Institute, Santa Fe.
 W right, A. H., Thompson, R. K., & Zhang, J. {2000}. The computational complexity
of NK fitness functions. IEEE Tr ansactions on Evolutionary Computation,
4 {4}, 373{379.
 332