M. L. Littman, J. Goldsmith and Mundhenk M.
Volume 9, 1998Links to Full Text:
Journal of Artificial Intelligence Research 9 (1998) 1-36 Submitted 1/98; published 8/98 The Computational Complexity of Probabilistic Planning Michael L. Littman firstname.lastname@example.org Department of Computer Science, Duke University Durham, NC 27708-0129 USA Judy Goldsmith email@example.com Department of Computer Science, University of Kentucky Lexington, KY 40506-0046 USA Martin Mundhenk firstname.lastname@example.org FB4 - Theoretische Informatik, Universit"at Trier D-54286 Trier, GERMANY Abstract We examine the computational complexity of testing and finding small plans in probabilistic planning domains with both flat and propositional representations. The complexity of plan evaluation and existence varies with the plan type sought; we examine totally ordered plans, acyclic plans, and looping plans, and partially ordered plans under three natural definitions of plan value. We show that problems of interest are complete for a variety of complexity classes: PL, P, NP, co-NP, PP, NPPP, co-NPPP, and PSPACE. In the process of proving that certain planning problems are complete for NPPP, we introduce a new basic NPPP-complete problem, E-Majsat, which generalizes the standard Boolean satisfiability problem to computations involving probabilistic quantities; our results suggest that the development of good heuristics for E-Majsat could be important for the creation of efficient algorithms for a wide variety of problems. 1. Introduction Recent work in artificial-intelligence planning has addressed the problem of finding effective plans in domains in which operators have probabilistic effects (Drummond & Bresina, 1990; Mansell, 1993; Draper, Hanks, & Weld, 1994; Koenig & Simmons, 1994; Goldman & Boddy, 1994; Kushmerick, Hanks, & Weld, 1995; Boutilier, Dearden, & Goldszmidt, 1995; Dearden & Boutilier, 1997; Kaelbling, Littman, & Cassandra, 1998; Boutilier, Dean, & Hanks, 1998). Here, an "effective" or "successful" plan is one that reaches a goal state with sufficient probability. In probabilistic propositional planning, operators are specified in a Bayes network or an extended STRIPS-like notation, and the planner seeks a recipe for choosing operators to achieve a goal configuration with some user-specified probability. This problem is closely related to that of solving a Markov decision process (Puterman, 1994) when it is expressed in a compact representation. In previous work (Goldsmith, Lusena, & Mundhenk, 1996; Littman, 1997a), we examined the complexity of determining whether an effective plan exists for completely observable domains; the problem is EXP-complete in its general form and PSPACE-complete when limited to polynomial-depth plans. (A polynomial-depth, or polynomial-horizon, plan is one that takes at most a polynomial number of actions before terminating.) For these results, cfl1998 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved. Littman, Goldsmith & Mundhenk plans are permitted to be arbitrarily large objects--there is no restriction that a valid plan need have any sort of compact (polynomial-size) representation. Because they place no restrictions on the size of valid plans, these earlier results are not directly applicable to the problem of finding valid plans. It is possible, for example, that for a given planning domain, the only valid plans require exponential space (and exponential time) to write down. Knowing whether or not such plans exist is simply not very important because they are intractable to express. In the present paper, we consider the complexity of a more practical and realistic problem--that of determining whether or not a plan exists in a given restricted form and of a given restricted size. The plans we consider take several possible forms that have been used in previous planning work: totally ordered plans, partially ordered plans, (totally ordered) conditional plans, and (totally order) looping plans. In all cases, we limit our attention to plans that can be expressed in size bounded by a polynomial in the size of the specification of the problem. This way, once we determine that a plan exists, we can use this information to try to write it down in a reasonable amount of time and space. In the deterministic planning literature, several authors have addressed the computational complexity of determining whether a valid plan exists, of determining whether a plan exists of a given cost, and of finding the valid plans themselves under a variety of assumptions (Chapman, 1987; Bylander, 1994; Erol, Nau, & Subrahmanian, 1995; B"ackstr"om, 1995; B"ackstr"om & Nebel, 1995). These results provide lower bounds (hardness results) for analogous probabilistic planning problems since deterministic planning is a special case. In deterministic planning, optimal plans can be represented by a simple sequence of operators (a totally ordered plan). In probabilistic planning, a good conditional plan will often perform better than any totally ordered (unconditional) plan; therefore, we need to consider the complexity of the planning process for a richer set of plan structures. For ease of discussion, we only explicitly describe the case of planning in completely observable domains. This means that the state of the world is known at all times during plan execution, in spite of the uncertainty of state transitions. We know that the state of the system is sufficient information for choosing actions optimally (Puterman, 1994), however, representing such a universal plan is often impractical in propositional domains in which the size of the state space is exponential in the size of the domain representation. For this reason, we consider other types of plan structures based on simple finite-state machines. Because the type of plans we consider do not necessarily use the full state of the system to make every decision, our results carry over to partially observable domains, although we do not explore this fact in detail in the present work. The computational problems we look at are complete for a variety of complexity classes ranging from PL (probabilistic logspace) to PSPACE. Two results are deserving of special mention because they concern problems closely related to ones being actively addressed by artificial-intelligence researchers; first, the problem of evaluating a totally ordered plan in a compactly represented planning domain is PP-complete.1 A compactly represented 1. The class PP is closely related to the somewhat more familiar #P; Toda (1991) showed that P#P = PPP. Roughly speaking, this means that #P and PP are equally powerful when used as oracles. The counting class #P has already been recognized by the artificial-intelligence community as an important complexity class in computations involving probabilistic quantities, such as belief-network inference (Roth, 1996). 2 Complexity of Probabilistic Planning planning domain is one that is described by a two-stage temporal Bayes network (Boutilier et al., 1998) or similar notation. Second, the problem of determining whether a valid totally ordered plan exists for a compactly represented planning domain is NPPP-complete. Whereas the class NP can be thought of as the set of problems solvable by guessing the answer and checking it in polynomial time, the class NPPP can be thought of as the set of problems solvable by guessing the answer and checking it using a probabilistic polynomial-time (PP) computation. It is likely that NPPP characterizes many problems of interest in the area of uncertainty in artificial intelligence; this paper and earlier work (Goldsmith et al., 1996; Mundhenk, Goldsmith, & Allender, 1997a; Mundhenk, Goldsmith, Lusena, & Allender, 1997b) give initial evidence of this. 1.1 Planning-Domain Representations A probabilistic planning domain M = hS; s0; A; t; Gi is characterized by a finite set of states S, an initial state s0 2 S, a finite set of operators or actions A, and a set of goal states G ` S. The application of an action a in a state s results in a probabilistic transition to a new state s0 according to the probability transition function t, where t(s; a; s0) is the probability that state s0 is reached from state s when action a is taken. The objective is to choose actions, one after another, to move from the initial state s0 to one of the goal states with probability above some threshold `.2 The state of the system is known at all times (fully observable) and so can be used to choose the action to apply. We are concerned with two main representations for planning domains: flat representations, which enumerate states explicitly, and propositional representations (sometimes called compact, structured, or factored representations), which view states as assignments to a set of Boolean state variables or propositions. Propositional representations can represent many domains exponentially more compactly than can flat representations. In the flat representation, the transition function t is represented by a collection of jSj Theta jSj matrices,3 one for each action. In the propositional representation, this type of jSj Theta jSj matrix would be huge, so the transition function must be expressed another way. In the probabilistic planning literature, two popular representations for propositional planning domains are probabilistic state-space operators (PSOs) (Kushmerick et al., 1995) and two-stage temporal Bayes networks (2TBNs) (Boutilier et al., 1995). Although these representations differ in the type of planning domains they can express naturally (Boutilier et al., 1998), they are computationally equivalent; a planning domain expressed in one representation can be converted in polynomial time to an equivalent planning domain expressed in the other with at most a polynomial increase in representation size (Littman, 1997a). In this work, we focus on a propositional representation called the sequential-effectstree representation (ST) (Littman, 1997a), which is a syntactic variant of 2TBNs with conditional probability tables represented as trees (Boutilier et al., 1995, 1998). This representation is equivalent to 2TBNs and PSOs and simplifies the presentation of our results. 2. It is also possible to formulate the objective as one of maximizing expected total discounted reward (Boutilier et al., 1995), but the two formulations are essentially polynomially equivalent (Condon, 1992). The only difficulty is that compactly represented domains may require discount factors exponentially close to one for this equivalence to hold. This is discussed further in Section 5. 3. We assume that the number of bits used to represent the individual probability values isn't too large. 3 Littman, Goldsmith & Mundhenk In ST, the effect of each action on each proposition is represented as a separate decision tree. For a given action a, the set of decision trees for the different propositions is ordered, so the decision tree for one proposition can refer to both the new and old values of previous propositions; this allows ST to represent any probability distribution. The leaves of a decision tree describe how the associated proposition changes as a function of the state and action, perhaps probabilistically. Section 1.2 gives a simple example of this representation. As in other propositional representations, the states in the set of goal states G are not explicitly enumerated in ST. Instead, we define a goal set, which is a set of propositions such that any state in which all the goal-set propositions are true is considered a goal state. The set of actions A is explicitly enumerated in ST, just as it is in the flat representation. The ST representation of a planning domain M = hS; s0; A; t; Gi can be defined more formally as M = hP; I; A; T; G i (we use blackboard-bold font to stand for an ST representation on a domain). Here, P is a finite set of distinct propositions. The set of states S is the power set of P; the propositions in s 2 S are said to be "true" in s. The set I ` P is the initial state. The set G is the goal set, so the set of goal states G is the set of states s such that G ` s. The transition function t is represented by a function T, which maps each action in A to an ordered sequence of jPj binary decision trees. Each of these decision trees has a distinct label proposition, decision propositions at the nodes (optionally labeled with the suffix ":new"), and probabilities at the leaves. The ith decision tree T(a)i for action a defines the transition probabilities t(s; a; s0) as follows. For the ith decision tree, let pi be its label proposition. Define aei to be the value of the leaf node found by traversing decision tree T(a)i, taking the left branch if the decision proposition is in s (or s0 if the decision proposition has the ":new" suffix) and the right branch otherwise. Finally, we let t(s; a; s0) = Y i ae ae i; if pi 2 s0, 1 Gamma aei; otherwise. (1) This definition of t constitutes a well-defined probability distribution over s0 for each a and s. To insure the validity of the representation, we only allow "p:new" to appear as a decision proposition in T(a)i if p is the label proposition for some decision tree T(a)j for j ! i. For this reason, the order of the decision trees in T(a) is significant. To put this another way, a proposition only has a new value after this new value has been defined by some decision tree. The complexity results we derive for ST apply also to PSOs, 2TBNs, and all other computationally equivalent representations. They also hold for the "succinct representation," a propositional representation popular in the complexity-theory literature, which captures the set of transition matrices as a function, most commonly represented by a Boolean circuit that computes that function. ST can straightforwardly be represented as a Boolean circuit, and, in the proof of Theorem 6, we show how to represent particular Boolean circuits in ST. Thus, although we have not shown that the succinct representation is formally equivalent to ST, the two representations are closely related; the proofs we give for ST need to be changed only slightly to work for the succinct representation (Goldsmith, Littman, & Mundhenk, 1997a, 1997b; Mundhenk et al., 1997b). Our results require that we restrict the succinct representation to generate transition probabilities with at most a polynomial 4 Complexity of Probabilistic Planning number of bits; the results may be different for other circuit-based representations that can represent probabilities with an exponential number of bits (Mundhenk et al., 1997a). 1.2 Example Domain To help make these domain-representation ideas more concrete, we present the following simple probabilistic planning domain based on the problem of building a sand castle at the beach. There are a total of four states in the domain, described by combinations of two Boolean propositions, moat and castle (propositions appear in boldface). The proposition moat signifies that a moat has been dug in the sand, and the proposition castle signifies that the castle has been built. In the initial state, both moat and castle are false, and the goal set is fcastleg. There are two actions: dig-moat and erect-castle (actions appear in sans serif). Figure 1 illustrates these actions in ST. Executing dig-moat when moat is false causes moat to become true with probability 1=2; if moat is already true, dig-moat leaves it unchanged. The castle proposition in not affected. The dig-moat action is depicted in the left half of Figure 1. The second action is erect-castle, which appears in the right half of Figure 1. The decision trees are numbered to allow sequential dependencies between their effects to be expressed. The first decision tree is for castle, which does not change value if it is already true when erect-castle is executed. Otherwise, the probability that it becomes true is dependent on whether moat is true; the castle is built with probability 1=2 if moat is true and only probability 1=4 if it is not. The idea here is that building a moat first protects the castle from being destroyed prematurely by the ocean waves. The second decision tree is for the proposition moat. Because erect-castle cannot make moat become true, there is no effect when moat is false. On the other hand, if the moat exists, it may collapse as a result of trying to erect the castle. The label castle:new in the diagram refers to the value of the castle proposition after the first decision tree is evaluated. If the castle was already built when erect-castle was selected, the moat remains built with probability 3=4. If the castle had not been built, but erect-castle successfully builds it, moat remains true. Finally, if erect-castle fails to make castle true, moat becomes false with probability 1=2 and everything is destroyed. Note that given an ST representation of a domain, we can perform a number of useful operations efficiently. First, given a state s and action a, we can generate a next state s0 with the proper probabilities. This is accomplished by calculating the value of the propositions of s0 one at a time in the order given in the representation of a, flipping coins with the probabilities given in the leaves of the decision trees. Second, given a state s, action a, and state s0, we can compute t(s; a; s0), the probability that state s0 is reached from state s when action a is taken, via Equation 1. 1.3 Plan Types and Representations We consider four classes of plans for probabilistic domains. Totally ordered plans are the most basic type, being a finite sequence of actions that must be executed in order; this type of plan ignores the state of the system. Acyclic plans generalize totally ordered plans to include conditional execution of actions. Partially ordered plans are a different way of 5 Littman, Goldsmith & Mundhenk moat 1: moat T F dig-moat 1/2 T castle 1: castle T F erect-castle T F 1/2 T moat 1/4 T moat 2: moat T F T F 1/2 T castle 3/4 T T F castle: new 1 T 1 T 1 T 0 T castle 2: castle T F 0 T1 T Figure 1: Sequential-effects-tree (ST) representation for the sand-castle domain generalizing totally ordered plans in which the precise sequence is left flexible (McAllester & Rosenblitt, 1991). Looping plans generalize acyclic plans to the case in which plan steps can be repeated (Smith & Williamson, 1995; Lin & Dean, 1995). This type of plan is also referred to as a plan graph or policy graph (Kaelbling et al., 1998). In the following sections, we prove computational complexity results concerning each of these plan types. The remainder of this section provides formal definitions of the plan types, illustrated in Figure 2 with examples for the sand-castle domain. In its most general form, a plan (or policy, controller or transducer) is a program that outputs actions and takes as input information about the outcome of these actions. In this work, we consider only a particularly restricted finite-state-controller-based plan representation. A plan P for a planning domain M = hS; s0; A; t; Gi can be represented by a structure (V; v0; E; ss; ffi) consisting of a directed (multi) graph (V; E) with initial node v0 2 V , a labeling ss : V ! A of plan nodes--called plan steps--to domain actions, and a labeling of edges with state sets ffi : E ! P(S) such that for every v 2 V with outgoing edges,S v02V :(v;v0)2E ffi(v; v0) = S and ffi(v; v1) " ffi(v; v2) = ; for all v1; v2 2 V , v1 6= v2. Some plansteps have no outgoing edges at all--these are the terminal steps. Actions for terminal steps are not executed. Note that the function ffi can be represented in a direct manner for flat domains, but for propositional domains, a more compact representation is needed. We assume that for propositional domains, edge labels are given as conjunctions of literals. The behavior of plan P in domain M is as follows. The initial time step is t = 0. At time step t * 0, the domain is in state st and the plan is at step vt (s0 is defined by the planning domain, v0 by the plan). Action ss(vt) is executed, resulting in a transition to domain state st+1 with probability t(st; ss(vt); st+1). Plan step vt+1 is chosen so that st+1 2 ffi(vt; vt+1); the function ffi tells the plan where to "go" next. At this point, the time-step index t is incremented and the process repeats. This continues until a terminal step is reached in the plan. One can understand the behavior of domain M under plan P in several different ways. The possible sequences of states of M can be viewed as a tree: each node of the tree at depth t is a state reachable from the initial state at time step t. Alternatively, one can view the state of M at time step t under plan P as a probability distribution over S. At time 6 Complexity of Probabilistic Planning step 0, with probability 1 the process is in state s0. The probability that M is in state s0 at time step t + 1, Pr(s0; t + 1), is the sum of the probabilities of all length t + 1 paths from s0 to s0, i.e., X s0;s1;s2;:::;st;st+1=s0 tY j=1 t(sj; aj; sj+1); where aj is the action selected by plan P at time j given the observed sequence of state transitions s0; : : : ; sj. This view is useful in some of the later proofs. Next, we formalize the probability that domain M reaches a goal state under plan P . We need to introduce several notions. A "legal" sequence of states and steps applied is called a trajectory, i.e., for M and P this is a sequence ff = h(si; vi)iki=0 of pairs with ffl t(si; ss(vi); si+1) ? 0 for 0 ^ i ^ k Gamma 1, ffl si+1 2 ffi(vi; vi+1) for 0 ^ i ^ k Gamma 1, and ffl v0; : : : ; vkGamma 1 are not terminal steps. A goal trajectory is a trajectory that ends in a goal state of M , sk 2 G. Note that each goal trajectory is finite. Thus, we can calculate the probability of a goal trajectory ff = h(si; vi)iki=0 as Pr(ff) = QkGamma 1i=0 t(si; ss(vi); si+1), given that sk 2 G. The probability that M reaches a goal state under plan P is the sum of the probabilities of goal trajectories for M , Pr(M reaches a goal state under P ) := X ff goal trajectory Pr(ff); we call this the value of the plan. We characterize a plan P = (V; v0; E; ss; ffi) on the basis of the size and structure of its underlying graph (V; E). If the graph (V; E) contains no cycles, we call it an acyclic plan, otherwise it is a looping plan. It follows that an acyclic plan has a terminal step, and that a terminal step will be reached after no more than jV j actions are taken; such plans can only be used for finite-horizon control. A totally ordered plan (sometimes called a "linear plan" or a "straight line" plan) is an acyclic plan with no more than one outgoing edge for each node in V . Such a plan is a simple path. In this work, we also consider partially ordered plans (sometimes called "nonlinear" plans) that express an entire family of totally ordered plans. In this representation, the steps of the plan are given as a partial order (specified, for example, as a directed acyclic graph). This partial order represents a set of totally ordered plans: all totally ordered sequences of plan steps consistent with the partial order that consist of all steps of the partially ordered plan. Each of these totally ordered plans has a value, and these values need not all be the same. As such, we have a choice in defining the value for a partially ordered plan. In this work, we consider the optimistic, pessimistic, and average interpretations. Let Omega (P ) be the set of totally ordered sequences consistent with partial order plan P . Under the optimistic interpretation, The value of P := max p2Omega (P ) Pr(M reaches a goal state under p): 7 Littman, Goldsmith & Mundhenk Under the pessimistic interpretation, The value of P := min p2Omega (P ) Pr(M reaches a goal state under p): Under the average interpretation, The value of P := 1jOmega (P )j X p2Omega (P ) Pr(M reaches a goal state under p): To illustrate these notions, Figure 2 gives plans of each type for the sand-castle domain described earlier. Initial nodes are marked an incoming arrow, and terminal steps are represented as filled circles. The 3-step totally ordered plan in Figure 2(a) successfully builds a sand castle with probability 0:4375. An acyclic plan is given in Figure 2(b), which succeeds with probability 0:46875 and executes dig-moat an average of 1:75 times. Note that it succeeds more often and with fewer actions on average than the totally ordered plan in Figure 2(a). Figure 2(c) illustrates a partially ordered plan for the sand-castle domain. While this plan bears a superficial resemblance to the acyclic plan in Figure 2(b), it has a different interpretation. In particular, the plan in Figure 2(c) represents a set of totally ordered plans with five (non-terminal) plan steps (3 dig-moat steps and 2 erect-castle steps). In contrast to the solid arrows in Figure 2(b), which indicate flow of control, the dashed arrows in Figure 2(c) represent ordering constraints: each erect-castle step must be preceded by at least two dig-moat steps, for example. Although there are ` 52 ' = 10 distinct ways of arranging the five plan steps in Figure 2(c) into a totally ordered plan, only two distinct totally ordered plans are consistent with the ordering constraints: dig-moat ! dig-moat ! dig-moat ! erect-castle ! erect-castle ! ffl (success probability 0:65625) and dig-moat ! dig-moat ! erect-castle ! dig-moat ! erect-castle ! ffl (success probability 0:671875). Thus, the optimistic success probability of this partially ordered plan is 0:671875, the pessimistic 0:65625. Note that the pessimistic interpretation is closely related to the standard interpretation in deterministic partial order planning (McAllester & Rosenblitt, 1991), in which a partially ordered plan is considered successful only if all its consistent totally ordered plans are successful. The average success probability is 0:6614583, here, because there are 4 orderings that yield the poorer plan described above, and 2 that yield the better one. The looping plan in Figure 2(d) does not terminate until it succeeds in building a sand castle, which it will do with probability 1:0 eventually. Of course, not all looping plans succeed with probability 1; the totally ordered plan in Figure 2(a) and the acyclic plan in Figure 2(b) are special cases of such looping plans, for instance. We define jP j the size of a plan P to be the number of steps it contains. We define jM j the size of a domain M to be the sum of the number of actions and states for a flat domain and the sum of the sizes of the ST decision trees for a propositional domain. 8 Complexity of Probabilistic Planning erect-castledig-moat dig-moat (a) A totally ordered plan. dig-moat erect-castledig-moat dig-moat (b) An acyclic (conditional) plan. not(moat) dig-moat (c) A partially ordered plan. erect-castledig-moat erect-castledig-moat erect-castledig-moat (d) A looping plan. moat moat not(moat) not(moat) moat moat and not(castle) not(moat) and not(castle) castle Figure 2: Example plans for the sand-castle domain We consider the following decision problems. The plan-evaluation problem asks, given a domain M , a plan P of size jP j ^ jM j, and threshold `, whether its value is greater than `, i.e., whether Pr(M reaches a goal state under P ) ? `: Note that the condition that jP j ^ jM j is just a technical one--we simply want to use jM j to represent the size of the problem. Given an instance in which jP j is larger than jM j, we simply imagine "padding out" jM j to make it larger. The important thing is that we are considering plans that are roughly the size of the description of the domain, and not the size of the number of states (which might be considerably larger). The plan-existence problem asks, given domain M , threshold `, and size bound z ^ jM j, whether there exists a plan P of size z with value greater than `. Note that because we bound the size of the target plan, the complexity of plan generation is no more than that of plan existence; the technique of self-reduction can be used to construct a valid plan using polynomially many calls to an oracle for the decision problem. Each of these decision problems has a different version for each type of domain (flat and propositional) and each type of plan category (looping, acyclic, totally ordered, and partially ordered under each of the three interpretations). We address all of these problems in the succeeding sections. 1.4 Complexity Classes For definitions of complexity classes, reductions, and standard results from complexity theory, we refer the reader to Papadimitriou (1994). Briefly, we are looking only at the complexity of decision problems (those with yes/no answers). The class P consists of problems that can be decided in polynomial time; that is, given an instance of the problem, there is a program for deciding whether the answer is yes or no that runs in polynomial time. The class NP contains the problems with polynomialtime checkable polynomial-size certificates: for any given instance and certificate, it can be checked in time polynomial in the size of the instance whether the certificate proves that the instance is in the NP set. This means that, if the answer to the instance is "yes," this 9 Littman, Goldsmith & Mundhenk can be shown in polynomial time given the right key. The class co-NP is the opposite--if the answer is "no," this can be shown in polynomial time given the right key. A problem X is C-hard for some complexity class C if every problem in C can be reduced to it; to put it another way, a fast algorithm for X can be used as a subroutine to solve any problem in C quickly. A problem is C-complete if it is both C-hard and in C; these are the hardest problems in the class. In the interest of being complete, we next give more detailed descriptions of the less familiar probabilistic and counting complexity classes we use in this work. The class #L ( `Alvarez & Jenner, 1993) is the class of functions f such that, for some nondeterministic logarithmically space-bounded machine N , the number of accepting paths of N on x equals f (x). The class #P is defined analogously as the class of functions f such that, for some nondeterministic polynomial-time-bounded machine N , the number of accepting paths of N on x equals f (x). Typical complete problems are computing the determinant for #L and computing the permanent for #P. A function f is defined to be in GapL if it is the difference f = g Gamma h of #L functions g and h. While #L functions have nonnegative integer values by definition, GapL functions may have negative integer values (for example, if g always returns zero). Probabilistic logspace (Gill, 1977), PL, is the class of sets A for which there exists a nondeterministic logarithmically space-bounded machine N such that x 2 A if and only if the number of accepting paths of N on x is greater than its number of rejecting paths. In the original definition of PL, there is no time bound on computations; Borodin, Cook, and Pippenger (1983) later showed PL ` P. Jung (1985) proved that any set computable in probabilistic logspace is computable in probabilistic logspace where the PL machine has a simultaneous polynomial-time bound. In apparent contrast to P-complete sets, sets in PL are decidable using very fast parallel computations (Borodin et al., 1983). Probabilistic polynomial time, PP, is defined analogously. A classic PP-complete problem is Majsat: given a Boolean formula in conjunctive normal form (CNF), does the majority of assignments satisfy it? According to Balc'azar, D'iaz, and Gabarr'o (1990), the PP-completeness of Majsat was shown in a combination of results from Gill (1977) and Simon (1975). For polynomial-space-bounded computations, PSPACE equals probabilistic PSPACE, and #PSPACE is the same as the class of polynomial-space-computable functions (Ladner, 1989). Note that L, NL, #L, PL and GapL are to logarithmic space what P, NP, #P, PP, and GapP are to polynomial time. Also, the notion of completeness we use in this paper relies on many-one reductions. In the case of PL, the reduction functions are logarithmic space; in the case of NP and above, they are polynomial time. For any complexity classes C and C0 the class CC 0 consists of those sets that are C-Turing reducible to sets in C0, i.e., sets that can be accepted with resource bounds specified by C, using some problem in C0 as a subroutine (oracle) with instantaneous output. For any class C ` PSPACE, it is the case that NPC ` PSPACE, and therefore NPPSPACE = PSPACE. The primary oracle-defined class we consider is NPPP. It equals the "^NPm " closure of PP (Tor'an, 1991), which can be seen as the closure of PP under polynomial-time disjunctive reducibility with an exponential number of queries (each of the queries computable in polynomial time from its index in the list of queries). To simplify our completeness results 10 Complexity of Probabilistic Planning for this class, we introduce a decision problem we call E-Majsat ("exists" Majsat), which generalizes the standard NP-complete satisfiability problem and the PP-complete Majsat. An E-Majsat instance is defined by a CNF Boolean formula OE on n Boolean variables x1; : : : ; xn and a number k between 1 and n. The task is to decide whether there is an initial partial assignment to variables x1; : : : ; xk so that the majority of assignments that extend that partial assignment satisfies OE. We prove that this problem is NPPP-complete in the Appendix. The complexity classes we consider satisfy the following containment properties and relations to other well-known classes: L ` NL ` PL ` P ` NPco-NP ` PP ` NP PP co-NPPP ` PSPACE ` EXP: Because P is properly contained in EXP, EXP-complete problems are provably intractable; the other classes may equal P, although that is not generally believed to be the case. Several other observations are worth making here. It is also known that PH ` NPPP, where PH represents the polynomial hierarchy. In a crude sense, PH is close to PSPACE, and, thus, our NPPP-completeness results place important problems close to PSPACE. However, some early empirical results (Littman, 1997b) show that random problem instances from PP have similar properties to random problem instances from NP, suggesting that PP might be close enough to NP for NP-type heuristics to be effective. 1.5 Results Summary Tables 1 and 2 summarize our results, which are explained in more detail in later sections. The general flavor of our main results and techniques can be conveyed as follows. To show that a plan-evaluation problem is in a particular complexity class C, we take the cross product of the steps of the plan and the states of the domain and then look at the complexity of evaluating the absorption probability of the resulting Markov chain (i.e., the directed graph with probability-labeled edges). The complexity of the corresponding planexistence problem is then bounded by NPC, because the problem can be solved by guessing the correct plan non-deterministically and then evaluating it; in many cases, it is NPCcomplete. The appropriate complexity class C depends primarily on the representation of the cross-product Markov chain. Exceptions to this basic pattern are the results for partially ordered plans in Section 4. These appear to require a distinct set of techniques. It is also worth noting that, although propositional domains can be exponentially more compact than flat domains, the computational complexity of solving problems in propositional domains is not always exponentially greater; in one instance, evaluating partially ordered plans under the average interpretation, the complexity is actually the same for flat and propositional domains! We also prove results concerning plan evaluation and existence for compactly represented plans (PP-complete and NPPP-complete, Corollary 5), plan existence of "large enough" looping plans in flat domains (P-complete, Theorem 7), plan evaluation and existence for looping plans in deterministic propositional domains (PSPACE-complete, Theorems 8 and 9), and plan existence for polynomial-size looping plans in partially observable domains (NP-complete, Section 5.1). 11 Littman, Goldsmith & Mundhenk Plan Type Plan Evaluation Plan Existence Reference unrestricted -- P-complete P & T (1987) polynomial-depth -- P-complete P & T (1987) looping PL-complete NP-complete Section 3 acyclic PL-complete NP-complete Section 2 totally ordered PL-complete NP-complete Section 2 partially ordered, optimistic NP-complete NP-complete Section 4 partially ordered, average PP-complete NP-complete Section 4 partially ordered, pessimistic co-NP-complete NP-complete Section 4 Table 1: Complexity results for flat representations (P & T (1987) is Papadimitriou and Tsitsiklis (1987)) Plan Type Plan Evaluation Plan Existence Reference unrestricted -- EXP-complete Littman (1997a) polynomial-depth -- PSPACE-complete Littman (1997a) looping PSPACE-complete PSPACE-complete Section 3 acyclic PP-complete NPPP-complete Section 2 totally ordered PP-complete NPPP-complete Section 2 partially ordered, optimistic NPPP-complete NPPP-complete Section 4 partially ordered, average PP-complete NPPP-complete Section 4 partially ordered, pessimistic co-NPPP-complete NPPP-complete Section 4 Table 2: Complexity results for propositional representations 12 Complexity of Probabilistic Planning 2. Acyclic Plans In this section, we treat the complexity of generating and evaluating acyclic and totally ordered plans. Theorem 1 The plan-evaluation problem for acyclic and totally ordered plans in flat domains is PL-complete. Proof: First, we show PL-hardness for totally ordered plans. Jung (1985) proved that a set A is in PL if and only if there exists a logarithmically space-bounded and polynomially time-bounded nondeterministic Turing machine N with the following property: For every input x, machine N must have at least half of its computations on input x be accepting if and only if x is in A. The machine N can be transformed into a probabilistic Turing machine R such that for each input x, the probability that R(x) accepts x equals the fraction of computations of N (x) that accepted. Given R, a planning domain M can be described as follows. The state set of M is the set of configurations of R on input x. Note that a configuration consists of the contents of the logarithmically space-bounded tape, the state, the location of the read/write heads, and one symbol each from the input and output tapes. Thus, a configuration can be represented with logarithmically many bits, and there are only polynomially many such configurations. The state-transition probabilities of M under the unique action a are the configuration transition probabilities of R. All states obtained from accepting configurations are goal states. The totally ordered plan consists of a "step counter" for R on input x, and each of its plan steps takes the only action a. The probability that the planning domain under this plan reaches a goal state is exactly the probability that R(x) reaches an accepting configuration. Thus, evaluating this totally ordered plan is PL-hard. Since totally ordered plans are acyclic plans, this also proves PL-hardness of the planevaluation problem for acyclic plans. Next, we show that the plan-evaluation problem is in PL for acyclic plans. Let M = hS; s0; A; t; Gi be a planning domain, let P = hV; v0; E; ss; ffii be an acyclic plan, and let threshold ` be given. We show how our question, whether the probability that M under P reaches a goal state with probability greater than `, can be equivalently transformed into the question of whether a GapL function is greater than 0. The transformation can be done in logarithmic space. As shown by Allender and Ogihara (1996), it follows that our question is in PL. At first, we construct a Markov chain C from M and P , which simulates the execution or "evaluation" of M under P . Note that a Markov chain can be seen as a probabilistic domain with only one action in its set of actions. Since there is no choice of actions, we do not mention them in this construction. The state space of C is S Theta V , the initial state is (s0; v0), the set of goal states is G Theta V , and the transition probabilities tC for C are tC ((s; v); (s0; v0)) = 8!: t(s; ss(v); s0); if s0 2 ffi(v; v0); 1; if v is a terminal step node, and (s; v) = (s0; v0); 0; otherwise. Let m be the number of plan steps of P (i.e., jV j, the number of nodes in the graph representing P ). Since states of C that contain a terminal step of P are sinks in C, it follows 13 Littman, Goldsmith & Mundhenk that Pr(M reaches a goal state under P ) = Pr(C reaches a goal state in exactly m steps): Let pC (s; m) := Pr(C reaches a goal state in exactly m steps from initial state s): Then, pC ((s0; v0); m) is the probability we want to calculate. The standard inductive definition of pC used to evaluate plans by dynamic programming is pC(s; 0) = ae 1; if s is a goal state of C,0; otherwise, pC (s; k + 1) = X s02STheta V tC (s; s0) Delta pC (s0; k); 0 ^ k ^ m Gamma 1: Let h be the maximum length of the representation of a state-transition probability tC . Then, for ph(s; 0) = ae 1; if s is a goal state of C,0; otherwise, ph(s; k + 1) = X s02STheta V 2h Delta tC (s; s0) Delta ph(s0; k); 0 ^ k ^ m Gamma 1; it follows that pC ((s0; v0); m) = ph((s0; v0); m) Delta 2Gamma hm. Note that ph((s0; v0); m) is an integer value. Therefore, pC ((s0; v0); m) ? ` if and only if ph((s0; v0); m) Gamma b2hm`c ? 0. In order to show that pC ((s0; v0); m) ? ` is decidable in PL, it suffices to show that ph((s0; v0); m) is in GapL. Therefore, we "unwind" the inductive definition of ph. Let T be the integer matrix obtained from tC with T(s;s0) = tC (s; s0) Delta 2h. We introduce the integer-valued T to show that ph can be composed from GapL functions using compositions under which GapL is closed; as tC is not integer valued, it cannot be used to show this. We can write ph(s; m) = X s02STheta V (T m)(s;s0) Delta ph(s0; 0): We argue that ph is in GapL. Each entry T(s;s0) is logspace computable from the domain M and plan P . Therefore, the powers of the matrix are in GapL, as shown by Vinay (1991). Because GapL is closed under multiplication and summation of polynomially many summands, it follows that ph 2 GapL. Finally, we use closure properties of GapL from Allender and Ogihara (1996); since GapL is closed under subtraction, it follows that the plan-evaluation for acyclic plans is in PL. Because totally ordered plans are acyclic plans, the plan-evaluation problem for totally ordered plans is also in PL. The technique of forming a Markov chain by taking the cross product of a domain and a plan will be useful later. Plan-existence problems require a different set of techniques. Theorem 2 The plan-existence problem for acyclic and totally ordered plans in flat domains is NP-complete. 14 Complexity of Probabilistic Planning Proof: First, we show containment in NP. Given a planning domain M , a threshold `, and a size bound z ^ jM j, guess a plan of the correct form of size at most z and accept if and only if M reaches a goal state with probability greater than ` under this plan. Note that checking whether a plan has the correct form can be done in polynomial time. Because the plan-evaluation problem is in PL (Theorem 1), it follows that the plan-existence problem is in NP (i.e., it is in NPPL = NP). To show the NP-hardness of the plan-existence problem, we give a reduction from the NP-complete satisfiability problem for Boolean formulae in conjunctive normal form. We construct a planning domain that evaluates a Boolean formula with n variables, where a (n + 2)-step plan describes an assignment of values to the variables. In the first step, a clause is chosen randomly. At step i + 1, the planning domain "checks" whether the plan satisfies the appearance of variable i in that clause. If so, the clause is marked as satisfied. After n + 1 steps, if no literal was satisfied in that clause, then no goal state is reached through this clause, otherwise, a transition is made to the goal state. Therefore, the goal state will be reached with probability 1 (greater than 1 Gamma 1=m) if and only if all clauses are satisfied--the plan describes a satisfying assignment. We formally define the reduction, which is similar to one presented by Papadimitriou and Tsitsiklis (1987). Let OE be a CNF formula with n variables x1; : : : ; xn and m clauses C1; : : : ; Cm. Let the sign of an appearance of a variable in a clause be Gamma 1 if the variable is negated, and 1 otherwise. Define the planning domain M (OE) = hS; s0; A; t; Gi where S = fsat(i; j); unsat(i; j) j 1 ^ i ^ n + 1; 1 ^ j ^ mg [ fs0; sacc; srejg; A = fassign(i; b) j 1 ^ i ^ n; b 2 fGamma 1; 1gg [ fstart; endg; G = fsaccg; t(s; a; s0) = 8?????? ????????? ????????? ???! ????????? ????????? ????????? : 1 m ; if s = s0; a = start; s 0 = unsat(1; j); 1 ^ j ^ m; 1; if s = s0; a 6= start; s0 = srej; 1; if s = unsat(i; j); a = assign(i; b); s0 = sat(i + 1; j); i ^ n; xi appears in Cj with sign b; 1; if s = unsat(i; j); a = assign(i; b); s0 = unsat(i + 1; j); i ^ n; xi does not appear in Cj with sign b; 1; if s = unsat(i; j); a = assign(i0; b) or a = start or a = end; s0 = srej; i0 6= i ^ n; b 2 fGamma 1; 1g; 1; if s = unsat(n + 1; j); s0 = srej; 1; if s = sat(i; j); a = assign(i; b); s0 = sat(i + 1; j); i ^ n; 1; if s = sat(i; j); a = assign(i0; b) or a = start or a = end; s0 = srej; i0 6= i ^ n; 1; if s = sat(n + 1; j); a = end; s0 = sacc; 1; if s = sat(n + 1; j); a 6= end; s0 = srej; 1; if s = s0 = srej or s = s0 = sacc; 0; otherwise. The meaning of the states in this domain is as follows. When the domain is in state sat(i; j) for 1 ^ i ^ n, 1 ^ j ^ m, it means the formula has been satisfied, and we are currently checking variable i in clause j. State sat(n + 1; j) for all 1 ^ j ^ m means that we've finished verifying clause j and it was satisfied. The meanings are similar for the 15 Littman, Goldsmith & Mundhenk unsat(1; 1) unsat(4; 1) unsat(3; 1) unsat(2; 1)sat(2; 1) sat(3; 1) sat(4; 1) s0 unsat(1; 2) unsat(4; 2) unsat(3; 2) unsat(2; 2)sat(2; 2) sat(3; 2) sat(4; 2) srej start start assign(1; 1) assign(1; Gamma 1) assign(1; 1) assign(2; x) assign(3; x) assign(2; Gamma 1) assign(2; x) assign(3; x) assign(3; 1) assign(3; Gamma 1) end end endend assign(1; Gamma 1) sacc assign(2; 1) 1=21=2 Figure 3: A domain generated from the Boolean formula (x1 . :x2) ^ (:x1 . x3) "unsat" states. Of course, s0 is the initial state and sacc and srej are the accepting and rejecting states, respectively. The actions in this domain are start and end, which mark the beginning and end of the assignment, and assign(i; b) for 1 ^ i ^ n, b 2 fGamma 1; 1g, which assign the truth value b to variable i. Figure 3 gives the domain generated by this reduction from a simple Boolean formula. By the description of the reduction, M (OE) can be computed from OE in time polynomial in jOEj. By construction, M (OE) under z = (n + 2)-step plan P can only reach goal state sacc if P has the form start ! assign(1; b1) ! assign(2; b2) ! Delta Delta Delta ! assign(n; bn) ! end ! ffl: P reaches sacc with probability 1 if and only if b1; : : : ; bn is a satisfying assignment for the n variables in OE. This shows that Boolean satisfiability polynomial-time reduces to the plan-existence problem for totally ordered and acyclic plans, showing that it is NP-hard. Note that if we bound the plan depth (horizon) instead of the plan size, the planexistence problem for acyclic plans in flat domains is P-complete (Goldsmith et al., 1997a; Papadimitriou & Tsitsiklis, 1987). Limiting the plan size makes the problem more difficult because it is possible to force the planner to take the same action from different states; figuring out how to do this without sacrificing plan quality is very challenging. In propositional domains, plan evaluation is harder because of the large number of states. Theorem 3 The plan-evaluation problem for acyclic and totally ordered plans in propositional domains is PP-complete. Proof: To show PP-hardness for totally ordered plans, we give a reduction from the PP-complete problem Majsat: given a CNF Boolean formula OE, does the majority of assignments satisfy it? 16 Complexity of Probabilistic Planning xa1:new 1: xi T F T F ... n+m+1: satisfied clause1:new T F clause2:new T F clausem:new T F ... evaluate 1/2 T 2: xi 1/2 T n: xi 1/2 T n+1: clause1 xb1:new 1 T 1 T xc1:new T F 1 T ... xam:new T F T F n+m: clausem xbm:new1 T 1 T xcm:new T F 0 T1 T 0 T 0 T 0 T1 T done T F 0 T n+m+2: done 1 T xd1:new T F 1 T0 T Figure 4: Sequential-effects-tree representation for evaluate Given OE, we construct a planning domain M (OE) and a 1-step plan such that the plan achieves the goal with probability greater than ` = 1=2 if and only if the majority of assignments satisfies OE. The planning domain M (OE) consists of a single action evaluate, which is also the 1-step plan to be evaluated. There are n + m + 2 propositions in M (OE); x1 through xn, which correspond to the n variables of OE; clause1 through clausem, which correspond to the m clauses of OE; satisfied, which is also the sole element of the goal set; and done, which insures that evaluate is only executed once (this is important when this domain is used later in Theorem 4 to show the complexity of plan existence). In the initial state, all propositions are false. The evaluate action generates a random assignment to the variables of OE, evaluates the clauses (clausei is true if any of the literals in the ith clause is true), and evaluates the entire formula (satisfied is true if all the clauses are true). Figure 4 gives an ST representation of evaluate, in which xai ; xbi ; : : : represent the variables in clause i. By construction, OE is in Majsat if and only if M (OE) reaches a goal state with probability greater than ` = 1=2 under the plan consisting of the single action evaluate. We next show membership in PP for acyclic plans. We do this by showing that a planning domain M and an acyclic plan P induce a computation tree consisting of all paths through M under P . Evaluating this computation tree can be accomplished by a PP machine. Let b be a bound on the number of bits used to specify probabilities in the leaves of the decision trees representing M .4 Consider a computation tree defined as follows. It has root labeled hs0; v0i. If, in the planning domain M , the probability of reaching state s0 from s 4. We represent numbers in polynomial-precision binary representation. In principle, this could introduce round-off errors if planning problems are specified in some other form. 17 Littman, Goldsmith & Mundhenk given action ss(v) is equal to *, then hs; ss(v)i will have * Delta 2b children labeled hs0; ffi(v; s0)i. Each of the identically labeled child nodes is independent but is defined identically to the others. Thus, the number of paths with a given set of labels corresponds to the probability of that trajectory through the domain and plan multiplied by (2b)h, where h is the depth of the plan. The number of accepting computations is, therefore, more than ` Delta (2b)h if and only if the probability of achieving the goal is more than `. Note that b is inherent in the planning domain, rather than in h. A PP machine accepts if more than half of the final states are accepting, so if ` 6= 1=2, it will be necessary to pad the computation tree by introducing "dummy" branches that accept or reject in the right proportions. The plan-existence problem is essentially equivalent to guessing and evaluating a valid plan. Theorem 4 The plan-existence problem for acyclic and totally ordered plans in propositional domains is NPPP-complete. Proof: Containment in NPPP for both totally ordered and for acyclic plans follows from the fact that a polynomial-size plan can be guessed in polynomial time and checked in PP (Theorem 3). Hardness for NPPP for both totally ordered and acyclic plans can be shown using a reduction from E-Majsat, shown NPPP-hard in the Appendix. The reduction echoes the one used in the PP-hardness argument in the proof of Theorem 3. Given a CNF Boolean formula OE with variables x1; : : : ; xn, and a number k, we construct a planning domain M (OE; k) such that a plan exists that can reach the goal with probability greater than ` = 1=2 if and only if there is an assignment to the variables x1; : : : ; xk such that the majority of assignments to the remaining variables satisfies OE. The planning domain M (OE; k) consists of the action evaluate from Theorem 3 and one action, set-xi, for each of the first k variables. Just as in the proof of Theorem 3, there are n + m + 2 propositions in M (OE; k), all initially false: x1 through xn, which correspond to the n variables of OE; clause1 through clausem, which correspond to the m clauses of OE; satisfied; and done, which insures that evaluate is only executed once. The goal set contains satisfied and done. For 1 ^ i ^ k, action set-xi makes proposition xi true. Analogously to Theorem 3, the evaluate action generates a random assignment to the remaining variables of OE, evaluates the clauses (clausei is true if any of the literals in the clause is true), and evaluates the entire formula (satisfied is true if all the clauses are true), and sets done to true. If done is true, no further action can make satisfied true. If the pair OE; k is in E-Majsat, then there exists an assignment b1 : : : bk to the first k variables of OE such that the majority of assignments to the rest of the variables satisfies OE. Therefore, the plan applying steps set-xi for all i with bi = 1 followed by an evaluate action reaches a goal state with probability greater than ` = 1=2. Conversely, assume M (OE; k) under totally ordered plan P reaches a goal state with probability greater than 1=2. Since the evaluate action is the only action setting done to true, and since no action reaches the goal once done is set to true, we can assume without loss of generality that P consists of a sequence of steps set-xi that ends with evaluate. By construction, the assignment to x1; : : : ; xk assigning 1 exactly to those variables set by P 18 Complexity of Probabilistic Planning is an assignment under which the majority of the assignments to the rest of the variables satisfies OE, and therefore OE; k is in E-Majsat. Since every totally ordered plan is acyclic, the same hardness holds for acyclic plans. In the above results, we consider both flat and compactly represented (propositional) planning domains but only flat plans. Compactly represented plans are also quite useful. A compact acyclic plan is an acyclic plan in which the names of the plan steps are encoded by a set of propositional variables and the step-transition function ffi between plan steps is represented by a set of decision trees, just as in ST. We require that the plan has depth polynomial in the size of the representation, even though the total number of steps in the plan might be exponential due to the logarithmic succinctness of the encodings. Because the plan-domain cross-product technique used in the proof of Theorem 3 generalizes to compact acyclic plans, the same complexity results apply. This also holds true for a probabilistic acyclic plan, which is an acyclic plan that can make random transitions between plan steps (i.e., the step-transition function ffi is stochastic). These insights can be combined to yield the following corollary of Theorems 3 and 4. Corollary 5 The plan-evaluation problem for compact probabilistic acyclic plans in propositional domains is PP-complete and the plan-existence problem for compact probabilistic acyclic plans in propositional domains is NPPP-complete. We mention probabilistic plans here for two reasons. First, the behavior of some planning structures (such as partially ordered plan evaluation under the average interpretation, discussed in Section 4) can be thought of as generating probabilistic plans. Second, there are many instances in which simple probabilistic plans perform nearly as well as much larger and more complicated deterministic plans; this notion is often exploited in the field of randomized algorithms. Work by Platzman (1981) (described by Lovejoy, 1991) shows how the idea of randomized plans can come in handy for planning in partially observable domains. 3. Looping Plans Looping plans can be applied to infinite-horizon control. The complexity of plan existence and plan evaluation in flat domains (Theorems 1 and 2) does not depend on the presence or absence of loops in the plan. Theorem 6 The plan-evaluation problem for looping plans in flat domains is PL-complete. Proof: Given a domain M and a looping plan P , we can construct a product Markov chain C as in the proof of Theorem 1. As in the proof of Theorem 6 of Allender and Ogihara (1996), this chain can be constructed such that it has exactly one accepting and exactly one rejecting state; both of these states are absorbing. The probability that M reaches a goal state under P equals the probability that C reaches its accepting state if started in its initial state, which is the product of the initial states of M and P . In the 19 Littman, Goldsmith & Mundhenk proof of Theorem 6 of Allender and Ogihara (1996), it is shown that the construction of the Markov chain and the computation of whether it reaches its final state with probability greater than ` can be performed in PL. PL-hardness is implied by Theorem 1, since acyclic plans are a special case of looping plans. Theorem 7 The plan-existence problem for looping plans in flat domains is NP-complete in general, but P-complete if the size of the desired plan is at least the size of the state or action space (i.e., z * min(jSj; jAj)). Proof sketch: NP-completeness follows from the proof of Theorem 2; containment and hardness still hold if plans are permitted to be looping. However, this is only true if we are forced to specify a plan whose size is small with respect to the size of the domain. If our looping plan is allowed to have a number of states that is at least as large as the number of states or actions in the domain, the problem can be solved in polynomial time. It is known that for Markov decision processes such as these the maximum probability of reaching a goal state equals the maximum probability of reaching a goal state under any infinite-horizon stationary policy, where a stationary policy is a mapping from states to actions that is used repeatedly to choose actions at each time step. It is known that such an optimal stationary policy can be computed in polynomial time via linear programming (Condon, 1992). Any stationary policy for a domain M = hS; s0; A; G; ti can be written as a looping plan, although, of course, not all looping plans correspond to stationary policies. We show that for any fixed stationary policy p : S ! A, there are two simple ways a looping plan P = (V; v0; E; ss; ffi) can be represented. First, let V = A, v0 = p(s0), ss(v) = v, and ffi(v; v0) = fs 2 S j p(s) = v0g. It follows that whenever M reaches state s, then the action applied according to the looping plan is the same as according to P . Second, let V = S, v0 = s0, ss(v) = p(v), and ffi(v; v0) = fv0g. It follows that whenever M reaches state s, the plan will be at the node corresponding to that state and, therefore, the appropriate action for that state will be applied by the looping plan. Therefore, the maximum probability of reaching a goal state can be obtained by either of these looping plans. Since the best stationary policy can be computed in polynomial time, the best looping plan can be computed in polynomial time, too. P-hardness follows from a theorem of Papadimitriou and Tsitsiklis (1987). In propositional domains, the complexity of plan existence and plan evaluation of looping plans is quite different from the acyclic case. Looping plan evaluation is very hard. Theorem 8 The plan-evaluation problem for looping plans in both deterministic and stochastic propositional domains is PSPACE-complete. Proof: Recall that the plan-evaluation problem for flat domains is in PL (Theorem 1). For a planning domain with cn states and a representation of size n, a looping plan can 20 Complexity of Probabilistic Planning be evaluated in probabilistic space O(log(cn)) (Theorem 6), which is to say probabilistic space polynomial in the size of the input. This follows because the ST representation of the domain can be used to compute entries of the transition function t in polynomial space. Since probabilistic PSPACE equals PSPACE, this shows that the plan-evaluation problem for looping plans in stochastic propositional domains is in PSPACE. It remains to show PSPACE-hardness for deterministic propositional domains. Let N be a deterministic polynomial-space-bounded Turing machine. The moment-to-moment computation state (configuration) of N can be expressed as a polynomial-length bit string that encodes the contents of the Turing machine's tape, the location of the read/write head, the state of N 's finite-state controller, and whether or not the machine is in an accepting state. For any input x, we describe how to construct in polynomial time a deterministic planning domain M (x) and a single-action looping plan that reaches a goal state of M (x) if and only x is accepted by Turing machine N . Given a description of N and x, one can, in time polynomial in the size of the descriptions of N and x, produce a description of a Turing machine T that computes the transition function for N . In other words, T on input c, a configuration of N , outputs the next configuration of N . (In fact, T can even check whether c is a valid configuration in the computation of N (x) by simulating that computation.) By an argument similar to that used in Cook's theorem, T can be modeled by a polynomial-size circuit. This circuit takes as input the bit string describing the current configuration of N and outputs the next configuration. Next, we argue that the computation of this circuit can be expressed by an action compute in ST representation. There is one proposition in M (x) for each bit in the configuration, plus one for each gate of the circuit. The three standard gates, "and," "or," and "not" are all easily represented as decision trees. By ordering the decision trees in compute according to a topological sort of the gates of the circuit, a single compute action can compute precisely the same output as the circuit. Figure 5 illustrates this conversion for a simple circuit, which gives the form of the "not" (i1), "and" (i2), and "or" (i3) gates. We can now describe the complete reduction. The planning domain M (x) consists of the single action compute and the set of propositions described in the previous paragraph. The initial state is the initial configuration of the Turing machine N , and the goal set is the proposition corresponding to whether or not the configuration is an accepting state for N . Because all transitions are deterministic and only one action can be chosen, it follows that the goal state is reached with probability 1 (greater than 1=2, for example) under the plan that repeatedly chooses compute until an accepting state is reached if and only if polynomial-space machine N on input x accepts. A similar argument shows that looping plan existence is not actually any harder than looping plan evaluation. Theorem 9 The plan-existence problem for looping plans in both deterministic and stochastic propositional domains is PSPACE-complete. 21 Littman, Goldsmith & Mundhenk not or and andi1 or c1 c2 c3 c1 c2 c3 not i2i3 c2 T F 1: i1 0 T 1 T compute i1:new T F T F 2: i2 0 T 1 T c1 T F T F 3: i3 c21 T 1 T 0 T 4: c1 5: c2 6: c3 c3 0 T i2:new T F 0 T 1 T c1 T F T F 0 T 1 T i3:new T F T F i2:new1 T 1 T 0 T i2:new 0 T Figure 5: A circuit and its representation as a sequential-effects tree Proof: Hardness for PSPACE follows from the same construction as in the proof of Theorem 8: either the one-step looping plan is successful, or it is not. No other plan yields a better result. Recall that we are only interested in determining whether there is a plan of size z, where z is bounded by the size of the domain, that reaches the goal with a given probability. The problem is in PSPACE because the plan can be guessed in polynomial time and checked in PSPACE (Theorem 8). Because NPPSPACE = PSPACE, the result follows. As we mentioned earlier, the unrestricted infinite-horizon plan-existence problem is EXP-complete (Littman, 1997a); this shows the problem of determining unrestricted plan existence is EXP-hard only because some domains require plans that are larger than polynomialsize looping plans. Because Theorem 9 shows PSPACE-completeness for determining plan existence in deterministic domains, it is closely related to the PSPACE-completeness result of Bylander (1994). The main difference between the two results is that our theorem applies to more compact plans (polynomial instead of exponential) with more complex operator descriptions (conditional effects instead of preconditions with add and delete lists) that can include loops. Also, as the proofs above show, PSPACE-hardness is retained even in planning domains with only one action, so it is the looping that makes looping plans hard to work with. 4. Partially Ordered Plans Partially ordered plans are a popular representation because they allow planning algorithms to defer a precise commitment to the ordering of plan steps until it becomes necessary in 22 Complexity of Probabilistic Planning the planning process. A k-step partially ordered plan corresponds to a set of k-step totally ordered plans--all those that are consistent with the given partial order. The evaluation of a partially ordered plan can be defined to be the evaluation of the best, worst, or average member of the set of consistent totally ordered plans; these are the optimistic, pessimistic, and average interpretations, respectively. The plan-evaluation problem for partially ordered plans is different from that of totally ordered plans. This is because a single partial order can encode all totally ordered plans. Hence, evaluating a partially ordered plan involves figuring out the best (in case of optimistic interpretation) or the worst (for pessimistic interpretation) member, or the average (for average interpretation) of this combinatorial set. Theorem 10 The plan-evaluation problem for partially ordered plans in flat domains is NP-complete under the optimistic interpretation. Proof sketch: Membership in NP follows from the fact that we can guess any totally ordered plan consistent with the given partial order and accept if and only if the domain reaches a goal state with probability more than `. Remember that this evaluation can be performed in PL (Theorem 1), and therefore deterministically in polynomial time. The hardness proof is a variation of the construction used in Theorem 2. The partiallyordered plan to evaluate has the form given in Figure 6; the consistent total orders are of the form start ! assign(1; b1) ! assign(1; Gamma b1) ! assign(2; b2) ! assign(2; Gamma b2) ! Delta Delta Delta ! assign(n; bn) ! assign(n; Gamma bn) ! end ! ffl; where bi is either 1 or Gamma 1. Each of the possible plans can be interpreted as an assignment to n Boolean variables by ignoring every second assignment action. The construction in Theorem 2 shows how to turn a CNF formula OE into a planning domain M (OE), and it can easily be modified to ignore every second action. Thus, the best totally ordered plan consistent with the given partially ordered plan reaches the goal with probability 1 if and only if it reaches the goal with probability greater than 1 Gamma 2Gamma m if and only if it satisfies all clauses of OE if and only if OE is satisfiable. Theorem 11 The plan-evaluation problem for partially ordered plans in flat domains is co-NP-complete under the pessimistic interpretation. Proof sketch: Both the proof of membership in co-NP and the proof of hardness are very similar to the proof of Theorem 10. We show a reduction from the co-NP-complete set Sat of unsatisfiable formulae in CNF. The plan to evaluate has the form given in Figure 6 and is interpreted as above. As in the proof of Theorem 2, we construct a planning domain M 0(OE), but we take G = fsrejg as goal states, where the state srej is reached with probability greater than 0 if and only if the assignment does not satisfy one of the clauses of formula OE. A formula is unsatisfiable if and only if under every assignment at least one of the clauses is not satisfied. Therefore, the probability that M 0(OE) reaches a goal state under a given totally ordered plan is greater than 0 if and only if the plan corresponds to an unsatisfying 23 Littman, Goldsmith & Mundhenk ... assign(1,1) assign(2,1) assign(3,1) assign(n,1) assign(1,-1) assign(2,-1) assign(3,-1) assign(n,-1) start end Figure 6: A partially ordered plan that can be hard to evaluate assignment. Finally, the minimum of that probability over all consistent partially ordered plans is greater than 0 if and only if OE is unsatisfiable. Theorem 12 The plan-evaluation problem for partially ordered plans in flat domains is PP-complete under the average interpretation. Proof: Under the average interpretation, we must decide whether the average evaluation over all consistent totally ordered plans is greater than threshold `. This can be decided in PP by guessing uniformly a totally ordered plan and checking its consistency with the given partially ordered plan in polynomial time. If the guessed totally ordered plan is consistent, it can be evaluated in polynomial time (Theorem 1) and accepted or rejected as appropriate. If the guessed plan is inconsistent, the computation accepts with probability ` and rejects with probability 1 Gamma `, leaving the average over the consistent orderings unchanged with respect to the threshold `. The PP-hardness is shown by a reduction from the PP-complete Majsat. Let OE be a formula in CNF. We show how to construct a domain M (OE) and a partially ordered plan P (OE) such that OE 2 Majsat if and only if the average performance of M (OE) under a totally ordered plan consistent with P (OE) is greater than 1=2. Let OE consist of the m clauses C1; : : : ; Cm, which contain n variables x1; : : : ; xn. Domain M (OE) = hS; s0; A; t; Gi has actions A = fassign(i; b) j i 2 f1; : : : ; ng; b 2 fGamma 1; 1gg [ fstart; check; endg: Action assign(i; b) will be interpreted as "assign sign b to xi." The partially ordered plan P (OE) has plan steps V = foe(i; b; h) j i 2 f1; : : : ; ng; b 2 fGamma 1; 1g; h 2 f1; : : : ; mgg [ fstart; check; endg and mapping ss : V ! A with ss(oe) = oe for oe 2 fstart; check; endg, and ss(oe(i; b; h)) = assign(i; b): 24 Complexity of Probabilistic Planning The order E requires that a consistent plan has start as the first and end as the last step. The steps in between are arbitrarily ordered. More formally, E = f(start; q) j q 2 V Gamma fstart; endgg [ f(q; end) j q 2 V Gamma fstart; endgg: Now, we define how the domain M (OE) acts on a given totally ordered plan P consistent with P (OE). Domain M (OE) consists of the cross product of the following polynomial-size deterministic domains Ms and MOE, to which a final probabilistic transition will be added. Before we describe Ms and MOE precisely, here are their intuitive definitions. The domain Ms is satisfied by plans that have the form of an assignment to the n Boolean variables with the restriction that the assignment is repeated m times (for easy checking). The domain MOE is satisfied by plans that correspond to satisfying assignments. The composite of these two domains is only satisfied by plans that correspond to satisfying assignments. We will now define these domains formally. First, Ms checks whether the totally ordered plan matches the regular expression start (assign(1; 0)mjassign(1; 1)m) Delta Delta Delta (assign(n; 0)mjassign(n; 1)m) check ((assign(1; 0)jassign(1; 1)) Delta Delta Delta (assign(n; 0)jassign(n; 1)))m: Note that the m here is a constant. Let "good" be the state reached by Ms if the plan matches that expression. Otherwise, the state reached is "bad". To clarify, the actions before check are there simply to "use up" the extra steps not used in specifying the assignment in the partially ordered plan. Next, MOE checks whether the sequence of actions following the check action satisfies the clauses of OE in the following sense. Let a1 Delta Delta Delta ak be this sequence. MOE interprets each subsequence a1+(jGamma 1)n Delta Delta Delta an+(jGamma 1)n with al+(jGamma 1)m = assign(x; bl) as assignment b1; : : : ; bn to the variables x1; : : : ; xn, and checks whether this assignment satisfies clause Cj. If all single clauses are satisfied in this way, then MOE reaches state "satisfied". Note that Ms and MOE are defined so that they do not deal with the final end action. M (OE) consists of the product domain of Ms and MOE with the transitions for action end as follows. If M is in state (bad; q) for any state q of MOE, then action end lets M go probabilistically to state "accept" or to state "reject", with probability 1=2 each; if M is in state (good; satisfied), the M under action end goes to state "accept" (with probability 1); otherwise, M under action end goes to state reject (with probability 1). The set of goal states of M consists of the only state "accept". We analyze the behavior of M (OE) under any plan P consistent with P (OE). If Ms under P reaches state "bad", then M (OE) under P reaches a goal state with probability 1=2. Now, consider a plan P under which Ms reaches the state "good"--called a good plan. Then P matches the above regular expression. Therefore, for every i 2 f1; : : : ; mg there exists bi 2 fGamma 1; 1g such that all steps s(i; bi; h) are between start and check. Thus, all steps between check and end are s(1; 1 Gamma i1; 1) Delta Delta Delta s(n; 1 Gamma in; 1)s(1; 1 Gamma i1; 2) Delta Delta Delta s(n; 1 Gamma in; m) Consequently, the sequence of actions defined by the labeling of these plan steps are (assign(1; i1)assign(2; i2) Delta Delta Delta assign(n; in))m: 25 Littman, Goldsmith & Mundhenk This means, that MOE checks whether all clauses of OE are satisfied by the assignment i1 Delta Delta Delta in, i.e., MOE checks whether i1 Delta Delta Delta in satisfies OE. Therefore, M (OE) accepts under plan P with probability 1, if the plan represents a satisfying assignment, and with probability 0 otherwise. Note that each assignment corresponds to exactly one good plan. Therefore, the average over all good plans that M (OE) accepts equals the fraction of satisfying assignments of OE. Since M (OE) accepts under "bad" plans with probability 1=2, this yields that the average over all plans consistent with P (OE) of the acceptance probabilities of M (OE) is greater than 1=2 if and only if OE 2 Majsat. The complexity of the plan-existence problem for partially ordered plans is identical to that for totally ordered plans. Theorem 13 The plan-existence problem for partially ordered plans in flat domains is NPcomplete under the pessimistic, optimistic and average interpretations. The plan-existence problem for partially ordered plans in propositional domains is NPPP-complete under the pessimistic, optimistic and average interpretations. Proof: First, note that a totally ordered plan is a special type of partially ordered plan and its evaluation is unchanged under the pessimistic, optimistic, or average interpretation. In particular, because there is only one ordering consistent with a given totally ordered plan, the best, worst, and average orderings are all the same. Therefore, if there exists a totally ordered plan with value greater than `, then there is a partially ordered plan with value greater than ` (the same plan), under all three interpretations. Conversely, if there is a partially ordered plan with value greater than ` under any of the three interpretations, then there is a totally ordered plan with value greater than `. This is because the value of the best, worst, and average ordering of a partially ordered plan is always a lower bound on the value of the best consistent totally ordered plan. Given this strong equivalence, the complexity of plan existence for partially ordered plans is a direct corollary of Theorems 2 and 4. The pattern for partially ordered plan evaluation in flat domains is that the average interpretation is no easier to decide than either the optimistic or pessimistic interpretations. In propositional domains, the pattern is the opposite: the average interpretation is no harder to decide than either the optimistic or pessimistic interpretations. Theorem 14 The plan-evaluation problem for partially ordered plans in propositional domains is NPPP-complete under the optimistic interpretation, co-NPPP-complete under the pessimistic interpretation, and PP-complete under the average interpretation. Proof sketch: For the optimistic interpretation, membership in NPPP follows from the fact that we can guess a single sufficiently good consistent total order and evaluate it in PP (Theorem 3). Hardness for NPPP can be shown using a straightforward reduction from E-Majsat (as in the proof of Theorem 4). For the pessimistic interpretation, membership in co-NPPP follows from the fact that we can guess the worst consistent total order and evaluate it in PP (Theorem 3). Hardness for 26 Complexity of Probabilistic Planning co-NPPP can be shown by reducing to it the co-NPPP version of E-Majsat (E-Majsat); the proof is a simple adaptation of the techniques used, for example, in Theorem 4 above. For the average interpretation, the problem can be shown to be in PP by combining the argument in the proof of Theorem 12 showing how to average over consistent totally ordered plans with the argument in the proof of Theorem 3 showing how to evaluate a plan in a propositional domain in PP. Alternatively, we could express the evaluation of a partially ordered plan under the average interpretation as a compact probabilistic acyclic plan; Corollary 5 states that such plans can be evaluated in PP. PP-hardness follows directly from Theorem 3, because totally ordered plans are a special case of partially ordered plans and evaluating totally ordered plans is PP-hard. 5. Applications To help illustrate the utility of our results, this section cites several planners from the literature and analyzes the computational complexity of the problems they attack. We do not give detailed explanations of the planners themselves; for this, we refer the reader to the original papers. We focus on three planning systems: witness (Brown University), buridan (University of Washington), and treeplan (University of British Columbia). In the process of making connections to these planners, we also describe how our work relates to the discounted-reward criterion, partial observability, other domain representations, partial order conditional planning, policy-based planning, and approximate planning. 5.1 Witness The witness algorithm (Cassandra, Kaelbling, & Littman, 1994; Kaelbling et al., 1998) solves flat partially observable Markov decision processes using a dynamic-programming approach. The basic algorithm finds optimal unrestricted solutions to finite-horizon problems. Papadimitriou and Tsitsiklis (1987) showed that the plan-existence problem for polynomialhorizon partially observable Markov decision processes is PSPACE-complete. As an extension to their finite-horizon algorithm, Kaelbling et al. (1998) sketch a method for finding optimal looping plans for some domains. Although this is not presented as a formal algorithm, it is not unreasonable to say that the pure form of the problem that this extended version of witness attacks is one of finding a valid polynomial-size looping plan for a partially observable domain. The similarities between this problem and that described in Section 3 are that the domains are flat and that the plans are identical in form. The apparent differences are that witness optimizes a reward function instead of probability of goal satisfaction and that witness works in partially observable domains whereas our results are defined in terms of completely observable domains. Both of these apparent differences are insignificant, however, from a computational complexity point of view. First, witness attempts to maximize the expected total discounted reward over an infinite horizon (sometimes called optimizing a time-separable value function). As argued by Condon (1992), any problem defined in terms of a sum of discounted rewards can be recast as one of goal satisfaction. The argument proceeds roughly as follows. Let 0 ! fl ! 1 be the discount factor and R(s; a) be the immediate reward received for taking action a in state s. 27 Littman, Goldsmith & Mundhenk Define R0(s; a) = R(s; a) Gamma mins 0;a0 R(s0; a0) maxs0;a0 R(s0; a0) Gamma mins0;a0 R(s0; a0) : From this, we have that 0 ^ R0(s; a) ^ 1 for all s and a and that the value of any plan with respect to the revised reward function is a simple linear transformation of its true value. Now, we introduce an auxiliary state g to be the goal state and create a new transition function t0 such that t0(s; a; g) = (1Gamma fl)R0(s; a) and t0(s; a; s0) = (1Gamma (1Gamma fl)R0(s; a))t(s; a; s0) for s0 6= g; t0 is a well-defined transition function and the probability of goal satisfaction for any plan under transition function t0 is precisely the same as the expected total discounted reward under reward function R0 and transition function t. Thus, any problem stated as one of optimizing the expected total of discounted immediate rewards can be turned into an equivalent problem of optimizing goal satisfaction with only a slight change to the transition function and one additional state. This means there is no fundamental computational complexity difference between these two different types of planning objectives. The second apparent difference between the problem solved by the extended witness algorithm and that described in Section 3 is that of partial versus complete observability. In fact, our results do address partial observability, albeit indirectly. In our formulation of the plan-existence problem, plans are constrained to make no conditional branches (in the totally ordered and partially ordered cases), or to branch only on distinctions made by the step-transition function ffi (in the acyclic and looping cases); these two choices correspond to unobservable and partially observable domains, respectively. In a partially observable domain, the plan-existence problem becomes one of finding a valid polynomial-size finitestate controller subject to the given observability constraints. Nothing in our complexity proofs depends on the presence or absence of additional observability constraints. Therefore, it is a direct corollary of Theorem 2 that the plan-existence problem for polynomial-horizon plans in unobservable domains is NP-complete (Papadimitriou & Tsitsiklis, 1987) and of Theorem 7 that the plan-existence problem for polynomial-size looping plans in partially observable domains is NP-complete (this is a new result). It is interesting to note that the computational complexity of searching for size-bounded plans in partially observable domains is generally substantially less than that of solving the corresponding unconstrained partially observable Markov decision process. For example, we found that the plan-existence problem for acyclic plans in propositional domains is NPPP-complete (Theorem 4). The corresponding unconstrained problem is that of determining the existence of a history-dependent policy for a polynomial-horizon, compactly represented partially observable Markov decision process, which is EXPSPACE-complete (Theorem 4.15 of Goldsmith et al., 1996, or Theorem 6.8 of Mundhenk et al., 1997b). The gap here is enormous: EXPSPACE is to EXP what PSPACE is to P, and EXP is already provably intractable in the worst case. In contrast to EXPSPACE-complete problems, it is conceivable that good heuristics for NPPP-complete problems can be created by extensions of recent advances in heuristics for NP-complete problems. Therefore, there is some hope of devising effective planning algorithms by building on the observations in this paper and searching for optimal size-bounded plans instead of optimal unrestricted plans; in fact, recent planners for both propositional domains (Majercik & Littman, 1998a, 1998b) and flat domains (Hansen, 1998) are motivated by these results. 28 Complexity of Probabilistic Planning Domain Type Horizon Type Size-Bounded Plan Unrestricted Plan flat polynomial NP-complete PSPACE-complete propositional polynomial NPPP-complete EXPSPACE-complete flat infinite NP-complete undecidable propositional infinite PSPACE-complete undecidable Table 3: Complexity results for plan existence in partially observable domains Table 3 summarizes complexity results for planning in partially observable domains. The results for size-bounded plans are corollaries of Theorems 2, 4, 7, and 9 of this paper. The results for unrestricted plans are due to Papadimitriou and Tsitsiklis (1987) (flat, polynomial), Goldsmith et al. (1996) (propositional, polynomial), and Hanks (1996) (infinite-horizon). This last result is derived by noting the isomorphism of the infinitehorizon problem to the emptiness problem for probabilistic finite-state automata, which is undecidable (Rabin, 1963). 5.2 Buridan The buridan planner (Kushmerick et al., 1995) finds partially ordered plans for propositional domains in the PSO representation. There are two identifiable differences between the problem solved by buridan and the problem analyzed in Section 4: the representation of planning problems and the fact that buridan is not restricted to find polynomial-size plans. We address each of these differences below. Although, on the surface, PSO is different from ST, either can be converted into the other in polynomial time with at most a polynomial increase in domain size. In particular, the effect of an action in PSO is represented by a single decision tree consisting of proposition nodes (like ST) and random nodes (easily simulated in ST using auxiliary propositions). At the leaves are a list of propositions that become true and another list of propositions that become false should that leaf be reached. This type of correlated effect is also easily represented in ST using the chain rule of probability theory to decompose the probability distribution into separate probabilities for each proposition and careful use of the ":new" suffix. Thus, any PSO domain can be converted to a similar size ST domain quickly. Similarly, a domain in ST can be converted to PSO with at most a polynomial expansion. This conversion is too complex to sketch here, but follows from the proof of equivalence between ST and a simplified representation called IF (Littman, 1997a). Given the polynomial equivalence between ST and PSO, any complexity results for ST carry over to PSO.5 The results described in this paper concern planning problems in which a bound is given on the size of the plan sought. Although Kushmerick et al. (1995) do not explicitly describe their planner as one that prefers small plans to large plans, the design of the planner as one that searches through the space of plans makes the notion of plan size central to the algorithm. Indeed, the public-domain buridan implementation uses plan size as part of a best-first search procedure for identifying a sufficiently successful plan. This means that, all other things being equal, shorter plans will be found before larger plans. Furthermore, to assure termination, the planner only considers a fixed number of plans before halting, thus 5. To be more precise, this is true for complexity classes closed under log-space reductions. 29 Littman, Goldsmith & Mundhenk putting a limit indirectly on the maximum allowable plan size. So, although buridan does not attempt to solve precisely the same problem that we considered, it is fair to say that the problem we consider is an idealization of the problem attacked by buridan. Regardless, our lower bounds on complexity apply to buridan. Kushmerick et al. (1995) looked at generating sufficiently successful plans under both the optimistic interpretation and the pessimistic interpretation. They also explicitly examined the plan-evaluation problem for partially ordered plans under both interpretations. Therefore, Theorems 13 and 14 apply to buridan. The more sophisticated c-buridan planner (Draper et al., 1994) extends buridan to plan in partially observable domains and to produce plans with conditional execution. The results of our work also shed light on the computational complexity of the problem addressed by c-buridan. Draper et al. (1994) devised a representation for partially ordered acyclic (conditional) plans. In this representation, each plan step generates an observation label as a function of the probabilistic outcome of the step. Each step also has an associated set of context labels dictating the circumstances under which that step must be executed. A plan step is executed only if its context labels are consistent with the observation labels produced in earlier steps. In its totally ordered form, this type of plan can be expressed as a compact acyclic plan; Corollary 5 can be used to show that the plan-evaluation and plan-existence problems for a totally ordered version of c-buridan's conditional plan representation in propositional domains are PP-complete and NPPP-complete, respectively. In our results above, we consider evaluating and searching for plans that are partially ordered and plans that have conditional execution, but not both at once. Nonetheless, the same sorts of techniques presented in this paper can be applied to analyzing the problems attacked by c-buridan. For example, consider the plan-existence problem for c-buridan's partially ordered conditional plans under the optimistic interpretation. This problem asks whether there is a partially ordered conditional plan that has some total order that reaches the goal with sufficient probability. This is equivalent to asking whether there is a totally ordered conditional plan that reaches the goal with sufficient probability. Therefore, the problem is NPPP-complete, by the argument in the previous paragraph. In spite of many superficial differences between the problems analyzed in this paper and those studied by the creators of the buridan planners, our results are quite relevant to understanding their work. 5.3 Treeplan A family of planners have been designed that generate a decision-tree-based representation of stationary policies (mappings from state to action) (Boutilier et al., 1995; Boutilier & Poole, 1996; Boutilier & Dearden, 1996) in probabilistic propositional domains; we refer to these planners collectively as the treeplan planners. Once again, these planners solve problems that are not identical to the problems addressed in this paper but are closely related. The planner described by Boutilier et al. (1995) finds solutions that maximize expected total discounted reward in compactly represented Markov decision processes (the domain representation used is expressively equivalent to ST). As mentioned earlier, the difference between maximizing goal satisfaction and maximizing expected total discounted reward is a 30 Complexity of Probabilistic Planning superficial one, so the problem addressed by this planner is EXP-complete (Littman, 1997a). Although the policies used by Boutilier et al. (1995) appears quite dissimilar from the finitestate controllers described in our work, policies can be converted to a type of similarly sized compact looping plan (an extension of the type of plan described in Corollary 5). The conversion from stationary policies to looping plans is as described in the proof of Theorem 7, except that the resulting plans are represented compactly. In later work, Boutilier and Dearden (1996) show how it is possible to limit the size of the representation of the policy in treeplan and still obtain approximately optimal performance. This is necessary because, in general, the size of decision trees needed to represent the optimal policies can be exponentially large. By keeping the decision trees from getting too large, the resulting planner becomes subject to an extension of Theorem 9 and, therefore, attacks a PSPACE-complete problem. One emphasis of Boutilier and Dearden (1996) is on finding approximately optimal solutions, with the hope that doing so is easier than finding optimal solutions. We do not explore the worst-case complexity of approximation in this paper, although Lusena, Goldsmith, and Mundhenk (1998) have produced some strong negative results in this area. A related issue is one of using simulation (random sampling) to find approximately optimal solutions to probabilistic planning problems. Some empirical successes have been obtained with the related approach of reinforcement learning (Tesauro, 1994; Crites & Barto, 1996), but, once again, the worst-case complexity of probabilistic planning is not known to be any lower for approximation by simulation. 6. Conclusions In this paper, we explored the computational complexity of plan evaluation and plan existence in probabilistic domains. We found that, in compactly represented propositional domains, restricting the size and form of the policies under consideration reduced the computational complexity of plan existence from EXP-complete for unrestricted plans to PSPACE-complete for polynomial-size looping plans and NPPP-complete for polynomialsize acyclic plans. In contrast, in flat domains, restricting the form of the policies under consideration increased the computational complexity of plan existence from P-complete for unrestricted plans to NP-complete for totally ordered plans; this is because a plan that is smaller than the domain in which it operates is often unable to exploit important Markov properties of the domain. We were able to characterize precisely the complexity of all problems we examined with regard to the current state of knowledge in complexity theory. Several problems we studied turned out to be NPPP-complete. The class NPPP promises to be very useful to researchers in uncertainty in artificial intelligence because it captures the type of problems resulting from choosing ("guessing") a solution and then evaluating its probabilistic behavior. This is precisely the type of problem faced by planning algorithms in probabilistic domains, and captures important problems in other domains as well, such as constructing explanations in belief networks and designing robust communication networks. We provide a new conceptually simple NPPP-complete problem, E-Majsat, that may be useful in further explorations in this direction. The basic structure of our results is that if plan evaluation is complete for some class C, then plan existence is typically NPC-complete. This same basic structure holds in determin31 Littman, Goldsmith & Mundhenk istic domains: evaluating a totally ordered plan in a propositional domain is P-complete (for sufficiently powerful domain representations) and determining the existence of a polynomialsize totally ordered plan is NPP = NP-complete. From a pragmatic standpoint, the intuition that searching for small plans is more efficient than searching for arbitrary size plans suggests that exact dynamic-programming algorithms, which are so successful in flat domains, may not be as effective in propositional domains; they do not focus their efforts on the set of small plans. Algorithm-development energy, therefore, might fruitfully be spent devising heuristics for problems in the class NPPP as this class captures the essence of searching for small plans in probabilistic domains--some early results in this direction are appearing (Majercik & Littman, 1998a, 1998b). Complexity theorists have only recently begun to explore classes such as NPPP that lie between the polynomial hierarchy and PSPACE and algorithm designers have come to these classes even more recently. As this paper marks the beginning of our exploration of this class of problems, much work is still to be done in probing algorithmic implications, but it is our hope that heuristics for NPPP could lead to powerful methods for solving a range of important uncertainty-sensitive combinatorial problems. Acknowledgements This work was supported in part by grants NSF-IRI-97-02576-CAREER (Littman), and NSF CCR-9315354 (Goldsmith). We gratefully acknowledge Andrew Klapper, Anne Condon, Matthew Levy, Steve Majercik, Chris Lusena, Mark Peot, and our reviewers for helpful feedback and conversations on this topic. Appendix A. Complexity of E-Majsat The E-Majsat problem is: given a pair (OE; k) consisting of a Boolean formula OE of n variables x1; : : : ; xn and a number 1 ^ k ^ n, is there an assignment to the first k variables x1; : : : ; xk such that the majority of assignments to the remaining nGamma k variables xk+1; : : : ; xn satisfies OE? For k = n, this is precisely Boolean satisfiability, a classic NP-complete problem. This is because we are asking whether there exists an assignment to all the variables that makes OE true. For k = 0, E-Majsat is precisely Majsat, a well-known PP-complete problem. This is because we are asking whether the majority of all total assignments makes OE true. Deciding an instance of E-Majsat for intermediate values of k has a different character. It involves both an NP-type calculation to pick a good setting for the first k variables and a PP-type calculation to see if the majority of assignments to the remaining variables makes OE true. This is akin to searching for a good answer (plan, schedule, coloring, belief network explanation, etc.) in a combinatorial space when "good" is determined by a computation over probabilistic quantities. This is just the type of computation described by the class NPPP, and we show next that E-Majsat is NPPP-complete. Theorem 15 E-Majsat is NPPP-complete. 32 Complexity of Probabilistic Planning Proof: Membership in NPPP follows directly from definitions. To show completeness of E-Majsat, we first observe (Tor'an, 1991) that NPPP is the ^NPm -closure of the PP-complete set Majsat. Thus, any NPPP computation can be modeled by a nondeterministic machine N that, on each possible computation, first guesses a sequence s of bits that controls its nondeterministic moves, deterministically performs some computation on input x and s, and then writes down a formula qx;s with variables in z1; : : : ; zl as a query to Majsat. Finally, N (x) with oracle Majsat accepts if and only if for some s, qx;s 2 Majsat. Given any input x, like in Cook's Theorem, we can construct a formula OEx with variables y1; : : : ; yk and z1; : : : ; zl such that for every assignment a1; : : : ; ak; b1; : : : ; bl it holds that OEx(a1; : : : ; ak; b1; : : : ; bl) = qx;a1Delta Delta Delta ak (b1; : : : ; bl). Thus, (OEx; k) 2 E-Majsat if and only if for some assignment s to y1; : : : ; yk, qx;s 2 Majsat if and only if N (x) accepts. References Allender, E., & Ogihara, M. (1996). Relationships among PL, #L, and the determinant. Theoretical Informatics and Applications, 30 (1), 1-21. `Alvarez, C., & Jenner, B. (1993). A very hard log-space counting class. Theoretical Computer Science, 107, 3-30. B"ackstr"om, C. (1995). Expressive equivalence of planning formalisms. Artificial Intelligence, 76 (1-2), 17-34. B"ackstr"om, C., & Nebel, B. (1995). Complexity results for SAS+ planning. Computational Intelligence, 11 (4), 625-655. Balc'azar, J., D'iaz, J., & Gabarr'o, J. (1988/1990). Structural Complexity I/II. EATCS Monographs on Theoretical Computer Science. Springer Verlag. Borodin, A., Cook, S., & Pippenger, N. (1983). Parallel computation for well-endowed rings and space-bounded probabilistic machines. Information and Control, 58 (1-3), 113-136. Boutilier, C., Dean, T., & Hanks, S. (1998). Decision theoretic planning: Structural assumptions and computational leverage. In preparation. Boutilier, C., & Dearden, R. (1996). Approximating value trees in structured dynamic programming. In Saitta, L. (Ed.), Proceedings of the Thirteenth International Conference on Machine Learning. Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1104-1113. Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 1168-1175. AAAI Press/The MIT Press. 33 Littman, Goldsmith & Mundhenk Bylander, T. (1994). The computational complexity of propositional STRIPS planning. Artificial Intelligence, 69, 161-204. Cassandra, A. R., Kaelbling, L. P., & Littman, M. L. (1994). Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pp. 1023-1028 Seattle, WA. Chapman, D. (1987). Planning for conjunctive goals. Artificial Intelligence, 32, 333-379. Condon, A. (1992). The complexity of stochastic games. Information and Computation, 96 (2), 203-224. Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In Touretzky, D. S., Mozer, M. C., & Hasselmo, M. E. (Eds.), Advances in Neural Information Processing Systems 8 Cambridge, MA. The MIT Press. Dearden, R., & Boutilier, C. (1997). Abstraction and approximate decision-theoretic planning. Artificial Intelligence, 89 (1-2), 219-283. Draper, D., Hanks, S., & Weld, D. (1994). Probabilistic planning with information gathering and contingent execution. In Proceedings of the AAAI Spring Symposium on Decision Theoretic Planning, pp. 76-82. Drummond, M., & Bresina, J. (1990). Anytime synthetic projection: Maximizing the probability of goal satisfaction. In Proceedings of the Eighth National Conference on Artificial Intelligence, pp. 138-144. Morgan Kaufmann. Erol, K., Nau, D. S., & Subrahmanian, V. S. (1995). Complexity, decidability and undecidability results for domain-independent planning. Artificial Intelligence, 76, 75-88. Gill, J. (1977). Computational complexity of probabilistic Turing machines. SIAM Journal on Computing, 6 (4), 675-695. Goldman, R. P., & Boddy, M. S. (1994). Epsilon-safe planning. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence (UAI94), pp. 253-261 Seattle, WA. Goldsmith, J., Littman, M., & Mundhenk, M. (1997a). The complexity of plan existence and evaluation in probabilistic domains. Tech. rep. CS-1997-07, Department of Computer Science, Duke University. Goldsmith, J., Littman, M. L., & Mundhenk, M. (1997b). The complexity of plan existence and evaluation in probabilistic domains. In Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-97), pp. 182-189 San Francisco, CA. Morgan Kaufmann Publishers. Goldsmith, J., Lusena, C., & Mundhenk, M. (1996). The complexity of deterministically observable finite-horizon Markov decision processes. Tech. rep. 268-96, Department of Computer Science, University of Kentucky. 34 Complexity of Probabilistic Planning Hanks, S. (1996). Decision-theoretic planning in unobservable domains is undecidable. Personal communication. Hansen, E. A. (1998). Finite-Memory Control of Partially Observable Systems. Ph.D. thesis, University of Massachusetts. Jung, H. (1985). On probabilistic time and space. In Proceedings 12th ICALP, pp. 281-291. Lecture Notes in Computer Science, Springer-Verlag. Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101 (1-2), 99-134. Koenig, S., & Simmons, R. G. (1994). Risk-sensitive planning with probabilistic decision graphs. In Proceedings of the 4th International Conference on Principles of Knowledge Representation and Reasoning, pp. 363-373. Kushmerick, N., Hanks, S., & Weld, D. S. (1995). An algorithm for probabilistic planning. Artificial Intelligence, 76 (1-2), 239-286. Ladner, R. (1989). Polynomial space counting problems. SIAM Journal on Computing, 18, 1087-1097. Lin, S.-H., & Dean, T. (1995). Generating optimal policies for high-level plans with conditional branches and loops. In Proceedings of the Third European Workshop on Planning, pp. 205-218. Littman, M. L. (1997a). Probabilistic propositional planning: Representations and complexity. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, pp. 748-754. AAAI Press/The MIT Press. Littman, M. L. (1997b). Solving large POMDPs: Lessons from complexity theory. Talk presented at the DARPA AI Workshop in Providence, RI. Slides available at URL http://www.cs.duke.edu/,mlittman/talks/darpa97-pomdp.ps. Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observable Markov decision processes. Annals of Operations Research, 28 (1), 47-65. Lusena, C., Goldsmith, J., & Mundhenk, M. (1998). Nonapproximability results for Markov decision processes. Tech. rep. UK CS Dept TR 275-98, University of Kentucky. Majercik, S. M., & Littman, M. L. (1998a). MAXPLAN: A new approach to probabilistic planning. In Simmons, R., Veloso, M., & Smith, S. (Eds.), Proceedings of the Fourth International Conference on Artificial Intelligence Planning, pp. 86-93. AAAI Press. Majercik, S. M., & Littman, M. L. (1998b). Using caching to solve larger probabilistic planning problems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 954-959. The AAAI Press/The MIT Press. Mansell, T. M. (1993). A method for planning given uncertain and incomplete information. In Proceedings of the 9th Conference on Uncertainty in Artificial Intelligence, pp. 350-358. Morgan Kaufmann Publishers. 35 Littman, Goldsmith & Mundhenk McAllester, D., & Rosenblitt, D. (1991). Systematic nonlinear planning. In Proceedings of the 9th National Conference on Artificial Intelligence, pp. 634-639. Mundhenk, M., Goldsmith, J., & Allender, E. (1997a). The complexity of policy-evaluation for finite-horizon partially-observable Markov decision processes. In Proceedings of 22nd Symposium on Mathematical Foundations of Computer Science (published in Lecture Notes in Computer Science). Springer-Verlag. Mundhenk, M., Goldsmith, J., Lusena, C., & Allender, E. (1997b). Encyclopaedia of complexity results for finite-horizon Markov decision process problems. Tech. rep. UK CS Dept TR 273-97, University of Kentucky. Papadimitriou, C. H. (1994). Computational Complexity. Addison-Wesley, Reading, MA. Papadimitriou, C. H., & Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12 (3), 441-450. Platzman, L. K. (1981). A feasible computational approach to infinite-horizon partiallyobserved Markov decision problems. Tech. rep. J-81-2, Georgia Institute of Technology, Atlanta, GA. Puterman, M. L. (1994). Markov Decision Processes--Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY. Rabin, M. O. (1963). Probabilistic automata. Information and Control, 6 (3), 230-245. Roth, D. (1996). On the hardness of approximate reasoning. Artificial Intelligence, 82 (1-2), 273-302. Simon, J. (1975). On some central problems in computational complexity. Ph.D. thesis, Cornell University. Also Cornell Department of Computer Science Technical Report TR75-224. Smith, D. E., & Williamson, M. (1995). Representation and evaluation of plans with loops. Working notes for the 1995 Stanford Spring Symposium on Extended Theories of Action. Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6 (2), 215-219. Toda, S. (1991). PP is as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 20, 865-877. Tor'an, J. (1991). Complexity classes defined by counting quantifiers. Journal of the ACM, 38 (3), 753-774. Vinay, V. (1991). Counting auxiliary pushdown automata and semi-unbounded arithmetic circuits. In Proc. 6th Structure in Complexity Theory Conference, pp. 270-284. IEEE. 36