Accelerating Partial-Order Planners: Some Techniques for Effective Search Control and Pruning

A. Gerevini and L. Schubert

Volume 5, 1996

Links to Full Text:

Abstract:
We propose some domain-independent techniques for bringing well-founded partial-order planners closer to practicality. The first two techniques are aimed at improving search control while keeping overhead costs low. One is based on a simple adjustment to the default A* heuristic used by UCPOP to select plans for refinement. The other is based on preferring ``zero commitment'' (forced) plan refinements whenever possible, and using LIFO prioritization otherwise. A more radical technique is the use of operator parameter domains to prune search. These domains are initially computed from the definitions of the operators and the initial and goal conditions, using a polynomial-time algorithm that propagates sets of constants through the operator graph, starting in the initial conditions. During planning, parameter domains can be used to prune nonviable operator instances and to remove spurious clobbering threats. In experiments based on modifications of UCPOP, our improved plan and goal selection strategies gave speedups by factors ranging from 5 to more than 1000 for a variety of problems that are nontrivial for the unmodified version. Crucially, the hardest problems gave the greatest improvements. The pruning technique based on parameter domains often gave speedups by an order of magnitude or more for difficult problems, both with the default UCPOP search strategy and with our improved strategy. The Lisp code for our techniques and for the test problems is provided in on-line appendices.

Extracted Text


Journal of Artificial Intelligence Research 5 (1996) 95-137 Submitted 3/96;
published 9/96

Accelerating Partial-Order Planners: Some Techniques for

Effective Search Control and Pruning

Alfonso Gerevini gerevini@ing.unibs.it Dipartimento di Elettronica per l'Automazione,
Universit`a di Brescia Via Branze 38, I-25123 Brescia, Italy

Lenhart Schubert schubert@cs.rochester.edu Department of Computer Science,
University of Rochester Rochester, NY 14627-0226, USA

Abstract We propose some domain-independent techniques for bringing well-founded
partialorder planners closer to practicality. The first two techniques are
aimed at improving search control while keeping overhead costs low. One is
based on a simple adjustment to the default A* heuristic used by ucpop to
select plans for refinement. The other is based on preferring "zero commitment"
(forced) plan refinements whenever possible, and using LIFO prioritization
otherwise. A more radical technique is the use of operator parameter domains
to prune search. These domains are initially computed from the definitions
of the operators and the initial and goal conditions, using a polynomial-time
algorithm that propagates sets of constants through the operator graph, starting
in the initial conditions. During planning, parameter domains can be used
to prune nonviable operator instances and to remove spurious clobbering threats.
In experiments based on modifications of ucpop, our improved plan and goal
selection strategies gave speedups by factors ranging from 5 to more than
1000 for a variety of problems that are nontrivial for the unmodified version.
Crucially, the hardest problems gave the greatest improvements. The pruning
technique based on parameter domains often gave speedups by an order of magnitude
or more for difficult problems, both with the default ucpop search strategy
and with our improved strategy. The Lisp code for our techniques and for
the test problems is provided in on-line appendices.

1. Introduction We are concerned here with improving the performance of "well-founded"
domain-independent planners - planners that permit proofs of soundness, completeness,
or other desirable theoretical properties. A state-of-the-art example of
such a planner is ucpop (Barrett et al., 1994; Penberthy & Weld, 1992), whose
intellectual ancestry includes strips (Fikes & Nilsson, 1971), tweak (Chapman,
1987), and snlp (McAllester & Rosenblitt, 1991). Such planners unfortunately
do not perform well at present, in comparison with more practically oriented
planners such as sipe (Wilkins, 1988), prs (Georgeff & Lansky, 1987), or
O-Plan (Currie & Tate, 1991).

However, there appear to be ample opportunities for bringing well-founded
planners closer to practicality. In the following, we begin by suggesting
some improvements to search control in planning, based on more carefully
formulated strategies for selecting partial plans for refinement, and for
choosing open conditions in a selected partial plan. Our plancfl1996 AI Access
Foundation and Morgan Kaufmann Publishers. All rights reserved.

Gerevini & Schubert selection strategy uses S+OC - the number of steps in
a plan plus the number of open conditions still to be established - as a
heuristic measure for ucpop's A* search of the plan space. (Addition of an
attenuated term reflecting the number of threats or "unsafe conditions" UC,
such as 0.1UC, is sometimes advantageous.)1 Our flaw-selection strategy,
which we term ZLIFO, prefers "zero commitment" plan refinements to others,
and otherwise uses a LIFO (stack) discipline. Zero commitment refinements
are logically necessary ones: they either eliminate a plan altogether because
it contains an irremediable flaw, or they add a unique step or unique causal
link (from the initial state) to establish an open condition that cannot
be established in any other way. The strategy is closely related to ones
proposed by Peot & Smith (1993) and Joslin & Pollack (1994) but generally
appears to perform better than either.

We describe these two classes of techniques in Section 2 below, and in Section
3 we report our experimental results based on slightly modified versions
of ucpop.2 For the more difficult problems taken from the available ucpop
test suite and elsewhere, we obtain improvements by factors ranging from
5 to more than 1000, with the hardest problems giving the greatest improvements.

We then turn to our proposal for using computed operator parameter domains
during planning. In particular, in Section 4 we motivate and describe a method
of precomputing parameter domains based on propagating sets of constants
forward from the initial conditions.3 The process is iterative, but the algorithm
runs within a time bound that is polynomial in the size of the problem specification.
We provide details of the algorithm, along with theorems about its correctness
and tractability, in Sections 4.2-4.3 and Online Appendix 1.

In Section 5 we show how to use parameter domain information in a ucpop-style
planner. During planning, parameter domains can be used to prune operator
instances whose parameter domains are inconsistent with binding constraints,
and to eliminate spurious threats that cannot, in fact, be realized without
violating domain constraints. We illustrate the effectiveness of this technique
with examples drawn from the ucpop test suite as well as from the trains
transportation planning world developed at Rochester (Allen & Schubert, 1991;
Allen et al., 1995). In some of these tests, we apply the parameter domain
information in the context of the default ucpop search strategy. We demonstrate
significant gains on most problems, particularly the more challenging ones
(e.g., speedups of more than an order of magnitude for several problems in
the strips world, and a more than 900-fold speedup for a trains problem).

In another set of tests in the trains world, we use our own improved search
strategies as baseline, i.e., we ask whether additional speedups are obtainable
by use of parameter

1. The search strategy is described as "A* or IDA*" search in (Penberthy
& Weld, 1992); in the code for

ucpop 2.0 it is described more generally as best-first, since arbitrary ranking
functions, not necessarily corresponding to A* heuristics, may be plugged
in. But with choices like S+OC or S+OC+UC as plan-ranking heuristic (as discussed
in Section 2.2), it is natural to view the strategy as an A* strategy. 2.
While the techniques we describe are applicable to other planners, our focus
is on ucpop because it is

well-known and the Lisp code is readily available. The system can be obtained
via anonymous ftp from cs.washington.edu. 3. We hope that the notion of a
parameter domain, as a set of admissible bindings (constants), will cause

no confusion with the notion of a planning domain, as a specified set of
operators, along with constraints on admissible initial conditions and goal
conditions.

96

Accelerating Partial-Order Planners domains, above those obtainable with
the S+OC and ZLIFO search strategies. Our experimental results again show
speedups by about a factor of 10 through use of parameter domains, on top
of those obtained by the improved search strategies (the combined speedup
is over 2000).

As evidence that the effectiveness of using parameter domains in combination
with our search strategy is not dependent on some peculiarity of the latter,
we also include some results for ucpop's default strategy, Joslin and Pollack's
"least cost flaw repair" (LCFR) strategy (Joslin & Pollack, 1994) and for
Peot and Smith's "least commitment" (LC) open condition selection strategy
(Peot & Smith, 1993) in Section 5.

In Section 6, we state our conclusions, comment on some related work and
mention possible extensions of our techniques.

2. Plan Selection and Goal Selection We will be basing our discussion and
experiments on ucpop, an algorithm exemplifying the state of the art in well-founded
partial-order planning. Thus we begin with a sketch of this algorithm, referring
the reader to (Barrett et al., 1994; Penberthy & Weld, 1992) for details.
In the next two subsections we then motivate and describe our improved plan-selection
and goal-selection strategies.

2.1 UCPOP ucpop uses strips-like operators, with positive or negative preconditions
and positive or negative effects. The initial state consists of positive
predications with constant arguments (if any), and all other ground predications
are false by default. Unlike strips, ucpop also allows conditional effects,
expressed by 2-part when-clauses specifying a (possibly complex) extra condition
needed by that effect and the (possibly complex) effect itself. For instance,
an action PUTON(?x ?y ?z) ("put ?x on ?y from ?z") might have conditional
effects stating that when ?y is not the table, it will not be clear at the
end of the action, and when z is not the table, it will be clear at the end
of the action. The "U" in ucpop indicates that universally quantified conditions
and effects are permitted as well. For instance, it is permissible to have
a precondition for a PICKUP(?x) action that says that for all ?y, (not (on
?y ?x)) holds. Universal statements are handled by explicit substitution
of domain constants and need not concern us at this point.

In essence, ucpop explores a space of partially specified plans, each paired
with an agenda of goals still to be satisfied and threats still to be averted.
The initial plan contains a dummy *start* action whose effects are the given
initial conditions, and a dummy *end* action whose preconditions are the
given goals. Thus goals are uniformly viewed as action preconditions, and
are uniformly achieved through the effects of actions, including the *start*
action.

The plans themselves consist of a collection of steps (i.e., actions obtained
by instantiating the available operators), along with a set of causal links,
a set of binding constraints, and a set of ordering constraints. When an
open goal (precondition) is selected from the agenda, it is established (if
possible) either by adding a step with an effect that unifies with the goal,
or by using an existing step with an effect that unifies with the goal. (In
the latter case, it must be consistent with current ordering constraints
to place the existing

97

Gerevini & Schubert step before the goal, i.e., before the step whose preconditions
generated the goal.) When a new or existing step is used to establish a goal
in this way, there are several side effects:

ffl A causal link (Sp; Q; Sc) is also added, where Sp indicates the step
"producing" the

goal condition Q and Sc indicates the step "consuming" Q. This causal link
serves to protect the intended effect of the added (or reused) step from
interference by other steps.

ffl Binding constraints are added, corresponding to the unifier for the action
effect in

question and the goal (precondition) it achieves.

ffl An ordering constraint is added, placing the step in question before
the step whose

precondition it achieves.

ffl If the action in question is new, its preconditions are added to the
agenda as new

goals (except that EQ/NEQ conditions are integrated into the binding constraints
- see below).

ffl New threats (unsafe conditions) are determined. For a new step and its
causal link,

other steps threaten the causal link if they have effects unifiable with
the condition protected by the causal link (and these effects can occur temporally
during the causal link); and the effects of the new step may similarly threaten
other causal links. In either case, new threats are placed on the agenda.
It is useful to distinguish definite threats from potential threats: the
former are those in which the unification that confirmed the threat involved
no new binding of variables.

Binding constraints assert the identity (EQ) or nonidentity (NEQ) of two
variables or a variable and a constant. EQ-constraints arise from unifying
open goals with action effects, and NEQconstraints arise (i) from NEQ-preconditions
of newly instantiated actions, (ii) from matching negative goals containing
variables to the initial state, and (iii) from averting threats by "separation",
i.e., forcing non-equality of two variables or a variable and a constant
that were unified in threat detection. NEQ-constraints may be disjunctive,
but are handled simply by generating separate plans for each disjunct.

The overall control loop of ucpop consists of selecting a plan from the current
list of plans (initially the single plan based on *start* and *end*), selecting
a goal or threat from its agenda, and replacing the plan by the corresponding
refined plans. If the agenda item is a goal, the refined plans are those
corresponding to all ways of establishing the goal using a new or existing
step. If the agenda item is a definite threat to a causal link (Sp; Q; Sc),
then there are at most three refined plans. Two of these constrain the threatening
step to be before step Sp (demotion) or after step Sc (promotion), thus averting
the threat. A third possibility arises if the effect threatening (Sp; Q;
Sc) is a conditional effect of the threatening action. Such a conditional
threat can be averted by creating a goal denying some precondition needed
by the conditional effect.

ucpop has a "delay separation" switch, *d-sep*, and when this is turned on,
only definite threats are dealt with. Note that potential threats may become
definite as a result of added binding constraints. (They may also "expire"
as a result of added binding and ordering constraints, i.e., the threatening
effect may no longer unify with the threatened condition or it may be forced
to occur before or after the threatened causal link. Expired

98

Accelerating Partial-Order Planners threats are removed from the agenda when
selected.) When *d-sep* is off, potential threats as well as definite ones
are averted, with separation as an additional method of doing so besides
the three methods above.

Inconsistencies in binding constraints and ordering constraints are detected
when they first occur (as a result of adding a new constraint) and the corresponding
plans are eliminated. Planning fails if no plans remain. The success condition
is the creation of a plan with consistent binding and ordering constraints
and an empty agenda.

The allowance for conditional effects and universal conditions and effects
causes only minor perturbations in the operation of ucpop. For instance,
conditional effects can lead to multiple matches against operators for a
given goal, each match generating different preconditions. (Of course, there
can be multiple matches even without conditional effects, if some predicates
occur more than once in the effects.)

The key issues for us right now are the strategic ones: how plans are selected
from the current set of plans (discussed in Section 2.2), and how goals are
selected for a given plan (discussed in Section 2.3).

2.2 The Trouble with Counting Unsafe Conditions The choice of the next plan
to refine in the ucpop system is based on an A* best-first search. Recall
that A* uses a heuristic estimate f (p) of overall solution cost consisting
of a part g(p) = cost of the current partial solution (plan) p and a part
h(p) = estimate of the additional cost of the best complete solution that
extends p. In the current context it is helpful to think of f (p) as a measure
of plan complexity, i.e., "good" plans are simple (low-complexity) plans.

There are two points of which the reader should be reminded. First, in order
for A* to guarantee discovery of an optimal plan (i.e., the "admissibility"
condition), h(p) should not overestimate the remaining solution cost (Nilsson,
1980). Second, if the aim is not necessarily to find an optimal solution
but to find a satisfactory solution quickly, then f (p) can be augmented
to include a term that estimates the remaining cost of finding a solution.
One common way of doing that is to use a term proportional to h(p) for this
as well, i.e., we emphasize the h-component of f relative to the g-component.
This is reasonable to the extent that the plans that are most nearly complete
(indicated by a low h-value) are likely to take the least effort to complete.
Thus we will prefer to pursue a plan p0 that seems closer to being complete
to a plan p further from completion, even though the overall complexity estimate
for p0 may be greater than for p (Nilsson, 1980) (pages 87-88). Alternatively,
we could add a heuristic estimate of the remaining cost of finding a solution
to f (p) that is more or less independent of the estimate h(p).

With these considerations in mind, we now evaluate the advisability of including
the various terms in ucpop's function for guiding its A* search, namely

S, OC, CL, and UC, where S is the number of steps in the partial plan, OC
is the number of open conditions (unsatisfied goals and preconditions), CL
is the number of causal links, and UC is the number of unsafe conditions
(the number of pairs of steps and causal links where the step

99

Gerevini & Schubert threatens the causal link). The default combination used
by ucpop is S+OC+UC.4 This becomes S+OC+UC+F if special open conditions called
"facts" are present. These are conditions that are not state-dependent (e.g.,
a numerical relation like (add-one ?x ?y), or a geometrical one like (loc-in-room
?x ?y ?room)) and are established by Lisp functions (Barrett et al., 1994).
Since few of our test problems involved facts, we will not discuss the F
term further except to say that we followed the ucpop default strategy of
including this term where it is relevant (see the TileWorld problems in Section
3.2 and also some remarks in Section 5.2 in connection with the parameter-domain
experiments).

2.2.1 S: the number of steps currently in the plan This can naturally be
viewed as comprising g(p), the plan complexity so far. Intuitively, a plan
is complex to the extent that it contains many steps. While in some domains
we might want to make distinctions among the costs of different kinds of
steps, a simple step count seems like a reasonable generic complexity measure.

2.2.2 OC: the number of open conditions This can be viewed as playing the
role of h(p), since each remaining open condition must be established by
some step. The catch is that it may be possible to use existing steps in
the plan (including *start*, i.e., the initial conditions) to establish remaining
open conditions. Thus OC can overestimate the number of steps still to be
added, forfeiting admissibility.

Despite this criticism, several considerations favor retention of the OC
term. First, a better estimator of residual plan complexity seems hard to
come by. Perhaps one could modify OC by discounting open conditions that
are matched by existing actions, but this presumes that all such open conditions
can actually be achieved by action re-use, which is improbable if there are
remaining threats, or remaining goals requiring new steps.5 Second, the possibility
that OC will overestimate the residual plan complexity will rarely be actualized,
since typically further steps still need to be added to achieve some of the
goals, and those steps will typically introduce further open conditions again
requiring new steps. Finally, to the extent that OC does at times overestimate
the residual plan complexity, it can be viewed as emphasizing the the h(p)
term of f (p), thus promoting faster problem-solving as explained above.

2.2.3 CL: the number of causal links One might motivate the inclusion of
this term by arguing that numerous causal links are indicative of a complex
plan. As such, CL appears to be an alternative to step-counting.

4. This is in no way the "recommended" strategy. The ucpop implementation
makes available various

options for controlling search, to be used at the discretion of experimenters.
Our present work has prompted the incorporation of our particular strategies
as an option in ucpop 4.0. 5. Note that threats and remaining goals impose
constraints that may not be consistent with seemingly

possible instances of action re-use. This is clear enough for threats, which
often imply temporal ordering constraints inconsistent with re-use of an
action. It is also fairly clear for remaining goals. For instance, in Towers
of Hanoi the small disk D1 is initially on the medium disk D2, which in turn
is on the big disk D3, and D3 is on peg P1. The goal is to move the tower
to the third peg P3, so it seems to ucpop initially as if (on D1 D2) and
(on D2 D3) could be achieved by "re-use" of *start*. However, the third goal
(on D3 P3) implies that various actions must be added to the plan which are
inconsistent with those two seemingly possible instances of action re-use.

100

Accelerating Partial-Order Planners However, note that CL is in general larger
than S, since every step of a plan establishes at least one open condition
and thus introduces at least one causal link. The larger CL is relative to
S, the more subgoals are achieved by action re-use. Hence, if we use CL instead
of (or in addition to) S in the g(p) term, we would in effect be saying that
achieving multiple subgoals with a single step is undesirable; we would tend
to search for ways of achieving multiple goals with multiple steps, even
when they can be achieved with a single step. This is clearly not a good
idea, and justifies the exclusion of CL from f (p).

2.2.4 UC: the number of unsafe conditions We note first of all that this
is clearly not a g-measure. While the number of threats will tend to increase
if we establish more and more subgoals without curtailing threats, threats
as such are not elements of the plan being constructed and so do not contribute
to its complexity. In fact, when the plan is done all threats will be gone.

Can UC then be viewed as an h-measure? One argument of sorts for the affirmative
is the following. Not all partial plans are expandable into complete plans,
and a high value of UC makes it more likely that the partial plan contains
irresolvable conflicts. If we regard impossible plans as having infinite
cost, then inclusion of a term increasing with UC as part of the h-measure
is reasonable. This carries a serious risk, though, since in the case where
the partial plan does have a consistent completion (despite a high UC-count),
inclusion of such a term can greatly overestimate the residual plan complexity.

Another possible affirmative argument is that conditional threats are sometimes
resolved by "confrontation", which introduces a new goal denying a condition
required for the threatening conditional effect. This new goal may in turn
require new steps for its achievement, adding to the plan complexity. However,
this link to complexity is very tenuous. In the first place, many of the
ucpop test domains involve no conditional effects, and threat removal by
promotion, demotion or separation adds no steps. Even when conditional effects
are present, many unconditional as well as conditional threats are averted
by these methods.

Furthermore, UC could swamp all other terms since threats may appear and
expire in groups of size O(n), where n is the number of steps in the plan.
For instance, consider a partial plan that involves moves by a robot R to
locations L1, ..., Ln, so that there are n causal links labeled (at R L1),
..., (at R Ln). If a new move to location L is now added, initially with
an indefinite point of departure ?x, this produces effects (at R L) and (not
(at R ?x)). The latter can threaten all of the above n causal links, at least
if the new move is at first temporally unordered relative to the n existing
moves. If this new action subsequently happens to be demoted so as to precede
the first move (or promoted so as to follow the last), or if ?x becomes bound
to a constant distinct from L1, ..., Ln, all n threats expire. Keeping in
mind that different steps in a plan may have similar effects, we can see
that half of the steps could threaten the causal links of the others. In
such a case we could have O(n2) unsafe conditions, destined to expire as
a result of O(n) promotions/demotions. In fact even a single new binding
constraint may cause O(n2) threats to expire. For instance, if there are
n=2 effects (not (P ?x)) threatening n=2 causal links labeled (P ?y), then
if binding constraint (NEQ ?x ?y) is added, all n2=4 threats expire. Recall
that when expired threats are selected from the agenda by ucpop, they are
recognized as such and discarded without further action.

101

Gerevini & Schubert Our conclusion is that it would be a mistake to include
UC in full in a general h-measure, though some increasing function of UC
that remains small enough not to mask OC may be worth including in h.

Finally, can UC be regarded as a measure of the remaining cost of finding
a solution? Here, similar arguments to those above apply. On the affirmative
side, we can argue that a high value of UC indicates that we may be facing
a combinatorially explosive, timeconsuming search for a set of promotions
and demotions that produce a conflict-free step ordering. In other words,
a high value of UC may indicate a high residual problem-solving cost. (And
at the end of such a search, we may still lack a solution, if no viable step
ordering exists.) On the other hand, we have already noted that unsafe conditions
include many possible conflicts which may expire as a result of subsequent
partial ordering choices and variable binding choices not specifically aimed
at removing these conflicts. So counting unsafe conditions can arbitrarily
overestimate the number of genuine refinement steps, and hence the problem-solving
effort, still needed to complete the plan.

So UC is scarcely more trustworthy as a measure of residual planning cost
than as a measure of residual plan cost.

Thus we conclude that the most promising general heuristic measure for plan
selection is S+OC, possibly augmented with an attenuated form of the UC term
that will not dominate the S+OC component. (For instance, one might add a
small fraction of the term, such as UC/10, or more subtly - to avoid swamping
by a quadratic component - a term proportional to UC:5.)

2.3 The Goal Selection Strategy An important opportunity for improving planning
performance independently of the domain lies in identifying forced refinements,
i.e., refinements that can be made deterministically. Specifically, in considering
possible refinements of a given partial plan, it makes sense to give top
priority to open conditions that cannot be achieved; and then preferring
open conditions that can be achieved in only one way - either through addition
of an action not yet in the plan, or through a unique match against the initial
conditions.

The argument for giving top priority to unachievable goals is just that a
plan containing such goals can be eliminated at once. Thus we prevent allocation
of effort to the refinement of doomed plans, and to the generation and refinement
of their doomed successor plans.

The argument for preferring open conditions that can be achieved in only
one way is equally apparent. Since every open condition must eventually be
established by some action, it follows that if this action is unique, it
must be part of every possible completion of the partial plan under consideration.
So, adding the action is a "zero-commitment" refinement, involving no choices
or guesswork. At the same time, adding any refinement in general narrows
down the search space by adding binding constraints, ordering constraints
and threats, which constrain both existing steps and subsequently added steps.
For unique refinements this narrowing-down is monotonic, never needing revocation.
For example, suppose some refinement happens to add constraints that eliminate
a certain action instance A as a possible way of achieving a certain open
condition C. If the refinement is unique, then we are assured that no completion
of the plan contains A as a way of establishing C. If it is not unique, we
have no such assurance, since some alternative refinement may be

102

Accelerating Partial-Order Planners compatible with the use of A to achieve
C. In short, the zero-commitment strategy cuts down the search space without
loss of access to viable solutions.

Peot and Smith (1993) studied the strategy of preferring forced threats to
unforced threats, and also used a "least commitment" (LC) strategy for handling
open conditions. Least commitment always selects an open condition which
generates the fewest refined plans. Thus it entails the priorities for unachievable
and uniquely achievable goals above (while also entailing a certain prioritization
of nonuniquely achievable goals). Joslin and Pollack (1994) studied the uniform
application of such a strategy to both threats and open conditions in ucpop,
terming this strategy "least cost flaw repair" (LCFR). Combining this with
ucpop's default plan selection strategy, they obtained significant search
reductions (though less significant running time reductions, mainly for implementation
reasons, but also because of the intrinsic overhead of computing the "repair
costs") for a majority of the problems in the ucpop test suite.

Joslin & Pollack (1994) and subsequently Srinivasan & Howe (1995) proposed
some variants of LCFR designed to reduce the overhead incurred by LCFR for
flaw selection. These strategies employ various assumptions about the flaw
repair costs, allowing the more arduous forms of cost estimation (requiring
look-ahead generation of plans) to be confined to a subset of the flaws in
the plan, while for the rest an approximation is used that does not significantly
increase the overhead. Both teams obtained quite significant reductions in
overhead costs in many cases, e.g., by factors ranging from about 3 to about
20 for the more difficult problems. However, overall performance was sometimes
adversely affected. Joslin and Pollack found that their variant (QLCFR) solved
fewer problems than LCFR, because of an increase in the number of plans generated
in some cases. Each of Srinivasan & Howe's four strategies did slightly better
than LCFR in some of their 10 problem domains but significantly worse in
others. In terms of plans examined during the search, their best overall
strategy, which uses similar action instances for similar flaws, did slightly
better on 4 of the domains, slightly worse on 4, and significantly worse
on 2 (and in those cases the number of plans examined was also more than
a factor of 20 above that of default ucpop).

In the unmodified form of ucpop, goals are selected from the agenda according
to a LIFO (last-in first-out, i.e., stack) discipline. Based on experience
with search processes in AI in general, such a strategy has much to recommend
it, as a simple default. In the first place, its overhead cost is low compared
to strategies that use heuristic evaluation or lookahead to prioritize goals.
As well, it will tend to maintain focus on the achievement of a particular
higher-level goal by regression - very much as in Prolog goal chaining -
rather than attempting to achieve multiple goals in breadth-first fashion.

Maintaining focus on a single goal should be advantageous at least when some
of the goals to be achieved are independent. For instance, suppose that two
goals G1 and G2 can both be achieved in various ways, but choosing a particular
method of achieving G1 does not rule out any of the methods of achieving
G2. Then if we maintain focus on G1 until it is solved, before attempting
G2, the total cost of solving both goals will just be the sum of the costs
of solving them individually. But if we switch back and forth, and solutions
of both goals involve searches that encounter many dead ends, the combined
cost can be much larger. This is because we will tend to search any unsolvable
subtree of the G1 search tree repeatedly, in combination with various alternatives
in the G2 search tree (and vice versa). This argument should still have some
validity even if G1 and G2 are not entirely

103

Gerevini & Schubert independent; i.e., as long as G1 gives rise to subproblems
that tend to fail in the same way regardless of choices made in the attempt
to solve G2 (or vice versa), then shifting attention between G1 and G2 will
tend to generate a set of partial plans that unnecessarily "cross-multiplies"
alternatives.

We have therefore chosen to stay with ucpop's LIFO strategy whenever there
are no zero commitment choices. This has led to very substantial improvements
over LCFR in our experiments.

Thus our strategy, which we term ZLIFO ("zero-commitment last-in first-out"),
chooses the next flaw according to the following preferences:

1. a definite threat (*d-sep* is turned on), using LIFO to pick among these;
2. an open condition that cannot be established in any way; 3. an open condition
that can be resolved in only one way, preferring open conditions

that can be established by introducing a new action to those that can be
established by using *start*;6

4. an open condition, using LIFO to pick among these.

Hence the overhead incurred by ZLIFO for flaw selection is limited to the
open conditions, and is lower for these than the overhead incurred by LCFR.
Furthermore, it can also be significantly lower in practice than the overhead
incurred by LC, because testing whether an OC is not a zero-commitment choice
(i.e., whether it can be established in more than one way) is less expensive
than computing the total number of ways to achieve it.

In Online Appendix 1 we give the pseudocode of ZLIFO for the selection of
the open condition (preferences 2-4). Very recently this implementation has
also been packaged into ucpop 4.0, a new version of ucpop which is available
by anonymous ftp to cs.washington.edu.

3. Experiments Using UCPOP In order to test our ideas we modified version
2.0 of ucpop (Barrett et al., 1994), replacing its default plan-selection
strategy (S+OC+UC) and goal-selection strategy (LIFO) to incorporate strategies
discussed in the previous sections.

We tested the modified planner on several problems in the ucpop suite, emphasizing
those that had proved most challenging for previous strategies, on some artificial
problems due to Kambhampati et al. (1995), in the trains transportation domain
developed in Rochester (Allen & Schubert, 1991; Allen et al., 1995), and
in Joslin & Pollack's TileWorld domain (Joslin & Pollack, 1994). We briefly
describe the test problems and the platforms and parameter settings we used,
and then present the experimental results for our improved search strategies.

6. 2. and 3. are zero-commitment choices. In our experiments, which are described
in the next section, the

sub-preference in 3. gave improvements in the context of Russell's tire changing
domain (in particular with Fix3), without significant deterioration of performance
in the other domains.

104

Accelerating Partial-Order Planners 3.1 Test Problems and Experimental Settings
The ucpop problems include Towers of Hanoi (T of H), Fixa, Fix3, Fixit, Tower-Invert4,
Test-Ferry, and Sussman-Anomaly. In the case of T of H, we added a 3-operator
version to the ucpop single-operator version, since T of H is a particularly
hard problem for ucpop and its difficulty has long been known to be sensitive
to the formalization (e.g., (Green, 1969)). Fixa is a problem from Dan Weld's
"fridge domain", in which the compressor in the fridge is to be exchanged,
requiring unscrewing several screws, stopping the fridge, removing the backplane,
and making the exchange. Fix3 is from Stuart Russell's "flat tire domain",
where a new wheel is to be mounted and lowered to the ground (the old wheel
has been jacked up already and the nuts loosened); this requires unscrewing
the nuts holding the old wheel, removing the wheel, putting on the new wheel,
screwing on the nuts, jacking down the hub, and tightening the nuts. Fixit
is more complicated, as the wheel is not yet jacked up initially and the
nuts not yet loosened, the spare tire needs to be inflated, and the jack,
wrench and pump all need to be taken out of the trunk and stowed again at
the end. Tower-Invert4 is a problem in the blocks world, requiring the topmost
block in a stack of four blocks to be made bottom-most. Test-Ferry is a simple
problem requiring two cars to be moved from A to B using a one-car ferry,
by boarding, sailing, and unboarding for each car.

The artificial problems correspond to two parameter settings for ART-#est-#clob,
one of the two artificial domains that served as a testbed for Kambhampati
et al.'s extensive study of the behavior of various planning strategies as
a function of problem parameters (Kambhampati et al., 1995). ART-#est-#clob
provides two layers of 10 operators each, where those in layer 1 achieve
the preconditions of those in layer 2, and each operator in layer 2 achieves
one of the 10 goals. However, some operators in each layer can establish
or clobber the preconditions of their neighbors, and this can force operators
to be used in a certain order.

The version of the trains domain that we encoded involves four cities (Avon,
Bath, Corning, Dansville) connected by four tracks in a diamond pattern,
with a fifth city (Elmira) connected to Corning by a fifth track. The available
resources, which are located at various cities, consist of a banana warehouse,
an orange warehouse, an orange juice factory, three train engines (not coupled
to any cars), 4 boxcars (suitable for transporting oranges or bananas), and
a tanker car (suitable for transporting orange juice). Goals are typically
to deliver oranges, bananas, or orange juice to some city, requiring engine-car
coupling, car loading and unloading, engine driving, and possibly OJ-manufacture.

The TileWorld domain consists of a grid on which holes and tiles are scattered.
A given tile may or may not fit into a particular hole. The goals are to
fill one or more holes by using three possible actions: picking up a tile,
going to an x-y location on the grid, and dropping a tile into a hole. The
agent can carry at most four tiles at a time.

Formalizations of these domains in terms of ucpop's language are provided
in Online Appendix 2. The experiments for all problems except Fixit, the
trains problems and the TileWorld problems were conducted on a sun 10 using
Lucid Common Lisp 4.0.0, while the rest (Tables X-XI in the next subsection)
were conducted on a sun 20 using Allegro Common Lisp 4.2. Judging from some
repeated experiments, we do not think that the

105

Gerevini & Schubert Goal-selection Plan-selection CPU sec Plans

LIFO S+OC+UC 204.51 160,911/107,649 LIFO S+OC 0.97 751/511 ZLIFO S+OC+UC
6.90 1816/1291 ZLIFO S+OC 0.54 253/184

Table I: Performance of plan/goal selection strategies on T-of-H1

differences in the platforms significantly impact performance improvements.7
Among the search control functions provided by ucpop, we used the default
bestf-search when the problem was solvable within the search limit of 40,000
plans generated, while we used the function id-bf-search (an implementation
of the linear-space best-first search algorithm given by Korf, 1992), when
this limit was exceeded.8 In all of the experiments the delayseparation switch,
*d-sep*, was on, except for those using the LCFR strategy.

3.2 Experimental Results for ZLIFO and S+OC Tables I-XI show the CPU time
(seconds) and the number of plans created/explored by ucpop on twelve problems
in the domains described above: Towers of Hanoi with three disks and either
one operator (T-of-H1) or three operators (T-of-H3), the fridge domain (Fixa),
the tire changing domain (Fix3 and Fixit), the blocks world (Tower-Invert4
and Sussman-anomaly), the ferry domain (Test-Ferry), the artificial domain
ART-#est-#clob (specifically, ART-3-6 and ART-6-3), the trains domain (Trains1,
Trains2 and Trains3) and the TileWorld domain (tw-1, ..., tw-6). Both the
number of plans created/explored and the CPU time are important performance
measures. The number of plans, which indicates search space size, is a more
stable measure in the sense that it depends only on the search algorithm,
not the implementation.9 But the time is still of interest since an improvement
in search may have been purchased at the price of a more time-consuming evaluation
of alternatives. It turns out that we do pay some price in overhead when
we substitute our strategies for the defaults (factors ranging from about
1.2 to 1.9, and rarely higher, per plan created). This may be due to slightly
greater inherent complexity of ZLIFO versus LIFO, but we think the differences
could be reduced by substituting modified data structures for those of ucpop
- we were committed to not altering these.

Tables I and II show that for the T of H the plan selection strategy S+OC
gives dramatic improvements over the default S+OC+UC strategy. (In these
tests the default LIFO goal selection strategy was used.) In fact, ucpop
solved T-of-H1 in 0.97 seconds using S+OC versus 204.5 seconds using S+OC+UC.
T-of-H3 proved harder to solve than T-of-H1, re7. The differences were the
result of what was available at different times and locales over the course
of

nearly two years of experimentation. 8. This choice was motivated by the
observation that when the problem is relatively easy to solve

bestf-search appears to be more efficient than id-bf-search, while for hard
problems it can be very inefficient because of the considerable amount of
space used at run time and the CPU time spent on garbage collection, which
in some cases made Lisp crash, reporting an internal error. 9. It is also
worth noting that the number of plans created implicitly takes into account
plan size, since

addition of a step to a plan is counted as creation of a new plan in ucpop.

106

Accelerating Partial-Order Planners Goal-selection Plan-selection CPU sec
Plans

LIFO S+OC+UC ? 600 ? 500,000 LIFO S+OC 8.54 5506/3415 ZLIFO S+OC+UC ? 600
? 500,000 ZLIFO S+OC 1.24 641/420

Table II: Performance of plan/goal selection strategies on T-of-H3

Goal-selection Plan-selection CPU sec Plans

LIFO S+OC+UC 2.45 2131/1903 LIFO S+OC 2.48 2131/1903 ZLIFO S+OC+UC 0.33 96/74
ZLIFO S+OC 0.33 96/74

Table III: Performance of plan/goal selection strategies on Fixa

quiring 8.5 seconds using S+OC and an unknown time in excess of 600 CPU seconds
using S+OC+UC.

Our ZLIFO goal-selection strategy can significantly accelerate planning compared
with the simple LIFO strategy. In particular, when ZLIFO was combined with
the S+OC planselection strategy in solving T of H, it further reduced the
number of plans generated by a factor of 3 in T-of-H1 and by a factor of
8 in T-of-H3. The overall performance improvement for T-of-H1 was thus a
factor of 636 in terms of plans created and factor of 379 in terms of CPU
time (from 204.5 to 0.54 seconds).

Tables III-VIII provide data for problems that are easier than T of H, but
still challenging to ucpop operating with its default strategy, namely Fixa
(Table III), Fix3 (Table IV), Tower-Invert4 (Table V), Test-Ferry (Table
VI) and the artificial domain ART-#est-#clob with #est = 3 and #clob = 6
(Table VII) and with #est = 6 and #clob = 3 (Table VII). The results show
that the combination of S+OC and ZLIFO substantially improves the performance
of ucpop in comparison with its performance using S+OC+UC and LIFO. The number
of plans generated dropped by a factor of 22 for Fixa, by a factor of 5.9
for

Goal-selection Plan-selection CPU sec Plans

LIFO S+OC+UC 6.50 3396/2071 LIFO S+OC 0.43 351/215 ZLIFO S+OC+UC 1.12 357/221
ZLIFO S+OC 1.53 574/373

Table IV: Performance of plan/goal selection strategies on Fix3

107

Gerevini & Schubert Goal-selection Plan-selection CPU sec Plans

LIFO S+OC+UC 1.35 808/540 LIFO S+OC 0.19 148/105 ZLIFO S+OC+UC 2.81 571/378
ZLIFO S+OC 0.36 142/96

Table V: Performance of plan/goal selection strategies on Tower-Invert4

Goal-selection Plan-selection CPU sec Plans

LIFO S+OC+UC 0.63 718/457 LIFO S+OC 0.32 441/301 ZLIFO S+OC+UC 0.24 136/91
ZLIFO S+OC 0.22 140/93

Table VI: Performance of plan/goal selection strategies on Test-Ferry

Goal-selection Plan-selection CPU sec Plans

LIFO S+OC+UC .67 568/392 LIFO S+OC 1.36 1299/840 ZLIFO S+OC+UC 0.16 72/49
ZLIFO S+OC 0.18 79/54

Table VII: Performance of plan/goal selection strategies on ART-#est-#clob
with #est = 3

and #clob = 6 (averaged over 100 problems)

Goal-selection Plan-selection CPU sec Plans

LIFO S+OC+UC 1.32 985/653 LIFO S+OC 2.08 1743/1043 ZLIFO S+OC+UC 0.14 57/37
ZLIFO S+OC 0.14 57/37

Table VIII: Performance of plan/goal selection strategies on ART-#est-#clob
with #est = 6

and #clob = 3 (averaged over 100 problems)

Goal-selection Plan-selection CPU sec Plans

LIFO S+OC+UC 0.06 44/26 LIFO S+OC 0.04 36/21 ZLIFO S+OC+UC 0.12 67/43 ZLIFO
S+OC 0.07 41/25

Table IX: Performance of plan/goal selection strategies on Sussman-anomaly

108

Accelerating Partial-Order Planners 1 10 100 1000

10 100 1000 10000 100000 1e+06 1e+07

Problem size

Sussman-anomaly

ART-3-6

Test-ferryTower-invert4

Fixa

Fix3

T-of-H1 Trains1 ART-6-3

Fixit Performance Improvement

2

2

22

2

2

2

2

2

2

2 Search space reduction

Speedup

Figure 1: Performance improvement due to ZLIFO and S+OC, relative to the
number of

plans generated by LIFO and S+OC+UC (log-log scale). The improvements for
the problems that ucpop was unable to solve even with a very high search
limit (Trains2, Trains3, and T-of-H3) are not included.

Fix3, by a factor of 5.7 for Tower-Invert4, by a factor of 5.1 for Test-Ferry,
by a factor of 7 for ART-3-6, and by a factor of 17 for ART-6-3.

Concerning ART-#est-#clob, note that the performance we obtained with unenhanced
ucpop (568 plans generated for ART-3-6 and 985 for ART-6-3) was much the
same as (just marginally better than) reported by Kambhampati et al. (1995)
for the best planners considered there (700 - 1500 plans generated for ART-3-6,
and 1000-2000 for ART-6-3). This is to be expected, since ucpop is a generalization
of the earlier partial-order planners. Relative to standard ucpop and its
predecessors, our "accelerated" planner is thus an order of magnitude faster.
Interestingly, the entire improvement here can be ascribed to ZLIFO (rather
than S+OC plan selection, which is actually a little worse than S+OC+UC).
This is probably due to the unusual arrangement of operators in ART-#est-#clob
into a "clobbering chain" (An

Gamma ;1 clobbers AnGamma Gamma 1;1's preconditions, ..., A1;1 clobbers
A0;1's preconditions;similarly for A

i;2), which makes immediate attention to new unsafe conditions an unusually
good strategy.

In experimenting with various combinatorially trivial problems that unmodified
ucpop handles with ease, we found that the S+OC and ZLIFO strategy is neither
beneficial nor harmful in general; there may be a slight improvement or a
slight degradation in performance. Results for the Sussman anomaly in Table
IX provide an illustrative example.

We summarize the results of Tables I-X in Figure 1, showing the performance
improvements obtained with the combined ZLIFO goal selection strategy and
S+OC plan selection

109

Gerevini & Schubert Trains1 Trains2 Trains3 Fixit ZLIFO & Plans 4097/2019
17,482/10,907 31,957/19,282 5885/3685

S+OC Time 13.7 80.6 189.8 32.5

LC & Plans 438/242 34,805/24,000 253,861/168,852 71,154/46,791 S+OC Time
2.6 368.9 1879.9 547.8 LCFR & Plans 1093/597 ?1,000,000 ?1,000,000 190,095/117,914

S+OC Time 10.65 ?10,905 ?9918 4412.36 LIFO & Plans 1,071,479/432,881 ? 10,000,000
? 1,000,000 8,090,014/4,436,204 S+OC+UC Time 3050.15 ? 37,879 ? 2539 27,584.9

Table X: Performance of the plan selection strategy S+OC in combination with
the goal

selection strategies ZLIFO, LCFR and LC in solving problems which are very
hard for the default strategies of ucpop (S+OC+UC/LIFO). (The CPU seconds
do not include Lisp garbage collection. The number of plans generated for
LCFR does not include those created in order to estimate the repair cost
of the flaws.)

Problem ZLIFO* LCFR

name CPU time Plans CPU time Plans

tw-1 0.09 26/15 0.10 26/15 tw-2 0.61 72/39 0.66 72/39 tw-3 2.55 138/71 3.17
139/72 tw-4 7.80 224/111 10.97 227/114 tw-5 19.41 330/159 30.17 336/165 tw-6
42.57 456/215 71.10 466/225

Table XI: Performance of UCPOP in the TileWorld domain using ZLIFO* and LCFR
for

goal selection, and S+OC+F+0.1UC for plan selection

strategy as a function of problem difficulty (as indicated by the number
of plans generated by the default LIFO plus S+OC+UC strategy). The trend
toward greater speedups for more complex problems (though somewhat dependent
on problem type) is quite apparent from the log-log plot.

For direct comparison with Joslin and Pollack's LCFR strategy and Peot and
Smith's LC strategy, we implemented their strategies and applied them to
several problems. They did very well (sometimes better than ZLIFO) for problems
on the lower end of the difficulty spectrum, but poorly for harder problems.
(For all the problems we ran, LC with the *d-sep* switch on performed better
than LCFR in terms of plans explored and CPU time required.) For T-of-H1
LCFR in combination both with the default S+OC+UC plan selection strategy,
and with our S+OC plan strategy did not find a solution within a search limit
of 200,000 plans generated (cf. 253 for ZLIFO with S+OC, and 751 for ZLIFO
with S+OC+UC), requiring an unknown CPU time in excess of 4254 seconds with
S+OC+UC,

110

Accelerating Partial-Order Planners and in excess of 4834 seconds with S+OC
(cf. 0.54 seconds for ZLIFO with S+OC).10 LC performed much better than LCFR
but still considerably worse than ZLIFO, solving T-of-H1 by generating/exploring
8313/6874 plans with S+OC and 8699/6441 plans with S+OC+UC, and requiring
44.4 CPU secs. and 48.95 CPU secs. respectively. For T-ofH3, LC found a solution
by generating/exploring 21,429/15,199 plans with S+OC+UC and 17,539/14,419
plans with S+OC, requiring 145.18 CPU secs. and 77.84 CPU secs. respectively.

Table X shows the results for the plan strategy S+OC, with the goal strategies
ZLIFO, LCFR and LC, applied to three problems (Trains1, Trains2 and Fixit).
As shown by the data in the table these are very hard for the default strategies
of ucpop (LIFO & S+OC+UC), but become relatively easy when S+OC is used in
combination either with ZLIFO, LCFR or LC. While LCFR and LC did slightly
better than ZLIFO for Trains1 (the easiest of these problems), they performed
quite poorly for Fixit, Trains2 and Trains3 (the hardest problems) compared
to ZLIFO.

Joslin and Pollack (1994) tested their LCFR strategy on six problems in the
TileWorld (tw-1, ..., tw-6), five of which are very hard for default ucpop,
but easy for ucpop using LCFR.11 We tested our ZLIFO strategy in the TileWorld
using the same six problems. ZLIFO did well for tw-1-4, but for tw-5 and
tw-6 its performance dropped well below that of LCFR. This raised the question
whether for these particular problems it is crucial to minimize "repair cost"
in flaw selection uniformly, rather than just in certain special cases (ZLIFO
does minimize the repair cost when no threat is on the flaw list, and at
least one zerocommitment open condition is present). However, further experiments
aimed at answering this question suggested that the poor choices made by
ZLIFO for some TileWorld problems were not due to selection of "high cost"
over "low cost" flaws. Instead two factors appear be crucial for improving
ZLIFO: (a) emphasizing zero-commitment open conditions by giving them higher
priority than threats; (b) when there are no zero-commitment open conditions,
resolving threats as soon as they enter the agenda. (We realized the relevance
of (b) by observing that the performance of a modified versions of LCFR,
where the *d-sep* switch is implicitly forced on, dramatically degraded for
tw-6 in a slightly different formulation of the TileWorld.)

We extended our ZLIFO strategy to include (a) and (b), and we briefly tested
the resulting variant of ZLIFO (ZLIFO*). Table XI shows the results for ZLIFO*
together with the plan selection strategy S+OC+0.1UC+F, where as discussed
in Section 2.3 we included an attenuated form of the UC term (UC/10), and
an F term equal to the number of facts since TileWorld uses facts to track
the number of tiles carried by the agent.12 ZLIFO*

10. This was with *d-sep* turned off, which is the implicit setting in LCFR
(Joslin, 1995). In our experiments

we also tested a variant of LCFR, where the switch is forced to be on. The
resulting goal strategy in combination with our plan strategy S+OC performed
significantly better for T-of-H1, solving the problem generating/exploring
7423/6065 plans, and using 110.45 CPU seconds. Note also that a comparison
of our implementation of LCFR and Joslin & Pollack's implementation used
for the experiments discussed in (Joslin & Pollack, 1994) showed that our
implementation is considerably faster (Joslin, 1995). 11. In their experiments
tw-2, the easiest among tw-2-6, was not solved by ucpop even when allowed
to run

for over eight hours. On the other hand, ucpop using LCFR solves tw-6, the
hardest problem, without ever reaching a dead-end node in the search tree.
12. In the ZLIFO* experiments the refined plans generated by resolving a
threat were added to the flaw list

in the following order: first the plan generated by promotion, then the plan
generated by demotion, and finally the plan generated by confrontation or
separation.

111

Gerevini & Schubert performed very efficiently for all six TileWorld problems,
in fact a little better than LCFR. Note that for these problems ZLIFO* is
more efficient than LCFR in terms of the CPU time, even though the number
of plans generated/explored by the two strategies is approximately the same.
This is because the overhead of selecting the next flaw to be handled is
higher in LCFR than in ZLIFO* (and ZLIFO). In fact, while LCFR needs to compute
the "repair cost" of each flaw (including the threats) in the current plan,
ZLIFO* (ZLIFO) only needs to check for the presence of zero-commitment open
conditions, without processing the threats.

Additional experiments indicated that the average performance of ZLIFO* is
comparable to that of ZLIFO for most of the other problems we used in our
experiments, in terms of plans created/explored. However, the CPU time tends
to increase since the overhead of computing the goal selection function is
higher for ZLIFO* than for ZLIFO, because of the extra agenda-management
costs. Because of this overhead, we do not regard ZLIFO* as generally preferable
to ZLIFO. However, the TileWorld experiments underscored for us that in some
worlds refinements of ZLIFO are advantageous.

Finally, another possible variant of ZLIFO, which was suggested to us by
David Smith, is based on the following preferences of the next flaw to be
handled: (i) a threat that cannot be resolved; (ii) an open condition that
cannot be established; (iii) a threat that has only one possible resolution;
(iv) an open condition that can only be established in one way; (v) other
threats; (vi) other open conditions (using LIFO to pick among these). We
observe that while this strategy could give further savings in terms of plans
created/explored, it also imposes an additional overhead with respect to
both ZLIFO and ZLIFO* which could degrade performance in terms of CPU time.

4. Precomputing Parameter Domains Even with the speedups obtained through
improved search, a ucpop-like algorithm remains severely limited in the complexity
of problems it can solve. We believe that significant further progress requires
fuller use of global properties of the search space, as determined by the
structure of the operators, initial conditions, and goals. One way to do
that would be through a more in-depth analysis of alternatives during the
search, but this can lead to high overhead costs. Another is to precompute
constraints on the search space, and to use these during planning to prune
the search. The parameter domain method we now motivate and describe is of
the latter type.

4.1 How Can Parameter Domains Help? In our previous experimentation with
ucpop strategies, we found that ucpop goal regression often hypothesized
steps that were doomed to be abandoned eventually, because they stipulated
impossible parameter bindings. A clear example of this occurred in the Molgen
domain, as encoded in the ucpop test suite. The goal of the "Rat-insulin"
test problem is

(and (bacterium ?b) (molecule ?m)

(contains IG ?m) (contains ?m ?b) (pure ?b)),

where ?b and ?m are existentially quantified variables. What this means is
that we wish to create a purified bacterial culture ?b, where ?b contains
a molecule ?m (necessarily an

112

Accelerating Partial-Order Planners exosome, it turns out), and this molecule
in turn contains the insulin gene, IG. We are using the abbreviations IG,
EE, JE, L for insulin-gene, e-coli-exosome, junk-exosome, and linker; and
E, J, A1 for e-coli, junk, and antibiotic-1. Roughly speaking, the solution
involves processing the initially given mRNA form of the insulin gene so
as to produce a form of insulin DNA that can be spliced into the e-coli-exosome,
using a ligate operator. In turn, the exosome is inserted into the e-coli
bacterium using a transform operator, and the bacterial culture is then purified
using a screen operator, with antibiotic-1. (The junk bacterium and exosome
merely serve to complicate the task - they are nearly, but not quite, substitutable
for the e-coli bacterium and exosome; the junk exosome, unlike e-coli-exosome,
is not resistant to antibiotic-1, violating a precondition of screen.)

Now, in the initial regression the goals (bacterium ?b) and (molecule ?m)
can be established only with the *start* operator, i.e., with the initial
conditions, and thus will not be instantiated to bizarre values. (The initial
conditions supply E and J as the only instances of bacterium, and IG, EE,
JE, and L as the only instances of molecule.) On the other hand, the remaining
goals turn out to match the effects of various instances of the ligate, transform,
and screen operators of Molgen, as follows:

(contains IG ?m): (ligate IG ?m), (transform IG ?m) (contains ?m ?b): (ligate
?m ?b), (transform ?m ?b) (pure ?b): (screen ?b ?y ?z)

ucpop will happily regress on these actions. Yet two of them, (transform
IG ?m) and (ligate ?m ?b), are doomed to fail, perhaps after a great deal
of effort has been expended on trying to satisfy their preconditions. In
particular, examination of the constants that can "flow into" the transform
operator from the initial conditions and other Molgen operators shows that
its first argument is restricted to domain fEE, JEg, i.e., it must be one
of the given exosomes, and the second is restricted to fE, Jg, i.e., it must
be one of the given bacteria. Consequently the instance (transform IG ?m)
is unrealizable, as its first argument IG is not in fEE, JEg. (Note that
distinct constants denote distinct entities according to the unique-names
assumption made by ucpop.) The (ligate ?m ?b) action is doomed for slightly
more subtle reasons. It is the result of a match between (contains ?m ?b)
and a "when-clause" (conditional effect) of the ligate operator, whose preconditions
can be reached only if the second parameter ?b lies in the set of molecules
fIG, JE, EEg; yet ?b is also restricted to the set of bacteria fE, Jg, as
a result of the goal condition (bacterium ?b). The fact that these sets are
disjoint should allow us to eliminate the (transform IG ?m) action.

Note that elimination of action candidates as above increases the number
of zero commitment plan refinements that can be made. In the example, we
are left with exactly one action for each of the three goals, and so the
ZLIFO and LCFR strategies will prefer to regress on these goals rather than
regressing on (bacterium ?b) and (molecule ?m) - which would prematurely
make arbitrary choices of ?b and ?m from the initial state.

4.2 Description of the Algorithm In any completed plan, each precondition
of each action must be instantiated by an effect of some earlier action.
So the values of the parameters of the action can only be values that

113

Gerevini & Schubert can be "produced" by earlier actions, starting with the
initial action, *start*. Moreover, suppose that a parameter x of a certain
action occurs in each of preconditions P1, ..., Pk. Then a constant c is
a possible value of x only if earlier actions can instantiate x to c in each
of P1, ..., Pk.

Our algorithm find-parameter-domains is based on these observations. Beginning
in the initial state, it propagates positive atomic predications to all possible
operator preconditions. For a propagated ground atom, if the atom matches
an operator precondition, the algorithm adds the constants in that ground
atom to the individual domains of the parameters they were unified with.
These individual domains are particular to specific preconditions. For instance,
the individual domain of ?x for an operator with preconditions (on ?x ?y),
(clear ?x) will in general be distinct for these two preconditions.

As soon as we have nonempty individual domains for all parameters in all
preconditions of an operator, we form the intersection of the individual
domains of each parameter of the operator. For example, if (on ?x ?y) has
(so far) been matched by (on A B) and (on B C), and (clear ?x) has (so far)
been matched by (clear A) and (clear Table), then the individual domain of
x will be fA,Bg in the first precondition and fA,Tableg in the second. Thus
(assuming there are no other preconditions) the intersected domain of ?x
will be fAg at this point. If later (clear B) is also matched against (clear
?x), the intersected domain of ?x will grow to fA,Bg. When both ?x and ?y
have nonempty intersected domains, the effects (postconditions) of the operator
can in turn be propagated, with ?x and ?y "bound" to their intersected domains.

The propagated effects are again matched against all possible operator preconditions,
and when a variable "bound" to an intersected domain is successfully unified
with a variable in a precondition, it passes its intersected domain to the
individual domain of that precondition-variable (via a union operation).
This can again lead to growth of the intersected domains of the operator
whose precondition was matched, the effects of that operator may then be
propagated, and so on. The individual domains and intersected domains grow
monotonically during the propagation process, and in the end represent the
desired parameter domains of the operators.

We illustrate this process through an example. Consider the simple planning
problem depicted in Figure 2 where an "operator graph" (Smith & Peot, 1993)
is used to describe the logical dependencies among the operators, while the
iterative computation of the parameter domains is graphically illustrated
with a "domain-propagation graph" below the operator graph.

The initial conditions (P A) and (P B) unify with the precondition (P ?x)
of op1. So, the individual domain of ?x relative to the precondition P of
op1 is fA,Bg. On the other hand, the precondition (Q ?x) of op1 cannot be
satisfied by the initial state, and so the individual domain of ?x relative
to Q is initially the empty set. Hence the intersected domain of ?x for op1
is also the empty set.

For op2 we have a different situation, since here we have only one precondition
and it can be established by the initial state. Therefore, the individual
domain of ?y relative to precondition R of op2 is the set of constants fB,Cg,
and the intersected domain of ?y for op2 is the same set (because R is the
only precondition of op2 involving ?y). Since the intersected domain of ?y
has been enlarged (initially it was empty), it is propagated to the individual
domains of the other operators through the effect (Q ?y) of op2. In particular,

114

Accelerating Partial-Order Planners

indicates bundle of edges *start* *end*

(P ?x) (Q ?x)

(R ?y)

(S ?z)

(T B) op1

op2

op3             effects:               effects:              effects: op1:
preconds:  op3:     preconds:  (P ?x),(Q ?x)

(S ?x)

(R ?y) (Q ?y) op2:     preconds:   (S ?z)

(T ?z)

Init state:  Goal: (T B)

(P A),(P B),(R B),(R C),(S C)

id(?y)={B,C} ID(P,?x)={A,B}ID(Q,?x)={}

id(?x)={} id(?x)={B}

ID(P,?x)={A,B} ID(Q,?x)={B,C}

id(?z)={A,B} ID(S,?z)={A,B}ID(S,?z)={A} id(?z)={A}

ID(R,?y)={B,C}

(S ?z)

(Q ?y) (P B) (P A)

(R C) (R B)

S(A)

(R ?y) (Q ?x)

(P ?x)

(S ?z)

(S ?x) (Q ?x)op1 op2

op1

op3op3 Figure 2: Operator and domain-propagation graphs for a simple planning
problem.

ID(?x,P) indicates the individual domain of the parameter ?x relative to
precondition P; id(?x) indicates the intersected domain of the parameter
?x; final intersected domains are indicated using bold fonts.

(Q ?y) matches the precondition (Q ?x) of op1. So, the individual domain
of ?x relative to precondition Q of op1 is updated by adding the constants
of the intersected domain of ?y to it. Thus the intersected domain of ?x
is enlarged to fBg, and can be propagated through the effect (S ?x) of op1.

Similarly, the propagation of (S ?x) will enlarge the individual domain of
?z for op3, and also the intersected domain, to the set fA,Bg. Therefore,
the final intersected domains are: fBg for ?x in op1; fB,Cg for ?y in op2;
fA,Bg for ?z in op3.

Before presenting the algorithm a little more formally, we note that the
parameter domains will sometimes be "too large", including values that would
be found to be impossible

115

Gerevini & Schubert if a more detailed state space exploration were conducted.
However, all that is required for soundness in our use of the domains is
that they not be "too small" (i.e., that they contain all parameter values
that can actually occur in the problem under consideration). Of course, to
be of practical use the parameter domains of an operator should exclude some
of the constants occurring in the problem specification, particularly those
for which it is intuitively obvious that they are of the wrong sort to fill
particular argument slots of the operator. This has turned out to be the
case for all problem domains we have so far experimented with.

The preceding sketch of our method is an oversimplification since preconditions
and effects of ucpop operators may be particular to a when-clause. In this
case we compute individual domains and intersected domains separately for
each when-clause. For example, consider the following schematic representation
of an operator:

(define (operator op1)

:parameters (?x ?y) :precondition (and P1 P2) :effect (and E1 E2

(when P 0E0) (when P "E") )),

where all conditions starting with P or E denote atomic formulas that may
involve ?x and ?y. We can think of this operator as consisting of a primary
when-clause whose preconditions P1 and P2 must always be satisfied and whose
effects E1 and E2 are always asserted, and two secondary when-clauses whose
respective preconditions P 0 and P " may or may not be satisfied, and when
they are, the corresponding effects E0 and E" are asserted. Here our algorithm
would maintain individual domains for ?x and ?y for each of preconditions
P1, P2, P 0, and P ", and it would maintain intersected domains for ?x and
?y for the primary when-clause and each of the two secondary clauses. The
intersected domains for the secondary clauses would be based on the individual
domains of ?x and ?y not only relative to P 0 and P ", but also on those
relative to P1 and P2, since (as noted) the primary preconditions must hold
for the operator to have any of its effects, including conditional effects.

Some further complications arise when ucpop operators contain universally
quantified preconditions or effects, disjunctive preconditions, or facts
(mentioned in Section 2.2). Rather than dealing with these complications
directly, we will assume that no such operators occur in the input to the
algorithm. Later we describe a semi-automated way of handling operators containing
the additional constructs.

The algorithm is outlined below (a more detailed description is given in
Online Appendix 1). W is a list of (names of) when-clauses whose effects
are to be propagated. Individual parameter domains are initially nil, and
intersected parameter domains are initially either nil or T (where T is the
universal domain). The intersected domain of a parameter, relative to a given
when-clause, is T just in case the parameter occurs neither in the preconditions
of the when-clause nor in the primary preconditions. (In such a case the
successful instantiation of the when-clause is clearly independent of the
choice of value for the parameter in question.) Unification in step 2(a)
is as usual, except that when an effect variable v is unified with a constant
c in a precondition, the unification succeeds,

116

Accelerating Partial-Order Planners with unifier v = c, just in case c is
an element of the intersected domain of v (for the relevant when-clause).
The given inits (initial conditions) and goals (which may be omitted, i.e.,
nil) are treated as an operator *start* with no preconditions and an operator
*end* with no effects. Variables in goals are treated like operator parameters.
We use the terms "parameters" and "variables" interchangeably here.

Algorithm: find-parameter-domains(operators,inits,goals)

1. Initialize W to the initial conditions, so that it contains just the (primary)
when-clause

of *start*.

2. Repeat steps (a-c) until W = nil:

(a) Unify the positive effects of all when-clauses in W with all possible
operator

preconditions, and mark the preconditions successfully matched in this way
as "matched". (This marking is permanent.) Augment the individual domain
of each matched precondition variable with a certain set C of constants,
defined as follows. If the precondition variable was unified with a constant
c, then C = fcg; if it was unified with an effect variable, then C is the
intersected domain of that effect variable (relative to the when-clause to
which the effect belongs).

(b) Mark those when-clauses as "propagation candidates" that have all their
preconditions (including corresponding primary preconditions) marked as "matched"
and that involve at least one variable for which some relevant individual
domain was augmented in step (a).

(c) Reset W to nil. For all when-clauses that are propagation candidates,
compute

new intersected domains for their variables. If an intersected domain of
a whenclause is thereby enlarged, and all intersected domains for the when-clause
are now nonempty, then add the when-clause to W.

3. Further restrict intersected domains using equative preconditions of form
(EQ u v),

i.e., form a common intersected domain if both u and v are variables. If
u is a constant and v is a variable, reduce the intersected domain of v by
intersecting it with fug; similarly if u is a variable and v is a constant.
If the equation belongs to a primary when-clause, use it to reduce the intersected
domains of u and v (whichever are variables) in the secondary clauses as
well.

4. Return the intersected domains as the parameter domains, producing a sequence
of

lists with each list of form

(op (x1 a1 b1 c1 :::) (x2 a2 b2 c2 :::) :::), where each operator op appears
at least once. If op has k conditional effects, there will be k + 1 successive
lists headed by op, where the first provides the parameter domains for the
primary effects of op and the rest provide the parameter domains for the
conditional effects (in the order of appearance in the ucpop definition of
op).

Note that we do not match or propagate negative conditions. The problem with
negative conditions is that a very large number of them may be implicit in
the initial conditions, given

117

Gerevini & Schubert the use of the Closed World Assumption in ucpop. For
instance, in a world of n blocks, with at most O(n) on-relations (assuming
that a block can be on only one other block), we necessarily have O(n2) implicit
(not (on ...)) relations. In fact, the individual variable domains of negative
preconditions or goals can really be infinitely large. For instance, given
an empty initial state and a (paint-red ?x) operation with precondition (not
(red ?x)) and effect (red ?x), we can achieve (red c) for infinitely many
constants c. Perhaps negative conditions could be effectively dealt with
by maintaining anti-domains for them, but we have not explored this since
in practice ignoring negative conditions seems to cause only minimal "domain
bloating". (We have proved that no actual domain elements can be lost through
neglect of some preconditions.)

Our use of EQ-conditions could be refined by making use of them during the
propagation process, and NEQ-conditions could also be used. However, doing
so would probably have marginal impact.

As a final comment, we note that the output format specified in step 4 of
the algorithm is actually generalized in our implementation so as to report
inaccessible preconditions and goals. These inaccessible conditions are simply
appended to the list of parameter domains for the appropriate when-clause
of the appropriate operator. For instance, if the preconditions (oj ?oj)
and (at ?oj ?city) in the ld-oj ("load orange juice") operator of the trains
world (see Online Appendix 2) are unreachable (say, because no oranges for
producing orange juice have been provided), the parameter domain list for
the (unique) when-clause of ld-oj will have the appearance

(ld-oj (?oj ...) (?car ...) (?city ...) (oj ?oj) (at ?oj ?city)). This feature
turns out to be very useful for debugging operator specifications and detecting
unreachable goals.

4.3 Correctness and Tractability In keeping with the remarks in the previous
section, we will call an algorithm for computing parameter domains correct
if the domains it computes subsume all possible parameter values that can
actually occur (in a given primary or secondary when-clause) if we consider
all possible sequences of operator applications starting at the given initial
state.

The point is that this property will maintain the soundness of a planning
algorithm that uses the precomputed parameter domains to prune impossible
actions (as well as spurious threats) from a partially constructed plan.
We assert the following:

Theorem 1 The find-parameter-domains algorithm is correct for computing parameter
domains of ucpop-style sets of operators (without quantification, disjunction,
or facts), initial conditions, and (possibly) goal conditions.

The proof is given in Appendix A. A preliminary step is to establish termination,
using the monotonic growth of domains and the finiteness of the set of constants
involved. Correctness is then established by showing that if there exists
a valid sequence A0A1:::An of actions (operator instances) starting with
A0 = *start*, and if An is an instance of the operator Op, then the bindings
that the parameters of Op received in instance An are eventually added to
the relevant intersected domains of Op (where "relevant" refers to the when-clauses
of Op whose preconditions are satisfied at the beginning of An). This is
proved by induction on n.

118

Accelerating Partial-Order Planners We now indicate how we can deal with
universally quantified preconditions and effects, disjunctive preconditions,
and facts. We make some simple changes to operator definitions by hand in
preparation for parameter domain precomputation, and then use the domains
computed by find-parameter-domains, together with the original operators,
in running the planner. The steps for preparing an operator for parameter
domain precomputation are as follows:

ffl Delete disjunctive preconditions, fact-preconditions,13 and universally
quantified preconditions (this includes universally quantified goals; it
would also include universally quantified sentences embedded within the antecedents
of when-clauses, e.g., in the manner (:when (:forall (?x) Phi ) Psi ),
though these do not occur in any problem domains we have seen).

ffl Drop universal quantifiers occurring positively in operator effects,
i.e., occurring at

the top level or embedded by one or more :and's. For example, an effect

(:and (at robot ?to)

(:not (at robot ?from)) (:forall (?x)

(:when (:and (grasping ?x) (object ?x))

(:and (at ?x ?to) (:not (at ?x ?from))) )))

would become

(:and (at robot ?to)

(:not (at robot ?from)) (:when (:and (grasping ?x) (object ?x))

(:and (at ?x ?to) (:not (at ?x ?from))) )))

Note that the universally quantified variable should be renamed, if necessary,
to be distinct from all other such variables and from all operator parameters.

In the example above the universally quantified variable is unrestricted.
When the quantified variable includes a type restriction, as in (:forall
(object ?x) Phi ), then this type restriction needs to become an antecedent
of the matrix sentence Phi . In the example at hand, Phi  should be rewritten
as the equivalent of (:when (object ?x) Phi ). Since Phi  is often a when-clause,
this can be done by adding (object ?x) as a conjunct to the antecedent of
the when-clause. In some cases Phi  is a conjunction of when-clauses, and
in such a case the quantifier restriction can be added into each when-clause
antecedent.

ffl Drop existential quantifiers in preconditions and goals, adding any restrictions
on the

quantified variables as conjuncts to the matrix sentence. For example, the
goal

(:exists (bacterium ?y)

(:exists (molecule ?x)

(:and (contains IG ?x)

(contains ?x ?y) (pure ?y) )))

13. E.g., in the strips-world we would drop (fact (loc-in-room ?x ?y ?room)),
which checks whether

the given coordinates lie in the given room.

119

Gerevini & Schubert becomes

(:and (bacterium ?y) (molecule ?x) (contains IG ?x)

(contains ?x ?y) (pure ?y) )

(Actually, the :and is dropped as well, when supplying goals to find-parameterdomains.)

With these reductions, find-parameter-domains will then compute correct parameter
domains for the operators and goals. To see this, note first of all that
dropping preconditions (in the initial step above) will not forfeit correctness,
since doing so can only weaken the constraints on admissible parameter values,
and thus can only add constants to the domains. The effect of dropping a
universal quantifier, from the perspective of find-parameter-domains, is
to introduce a new parameter in place of the universal variable. (The operator
normalization subroutine detects variables in operator preconditions and
effects that are not listed as parameters, and treats them as additional
parameters.) While this is of course a drastic change in the meaning of the
operator, it preserves correctness of the parameter domain calculation. This
is because the domain of the new parameter will certainly contain all constants
(and hence, under the Closed World Assumption, all objects) over which the
quantified variable ranges. For example, if ?x is treated as a parameter
rather than a universally quantified variable in the conditional effect

(:forall (?x) (:when (object ?x) (in ?x box))), then the domain of ?x for
the when-clause will consist of everything that can be an object, in any
state where the operator can be applied. Thus the effect (in ?x box) will
also be propagated for all such objects, as is required. Finally, the elimination
of existential quantifiers from preconditions and goals can be seen to preserve
the meaning of those preconditions and goals, and hence preserves the correctness
of the parameter domain calculation.

Next we formally state our tractability claim for the algorithm, as follows
(with some tacit assumptions, mentioned in the proof).

Theorem 2 Algorithm find-parameter-domains can be implemented to run in O(mnpne(np+
ne)) time and O(mnp) space in the worst case, where m is the number of constants
in the problem specification, np is the combined number of preconditions
for all operators (and goals, if included), and ne is the combined number
of operator effects (including those of *start*).

Again the proof is in Appendix A. The time complexity of find-parameter-domains
is determined as the sum of (1) the cost of all the unifications performed,
(2) the costs of all the individual domain updates attempted, and (3) the
cost of all the intersected domain updates attempted. The space complexity
bound is easily derived by assuming that there is a fixed upper bound on
the number of arguments that a predicate (in a precondition or effect) can
have, and from the fact that for each when-clause at most O(m) constants
are stored.

By adding some additional data structures in find-parameter-domains we can
obtain a version of the algorithm whose worst-case time complexity is slightly
improved. In fact, in step 2.(c) instead of propagating all of the effects
of a when-clause with an enlarged

120

Accelerating Partial-Order Planners intersected domain (i.e., adding such
a when-clause to the list W), it is sufficient to propagate just those effects
of the when-clause that involve an enlarged intersected-domain. This could
be done by setting up for each when-clause a table that maps each parameter
to a list of effects (of that when-clause) involving that parameter. In the
improved algorithm we use W to store the list of effects (instead of the
list of whenclauses) that will be propagated in the next cycle of the algorithm,
and steps 1, and 2 of find-parameter-domains are modified in the following
way:

10. Initialize W to the list of the effects of *start*. 20. Repeat steps
(a-c) until W = nil:

(a0) Unify the positive effects in W with all possible operator preconditions,
and mark

the preconditions successfully matched in this way as "matched" ...

(b0) same as 2.(b).

(c0) Reset W to nil. For all when-clauses that are propagation candidates,
compute new intersected domains for their variables. If an intersected domain
of a when-clause is thereby enlarged, and all intersected domains for the
when-clause are now nonempty, then add to W the subset of the effects of
the when-clause involving at least one parameter whose intersected domain
is enlarged.

Note that the worst-case time complexity of the revised algorithm is improved,
because now each effect of each when-clause is propagated at most O(m) times.
This decreases the upper bound on the number of unifications performed, reducing
the complexity estimated in step (1) of the proof of Theorem 2 to O(mnenp).
Hence we have proved the following corollary.

Corollary 1 There exists an improved version of find-parameter-domains that
can be implemented to run in O(mn2pne) time in the worst case.

5. Using Parameter Domains for Accelerating a Planner We have already used
the example of Molgen to motivate the use of precomputed parameter domains
in planning, showing how such domains may allow us to prune non-viable actions
from a partial plan.

More fundamentally, they can be used each time the planner needs to unify
two predications involving a parameter, either during goal regression or
during threat detection. (In either case, one predication is a (sub)goal
and the other is an effect of an action or an initial condition.) If the
unifier is inconsistent with a parameter domain, it should count as a failure
even if it is consistent with other binding constraints in the current (partial)
plan. And if there is no inconsistency, we can use the unifier to intersect
and thus refine the domains of parameters equated by the unifier.

For example, suppose that G = (at ?x ?y) is a precondition of a step in the
current plan, and that E = (at ?w ?z) is an effect of another (possibly new)
step, where ?x, ?y, ?w and ?z are parameters (or, in the case of ?w and ?z,
existentially quantified variables) which have no binding constraints associated
with them in the current plan. Assume also that the domains of the parameters
are:

121

Gerevini & Schubert ?x : -Agent1, Agent2, Agent3"" ?y : -City1, City2"" ?w
: -Agent1, Agent2"" ?z : -City3, City4""

The unification of G and E gives the binding constraints f?x = ?w, ?y = ?zg,
which are not viable because the parameter domains of ?y and of ?z have an
empty intersection. On the other hand, if the domain of ?z had been fCity2,
City3, City4g, then the unification of G and E would have been judged viable,
and the domains of the parameters would have been refined to:

?x : -Agent1, Agent2"" ?y : -City2"" ?w : -Agent1, Agent2"" ?z : -City2""

Thus parameter domains can be incrementally refined as the planning search
progresses; and the narrower they become, the more often they lead to pruning.

5.1 Incorporating Parameter Domains into UCPOP The preceding consistency
checks and domain refinements can be used in a partial-order, causal-link
planner like ucpop as follows. Given a goal (open condition) G selected by
ucpop as the next flaw to be repaired, we can

(1) restrict the set of new operator instances that ucpop would use for establishing
G; an

instance of an operator with effect E (matching G) is disallowed if the precomputed
parameter domains relevant to E are incompatible with the current parameter
domains or binding constraints relevant to G; (note that the current parameter
domains associated with G may be refinements of the initial domains);

(2) restrict the set of existing steps that ucpop would reuse for establishing
G; reusing a

step with effect E (matching G) is disallowed if the current parameter domains
relevant to E are incompatible with the current parameter domains or binding
constraints relevant to G.

Moreover, given a potential threat by an effect Q against a protected condition
P, inspection of the relevant parameter domains may reveal that the threat
is actually spurious. This happens if the unifier of P and Q violates the
(possibly refined) domain constraints of a parameter in P or Q. Thus we can
often

(3) reduce the number of threats that are generated by the planner when a
new causal

link is introduced into the plan (this happens when an open condition is
established either by reusing a step or by introducing a new one);

(4) recognize that a threat on the list of the flaws to be processed is redundant,
allowing

its elimination. (Note that since parameter domains are incrementally refined
during planning, even if we use (3) during the generation of the threats,
it is still possible for a threat to becomes spurious after it has been added
to the flaw list).

These four uses of parameter domains cut down the search space without loss
of viable solutions, since the options that are eliminated cannot lead to
a correct, complete plan.

122

Accelerating Partial-Order Planners Note that (3) and (4) can be useful even
when the planner only deals with definite threats (i.e., *d-sep* switch is
turned on) for at least three reasons. First, determining that a threat is
not a definite threat when *d-sep* is on incurs an overhead cost. So, earlier
elimination of a spurious threat could lead to considerable savings if the
threat is delayed many times during the search. The second reason relates
to the plan-selection strategies adopted. If one uses a function that includes
an (attenuated) term corresponding to the number of threats currently on
the flaw list, then eliminating spurious threats in advance can give a more
accurate measure of the "badness" of a plan. Finally, parameter domains could
be used in threat processing so as to prune the search even when *dsep* is
on. In particular, suppose that we modify the notion of a definite threat,
when we have parameter domains, so that e.g., (P ?x) and (not (P ?y)) comprise
a definite threat if the parameter domains associated with ?x and ?y are
both c. So in that case, even with d-sep* on, we may discover early that
a threat has become definite - in which case it might also be a forced threat,
i.e., the choice between promotion and demotion may be dictated by ordering
constraints; and that can prune the search space. However, in our current
implementation we do not exploit this third point.

We have incorporated these techniques into ucpop (version 2.0), along with
our earlier improvements to the plan and goal selection strategies. Parameter
domains are handled through an extension of the "varset" data structure (Weld,
1994) to include the domains of the variables (parameters), and by extending
the unification process to implement the filtering discussed above.14 We
now describe our experiments with this enhanced system.

5.2 Experimental Results Using Parameter Domains Our main goal here is to
show that while the overhead determined by computing the parameter domains
is not significant (both at preprocessing time and at planning time), exploitation
of the parameter domains during planning can significantly prune the search.
In the experiments we used the version of find-parameter-domains which is
described in Section 4.2 and in Online Appendix 1. Note that for domains
more complex than the ones we have considered it might be worthwhile to use
the improved version of the algorithm discussed in Section 4.3. (However,
it remains to be seen whether problems significantly more complex than those
we consider here can be solved by any ucpop-style planner.)

The CPU times needed by our implementation of find-parameter-domains are
negligible for the problems we have looked at. They were 10 msec or less
for many problems in the ucpop test suite (when running compiled Allegro
CL 4.2 on a sun 20), 20 msec for two problems (Fixa from the fridge repair
domain and Fixit from the flat tire domain), and 30msec on the trains world
problems described below.

In our first set of tests, we relied on the search strategy used as default
in ucpop. The function used for A* plan selection was thus S+OC+UC+F (allowing
for problems that involve "facts"), and the goals were selected from the
agenda according to a pure LIFO discipline.15

14. In the current implementation new threats are filtered only when the
protected condition is established

by a step already in the plan. 15. In all experiments the *d-sep* switch
was on. The default delay-separation strategy for selecting unsafe

conditions was slightly modified in the version of ucpop using parameter
domains. In particular, the

123

Gerevini & Schubert We began by experimenting with a variety of problems
from ucpop's test suite, comparing performance with and without the use of
parameter domains. While relatively easy problems such as Sussman-anomaly,
Fixa, Test-ferry, and Tower-invert4 showed no improvement through the use
of parameter domains, most problems - particularly the harder ones - were
solved more easily with parameter domains. For example, the Rat-insulin problem
from the Molgen domain was solved nearly twice as fast, and some strips-world
problems (Move-boxes and variants)16 and Towers of Hanoi (T-of-H1) were solved
about 10 times as fast. Note that the strips-world problems involve both
facts and universally quantified conditional effects. Two problems from the
office world, Office5 and Office6, which we knew to be readily solvable with
our improved search strategy, remained difficult (in the case of Office6,
unsolvable) with the default ucpop strategy, despite the use of parameter
domains.17 Further experiments revealed that the source of this inefficiency
was the default plan-selection strategy of ucpop. In fact, using our S+OC+F
strategy instead of S+OC+UC+F, without parameter domains Office5 and Office6
were solved generating/exploring 3058/2175 and 8770/6940 plans respectively;
while using the parameter domains the plans numbered 1531/1055 and 2954/2204
respectively.

These initial experiments suggested to us that the most promising application
of computed parameter domains would be for nontrivial problems that involved
a variety of types of entities and relationships, and significant amounts
of goal chaining (i.e., with each successive action establishing preconditions
for the next). From this perspective, the trains world struck us as a natural
choice for further experimentation, with the additional advantage that its
design was independently motivated by research at Rochester into mixed-initiative
problem solving through natural-language interaction. (Refer again to the
formalization in Online Appendix 2.) Recall from Table X that the Trains1
problem was extremely hard for unmodified ucpop, requiring about 50 minutes
and generating over a million plans.

Running the same problem with parameter domains produced a solution in 3.3
seconds (with 1207 plans generated), i.e., 927 times faster.

Intuitively, the use of parameter domains to constrain planning is analogous
to using type constraints on the parameters (although parameter domains also
take account of initial conditions). So it is of interest to see whether
adding type constraints can provide similar efficiency gains as the use of
parameter domains. Our first set of experiments therefore included T-Trains1,
a "typed" version of Trains1; the operators have been slightly changed by
adding new preconditions stating the types of the parameters involved. For
example, the operator uncouple has been augmented with the preconditions
(engine ?eng) and (car ?car). This problem was also extremely hard for the
unmodified ucpop, exceeding the search limit of 1,000,000 plans generated
and requiring more than 2600 seconds. With parameter domains, the solution
was obtained in one second.

threats that can be resolved by separation and which are recognized to be
redundant through the use of parameter domains were selected to be eliminated.
16. Move-boxes-2 differs slightly from the Move-boxes problem in the ucpop
suite, in that its goal is (in-room

box2 rclk); Move-boxes-a differs slightly from the Move-boxes-2, in that
its initial state contains two boxes. 17. Office5 is directly from ucpop's
test suite and Office6 is minor variant of Office5. In Office5, all persons

are to be furnished with checks made out to them, using a check printer at
the office and a briefcase for picking up the checks and bringing them home.
"Sam" and "Sue" are the given persons, and in Office6 we have added (person
Alan) and (person Smith) in the initial conditions.

124

Accelerating Partial-Order Planners Problems without domains with domains
Domain

Plans CPU sec Plans CPU sec ratio

Trains1 1,071,479/432,881 3050.15 1207/824 3.29 0.425 T-Trains1 ? 1,000,000
? 2335 404/296 0.98 0.425 Move-boxes 608,231/167,418 1024.04 5746/3253 18.8
0.705 Move-boxes-1 ? 1,000,000 ? 6165 1264/645 3.59 0.705 Move-boxes-2 13,816/3927
45.05 1175/587 2.66 0.705 Move-boxes-a 13,805/3918 46.11 1175/587 2.54 0.702

T-of-H1 160,911/107,649 204.51

Lambda  17,603/12,250 37.5 0.722

Rat-insulin 364/262 0.36 196/129 0.19 0.714 Monkey-test1 96/62 0.12 75/46
0.11 0.733 Monkey-test2 415/262 0.61 247/149 0.50 0.529

Fix3 3395/2070 5.77 3103/1983 6.02 0.532 Office5 809,345/500,578 1927.4 575,224/358,523
1556.8 0.625 Office6 ? 1,000,000 ? 2730 ? 1,000,000 ? 2640 0.667 Tower-invert4
806/538 1.55 806/538 1.59 0.733 Sussman-anomaly 44/26 0.05 44/26 0.06 0.917

Fixa 2131/1903 2.2 2131/1903 2.34 1 Test-ferry 718/457 0.65 718/457 0.71
1

Table XII: Plans generated/visited and CPU time (secs) for standard ucpop
with and

without parameter domains. (Lambda  This result was obtained on a sun 10
with Lucid Common Lisp; the others on a sun 20 with Allegro Common Lisp.)

These results indicate that adding type constraints to operator specifications
is not nearly as effective as the use of parameter domains in boosting planning
efficiency. We discuss this point further in the context of the second set
of tests (below).

Table XII summarizes the experimental results for all of the experiments
that used the default ucpop search strategy. The table gives the number of
plans generated/visited by the planner and the CPU time (seconds) required
to solve the problems.18 Note that the use of the parameter domains gave
very dramatic improvements not only in the trains domain, but also in the
strips-world domain. The rightmost column supplies "domain ratio" data, as
a metric that we hoped would predict the likely effectiveness of using parameter
domains. The idea is that parameter domains should be effective to the extent
that they filter out many parameter bindings that can be reached by chaining
back from individual preconditions of an operator to the initial state. These
bindings can be found by using a variant of the algorithm for propagating
intersected domains that instead propagates unions of individual domains,
and comparing these union domains to the intersected domains.19

18. The systems were compiled under Allegro CL 4.2, with settings (space
0) (speed 3) (safety 1) (debug

0), and run on a sun 20. The CPU time includes the Lisp garbage collection
(it is the time given in the output by ucpop). 19. Actually, we do not need
to explicitly propagate union domains, but can propagate (partial) bindings
for

one predication at a time, starting with the initial conditions. We match
the predication to all possible preconditions, adding the constant arguments
it contains to the union domains of the matched operator

125

Gerevini & Schubert trains without domains with domains Domain problems Plans
CPU sec Plans CPU sec ratio

Trains1 4097/2019 13.7 297/238 1.4 0.425 Trains2 17,482/10,907 80.6 1312/1065
7.16 0.425 Trains3 31,957/19,282 189.8 3885/3175 25.1 0.411

Table XIII: Plans generated/visited and CPU time (secs) for ucpop with and
without

parameter domains in the trains domain using the ZLIFO strategy.

trains without domains with domains Domain problems Plans CPU sec Plans CPU
sec ratio

Trains1 1093/597 8.1 265/194 2.3 0.425 Trains2 ?50,000 ?607 ?50,000 ?534
0.425 Trains3 ?50,000 ?655 ?50,000 ?564 0.411

Table XIV: Plans generated/visited and CPU time (secs) for ucpop with and
without

parameter domains in the trains domain using the LCFR strategy.

The "domain ratio" provides this comparison, dividing the average union domain
size by the average intersected domain size, with averages taken over all
parameters of all when-clauses of all operators.

The largest speedups (e.g., for the trains problems) do tend to correlate
with the smallest domain ratios, and the smallest speedups with the largest
domain ratio (unity - see the last few rows). However, it can be seen from
the table that the problem difficulty (as measured by plans or CPU time)
is much more useful than the domain ratio as a predictor of speedups to be
expected when using parameter domains. Problems that generate on the order
of a million plans or more with standard ucpop tend to produce speedups by
3 orders of magnitude, whereas the domain ratio for some of these problems
(e.g., Move-boxes-1) is no better (or even worse) than for problems with
much smaller speedups (e.g., Move-boxesa, Rat-insulin, Monkey-test1, Monkey-test2).
The much lower difficulty of these problems predicts their reduced speedup.
But to complicate matters, not all difficult problems give high speedups
(see T-of-H1 and especially Office5); we do not know what subtleties of problem
structure account for these unusual cases.

In our second round of experiments, we tested the effectiveness of the parameter
domain technique in combination with our improved search strategy, i.e.,
S+OC/ZLIFO. In addition, we combined S+OC with LCFR (least cost flaw selection)
(Joslin & Pollack, 1994), so

(or when-clause). We then find corresponding (partially bound) effects, and
add any new effects to the list of predications still to be propagated. A
partially bound effect such as (P A ?x ?y) is new if there is no identical
or similar predication such as (P A ?u ?v) among the previously propagated
predications or among those still to be propagated.

126

Accelerating Partial-Order Planners t-trains without domains with domains
Domain

problems Plans CPU sec Plans CPU sec ratio

T-Trains1 3134/2183 17.2 505/416 3.4 0.425 T-Trains2 5739/4325 37.3 3482/2749
27.3 0.425 T-Trains3 17,931/13,134 130.4 11,962/9401 105.1 0.425

Table XV: Plans generated/visited and CPU time (secs) for ucpop with and
without parameter domains in the "typed" trains domain using the ZLIFO strategy.

t-trains without domains with domains Domain

problems Plans CPU sec Plans CPU sec ratio

T-Trains1 3138/2412 31.5 1429/1157 14.5 0.425 T-Trains2 ?50,000 ?1035 ?50,000
?1136 0.425 T-Trains3 ?50,000 ?976 ?50,000 ?962 0.425

Table XVI: Plans generated/visited and CPU time (secs) for ucpop with and
without

parameter domains in the "typed" trains domain using the LCFR strategy.

as to test for possible sensitivity of the parameter-domains technique to
the precise strategy used. For the present set of tests we used a search
limit of 50,000 plans generated.

Once again we began by sampling some problems from the ucpop test suite,
and these initial trials yielded results quite analogous to those for the
default ucpop strategy. We obtained no improvements for several easier problems
and significant improvements for harder ones (e.g., again close to a factor
of 2 for Rat-insulin). Noteworthy members of the latter category were Office5
and Office6 - recall that Office5 had shown little speedup with standard
ucpop and Office6 had been unsolvable. However, in view of the computational
expense of testing both ZLIFO and LCFR, we then decided to narrow our focus
to the trains world. As mentioned, the advantages of this world are its inherent
interest and relative complexity.

Tables XIII-XVI provide experimental results for the trains domain with the
S+OC/ ZLIFO strategy and the S+OC/LCFR strategy, in each case with and without
parameter domains.

The results in Tables XIII and XIV show that using parameter domains can
still give very significant improvements in performance, over and above those
obtained through the use of better search strategies. For example, the use
of parameter domains provided an 11-fold speedup for Trains2, for the S+OC/ZLIFO
strategy. In this particular problem the speedup (on all metrics) was the
result of pruning 1482 plans (more than half of those generated) during the
search., and recognizing 305 unsafe conditions as redundant. Evidently, the
effect of this pruning is amplified by an order of magnitude in the overall
performance, because of the futile searches that are cut short. Note that
the speedups for Trains1-3 are

127

Gerevini & Schubert roughly comparable (within a factor of 2) to those obtained
for problems in the previous set with comparable initial difficulty (e.g.,
see Move-boxes-2 and Move-boxes-a in Table XII). This again points to a rather
consistent correlation between problem difficulty and speedups obtainable
using parameter domains. The constant domain ratios are also compatible with
the more or less invariant speedups here, though this is of little import,
given the earlier results. For S+OC/LCFR the gains appear to be less, though
the single result showing a 3.5-fold speedup provides only anecdotal evidence
for such a conclusion. Trains2 and Trains3 remained too difficult for solution
by LCFR. Similar gains were observed for the S+OC/LC strategies where the
best observed gain in the Trains domain was a 1.7-fold speedup for Trains2.
In any case, all results confirm the effectiveness of the parameterdomains
technique.

Tables XV and XVI are again for the "typed" version of trains. In this case
parameter typing gave modest improvements in the absence of parameter-domains,
and (in contrast with the results for Trains1 under the default search strategy)
significant deterioration in their presence. While we do not know how to
account for these results in detail, it seems clear that contrary effects
are involved. On the one hand, typing does tend to help in that it tends
to limit choices of parameter values to "sensible" ones. For example, a precondition
(engine ?eng) will be satisfiable only through use of *start*, and the initial
state will thus constrain ?eng to assume sensible values. On the other hand,
adding type-preconditions will tend to broaden the search space, by adding
further open conditions to the flaw list.

The lesson from the "typed" experiments appears to be that it is best not
to supply explicit type constraints on operator parameters, instead using
our automated method of calculating and updating domains to constrain parameter
bindings.

6. Conclusions and Further Work We began by exploring some simple, domain-independent
improvements to search strategies in partial order planning, and then described
a method of using precomputed parameter domains to prune the search space.
We now summarize our conclusions about these techniques and then point to
promising directions for further work.

6.1 Improving Search Our proposed improvements to search strategies were
based on the one hand on a carefully considered choice of terms in the A*
heuristic for plan selection, and on the other on a preference for choosing
open conditions that cannot be achieved at all or can be achieved in only
one way (with a default LIFO prioritization of other open conditions). Since
the plan refinements corresponding to uniquely achievable goals are logically
necessary, we have termed the latter strategy a zero-commitment strategy.
One advantage of this technique over other similar strategies is that it
incurs a lower computational overhead.

Our experiments based on modifications of ucpop indicate that our strategies
can give large improvements in planning performance, especially for problems
that are hard for ucpop (and its "relatives") to begin with. The best performance
was achieved when our strategies for plan selection and goal selection were
used in combination. In practical terms, we were able to solve nearly every
problem we tried from the ucpop test suite in a fraction of a second (except
for Fixit, which required 38.2 seconds), where some of these problems

128

Accelerating Partial-Order Planners previously required minutes or were unsolvable
on the same machine. This included a sufficient variety of problems to indicate
that our techniques are of broad potential utility.

Further, our results suggest that zero-commitment is best supplemented with
a LIFO strategy for open conditions achievable in multiple ways, rather than
a generalization of zero-commitment favoring goals with the fewest children.
This somewhat surprising result might be thought to be due to the way in
which the designer of a domain orders the preconditions of operators; i.e.,
the "natural" ordering of preconditions may correlate with the best planning
order, giving a fortuitous advantage to a LIFO strategy relative to a strategy
like LC.20

However, some preliminary experiments we performed with randomized preconditions
for T-of-H1 and Trains1 indicate otherwise. In 5 randomizations of the preconditions
of T-of-H1, both LC and ZLIFO were slowed down somewhat, by average factors
of 2.2 (2) and 3.3 (4.2) in terms of plans expanded (CPU time used) respectively.
(In both cases, S+OC was used for plan search.) This still left ZLIFO with
a performance advantage of a factor of 22 in terms of plans created and 39
in terms of CPU time. For Trains1 the performance of LC greatly deteriorated
in 2 out of 5 cases (by a factor close to 70 in terms of both plans and time),
while that of ZLIFO actually improved marginally. This now left ZLIFO with
an average performance advantage over LC (whereas it had been slightly slower
in the unrandomized case) - a factor of 3.3 in terms of plans and 6.7 in
terms of CPU time (though these values are very unreliable, in view of the
fact that the standard deviations are of the same order as the means).

Despite these results we believe that a satisfactory understanding of the
dependence of flaw-selection strategies on the order of operator preconditions
will require a more extensive experimental investigation. We are currently
undertaking this work.

6.2 Using Parameter Domains We described an implemented, tractable algorithm
for precomputing parameter domains of planning operators, relative to given
initial conditions. We showed how to use the precomputed domains during the
planning process to prune non-viable actions and bogus threats, and how to
update them dynamically for maximum effect.

The idea of using precomputed parameter domains to constrain planning was
apparently first proposed in a technical report by Goldszmidt et al. (1994).
This contains the essential idea of accumulating domains by forward propagation
from the initial conditions. Though the report only sketches a single-sweep
propagation process from the initial conditions to the goals, the implemented
Rockwell Planner (RNLP) handles cyclic operator graphs, repeatedly propagating
bindings until quiescence, much as in our algorithm. Our algorithm deals
with the additional complexities of conditional effects and equalities (and
in semiautomated fashion with quantification) and appears to be more efficient
(Smith, 1996). Other distinctive features of our work are the method of incrementally
refining domains

20. This was suggested to us by David Smith as well as Mike Williamson. Williamson
tried ZLIFO with 5

randomized versions of T-of-H1, and reported a large performance degradation
(Williamson & Hanks, 1996). We recently ran these versions using our implementation,
obtaining far more favorable results (three of the five versions were easier
to solve than the original version of T-of-H1, while the other two versions
slowed down ZLIFO by a factor of 1.84 and 4.86 in terms of plans explored.)

129

Gerevini & Schubert during planning, the theoretical analysis of our algorithm,
and the systematic experimental tests.

Another closely related study is that of Yang and Chan (1994), who used hand-supplied
parameter domains in planning much as we use precomputed domains. An interesting
aspect of their work is the direct use of sets of constants as variable bindings.
For instance, in establishing a precondition (P ?x) using an initial state
containing (P a), (P b) and (P c), they would bind ?x to fa, b, cg rather
than to a specific constant. They refine these "noncommittal" bindings during
planning much as we refine variable domains, and periodically use constraint
satisfaction methods to check their consistency with current EQ/NEQ constraints.
They conclude that delaying variable bindings works best for problems with
low solution densities (while degrading performance for some problems with
high solution densities), and that the optimal frequency of making consistency
checks depends on whether dead ends tend to occur high or low in the search
tree. Our work is distinguished from theirs by our method of precomputing
parameter domains, our use of specific bindings when matching initial conditions
to OCs, our use of parameter domains in threat detection and resolution,
and our handling of the enriched syntax of ucpop operators as compared snlp
operators.

Judging from the examples we have experimented with, our techniques are well-suited
to nontrivial problems that involve diverse types of objects, relations and
actions, and significant logical interdependencies among the steps needed
to solve a problem. When used in conjunction with the default search strategy
of ucpop, our method gave significant speedups for nontrivial problems, reaching
a speedup factor of 927 in the trains transportation planning domain, and
more than 1717 for the hardest strips-world problem we tried . When combined
with our S+OC and ZLIFO search strategies, the parameter domain technique
still gave speedups by a factor of around 10 for some trains problems. Though
our implementation is aimed at a ucpop-style planner, essentially the same
techniques would be applicable to many other planners.

We also found the parameter domain precomputations to be a very useful debugging
aid. In fact, the domain precomputation for our initial formulation of the
trains world immediately revealed several errors. For instance, the domain
of the ?eng parameter of mv-engine turned out to contain oranges, bananas,
and an OJ factory, indicating the need for a type constraint on ?eng. (Without
this, transportation problems would have been solvable without the benefit
of engines and trains!) Another immediately apparent problem was revealed
by the parameter domains for ?city1 and ?city2 in mv-engine: the domain for
?city1 excluded Elmira, and that for ?city2 excluded Avon. The obvious diagnosis
was that we had neglected to assert both (connected c1 c2) and (connected
c2 c1) for each track connecting two cities. Furthermore, the parameter domains
can quickly identify unreachable operators and goals in some cases. For instance,
without the make-oj operator, the computed domains show that the ld-oj operator
is unreachable, and that a goal like (and (oj ?oj) (at ?oj Bath)) (getting
some orange juice to Bath) is unattainable (the parameter domain for ?oj
will be empty).

Of course, running the planner itself can also be used for debugging a formalization,
but planning is in general far more time-consuming than our form of preprocessing
(especially if the goal we pose happens to be unachievable in the formalization!),
and the trace of

130

Accelerating Partial-Order Planners an anomalous planning attempt can be
quite hard to interpret, compared to a listing of parameter domains, obtained
in a fraction of a second.

6.3 Further work First of all, some additional experimentation would be of
interest, to further assess and perhaps refine our search strategies. Some
of this experimentation might focus on threathandling strategies, including
the best general form of an attenuated UC-term in plan selection, and the
best way to combine threat selection with open condition selection. The preference
for definite threats over open conditions used by ZLIFO does appear to be
a good default according to our experience, but the TileWorld experiments
indicated that a re-ordering of priorities between threats and open conditions
is sometimes desirable. Concerning the choice of a UC-related term for inclusion
in the heuristic for plan selection, we should mention that we have briefly
tried using S+OC+UCd, where UCd is the number of definite threats, but did
not obtain significant uniform improvements.

One promising direction for further development of our search strategy is
to make the zero-commitment strategy apply more often by finding ways of
identifying false options as early as possible. That is, if a possible action
instance (obtained by matching an open condition against available operators
as well as against existing actions) is easily recognizable as inconsistent
with the current plan, then its elimination may leave us with a single remaining
match and hence an opportunity to apply the zero-commitment strategy.

One way of implementing this strategy would be to check at once, before accepting
a matched action as a possible way to attain an open condition, whether the
temporal constraints on that action force it to violate a causal link, or
alternatively, force its causal link to be violated. In that case the action
could immediately be eliminated, perhaps leaving only one (or even no) alternative.
This could perhaps be made even more effective by broadening the definition
of threats so that preconditions as well as effects of actions can threaten
causal links, and hence bring to light inconsistencies sooner. Note that
if a precondition of an action is inconsistent with a causal link, it will
have to be established with another action whose effects violate the causal
link; so the precondition really poses a threat from the outset.

Two possible extensions to our parameter domain techniques are (i) fully
automated handling of universally quantified preconditions and effects, disjunctions
and facts in the preprocessing algorithm; and (ii) more "intelligent" calculation
of domains, by applying a constraint propagation process to the sets of ground
predications that have been matched to the preconditions of an operator;
this can be shown to yield tighter domains, though at some computational
expense. Blum and Furst (1995) recently explored a similar idea, but rather
than computing parameter domains, they directly stored sets of ground atoms
that could be generated by one operator application (starting in the initial
state), two successive operator applications, and so on, and then used these
sets of atoms (and exclusivity relations among the atoms and the actions
connecting them) to guide the regressive search for a plan. The algorithm
they describe does not allow for conditional effects, though this generalization
appears entirely possible. For the examples used in their tests, they obtained
dramatic speedups.

131

Gerevini & Schubert Finally, we are also working on another preprocessing
technique, namely the inference of state constraints from operator specifications.
One useful form of constraint is implicational (e.g., (implies (on ?x ?y)
(not (clear ?y)))), and another is single-valuedness conditions (e.g., (on
?x ?y) may be single-valued in both ?x and ?y). We conjecture that such constraints
can be tractably inferred and used for further large speedups in domainindependent,
well-founded planning.

In view of the results we have presented and the possibilities for further
speedups we have mentioned, we think it plausible that well-founded, domain-independent
planners may yet become competitive with more pragmatically designed planners.

Acknowledgements This work amalgamates and extends two conference papers
on improving search (Schubert & Gerevini, 1995) and using computed parameter
domains (Gerevini & Schubert, 1996) to accelerate partial-order planners.
The research was supported in part by Rome Lab contract F30602-91-C-0010
and NATO Collaborative Research Grant CRG951285. Some of the work by AG was
carried out at IRST, 38050 Povo (TN), Italy, and at the CS Department of
the University of Rochester, Rochester NY USA. The helpful comments and perceptive
questions of Marc Friedman, David Joslin, Rao Kambhampati, Colm O'Riain,
Martha Pollack, David Smith, Dan Weld, Mike Williamson, and of Associate
Editor Michael Wellman and the anonymous reviewers are gratefully acknowledged.

Appendix A (Proofs of the Theorems) Theorem 1 The find-parameter-domains
algorithm is correct for computing parameter domains of ucpop-style sets
of operators (without quantification, disjunction, or facts), initial conditions,
and (possibly) goal conditions.

Proof. As a preliminary observation, the intersected parameter domains computed
iteratively by the algorithm eventually stabilize, since they grow monotonically
and there are only finitely many constants that occur in the initial conditions
and in operator effects. Thus the algorithm terminates.

In order to prove correctness we need to show that if there exists a valid
sequence A0A1:::An of actions (operator instances) starting with A0 = *start*,
and if An is an instance of the operator Op, then the bindings that the parameters
of Op received in instance An are eventually added to the relevant intersected
domains of Op (where "relevant" refers to the when-clauses of Op whose preconditions
are satisfied at the beginning of An). We prove this by induction on n.

If n = 0, then An = A0 = *start*, so there are no parameters and the claim
is trivially true.

Now assume that the claim holds for n = 1; 2; :::; k. Then consider any operator
instance Ak+1 that can validly follow A0A1:::Ak, i.e., such that Ak+1 is
an instance of an operator Op whose primary preconditions, possibly along
with some secondary ones, are satisfied at the end of A0A1:::Ak. Let p be
such a precondition, and write its instance in Ak+1 as (P c1 c2 ..). Then
(P c1 c2 ..) must be an effect of some Ai, where 0 ^ i ^ k. If i = 0

132

Accelerating Partial-Order Planners then (P c1 c2 ..) holds in the initial
state, and hence this predication is propagated and successfully matched
to p in the initial propagation phase of find-parameter-domains. If i ? 0,
then Ai is an instance of some operator Op' and (P c1 c2 ..) is the corresponding
instance of some effect (P t1 t2 ..) of Op', where each tj is either a parameter
of Op' or is equal to cj. Diagrammatically,

A0 . . . Ai . . . Ak Ak+1

j j Op' Op effect (P t1 t2 ..) Gamma Gamma Gamma Gamma ! precond p

(P c1 c2 .. ) (P c1 c2 .. )

By the induction assumption, the bindings of the parameters in Ai are eventually
added to the relevant intersected domains of Op'. This also implies that
the intersected domains of Op' become nonempty, and so the effect (P t1 t2
..) is eventually propagated, where any variables among the tj have the corresponding
constant cj in the relevant intersected domain. Consequently, much as in
the case i = 0, effect (P t1 t2 ..) is successfully matched to precondition
p of Op at some stage of the propagation. Given these observations, it is
clear that for both i = 0 and i ? 0, p will be marked "matched" in Op eventually,
and furthermore any parameters of Op that occur in p will have the bindings
resulting from the unification with (P c1 c2 ..) added to the appropriate
individual domains associated with p.

This argument applies to all preconditions of Op satisfied in its instance
Ak+1, in particular to all the primary preconditions. Since these are all
marked "matched", the algorithm will compute intersected domains for all
Op-parameters that occur in them. In view of the individual domain updates
just confirmed, and since individual domains grow monotonically, these intersected
domains will eventually contain the parameter bindings of Ak+1. For instance,
if a parameter ?x of Op occurs in a primary precondition and is bound to
c in Ak+1, we have shown that c will eventually be added to the intersected
domain of ?x associated with the primary when-clause of Op. If a parameter
does not occur in the primary preconditions of Op, then its intersected domain
is set to T at the outset, and this implicitly contains whatever binding
the parameter has in Ak+1.

A very similar argument can be made for any secondary when-clause of Op whose
preconditions are also satisfied for Ak+1. Again, all preconditions of the
secondary clause, as well as the primary preconditions, will be marked "matched",
and so for any parameter occurring in these combined preconditions, its intersected
domain (relative to the secondary clause) will be updated to include its
binding in Ak+1. For parameters of Op not occurring in any of these preconditions,
the intersected domains will again be set to T initially, and this implicitly
contains any possible binding. Finally, we note that since the intersected
domains relative to both primary and secondary when-clauses grow monotonically,
the augmentations of intersected domains we have just confirmed is permanent.
(In the case of T-domains, these remain T.)

We leave some additional details concerned with the ultimate use of EQ-preconditions
in find-parameter-domains to the reader. 2

133

Gerevini & Schubert Theorem 2 Algorithm find-parameter-domains can be implemented
to run in O(mnpne(np+ ne)) time and O(mnp) space in the worst case, where
m is the number of constants in the problem specification, np is the combined
number of preconditions for all operators (and goals, if included), and ne
is the combined number of operator effects (including those of *start*).

Proof. The time complexity of find-parameter-domains can be determined as
the sum of (1) the cost of all the unifications performed, (2) the costs
of all the individual domain updates attempted, and (3) the cost of all the
intersected domain updates attempted. We estimate an upper bound for each
of these terms under the following assumptions:

(a) the unification of any operator effect with any operator precondition
requires constant

time;

(b) there is a fixed upper bound on the number of arguments that a predicate
(in a

precondition or effect) can have. It follows that O(ne) is an upper bound
on the total number of intersected domains;21

(c) individual domains and intersected domains are stored in hash tables
(indexed by the

constants in the domain). So, we can check whether an element belongs to
a particular (individual or intersected) domain, and possibly add it to that
domain essentially in constant time. Furthermore for each individual and
intersected domain, appropriate data structures are used to keep track of
the (possibly empty) set of new elements that have been added to the domain
in the last update attempt.

(1) For any particular intersected domain of any particular operator, there
can be at most m updates of this domain. Each such update causes all of the
effects of the whenclause to which the intersected domain belongs to be propagated.
An upper bound on this number is ne. Each propagated effect may then be unified
with O(np) preconditions. Thus the O(m) updates of an intersected domain
may cause O(mnenp) unifications. Hence from (b), the overall number of unifications
caused by the propagation of intersected domains to individual domains is
O(mn2enp). To these unifications we have to add those which are initially
performed between the effects of *start* and the preconditions of the operators.
There are O(mnp) such unifications, and so they do not increase the previous
upper bound on the number of unifications. Thus, from (a), the cost of all
of the unifications performed by the algorithm is O(mn2enp).

(2) Each unification is potentially followed by an attempt to update the
individual domain(s) of the relevant parameter(s). However, with assumption
(c) the number of such attempts is limited to those where the set of new
elements in the intersected domain(s) of the unifying effect is (are) not
empty. Furthermore, when we attempt to update an individual domain DI by
performing the union of a relevant intersected domain Di and DI, only the
subset of the new elements of Di need to be added to DI (if they are not
already there). Thus, since any intersected domain grows monotonically, from
(b) and (c) we have that the overall cost of all the update attempts for
one particular individual domain caused

21. Note that if a parameter appears in a precondition of a when-clause,
but in none of its effects, then the

intersected domain of the parameter will not be propagated by the algorithm.
Hence in implementing the algorithm we can ignore such parameters.

134

Accelerating Partial-Order Planners by one particular effect is O(m). But
in the worst case one effect can unify with all the O(np) preconditions of
all the operators, yielding an overall bound on all of the attempts to update
the individual domains of O(mnenp).

(3) There can be an attempt to update a particular intersected domain for
each relevant individual domain update, and each relevant individual domain
can be updated O(m) times (because the domains grow monotonically). Therefore
from (b) there are at most O(mnp) attempts to update one intersected domain.
By (c) the total cost of these attempts is O(mn2p), because checking whether
a new element of an individual domain belongs to all the other O(np) relevant
individual domains takes O(np) time. So, since from (b) there are no more
than O(ne) intersected domains, the total cost incurred by the algorithm
for updating all of the intersected domains is O(mnen2p).

It follows that the time complexity of find-parameter-domains is:

O(mn2enp) + O(mnenp) + O(mnen2p) = O(mnpne(np + ne)). The space complexity
bound is easily derived from (b), and from the fact that for each when-clause
at most O(m) constants are stored. 2

References Allen, J., & Schubert, L. (1991). The TRAINS project. Tech. rep.
382, Dept. of Computer

Science, Univ. of Rochester, Rochester, NY. Also slightly revised as Language
and discourse in the TRAINS project, in A. Ortony, J. Slack, and O. Stock
(eds.), Communication from an Artificial Intelligence Perspective: Theoretical
Springer-Verlag, Heidelberg, pp. 91-120.

Allen, J., Schubert, L., Ferguson, G., Heeman, P., Hwang, C., Kato, T., Light,
M., Martin,

N., Miller, B., Poesio, M., & Traum, B. (1995). The TRAINS project: A case
study in building a conversational planning agent. Experimental and Theoretical
Artificial Intelligence, 7, 7-48.

Barrett, A., Golden, K., Penberthy, S., & Weld, D. (1994). UCPOP user's manual.
Tech.

rep. 93-09-06, Dept. of Computer Science and Engineering, University of Washington,
Seattle, WA 98105.

Blum, A., & Furst, M. (1995). Fast planning through planning graph analysis.
In Proceedings

of the Fourteenth International Joint Conference on Artificial Intelligence
(IJCAI-95), pp. 1636-1642 Montreal, CA. Morgan Kaufmann.

Chapman, D. (1987). Planning for conjunctive goals. Artificial Intelligence,
32 (3), 333-377. Currie, K., & Tate, A. (1991). O-Plan: The open planning
architecture. Artificial Intelligence, 51 (1).

Fikes, R., & Nilsson, N. (1971). STRIPS: A new approach to the application
of theorem

proving to problem solving. Artificial Intelligence, 2, 189-208.

Georgeff, M., & Lansky, A. (1987). Reactive reasoning and planning. In Proceedings
of the

Sixth National Conference of the American Association for Artificial Intelligence,
pp. 677-682 Seattle, WA. Morgan Kaufmann.

135

Gerevini & Schubert Gerevini, A., & Schubert, L.K. (1995). Computing parameter
domains as an aid to planning.

In Proc. of the 3rd Int. Conf. on Artificial Intelligence Planning Systems
(AIPS-96), pp. 94-101 Menlo Park, CA. The AAAI Press.

Goldszmidt, M., Darwiche, A., Chavez, T., Smith, D., & White, J. (1994).
Decision-theory

for crisis management. Tech. rep. RL-TR-94-235, Rome Laboratory.

Green, C. (1969). Application of theorem proving to problem solving. In Proceedings
of

the First International Joint Conference on Artificial Intelligence (IJCAI-69),
pp. 219-239.

Joslin, D. (1995). Personal communication. Joslin, D., & Pollack, M. (1994).
Least-cost flaw repair: a plan refinement strategy for

partial-order planning. In Proceedings of the Twelfth National Conference
of the American Association for Artificial Intelligence (AAAI-94), pp. 1004-1009
Seattle, WA. Morgan Kaufmann.

Kambhampati, S., Knoblock, C. A., & Yang, Q. (1995). Planning as refinement
search: A

unified framework for evaluating design tradeoff in partial-order planning.
Artificial Intelligence. Special Issue on Planning and Scheduling, 76 (1-2).

Korf, R. (1992). Linear-space best-first search: Summary of results. In Proceedings
of

the Tenth National Conference of the American Association for Artificial
Intelligence (AAAI-92), pp. 533-538.

McAllester, D., & Rosenblitt, D. (1991). Systematic nonlinear planning. In
Proceedings

of the Ninth National Conference on Artificial Intelligence (AAAI-91), pp.
634-639 Anheim, Los Angeles, CA. Morgan Kaufmann.

Nilsson, N. (1980). Principles of Artificial Intelligence. Tioga Pub. Co.,
Palo Alto, CA. Penberthy, J., & Weld, D. (1992). UCPOP: A sound, complete,
partial order planner

for ADL. In Nebel, B., Rich, C., & Swartout, W. (Eds.), Proceedings of the
Third International Conference on Principles of Knowledge Representation
and Reasoning (KR92), pp. 103-114 Boston, MA. Morgan Kaufmann.

Peot, M. A., & Smith, D. E. (1993). Threat-removal strategies for partial-order
planning.

In Proceedings of the Eleventh National Conference of the American Association
for Artificial Intelligence (AAAI-93), pp. 492-499 Washington, D.C. Morgan
Kaufmann.

Schubert, L., & Gerevini, A. (1995). Accelerating partial order planners
by improving

plan and goal choices. In Proc. of the 7th IEEE Int. Conf. on Tools with
Artificial Intelligence, pp. 442-450 Herndon, Virginia. IEEE Computer Society
Press.

Smith, D. E., & Peot, M. A. (1993). Postponing threats in partial-order planning.
In

Proceedings of the Eleventh National Conference of the American Association
for Artificial Intelligence (AAAI-93), pp. 500-506 Washington, D.C. Morgan
Kaufmann.

Smith, D. E. (1996). Personal communication.

136

Accelerating Partial-Order Planners Srinivasan, R., & Howe, A. (1995). Comparison
of methods for improving search efficiency

in a partial-order planner. In Proceedings of the Fourteenth International
Joint Conference on Artificial Intelligence (IJCAI-95), pp. 1620-1626.

Weld, D. (1994). An introduction to least commitment planning. AI Magazine,
15 (4),

27-62.

Wilkins, D. (1988). Practical Planning: Extending the Classical AI Planning
Paradigm.

Morgan Kaufmann, San Mateo, CA.

Williamson, M., & Hanks, S. (1995). Flaw selection strategies for value-directed
planning. In

Proceedings of the Third International Conference on Artificial Intelligence
Planning Systems, pp. 237-244.

Yang, Q., & Chan, A.Y.M. (1994). Delaying variable binding commitments in
planning. In

Proceedings of the Second International Conference on Artificial Intelligence
Planning Systems, pp. 182-187.

137