### An Approach to Temporal Planning and Scheduling in Domains with Predictable Exogenous Events

A. Gerevini, A. Saetti and I. Serina

Volume 25, 2006

Links to Full Text:

Abstract:
The treatment of exogenous events in planning is practically important in many real-world domains where the preconditions of certain plan actions are affected by such events. In this paper we focus on planning in temporal domains with exogenous events that happen at known times, imposing the constraint that certain actions in the plan must be executed during some predefined time windows. When actions have durations, handling such temporal constraints adds an extra difficulty to planning. We propose an approach to planning in these domains which integrates constraint-based temporal reasoning into a graph-based planning framework using local search. Our techniques are implemented in a planner that took part in the 4th International Planning Competition (IPC-4). A statistical analysis of the results of IPC-4 demonstrates the effectiveness of our approach in terms of both CPU-time and plan quality. Additional experiments show the good performance of the temporal reasoning techniques integrated into our planner.

Extracted Text

Journal of Artificial Intelligence Research 25 {2006} 187-231 Submitted 03/05;
published 02/06An Approach to Temporal Planning and Scheduling in
Domains with Predictable Exogenous Events
Alfonso Gerevini gerevini@ing.unibs.it
Alessandro Saetti saetti@ing.unibs.it Ivan Serina serina@ing.unibs.it Dipartimento
di Elettronica per l'Automazione
Universit022a degli Studi di Brescia
Via Branze 38, I-25123 Brescia, Italy
Abstract
The treatment of exogenous events in planning is practically important in
many real-
worlddomains where the preconditions of certain plan actions are affected
by such events.
Inthis paper we focus on planning in temporal domains with exogenous events
that happen
at known times, imposing the constraint that certain actions in the plan
must be executed
during some predefined time windows. When actions have durations, handling
such tem-
poral constraints adds an extra difficulty to planning. We propose an approach
to planning
in these domains which integrates constraint-based temporal reasoning into
a graph-based
planning framework using local search. Our techniques are implemented in
a planner that
took part in the 4th International Planning Competition {IPC-4}. A statistical
analysis
of the results of IPC-4 demonstrates the effectiveness of our approach in
terms of both CPU-time and plan quality. Additional experiments show the
good performance of the
temporal reasoning techniques integrated into our planner.
1. Introduction
In many real-world planning domains, the execution of certain actions can
only occur during
some predefined time windows where one or more necessary conditions hold.
For instance,
a car can be refueled at a gas station only when the gas station is open,
or a space telescope
can take a picture of a certain planet region only when this region is observable.
The truth
of these conditions is determined by some exogenous events that happen at
known times,
and that cannot be influenced by the actions available to the planning agent
{e.g., the closing of the gas station or the planet movement}.
Several frameworks supporting action durations and time windows have been
proposed
{e.g., Vere, 1983; Muscettola, 1994; Laborie & Ghallab, 1995; Schwartz &
Pollack, 2004; Kavuluri & U, 2004; Sanchez, Tang, & Mali, 2004}. However,
most of them are domain-
dependent systems or are not fast enough on large-scale problems. In this
paper, we propose a new approach to planning with these temporal features,
integratingconstraint-based temporal reasoning into a graph-based planning
framework.
The last two versions of the domain definition language of the International
plan-
ning competition {IPC} support action durations and predictable {deterministic}
exogenous
events {Fox & Long, 2003; Edelkamp & Hoffmann, 2004}. In PDDL2.1, predictable
exoge-
nous events can be implicitly represented {Fox, Long, & Halsey, 2004}, while
in PDDL2.2
they can be explicitly represented through timed initial literals, one of
the two new PDDLc
fl2006 AI Access Foundation. All rights reserved.Gerevini, Saetti & Serinafeatures
on which the 2004 competition {IPC-4} focused. Timed initial literals are
specified
in the description of the initial state of the planning problem through
assertions of the form
\{at t L}", where t is a real number, and L is a ground literal whose predicate
does not
appear in the effects of any domain action. The obvious meaning of {at t
L} is that L is
true from time t. A set of these assertions involving the same ground predicate
defines a
sequence of disjoint time windows over which the timed predicate holds.
An example in the
well-known \ZenoTravel" domain {Penberthy, 1993; Long & Fox, 2003a} is
{at 8 {open-fuelstation city1}) {at 12 {not {open-fuelstation city1})}
{at 15 {open-fuelstation city1}) {at 20 {not {open-fuelstation city1})}.
These assertions define two time windows over which {open-fuelstation city1}
is true,
i.e., from 8 to 12 {excluded} and from 15 to 20 {excluded}. A timed initial
literal is relevant to the planning process when it is a precondition of
a domain action, which we call a timed
pr e c ondition of the action. Each timed precondition of an action can
be seen as a temporal scheduling constraint for the action, defining the
feasible time window{s} when the action
can be executed. When actions in a plan have durations and timed preconditions,
computing
a valid plan requires planning and reasoning about time to be integrated,
in order to check whether the execution of the planned actions can satisfy
their scheduling constraints. If an
action in the plan cannot be scheduled, then the plan is not valid and it
must be revised.
The main contributions of this work are: {i} a new representation of temporal
plans with action durations and timed preconditions, called Temp or al ly-Disjunctive
Action Graph,
{TDA-graph} integrating disjunctive constraint-based temporal reasoning
into a recent
graph-based approach to planning; {ii} a polynomial method for solving the
disjunctive tem- poral reasoning problems that arise in this context; {iii}some
new local search techniques
to guide the planningprocess using our representation; and{iv} an experimental
analysis
evaluating the performance of our methods implemented in a planner called
lpg-td, which
took part in IPC-4 showing very good performance in many benchmark problems.
The \td" extension in the name of our planner is an abbreviation of \timed
initial literals and derived predicates", the two main new features of PDDL2.2.
1 In lpg-td, the techniques for handling timed initial literals are quite
different from the techniques for handling derived
predicates. The first ones concern representing temporal plans with predictable
exogenous
events and fast temporal reasoning for action scheduling during planning;
the second ones
concern incorporating a rule-based inference system for efficient reasoning
predicates during planning. Both timed initial literals and derived predicates
require to
change the heuristics guiding the search of the planner, but in a radically
different way. In
this paper, we focus on timed initial literals, which are by themselves
a significant and useful
extension to PDDL2.1. Moreover, ananalysis of the results of IPC-4 shows
that lpg-td was
top performer in the benchmark problems involving this feature. The treatment
of derived
predicates in lpg-td is presented in another recent paper {Gerevini et al.,
2005b}.1. Derived predicates allow us to express in a concise and natural
way some indirect action effects. Infor- mally, they are predicates which
do not appear in the effect of any action, and their truth is determined
by some domain rules specified as part of the domain description.188An Approach
to Temporal Planning and SchedulingThe paper is organized as follows. In
Section 2, after some necessary background, we
introduce the TDA-graph representation and a method for solving the disjunctive
temporal
reasoning problems that arise in our context. In Section 3, we describe
some new local
search heuristics for planning in the space of TDA-graphs. In Section 4,
we present the experimental analysis illustrating the efficiency of our approach.
In Section 5, we discuss
some related work. Finally, in Section 6 we give the conclusions.
2. Temporally Disjunctive Action Graph
Like in partial-order causal-link planning, {e.g., Penberthy & Weld, 1992;
McAllester & Rosenblitt, 1991; Nguyen& Kambhampati, 2001}, inour framework
we search in a space
of partial plans. Each search state is a partial temporal plan that we represent
by a
Temporally-Disjunctive Action Graph {TDA-graph}. A TDA-graph is an extension
of the linear action graph representation {Gerevini, Saetti, & Serina, 2003}
which integrates dis- junctive temporal constraints for handling timed initial
literals. A linear action graph is
a variant of the well-known planning graph {Blum & Furst, 1997}. In this
section, after
some necessary background on linear action graphs and disjunctive temporal
constraints,
we introduce TDA-graphs, and we propose some techniques for temporal reasoning
in the
context of this representation that will be used in the next section.
2.1 Background: Linear Action Graph and Disjunctive Temporal Constraints
A linear action graph {LA-graph} A for a planningproblem 005 is a directed
acyclic leveled
graph alternating a fact level, and an action level. Fact levels contain
fact nodes, each of
which is labeled by a ground predicate of 005. Each fact node f at a level
l is associated
with a no-op action node at level l representing a dummy action having the
predicate of f
as its only precondition and effect. Each action level contains one action
node labeled by
the name of a domain action that it represents, and the no-op nodes corresponding
to that level.
An action node labeled a at a level l is connected by incoming edges from
the fact nodes at level l representing the preconditions of a {pre c ondition
nodes}, andby outgoing edges to the fact nodes at level l + 1 representing
the effects of a {effect nodes}. The initial level
contains the special action node a
start , and the last level the special action node a
end
. The
effect nodes of a start
represent the positive facts of the initialstate of 005, and the precondition
nodes of a end
the goals of 005. A pair of action nodes {possibly no-op nodes} can be
constrained by a persistent mutex
relation {Fox & Long, 2003}, i.e., a mutually exclusive relation holding
at every level of the
graph, imposing that the involved actions can never occur in parallel in
a valid plan. Such
relations can be efficiently precomputed using an algorithm that we proposed
in a previous work {Gerevini et al., 2003}. An LA-graph A also contains a
set of ordering constr aints between actions in the {par-
tial} plan represented by the graph. These constraints are {i} constraints
imposed during
search to deal with mutually exclusive actions: if an action a at level
l of A is mutex with
an action node b at a level after l, then a is constrained to finish before
the start of b; {ii}
constraints between actions implied by the causal structure of the plan:
if an action a is189Gerevini, Saetti & Serinaused to achieve a precondition
of an action b, then a is constrained to finish before the start
of b.
The effects of an action node can be automatically propagated to the next
levels of
the graph through the corresponding no-ops, until there is an interfering
{mutex} action
\blocking" the propagation, or the last level of the graph has been reached
{Gerevini et al.,
2003}. In the rest of the paper, we assume that the LA-graph incorporates
this propagation. A Disjunctive Temp or al Problem {DTP} {Stergiou & Koubarakis,
2000; Tsamardinos
& Pollack, 2003} is a pair hP ; C i, where P is a set of time point variables,
C is a set of disjunctive constraints c 1
_ 001 001 001 _ c
n
, c i
is of form y i
000 x
i
024 k
i , x
i
and y
i
are in P , and k
i
is a real number {i = 1:::n}. When C contains only unary constraints, the
DTP is called Simple
Temp or al Problem {STP} {Dechter, Meiri,& Pearl, 1991}.
A DTP is consistent if and only if the DTP has a solution. A solution of
a DTP is an assignment of real values to the variables of the DTP that is
consistent with every constraint in the DTP. Computing a solution for a DTP
is an NP-hard problem {Dechter et al., 1991},
while computing a solution of an STP can be accomplished in polynomial time.
Given
an STP with a special \start time" variable s preceding all the others,
we can compute a
solution of the STP where each variable has the shortest possible distance
from s in O{n 001 c}
time, for n variables and c constraints in the STP {Dechter et al., 1991;
Gerevini & Cristani,
1997}. We call such a solution an optimal solution of the STP. Clearly,
a DTP is consistent if and only if we can choose from each constraint in
the DTP a disjunct obtaining a consistent
STP, and any solution of such an STP is also a solution of the original
DTP.
Finally , an STP is consistent if and only if the distance graph of the
STP does not
contain negative cycles {Dechter et al., 1991}. The distance graph of an
STP hP ; C i is a
directed labeled graph with a vertex labeled p for each p 2 P , and with
an edge from v 2 P to w 2 P labeled k foreach constraint w 000 v 024 k
2C . 2.2 Augmenting the LA-graph with Disjunctive Temporal Constraints Let
p be a timed precondition over a set W {p} of time windows. In the following,
x
000
and x
+ indicate the start time and end time of x, respectively, where x is either
a time window or an action. Moreover, a
l
indicates an action node at level l of the LA-graph under consideration.
For clarity of presentation, we will describe our techniques focusing on
action preconditions
that must hold during the whole execution of the action {except at the end
point of the
action}, and on operator effects that hold at the end of the action execution,
i.e., on PDDL
conditions of type \over all", and PDDL effects of type \at end" {Fox
& Long, 2003}.
2 In order to represent plans where actions have durationsand time windows
for their execution, we augment the ordering constraints of an LA-graph with
{i} action duration
c onstr aints and {ii} action scheduling constr aints. Duration constraints
have form a
+
000 a
000
= Dur{a};
where Dur{a} denotes the duration of an action a {for the special actions
a
start
and a end
,
we have Dur{a
start
} = Dur{a
end } = 0, sincea
000 start
= a
+
start
and a 000
end
= a
+
end
}. Duration
constraints are supported by the representation proposed in a previous work
{Gerevini2. Our methods and planner support all the types of operator condition
and effect that can be specified in
PDDL 2.1 and 2.2.190An Approach to Temporal Planning and Schedulinga
endp 6{000}{0}p
1p 1p 1p 1p 1{0}p
2a
1p 5p 5p 5a 3{50}{50}{50}[15]{90}Goal levelLevel 1Level 2Level 3{0}a startmutexpp
7p
8p
9p
9p
9a
2p
4p
4p
4{0}{0}{70}{70}{70}{70}p
8p
8{70}{70}{70}{0}[70]{0}p
9[50]{0}{75}{70}p 3mutexp
3p
3p
3{0}{0}{90}{000}p
10a
30a
1a
225507590a
starta
endp125pFigure 1:An example of LA-graph with nodes labeled by T -values
{in round brackets},
and the Gantt chart of the actions labeling the nodes of the LA-graph. Square
nodes are action nodes; circle nodes are fact nodes. Action nodes are also
marked
by the duration of the represented actions {in square brackets}. Unsupported
precondition nodes are labeled \({}". Dashed edges form chains of no-ops
blocked
by mutex actions. Grey areas in the Gantt chart represent the time windows
for
the timed precondition p of a
3
.et al., 2003}, while the representation and treatment of scheduling constraints
are a ma jor
contribution of this work.
Let 031 bethe plan represented by an LA-graph A. It is easy to see that
the set C formed
by the ordering constraints in A and the duration constraints of the actions
in 031 can be
encoded into an STP. For instance, if a
i
2031 is used to support a precondition node of a
j
,
then a
+
i
000 a
000
j
024 0 is in C ; ifa
i
and a j
are two mutex actions in 031, and a
i
is ordered before a
j ,
then a
+ i
000 a
000 j
024 0 is in C . Moreover, for every action a 2 031, the following STP-constraints
are in C : a
+
000 a
000
024 Dur{a}; a
000 000 a
+
024 000Dur{a};
which are equivalent to a
+
000 a
000
= Dur{a}. A scheduling constraint imposes the constraint
thatthe execution of an action must occur during the time windows associated
with a timed
precondition of the action. Syntactically, it is a disjunctive constraint
c 1
_ 001 001 001 _ c
n
, where c
i
isof the form {y
006
i 000 x
006
i
024 h
i }^ {v
006 i
000 u
006 i
024 k
i
};
u
006
i
; v
006
i
; x
006
i
; y
006
i
are action start times or action end times, and h i
; k
i 2 R . For every action
a 2 031 with a timed precondition p, the following disjunctive constraint
is added to C :191Gerevini, Saetti & Serina_
w2W {p}
000000
a
+
start
000 a
000
024 000w
000
001 ^
000
a
+
000 a
+ start
024 w
+
001001
:
3Definition 1A temporally disjunctive action graph {TDA-graph} is a 4-tuple
hA; T ; P ; C i where
* A is a linear action graph;
* T is an assignment of re al values to the nodes of A;
* P is the set of time point variables corr esp onding to the start times
and the end times of the actions labeling the action nodes of A;
* C is a set of ordering constr aints, duration constr aints and scheduling
constr aints
involving variables in P .
A TDA-graph hA; T ; P ; C i represents the {partial} plan formed by the
actions labeling
the action nodes of A with start times assigned by T . Figure 1 gives the
LA-graph and T -values of a simpleTDA-graph containing five action nodes
{a
start
; a 1
; a
2 ; a
3
; a end
} and
several fact nodes representing ten facts. The ordering constraints and
duration constraints
in C are:
4 a
+
1
000 a
000
3
024 0; a
+
2
000 a
000 3
024 0,
a +
1
000a 000
1
= 50; a
+
2
000 a
000
2 = 70; a
+ 3
000 a
000 3
= 15.
Assuming that p is a timed precondition of a 3
with windows [25; 50} and [75; 125}, theonly
scheduling constraint in C is:
{(a
+ start
000 a
000
3
024 00025} ^ {a
+ 3
000 a
+ start
024 50}) _ {(a
+
start
000 a
000
3
024 00075} ^ {a
+ 3
000 a
+ start
024 125}):
The pair hP ; C i defines a DTP D. 5
Let D
s be the set of scheduling constraints in D.
We have that D represents a set 002 of STPs, each of which consists of
the constraints in D 000 D
s
and one disjunct {pair of STP-constraints} for each disjunction in a subset
D 0
s
of
D
s
{D
0
s
022 D
s
}. We call a consistent STP in 002 an induce d STP of D. When an induced
STP contains a disjunct for every disjunction in D
s
{i.e., D
0
s
= D
s
}, we say that such a
{consistent} STP is a complete induce d STPof D.
The values assigned by T to the action nodes of A are the action start times
correspond- ing to an optimal solution of an induced STP. We call these start
times a schedule of the
actionsin A. The T value labeling a fact node f of A is the earliest time
t = T
a
+ Dur{a}3. Note that, if p is an over all timed condition of an action a,
then the end of a can be the time when an
exogenous event making p false happens, because in PDDL p is not required
to be true at the end of a
{Fox & Long, 2003}. 4. For brevity, in our examples we omit the constraints
a +
start
000 a
000
i
024 0 and a
+ i
000 a
000
end
024 0, for each action
a
i , as well as the duration constraints of a
start
and a end
, which have duration zero.
5. The disjunctive constraints in C are not exactly in DTP-form. However,
it is easy to see that every disjunctive constraint in C can be translated
into an equivalent conjunction of constraints in exact DTP- form. We use
our more compact notation for clarity and efficiency reasons.192An Approach
to Temporal Planning and Schedulingsuchthat a supports f in A, and a starts
at T
a
. If the induced STP from which we derive a
schedule is incomplete, then T may violate the scheduling constraint of
some action nodes, that we say are unschedule d in the current TDA-graph.
The following definitions present the notions of optimality for a complete
induced STP
and of optimal schedule, which will be used in the next section.Definition
2Givena DTP D with a point variable p, a complete induce d STP of D is an
optimal induced STP of D for p iff it has a solution assigning to p a value
that is less
than or equal to the value assigned to p by every solution of every other
complete induce d
STP of D.Definition 3Given a DTP D of a TDA-graph G , an optimal schedule
for the actions
inG is an optimal solution of an optimal induce d STP of D for a
000
end
.
Note that an optimal solution minimizes the makespan of the represented
{possibly partial} plan. The DTP D of the previous example {Figure 1} has
two inducedSTPs: one
with no time window for p {S
1
}, and one including the pair of STP-constraints imposing the
time window [75; 125} to p {S 2
}. The STP obtained by imposing the time window [25; 50}
to p is not an induced STP of the DTP, because it is not consistent. S
1
is a partial induced
STP of D, while S
2
is complete and optimal for the start time of a
end . The temporal values derived from the optimal solution of S
2
that are assigned by T to the action nodes of the TDA-graph are: a
000 start
= a
+
start
= 0, a
000
1
= 0, a
000
2
= 0, a
000
3
= 75, a
000
end
= a
+
end = 90.
2.3 Solving the DTP of a TDA-graph
In general, computing a complete induced STP of a DTP {if it exists} is
an NP-hard problem
that can be solved by a backtracking algorithm {Stergiou & Koubarakis, 2000;
Tsamardinos
& Pollack, 2003}. However, given the particular structure of the temporal
constraints forming a TDA-graph, we show that this task can be accomplished
in polynomial time with
a backtrack-free algorithm. Moreover, the algorithm computes an optimal
induced STP for
a
000
end .
In the following, we assume that each time window for a timed precondition
is no shorter
than the duration of its action {otherwise, the time window should be removed
from those
available for this precondition and, if no time window remains, then the
action cannot be used in any valid plan}. Moreover, withoutloss of generality,
we can assume that each
action has at most one timed precondition. It is easy to see that we can
always replace a
set of over all timed conditions of an action a with a single equivalent
timed precondition, whose time windows are obtained by intersecting the windows
forming the different original
timed conditions of a. Also a set of at start timed conditions and a set
of at end timed
conditions can be compiled into single equivalent timed preconditions. This
can be achieved
by translating these conditions into conditions of type over all. The idea
is similar to the
one presented by Edelkamp {2004}, with the difference that we can have more
than one
time window associated with a timed condition, while Edelkamp assumes that
each timed condition is associated with a unique time window. Specifically,
every at start timed
condition p of an action a can be translated into an equivalent timed condition
p
0
of type over all by replacing the scheduling constraint of p,193Gerevini,
Saetti & Serina04050100120150356080180pprxrDur{a}Dur{a}Dur{a}qFigure 2:An
example of a set of timed conditions compiled into a single timed precondi-
tion {x}. The solid boxes represent the time windows associated with the
timed
conditions p {of type at start}, q {of type at end}, and r {of type over
all} of
an action a. A solid box extended by a dashed box indicates the extension
of the time window in the translation of the corresponding timed condition
into an over all timed condition for a._
w2W {p}
000000 a
+
start 000 a
000
< 000w
000
001
^
000
a
000
000 a +
start
< w +
001001
; forcing a
000
to occur during one or more time windows, with
_
w2W {p}
000000
a
+
start
000 a
000
< 000w
000
001 ^
000
a +
000 a
+
start
< w +
+ Dur{a}
001001
: 6
Similarly, every at end timed condition p can be translated into an equivalent
over all
timed condition by replacing the scheduling constraint _
w2W {p}
000000
a +
start
000 a
+
< 000w
000
001
^
000
a
+
000 a
+ start
< w
+ 001001
;
forcing a
+
to occur during one or more time windows, with
_ w2W {p} 000000
a
+ start
000 a
000
< 000w
000
+ Dur{a}
001
^
000
a
+
000 a
+
start < w
+
001001 :
Clearly, this translation of the timed conditions of each domain action
into a single timed
precondition for the action can be accomplished by a preprocessing step
in polynomial time.
Figure 2 shows an example. Assume that action a has duration 20 and timed
conditions
p of type at start, q of type at end and r of type over all. Let [0; 50}
and [100; 150} be the time windows of p, [35; 80} the time window of q, and
finally [40; 60} and [120; 180} the
time windows of r. We can compile these timed conditions into a new timed
condition x
withthe time window [40; 60}.6. Note that for timed conditions of type at
start and at end we need to use \<" instead of \024". However,
the properties and algorithms for STPs can be easily generalized to STPs
extended with <-constraints
{e.g., Gerevini & Cristani, 1997}.194An Approach to Temporal Planning and
SchedulingSolve-DTP{X; S }Input : The set X of meta-variables in the meta
CSP of a DTP, a partial solution S of the meta CSP;Output: Either a solution
of the meta CSP or fail.1. if X = ; then stop and return S ;
2. x   SelectVariable{X }; X
0
X 000 fxg;
3. while D{x} 6= ; do
4. d   SelectValue{D{x});
5. S
0
S [fx   dg; D{x}   D{x}000 fdg;
6. D
0
{x}   D{x}; /* Saving the domain values */
7. if ForwardCheck-DTP{X 0
; S
0 } then
8. Solve-DTP{X
0
; S 0
};
9. D{x}   D
0 {x}; /* Restoring the domain values */
10. return fail; /* backtracking */ ForwardCheck-DTP{X; S }Input : The set
X of meta-variables, a {partial} solution S ;Output: Either true or false.1.
forall x 2 X do
2. forall d 2 D{x} do
3. if not Consistency-STP{S [ fx   dg} then 4. D{x}   D{x} 000 fdg;
5. if D{x}= ; then return false; /* dead-end */
6. return true.Figure 3:Basic algorithm for solving a DTP. D{x} is a global
variable whose value is the current domain of the meta-variable x. Consistency-STP{S
} returns true, if the STP formed by the variable values in the {partial}
solution S has a solution, false
otherwise.As observed by Stergiou and Kourbarakis {2000} and Tsamardinos
and Pollack {2003},
a DTP can be seen as a \meta CSP": the variables of the meta CSP are the
constraints of the original CSP, and the values of these {meta} variables
are the disjuncts forming
the constraints of the original CSP. The constraints of the meta CSP are
not explicitly
stated. Instead, they are implicitlydefined as follows: an assignment 022
of values to the
meta-variables satisfies the constraints of the meta CSP iff 022 forms
a consistent STP {an
inducedSTP of the DTP}. A solution of the meta CSP is a complete induced
STP of the
DTP.
Figure 3 shows an algorithm for solving the meta CSP of a DTP {Tsamardinos
& Pollack, 2003}, whichis a variant of the forward-checking backtracking
algorithm for solving
general CSPs. By appropriately choosing the next meta-variable to instantiate
{function SelectVariable} and its value {function SelectValue}, we can show
that the algorithm finds a solution with no backtr acking {if one exists}.
Moreover, by a simple modification of Solve-195Gerevini, Saetti & SerinaDTP,
we can derive an algorithm that is backtrack free even when the input meta
CSP has
no solution. This can be achieved by exploiting the information in the LA-graph
A of the TDA-graph to decompose its DTP D into a sequence of \growing DTPs"
D
1
032 D 2
032 ::: 032 D last
= D
where {i} last is the number of the levels in A, {ii} the variables V
i
of D
i
{i = 1::last } are
all the variables of D correspondingto the action nodes in A up to level
i, and {iii} the constraints of D
i are all the constraints of D involvingonly the variables in V
i
. E.g., for the
DTP of Figure 1, the point variables of D
3 are a
+
start , a
000
1 , a
+
1 , a
000
2 , a
+
2 , a
000
3 , a
+
3 , and the set
of constraints D
3
is
f a
+
1
000 a
000
3
024 0, a
+
2
000 a
000 3
024 0, a +
1
000 a 000
1
= 50, a
+
2
000 a
000
2
= 70, a
+
3
000 a
000 3
= 15,
{(a
+
start
000a
000
3 024 00025}^ {a +
3
000a +
start
024 50}) _ {(a
+ start
000a
000
3
02400075} ^ {a
+ 3
000a
+ start
024 125})g.
From the decomposed DTP, we can derive an ordered partition of the set of
meta-
variables in the meta CSP of the original DTP
X = X
1
[X 2
[ ::: [ X last
;
where X
i
is the set of the meta-variables corresponding to the constraints in D
i
000 D
i0001
, if i > 1, and in D 1
otherwise. This ordered partition is used to define the order in which
SelectVariable chooses the next variable to instantiate, which is crucial
to avoid backtrack-
ing. Specifically, every variable with a single domain value {i.e., an ordering
constraint, a duration constraint, or a scheduling constraint with only one
time window} is selected
beforeevery variable with more than one possible value {i.e., a scheduling
constraint with more than one time window}; moreover, if x
i
2 X
i
, x
j 2 X
j
and i < j , then x
i is selected
before x j
.
In order to avoid backtracking, the order in which SelectValue chooses the
value for a
meta-variable is very important as well: given a meta-variable with more
than one value {time window} in its current domain, we choose the value corresponding
to the earliest
available time window. E.g., ifthe current domain of the selected meta-variable
with m
possible values is
[
i=1::m
010000
a +
start
000 a
000
024 000w
000
i
001
^
000
a
+
000 a +
start
024 w
+
i
001011
;
then SelectValue chooses the j -th value such that jw
000
j
j < jw
000
h
j, for every h 2 f1; :::; mg, j 2 f1; :::; mg, h 6= j . In the following
we give a simple example illustrating the order in which SelectVariable
and SelectValue select the meta-variables and their meta-values, respectively.
Consider the
TDA-graph in Figure 1 with the additional time window [150; 200} forthe
timed precondi-
tion p of a
3
. The DTP of the extended TDA-graph has six meta-variables {x
1
; x
2
; : : : ; x
6
},
whose domains {the disjuncts of the corresponding constraints of the original
CSP} are: x
1
: fa
+
1
000 a
000
3
024 0g
x
2 : fa
+
2
000 a
000 3
024 0g196An Approach to Temporal Planning and Schedulingx
3
: fa +
1
000 a 000
1
= 50g
x
4
: fa
+
2
000 a
000
2
= 70g
x
5
: fa
+
3
000 a
000
3
= 15g
x
6
: f{a
+
start 000 a
000
3 024 00025} ^ {a
+
3
000 a
+
start
024 50}, {a
+ start
000 a
000
3
024 00075} ^ {a
+ 3
000 a
+ start
024 125}, {a
+
start 000 a
000
3 024 000150} ^ {a +
3
000 a +
start
024 200}g:
By exploiting the level structure of the TDA-graph, we derive an ordered
partition of the
meta-variables formed by the following sets: X
1
=fx
3
g, X 2
= fx
4
g, X
3
= fx
1 ; x
2
; x 5
; x
6 g.
Since x
3
belongs to X
1
while x
4 belongs to X
2 , SelectVariable selects x
3
before selecting x
4
. Similarly, the function selects x
4
before the meta-variables in X
3
. When the algorithm
instantiates x 6
, the first meta-value of x
6
{i.e., the first time window of the timed precondi- tion of a
3
} has been removed from its domain by forward checking, and SelectValue
selects {a
+
start 000 a
000
3
024 00075} ^ {a
+
3
000 a
+
start 024 125} before {a +
start
000 a
000
3
024 000150} ^ {a +
3
000 a +
start
024 200},
because the first meta-value corresponds to a time window starting at time
75, while the
second one corresponds to a time window starting at time 150. By using these
techniques for selecting the next meta-variable to instantiate and its value,
we can prove the following theorem.Theorem 1Given a DTP D for a TDA-graph,
if the meta CSP X of D is solvable, then
Solve-DTP finds a solution of X with no backtr acking. More over, this solution
is an optimal
induce d STP of D for a
000
end
.
Proof. The proof has two key points: the way meta-variables are selected
and instantiated
by SelectVariable and SelectValue, respectively; the particular type of
constraints of D, in
which all disjunctive constraints have a specific form encoding a set of
disjoint time windows,
and, by construction of D, we have
8j :9i such that i < j and 012 j= a 006
j
< a 006
i
; {1}
where 012 is the set of ordering constraints and duration constraints in
D, and a
006
i
{a
006 j
} is
an endpoint of a
i
{a
j
}. Because of property {1}, 012 cannot imply any restriction on the
maximum distance between an endpoint of a
i
and endpoint of a
j {while, of course, there
can be a lower bound on this distance}. I.e., for any positive quantity
u we have
8j :9i such that i < j and012 j= {a
006
j
000 a
006
i 024 u}: {2} Let assume that SelectVariable chooses a meta-variable x
that cannot be consistently
instantiated to a value in D{x} {and this means that we have reached a backtracking
point}.
We show that this cannot be the case. SelectVariable chooses the meta-variables
of the STP-constraints of D beforeany meta-
variable of a scheduling constraint with more than one value {time window}.
Let X
s
be the set of the meta-variables associated with the scheduling constraints
in D. We have
that x must be a meta-variable in X
s
, because we are assuming that the meta CSP X is
solvable. The use of the forward checking subroutine guarantees that at
least one value of x
is consistent with respect to the meta-variables that are instantiated in
the current partial197Gerevini, Saetti & Serinasolution S . Hence, itshould
be the case that at step 7 of Solve-DTP ForwardCheck-DTP
returnsfalse for every value d {time window} in D{x}, i.e., that for every
d 2 D{x} there
exists another uninstantiated meta-variable x
0
2 X
s
such that, forevery d
0
2 D{x
0
}, the
check Consistency-STP{S [ fx
0
d 0
g} executed by the forward checking subroutine returns
false. However, ifX has a solution {D is consistent}, this cannot be the
case because{i}the value chosen by SelectValue to instantiate x and the previously
instantiated meta- variables {step 4} is the earliest available time window
in the current domain of the
meta-variable under consideration, which is a \least commitment assignment",
and{ii}we have at most one scheduling constraint {meta-variable in X
s
} for each level of the
TDA-graph.
Let a
0
be the action constrained by the scheduling constraint associated with x
0
. Since SelectVariable selects x before x
0
, by {ii} we have that a
0 is at a level following the level of the
action constrained by the scheduling constraint associated with x. Thus,
by property {2},
we have that if x
0
could not be instantiated, then this would be because every time window
of a
0
constrains a
0
to start \too early": the current partial solution of X augmented with
any of the possiblevalues of x implies that the start time of a
0
should be after the end of
the last time window of a
0
. But then, {i} and the assumption that X is solvable guarantee
that this cannot be the case.
Moreover, sincethe value of every instantiated meta-variable is propagated
by forward checking to the unassigned variables, we have that the first value
assigned to any meta- variable is the same value assigned to that variable
in the solution found for the CSP {if
any} { it is easy to see that if the first value chosen by SelectValue{D{x})
is not feasible
{ForwardCheck-DTP{X 0
; S
0 } returns false}, then every other next value chosen for x is not feasible.
Finally, since the value chosen by SelectValue for a meta-variable corresponds
to the earliest available window in the current domain of the meta-variable,
itfollows that the
solution computed by the algorithm is a complete optimal induced STP of
D for a
000
end
. 2 As a consequence of the previous theorem, ifSolve-DTP performs backtracking
{step 10},
then the input meta CSP has no solution. Thus, we can obtain a general backtrack-free
algorithm for the DTP of any TDA-graph by simply replacing step 10 with
10. stop and return fail.
The correctness of the modified algorithm, which we called Solve-DTP +
, follows from Theorem 1. The next theorem states that the runtime complexity
of Solve-DTP +
is poly-
nomial.Theorem 2Given a TDA-graph G withDTP D, Solve-DTP
+
pro c esses the meta CSP corr esp onding to D in polynomial time with resp
e ct to the number of action nodes in G and
themaximum number of time windows in a scheduling constr aint of D.
77. It should be noted that here our main goal is to give a complexity bound
that is polynomial. The use of
improved forward checking techniques {e.g., Tsamardinos & Pollack, 2003}
could lead to a complexity
bound that is lower than the one given in the proof of the theorem.198An
Approach to Temporal Planning and SchedulingProof. The time complexity depends
on the number of times ForwardCheck-DTP isexe-
cuted, and on its time complexity. D contains a linear number of variables
with respect
to the number n of domain action nodes in the LA-graph of the TDA-graph,
O{n
2
} or- dering constraints, and O{n} duration constraints and scheduling constraints.
Hence, the
meta CSP of D has O{n
2 } meta-variables {one variable for each constraint of the original CSP}.
Let ! be the maximum number of time windows in a scheduling constraint of
D.
ForwardCheck-DTP isexecuted at most ! timesfor each meta-variable x, i.e.,
O{! 001 n
2 } times
intotal. Consistency-STP decides the satisfiability of an STP involving
O{n} variables, which
can be accomplished in O{n
3
} time {Dechter et al., 1991; Gerevini & Cristani, 1997}. {Note
that the variables of the STP that is processed by Consistency-STP are the
variables of the
original CSP, i.e., they are the starting time and the end time of the actions
in the plan.}
Finally, Consistency-STP is run O{! 001 n 2
} times during each run of ForwardCheck-DTP. It follows that the runtime
complexity of Solve-DTP
+
is O{!
2
001 n
7
}. 2 By exploiting the structure of the temporal constraints forming the
DTP of a TDA- graph, we can make the following additional changes to Solve-DTP
+
improving the efficiency of the algorithm.
* Instead of starting from an empty assignment S {no meta-variable is instantiated},
initially every meta-variable associated with an ordering constraint or
with a duration
constraint is instantiated with its value, and X contains only meta-variables
associated
with the scheduling constraints. As observed in the proof of Theorem 1,
if the meta
CSP is solvable, the values assigned to the meta-variables by the initial
S form a
consistent STP.
* F orward checking is performed only once for each meta-variable. This is
because in
the proof of Theorem 1 we have shown that, if the meta CSP is solvable,
then the
first value chosen by SelectValue should be feasible {i.e., ForwardCheck-DTP
returns true}. Thus, if the first value is not feasible, we can stop the
algorithm and return fail
because the meta CSP is not solvable. Moreover, we can omit steps 6 and
9 which saveand restore the domain values of the meta-variables.
* Finally , the improved algorithm can be made incremental by exploiting
the particular
way in which we update the DTP of the TDA-graph during planning {i.e., during
the
search of a solution TDA-graph described in the next section}. As described
in the next section, each search step is either an addition of a new action
node to a certain level l, or the removal of an action node from l. In both
cases, itsuffices to recompute
the sub-solution for the meta-variables in the subsets X
l
; X
l+1 ; :::; X
last
. The values
assigned to the other meta-variables is the same as the assignment in the
last solution computed before updating the DTP, and it is part of the input
of the algorithm.
Moreover, inorder to use the local search techniques described in the next
section, we
need another change to the basic algorithm: when the algorithm detects that
X has no
solution, instead of returning failure, {i} it keeps processing the remaining
meta-variables, and {ii} when it terminates, it returns the {partial} induced
STP S
i formed by the values
assigned to the meta-variables. The optimal solution of S
i defines the T -assignment of the
TDA-graph.199Gerevini, Saetti & SerinaIn the next section, S G
denotes the induced STP for the DTP of a TDA-graph G com- puted by our method.
3. Local Search Techniques for TDA-Graphs
A TDA-graph hA; T ; P ; C i can contain two types of flaw: unsupported precondition
nodes of A, called prop ositional flaws, and action nodes of A that are not
scheduled by T , called
tempor al flaws. If a level of A contains a flaw, we say that this level
is flawed. For example, if the only time window for p in the TDA-graph of
Figure 1 were [25; 50}, then level 3 would be flawed, because the start time
of a
3
would be 70, whichviolates the scheduling constraint for a
3
imposing that this action must be executed during [25; 50}.
A TDA-graph with no flawed level represents a valid plan and is called a
solution graph.
In this section, we present new heuristics for finding a solution graph
in a search space of
TDA-graphs. These heuristics are used to guide a local search procedure,
called Walkplan,
that was originallyproposed by Gerevini and Serina {1999} and that is the
heart of the
search engine of our planner. The initial TDA-graph contains only a
start
and a end
. Each search step identifies the
neighborhood N {G } {successor states} of the current TDA-graph G {searchstate},
which is
a set of TDA-graphs obtained from G by adding a helpful action node to A
or removing a harmful action node from A in an attempt to repair the earliest
flawed level of G .
8
In the following, for the sake of brevity when we refer to an action node
of a TDA-graph,
we are implicitly referring to an action node of the LA-graph of a TDA-graph.
Similarly for the level of a TDA-graph. Moreover, weremind the reader that
a
l
denotes the action atlevel l, while l a
denotes the level of action a.Definition 4Given a flawed level l of a TDA-graph
G , an action node is helpful for l iff
its insertion into G ata level i 024 l would remove a prop ositional flaw
at l.Definition 5Given a flawed level l of a TDA-graph G , an action node
at a level i 024 l
is harmful for l iff its removal from G would remove a prop ositional flaw
at l, or would decr e ase the T -value of a
l
, if a
l
is unschedule d. Examples of helpful action node and harmful action node
An action node representing an action with effect p 1
is helpful for level 3 of the TDA-graph
of Figure 1 if it is added at level 2 or 3 {bear in mind that the insertion
of an action node at
level 3 determines an expansion of the TDA-graph postponing a
3
to level 4; more details are
given at the end of the examples}. Action node a
3
of Figure 1 is harmful for level 3, because
its precondition node p 1
is unsupported; action node a
1
is harmful for level 3, because it
blocks the no-op propagation of p
1
at level 1, which would support the precondition node p
1
at level 3. Moreover, assumingW {p} = f[25; 50}g, a
3
is unscheduled in the plan represented
by the LA-graph. Action node a
2
is harmful for level 3, because the removal of a
2
from8. We have designed several flaw selection strategies that are described
and experimentally evaluated in a
recent paper {Gerevini, Saetti, & Serina, 2004}. The strategy preferring
flaws at the earliest level of the graph tends to perform better than the
others, and so it is used as the default strategy of our planner.
More details and a discussion about this strategy are given in the aforementioned
paper.200An Approach to Temporal Planning and SchedulingA would decrease
the temporal value of a
3
. On the contrary, a
1
is not harmful for level 3, because its removal would not affect the possible
scheduling of a
3
. Notice that an action
node can be both helpful and harmful: a
3
is harmful for level 3, and it is helpful for the
goal level {because it supports the precondition node p
10 of a
end
}. When we add an action node to a level l that is not empty, the LA-graph
is extended
by one level, all action nodes from l are shifted forward by one level {i.e.,
they are moved
to their next level}, and the new action is inserted at level l . Similarly,
when we remove an action node from level l, the graph is \shrunk" by removing
level l. Some additional
details about this process are given in another paper {Gerevini et al.,
2003}. Moreover, as
pointed out in the previous section, the addition {removal} of an action
node a requires us to update the DTP of G by adding {removing} the appropriate
ordering constraints between
a and other actions in the LA-graph of G , the duration constraint of a,
and the scheduling
constraint of a {if any}. From the updated DTP, we can use the method described
in the
previous section to revise T , and to compute a possiblynew schedule of
the actions in G
{i.e., an optimal solution of S
G
}. The elements in N {G } are evaluated using a heuristic evaluation function
E consisting
of two weighted terms, estimating their additional sear ch cost and tempor
al cost, i.e., the
number of search steps required to repair the new flaws introduced, and
their contribution
to the makespan of the represented plan, respectively. An element of N {G
} with the lowest
combined cost is then selected using a \noise parameter" randomizing the
search to escape from local minima {Gerevini et al., 2003}. In addition,
in order to escape local minima, the
new version of our planner uses a short tabu list {Glover & Laguna, 1997}.
In the rest of
this section, we will focus on the search cost term of E . The techniques
that we use for the
evaluation of the temporal cost and the {automatic} setting of the term
weights of E are
similar to those that we introduced in a previous work {Gerevini et al.,
2003}.
The search cost of adding a helpful action node a to repair a flawed level
l of G is
estimated by constructing a relaxe d tempor al plan 031 achieving{1}the
unsupportedprecondition nodes of a, denoted by Pre{a}{2}the propositional
flaws remaining at l after adding a, denoted by Unsup{l}, and{3}the supported
precondition nodes of other action nodes in G that would become
unsupported by adding a, denoted by Thre ats{a}.
Moreover, we estimate the number of additional temporal flaws that the addition
of a and
031 to G would determine, i.e., we count the number of{I}action nodes of
G that would become unscheduled by adding a and 031 to G ,{II}the unsatisfied
timed preconditions of a, if a is unscheduled in the TDA-graph ex-
tended with a and 031,{III}the action nodes of 031 witha scheduling constraint
that we estimate cannot be satisfied
in the context of 031 and of G .
The search cost of adding a to G is the number of actions in 031 plus {I},
{II} and {III}, which are new terms of the heuristic evaluation. Note that
the action nodes of {I} are201Gerevini, Saetti & Serinaa 3[15]{75}pqp 6{35}p
5{50}p
1p
1p
1a
startp 1p 1p 5p 5p 2{0}{0}{000}{50}Goal levelLevel 1Level 2Level 3{0}p
5{50}p
4p
9p
9p
9p
9{70}{70}{70}{70}{000}a
end{90}{50}p
3{0}{0}p
4{0}{0}{0}p
3p
3{0}p
6p
3p
4[50]{0}a
1{70}{70}{70}p
8p
8{70}p
8p
7{0}mutex{90}p
10Actionb 1b 2b 3b 4mutex{0}[70]a
2q 1q 2[10]{20}b3q 3p
4qq 4p
5[100]b4{50}{50}{0}{30}{20}{20}Relaxed Plan 0310015b
1b
2b
3b
4Action001550a
new{30}[5]Est lower boundp
4{0}p
3b2p
5[20]{0}Ib1[15]{0}{0}N umactsFigure 4:An example of relaxed temporal plan
031. Square nodes represent action nodes,
while the other nodes represent fact nodes; solid nodes correspond to nodes
in A [ fa
new
g; dotted nodes correspond to the precondition nodes and action nodes that
are considered during the construction of 031; the gray dotted nodes are
those selected for their inclusion in 031. Action nodes are marked by the
duration of
the represented actions {in square brackets} and by their estimated start
time {in
round brackets}. The meaning of Numacts is described in the text; the lower
bounds on the earliest action start times {E st lower bound} are computed
by the algorithm in Appendix A.those that would have to be ordered after
a {because a is used to achieve one of their
preconditions, or these action nodes are mutex with a} and that, given the
estimated end
time of 031 and the duration of a, wouldexcessively increase their start
time. In {II} we
consider the original formulation of the timed preconditions of a {i.e.,
the formulation before their possible compilation into one \merged" new
precondition, as discussed in Section 2.3}. Finally, to check the scheduling
constraint of an action in 031, we consider the estimated end
time of the relaxed subplan of 031 usedto achieve the preconditions of
this action.
Example of relaxed temporal plan and additional temporal flaws {I{III}
Figure 4 gives an example of 031 forevaluating the addition of a
new at level 2 of the LA-
graph on the left side of the figure {the same graph as the one used in
Figure 1}, which is a202An Approach to Temporal Planning and SchedulingRelaxedTimePlan{G;
I ; A}Input: A set of goal facts {G}, an initial state for the relaxed plan
{I }, a set of reusable actions {A};Output: The set of actions Acts forming
a relaxed plan for G from I and the earliest time when all
facts in G can be achieved.1. Acts   A; F
S a2Acts
2. t   M AX fT {g} j g 2 G \ F or g 2 G \ I g; 3. G   G000 I ;
4. while G 000 F 6= ;
5. g   a fact in G 000 F ;
6. b   BestAction {g};
7. hA; t
0
i   RelaxedTimePlan{Pre {b}; I ; Acts};
8. T {b}   ComputeEFT {b; t
0
}; 9. t   M AX ft; T {b}g; 10. forall f 2 Add{b} do
11. T {f }   M I N 010
T {f }; T {b} + Dur{b}
011
;
12. Acts   A[ fbg; F   S
a2Acts
13. return hActs; ti.Figure 5:Algorithm for computing a relaxed temporal
plan. ComputeEFT {b; t
0 } returns
the estimated earliest finishing time 034 of b that is consistent with
the scheduling constraint of b {if any}, and such that t
0 + Dur{b} 024 034 {for an example see Appendix A}. Add {a} denotes the
set of the positive effects of a.helpful action node for the unsupported
precondition p
6
. The goals of 031 are the unsupported
preconditions q1 and q2 of a
new
; whilethe initial state I of 031 is formed by the fact nodes that
are supported at level 2. The actions of 031 are a new
, b2 and b3. The numbers in the name of the actions and facts of the relaxed
plan indicate the order in which RelaxedTimePlan considers them. The estimated
start time and end time of b3 are 20 and 30, respectively.
Assume that the timed precondition q of a
new has associated with it the time window [0; 20}.
Concerning point {I}, there is no action node of G that would become unscheduled
a
new
and 031 to G . Concerning point {II}, a
new
is unscheduled and has one timed precondition
that is unsatisfied {q}. Concerning point {III}, we have that b3 cannot
be scheduled in the context of 031 andthe current TDA-graph G . Finally,
since 031 contains three actions, and the sum of {I}, {II} and {III} is
2, we have that the search cost of adding a
new
to G at level 2
is 5.
The evaluation of a TDA-graph derived by removing a harmful action node
a for a
flawed level l is similar, with 031 achieving
* the precondition nodes supported by a that would become unsupported by
removing
a and
* when l
a
< l, the unsupported precondition nodes at level l that do not become sup-
ported by removing a.203Gerevini, Saetti & SerinaRegarding the second point,
note that if l = l
a , then all flaws at l are eliminated because,
when we remove an action, we also {automatically} remove all its precondition
nodes. While, when l
a
S lack{a; a
k }
holds, where 031 b
is the portion of the relaxed plan computed so far, and 001{031 b
; a} is the estimated delay that adding the actions in 031
b
to G would cause to the start time of a.
Examples of time slack and \TimeThreat"
The slack between a
new
and a 3
in the TDA-graph of Figure 4 extended with a
new
is 35,
because even if a
new
started at 35, a
3
could still be executed during the time window
[75; 125} {imposedby the timed precondition p}; while if a new
started at 35 +
* , then a
3 would finish at 125 +
*  {determined by summing the start time of a
new
, Dur{a
new
}, Dur{a
2
},205Gerevini, Saetti & Serinaand Dur{a 3
}), and so the scheduling constraint of a
3
would be violated. Assume that we areevaluating the inclusionof b4 in the
relaxed plan of Figure 4 for achieving q2. We have 001{031
b4
; a
new
} = 150;
i.e. the estimated delay that the portion of the plan formed by b4 would
add to the end
time of a
new
is 150. Since the slack between a new
and a
3 is 35,
S lack{a
new
; a 3
} < 001{031
b4
; a new
};
and so a
3
2 TimeThre ats{b4}. On the contrary, since
S lack{a
new
; a 3
} > 001{031
b3
; a new
} = 30
we have that a
3 62 TimeThre ats{b3}.
T o conclude this section, we observe that the way we consider scheduling
constraints
during the evaluation of the search neighborhoodhas some similarity with
a well-known technique used in scheduling. For example, supposethat we are
evaluating the TDA-graphs
obtained by adding a helpfulaction node a to one among some alternative
possible levels of the graph, and that the current TDA-graph contains another
action node c which is mutex
with a. If the search neighborhood contains two TDA-graphs corresponding
a to a level before l
c
" and {2} \adding a to a level after l
c ", and {1} violates less scheduling constraints than {2}, then, according
to points {I}{{III}, {1} is preferred to {2}. A similar
heuristic method, called constr aint-b ase d analysis, has been proposed
by Erschler, Roubellat
and Vernhes {1976} to decide whether an action should be scheduled before
or after another
conflicting action, and it has been also used in other scheduling work for
guiding the search
toward a consistent scheduling of the tasks involved in the problem {e.g.,
Smith& Cheng, 1993}.
4. Experimental Results
We implemented our approach in a planner called lpg-td, which obtained the
2nd prize in
the metric-temporal track {\satisficing planners"} of the 4th International
Planning Compe- tition {IPC-4}. lpg-td is an incremental planner, in the
sense that it produces a sequence
of valid plans each of which improves the quality of the previous ones.
Plan quality is measured using the metric expression that is specified in
the planning problem description. The incremental process of lpg-td is described
in another paper {Gerevini et al., 2003}.
Essentially, the process iterates the search of a solution graph with an
on the lower bound of the plan quality, which is determined by the quality
of the previously
generated plans. lpg-td is written in C and is available from http://lpg.ing.unibs.it.
In this section, we present the results of an experimental study with two
main goals:
* testing the efficiency of our approach to temporal planning with predictable
exogenous events by comparing the performance of lpg-td and other recent
planners that at IPC-4 attempted the benchmark problems involving timed initial
literals {Edelkamp,
Hoffmann, Littman, & Younes, 2004};206An Approach to Temporal Planning and
SchedulingPlannerSolvedAttemptedSuccess ratioPlanning capabilities at IPC-4lpg-td845107479045Propositional
+ DP, Metric-Temporal +TILsgplan1090141577045Propositional + DP, Metric-Temporal
+TILp-mep9858817045Propositional, Metric-Temporal +TILcrikey36459461045Propositional,
Metric-Temporallpg-ipc330659452045Propositional, Metric-Temporaldownw ard
{diag}38043288045Propositional + DPdownward36043283045Propositional + DPmarvin22443252045Propositional
a6316638045TILoptop4508045TILT able 1:Number of problems attempted/solved
and success ratio of the {satisficing} plan-
ners that took part in IPC-4. \DP" means derived predicates; \TIL" means
timed
initial literals; \Propositional" means STRIPS or ADL. The planning capabili-
ties are the PDDL2.2 features in the test problems attempted by each planner
at IPC-4.
* testing the effectiveness of the proposed temporal reasoning techniques
integrated into the planningprocess to understand, in particular, their impact
on the overall performance of the system, and to compare them with other
existing techniques. For the first analysis, we consider the test problems
of the variant of the IPC-4 metric-
temporal domains involving timed initial literals. A comparison of lpg-td
and other IPC-4
planners considering all the variants of the IPC-4 metric-temporal domains
is given in Appendix B. Additional results are available from the web site
of our planner. For the second experiments, we use new domains and problems
obtained by extending two well-known benchmark domains {and the relative
problems} from IPC-3 with timed
initial literals {Long & Fox, 2003a}.
9
All tests were conducted on an Intel Xeon{tm} 3 GHz, 1 Gbytes of RAM. We
ran lpg-td
with the same default settings for every problemattempted.
4.1 LPG-td and Other IPC-4 Planners
In this section, we use the official results of IPC-4 to compare the performance
of lpg-td
with those of other planners that took part in the competition. The performance
of lpg-td corresponds to a single run. The CPU-time limit for the run was
30 minutes, after which
termination was forced. lpg-td.s indicates the CPU-time required by our
planner to derive
the first plan; lpg-td.bq indicates the best quality plan found within the
CPU-time limit.9. For a description of the IPC-4 domains and of the relative
variants, the reader can visit the
official web site of IPC-4 {http://ls5-www.cs.uni-dortmund.de/030edelkamp/ipc-4/index.html}.
The extended versions of the IPC-3 domains used in our experiments are available
from http://zeus.ing.unibs.it/lpg/TestsIPC3-TIL.tgz.207Gerevini, Saetti &
SerinaBefore focusing our analysis on the IPC-4 domains involving timed initial
literals, in
Table 1 we give a very brief overview of all the results of the IPC-4 {satisficing}
planners, in
terms of planning capabilities and problems attempted/solved by each planner.
The table
summarizes the results for all the domain variants of IPC-4. lpg-td and
sgplan {Chen, Hsu,
& Wah B., 2004} are the only planners supporting all the ma jor features
of PDDL2.1 and PDDL2.2. Both planners have a good success ratio {close to
80045}. downward {Helmert,
2004} and yahsp {Vidal, 2004} have a success ratio better than lpg-td and
sgplan, but
they handle only propositional domains {downward supports derived predicates,
while
yahsp does not}. sgplan attempted more problems than lpg-td because it was
also tested on the \compiled version" of the variants with derived predicates
and timed initial literals.
10
Moreover, lpg-td did not attempt the numerical variant of the two versions
of the Promela
domain and the ADL variant of PSR-large, because they use equality in some
numerical
preconditions or conditional effects, which currently our planner does not
support.
Figure 6 shows the performance of lpg-td in the variants of three domains
involving
predictable exogenous events with respect to the other {satisficing} planners
of IPC-4 sup-
porting timed initial literals: sgplan, p-mep {Sanchez et al., 2004} and
tilsapa {Kavuluri & U, 2004}. In Airport {upper plots of the figure}, lpg-td
solves 45 problems over 50, sgplan 43, p-mep 12, and tilsapa 7. In terms
of CPU-time, lpg-td performs much better than p-mep and tilsapa. lpg-td is
faster than sgplan in nearly all problems {except
problems 1 and 43}. In particular, the gap of performance in problems 21{31
is nearly one order of magnitude. Regarding plan quality, the performance
of lpg-td is similar to
the performance of p-mep and tilsapa, while, overall, sgplan finds plan
of worse quality
{with the exception of problems 41 and 43, where sgplan performs slightly
better, and the
easiest problems where lpg-td and sgplan perform similarly}.
lpg-td and tilsapa are the only planners of IPC-4 that attempted the variant
of
PipesWorld with timed initial literals {central plots of Figure 6}. lpg-td
solves 23 prob-
lems over 30, while tilsapa solves only 3 problems. In this domain variant
lpg-td performs
much better than tilsapa.
In the \flaw version" of Umts {bottom plots of Figure 6}, lpg-td solves
all 50 problems, while sgplan solves 27 problems {p-mep and tilsapa did not
attempt this domain variant}.
Moreover, lpg-td is about one order of magnitude faster than sgplan in every
problem solved. Compared to the other IPC-4 benchmark problems, the Umts
problems are generally easier to solve. In these test problems, the main
challenge is finding plans of good quality.
Overall, the best quality plans of lpg-td are much better than sgplan plans,
except for
the simplest problems where the two planners generate plans of similar quality.
In the basic
version of Umts without flawed actions, sgplan solves all problems as lpg-td,
but in terms
of plan quality lpg-td performs much better. Figure 7 shows the results
of the Wilcoxon sign-rank test, also known as the \Wilcoxon matched pairs
test" {Wilcoxon & Wilcox, 1964}, comparingthe performance of lpg-td and
the planners that attempted the benchmark problems of IPC-4 involving timed
initial liter-
als. The same test has been used by Long and Fox {2003a} forcomparing the
performance10. Such versions were generated for planners that do not support
these features of PDDL2.2. During the
competition we did not test lpg-td with the problems of the compiled domains
because the planner
supports the original version of these domains. lpg-td attempted every problem
of the {uncompiled} IPC-4 domains that it could attempt in terms of the planning
language it supports.208An Approach to Temporal Planning and Schedulingatendatend
Helvetica 10 100 1000 10000 100000 1e+06 1e+07 0 5 10 15 20 25 30 35 40 45Airport-WindowsMillisecondsLPG-td.s
{45  solved}P-MEP {12  solved}SGPlan {43  solved}TilSapa {7  solved}atendatend
Helvetica 0 200 400 600 800 1000 1200 1400 0 5 10 15 20 25 30 35 40 45Airport-WindowsMakespanLPG-td.bq
{45  solved}P-MEP {12  solved}SGPlan {43  solved}TilSapa {7  solved}atendatend
Helvetica 10 100 1000 10000 100000 1e+06 1e+07 0 5 10 15 20 25 30PipesWorldNoTankage-DeadlinesMillisecondsLPG-td.s
{23  solved}TilSapa {3  solved}atendatend Helvetica 0 5 10 15 20 25 30 0
5 10 15 20 25 30PipesWorldNoTankage-DeadlinesMakespanLPG-td.bq {23  solved}TilSapa
{3  solved}atendatend Helvetica 10 100 1000 10000 0 5 10 15 20 25 30 35 40
45 50UmtsFlaw-WindowsMillisecondsLPG-td.s {50  solved}SGPlan {27  solved}atendatend
Helvetica 1450 1500 1550 1600 1650 1700 1750 1800 1850 1900 0 5 10 15 20
25 30 35 40 45 50UmtsFlaw-WindowsMakespanLPG-td.bq {50  solved}SGPlan {27
solved}Figure 6:CPU-time and plan quality of lpg-td, p-mep, sgplan, and tilsapa
for three
IPC-4 domains with timed initial literals. On the x-axis we have the problem
names simplified by numbers. In the plots on the left, on the y-axis we
have
CPU-milliseconds {logarithmic scale}; in the plots on the right, on the
y-axis we
have the plan makespan {the lower the better}.209Gerevini, Saetti & SerinaCPU-Time
Analysislpg-td.svs p-meplpg-td.s vs sgplanlpg-td.s vs tilsapa5:8413:16210:118<
0:001{0:0016}< 0:00145197136Plan Quality Analysislpg-td.bqvs p-meplpg-td.bq
vs sgplanlpg-td.bq vs tilsapa9:8376:901< 0:001< 0:001< 0:0011215463Figure
7:Results of the Wilcoxon test for the performance of lpg-td compared with
other
IPC-4 {satisficing} planners in terms of CPU-times and plan quality for
the bench-
mark problems with timed initial literals.sgplanlpg-td.sCPU-Timep-meptilsapatilsap
aB :AA is consistently better than Bp-meplpg-td.bqPlan QualitysgplanB :AA
is better than B asignificant number of times{confidence level 99.84045}Figure
8:Partial order of the performance of the IPC-4 {satisficing} planners according
tothe Wilcoxon test for the benchmark problems with timed initial literals.
A
dashed arrow indicates that the performance relationship holds with a confidence
level slightly less than 99.9045.of the IPC-3 planners. For the CPU-time
analysis, we consider all the problems attempted
by both the compared planners and solved by at least one of them {when a
planner does notsolve a problem, the corresponding CPU-time is the IEEE arithmetic
representation of positive infinity}. For the plan quality {makespan} analysis,
we consider all the problems
solved by both the compared planners.210An Approach to Temporal Planning
and SchedulingIn order to carry out the Wilcoxon test, for each planning
problem we computed the difference between the CPU-times of the two plannersbeing
compared, defining the samples
of the test for the CPU-time analysis. Similarly, for the test concerning
the plan quality
analysis we computed the differences between the makespan of the plans generated
by the
two planners. The absolute values of these differences are ranked by increasing
numbers
starting from the lowest value. {The lowest value is ranked 1, the next
lowest value is
ranked 2, and so on.} Then we sum the ranks of the positive differences,
and we sum the
ranks of the negative differences. If the performance of the two planners
is not significantly
different, then the number of the positive differences will be approximately
equal to the
number of the negative differences, and the sum of the ranks in the set
of the positive differences will be approximately equal to the sum of the
ranks in the other set. Intuitively,
the test considers a weighted sum of the number of times one planner performs
better than
the other. The sum is weighted because the test uses the performance gap
to assign a rank to each performance difference. Each cell in Figure 7 gives
the result of a comparison between the performance of lpg-td and another
IPC-4 planner. When the number of the samples is sufficiently large,
the T-distributionused by the Wilcoxon test is approximatively a normal
distribution.
Therefore, the cells of the figure contain the z-value and the p-value characterizing
the normal distribution. The higher the z-value, the more significant the
difference of the
performance is. The p-value represents the level of significance in the
performance gap. We use a confidence level of 99.9045; hence, if the p-value
is lower than 0.001, then the
performance of the compared planners is statistically different. When this
information appears on the left {right} side of the cell, the first {second}
planner named in the title of
the cell performs better than the other planner.
11
For the analysis comparing the CPU- time, the value under each cell is the
number of the problems solved by at least one planner;
while for the analysis comparing the plan quality, it is the number of problems
solved by both the planners.
Figure 8 shows a graphical description of the relative performance of the
IPC-4 satisficing planners according to the Wilcoxon test for the benchmark
problems with timed initial literals. A solid arrow from a planner A to a
planner B {or a cluster of planners B} indicatesthat the performance of A
is statistically different from the performance of B,
and that A performs better than B {every plannerin B}. A dashed arrow from
A to B
indicates that A is better than B a significant number of times, but there
is no significant
Wilcoxon relationshipbetween A and B with a confidence level of 99.9045
{on the other hand, the relationshipholds with a confidence level slightly
less than 99.9045}. The results
of this analysis say that lpg-td is consistently faster than tilsapa and
p-mep, whileit is faster than sgplan a significant number of times. In terms
of plan quality, lpg-td performs
consistently better than p-mep, sgplan and tilsapa.
Although lpg-td does not guarantee optimal plans, it is interesting to compare
its
performance with the optimal planners that took part in IPC-4, especially
to see how good lpg-td's plans are. Figure 9 shows the performance of lpg-td
and the best results over the results of al l the other optimal IPC-4 planners
{\AllOthers-Opt"} in the temporal variants
of Airport and Umts {without flawed actions}. The plots for the plan quality
{makespan}11. The p-value in the cell comparing lpg-td and p-mep is omitted
because the number of problems solved by
both lpg-td and p-mep is not high enough to approximate the T-distribution
to a normal distribution.211Gerevini, Saetti & Serinaatendatend Helvetica
10 100 1000 10000 100000 1e+06 1e+07 0 5 10 15 20 25 30 35 40 45Airport-TimeMillisecondsLPG-td.s
{44  solved}LPG-td.bq {44  solved}AllOthers-Opt {21  solved}atendatend Helvetica
0 100 200 300 400 500 600 700 800 900 1000 0 5 10 15 20 25 30 35 40 45Airport-TimeMakespanLPG-td.s
{44  solved}LPG-td.bq {44  solved}AllOthers-Opt {21  solved}atendatend Helvetica
10 100 1000 10000 100000 1e+06 1e+07 0 5 10 15 20 25 30 35 40 45 50UMTS-TimeMillisecondsLPG-td.s
{50  solved}LPG-td.bq {50  solved}AllOthers-Opt {38  solved}atendatend Helvetica
500 550 600 650 700 750 800 850 900 0 5 10 15 20 25 30 35 40 45 50UMTS-TimeMakespanLPG-td.s
{50  solved}LPG-td.bq {50  solved}AllOthers-Opt {38  solved}Figure 9:Performance
of lpg-td and the best over all the optimal planners of IPC-4
{AllOthers-Opt} in Airport-Time and Umts-Time: CPU-time in logarithmic scale
{left plots} and plan makespan {right plots}. On the x-axis we have the problem
names simplified by numbers.showthat, innearly every problem of these domains,
the best quality plan found by lpg-td
is an optimal solution, and that the first plan found by lpg-td is generally
a good solution. The plots for the CPU-time show that lpg-td finds a plan
much more quickly than any
optimal planner, and that the CPU-time required by lpg-td to find the best
plan is often
lower than the CPU-time required by AllOthers-Opt {except for problems 12,
16, 18 and
20 of Airport}. It should be noted that lpg-td.bq is the last plan over
a sequence of
computed plans with increasing quality {and CPU-time}. The intermediate
plans in this
sequencecould already have good quality. In particular, as shown by the
plan quality plot
for Airport, the first plan {lpg-td.s} solving problem 12 has near-optimal
quality, but it is
computed much more quickly than the lpg-td.bq plan and the AllOthers-Opt
plan.212An Approach to Temporal Planning and SchedulingepswriteCepswriteCFigure
10:Plan quality distance between the solutions found by lpg-td and the correspond-
ing optimal solutions. On the x-axis, we have some classes of quality distance
{e.g., \10{25045" means that the plan generated by lpg-td is worse than
the
optimal plan by a factor between 0.1 and 0.25}. On the y-axis, we have the
percentage of solved problems for each of these classes.Finally, Figure
10 gives the results of a more general analysis on the plan quality distance,
considering all metric-temporal and STRIPS variants of the IPC-4 domains.
12
The analysis
uses only the problems solved by at least one IPC-4 optimal planner. It
is also important to note that we consider only the plans generated by the
incremental process of lpg-td using
no more CPU-time than the CPU-time required by the fastest optimal planner
{AllOthers-
Opt}. Overall, the results in Figure 10 provide significant empirical evidence
supporting
the claim that often an incremental local search approach allows us to compute
plans with
very good quality using less or no more CPU-time than an optimal approach.
In particular, the bars for the \0045{1045" class in the plot of the metric-temporal
problems show that the
percentage of the test problems in which the best quality plan of lpg-td
{lpg-td.bq} is optimal or nearly optimal {i.e., plan quality is worse than
optimal by a factor between
0 and 0.01, with 0 meaning no difference} is about 90045. Moreover, often
the first plan computed by lpg-td {lpg-td.s} has good quality: 60045 of
all these plans have quality that isoptimal or nearly optimal, and only about
25045 of them have a quality that is worse than
the optimal by a factor greater than 0.5.
Interestingly, the plot on the right of Figure 10 shows similar results
concerning the good
quality of lpg-td's plans also for the STRIPS problems of IPC-4 {with a
lower percentage
of the lpg-td.s' plans that are in the 0045{1045 class, and a slightly
higher percentage of the
lpg-td.bq's plans that are in the \> 50045" class}.
4.2 Temporal Reasoning in LPG-td
We conducted two main experiments. The first was aimed at testing the performance
of lpg-td when the number of windows for the timed initial literals varies
in problems12. For the STRIPS problems, the plan quality metric is the number
of the actions in the plan.213Gerevini, Saetti & Serinahaving the same initialstate
and goals. The second experiment focused on our temporal
reasoningtechniques with the main goals of empiricallyevaluating their performance,
and
understandingtheir impact on the overall performance of lpg-td.
For these experiments we used two well-known IPC-3 domains, which were modified
to
include timed initial literals: Rovers and ZenoTravel. The version of Rovers
with timed initial literals was obtained from the IPC-3 temporal version
as follows. In the problem
specification, for each \waypoint", we added a collection of pairs of timed
initial literals of
the type {at t
1
{insun waypoint0})
{at t
2
{not {insun waypoint0})}
where t
1
< t
2
. Each of these pairs defines a time window for the involved literal. In
the
operatorspecification file, the \recharge" operator has the precondition
{over all {insun ?w})
which imposes the constraint that the recharging actions are applied only
when the rover is in the sun {?w is the operator parameter representing the
waypoint of the recharging action.}
The modified version of ZenoTravelwas obtained similarly. In the problem
specification, for each city we added a collection of pairs of timed initial
literals of the type {at t
1
{open-stationcity0})
{at t
2 {not {open-stationcity0})}
and in the operator specification file, we added the timed precondition
{over all {open-station?c})
to the \refuel" operator, where ?c is the operator parameter representing
the city where
the refuel action is executed. Given a planning problem 005 and a collection
of time windows W
036
for a timed literal 036, it
should be noted that, in general, the difficulty of solving 005 is affected
by three parameters:
the number of windows in W 036
, their size, and the way they are distributedon the time
line.
13
We considered two methods for generating test problems taking account of
these
parameters {005 indicates an original IPC-3 problem in either the Rovers
or ZenoTravel
domain, and n indicates the number of windows in W
036
}:{I}Let 031 be the best {shortest makespan} plan among those generated
by lpg-td for
solving 005 within a certain CPU-time limit, and t the makespan of 031.
The time
interval [0; t] is divided into 2n 000 1 sub-intervals of equal size. The
time windows for each timed literal 036 of the extended problem 005
0
are the odd \sub-intervals" of [0; t],
i.e.,W 036
=
nh 0;
t2n0001
021
;
h
2t2n0001
;
3t2n0001 021
; : : : ; h
{2n0002}t2n0001
; t
021o :{II}Let d be the maximum duration of an action in 005 with a timed
precondition 036. The
time interval 022 = [0; d 002 {2n 000 1}] is divided into 2n 000 1 sub-intervals
of duration d.13. In general, these parameters influence not only the hardness
of temporal reasoning during planning, but
also the logical partof the planning process {i.e., the selection of the
actions forming the plan, that in lpg-tdis done using heuristics taking exogenous
events into account}.214An Approach to Temporal Planning and Schedulingatendatend
Helvetica 10 100 1000 10000 0 2 4 6 8 10 12 14 16 18 20Rovers-WindowsMilliseconds1
time window per waypoint 10 time windows per waypoint atendatend Helvetica
10 100 1000 10000 100000 1e+06 0 2 4 6 8 10 12 14 16 18 20ZenoTravel-WindowsMilliseconds1
time window per city 10 time windows per city Figure 11:Performance of lpg-td
in the Rovers and ZenoTravel domains extended with
timed initial literals {1 and 10 time windows for each timed literal}. The
test problems were generated using method I. On the x-axis we have the problem
names simplified by numbers; on the y-axis we have CPU-milliseconds{logarith-
mic scale}.Similarly to method {I}, the time windows for 036 in the extended
problem 005 0
are the
odd sub-intervals of 022.
Notice that we can use the first method only when the number of windows
is relatively
small because, if there are too many time windows of small size, the extended
problem can become unsolvable {no window is large enough to schedule into
it a necessary action with
a timed precondition}. The second method was designed to avoid this problem,
and it can
be used to test our techniques on planning problems involving many time
windows.
Figures11 and 12 give the results of the first experiment. The CPU-times
in these plots
are median values over five runs for each problem. For the results of Figure
11, we use
the IPC-3 test problems modified by method I, while for the results of Figure
12 we use
theIPC-3 test problemsmodified by method II. In both cases lpg-td solves
all problems.
The plots of Figure 11 indicate that the performance degradation when the
number of windows increases from 1 to 10 is generally moderate, except in
two cases. The plots of
Figure 12 indicate that, when the number of windows increases exponentially
from 1 to
10,000,theapproach scales up well for the benchmark problems considered.
For instance,
consider the first ZenoTravel problem. With 1 window lpg-td solves this
problem in 10 milliseconds, with 10 windows in 20 milliseconds, with 100
windows in 30 milliseconds, with 1000 windows in about 100 milliseconds,
and with 10,000 windowsin about 1 second.
Moreover, weobserved that the performance degradation is mainly determined
by a heavier
pre-processing phase {parsing and instantiation of the operators}. Tables
2 and 3 give some results concerning the experiment about our temporal reasoning
techniques implemented in lpg-td. We consider some of the problems with
10 time windows
{for each timed fluent} used for the tests of Figure 11, and we examine
the computational215Gerevini, Saetti & Serinaatendatend Helvetica 10 100
1000 10000 100000 0 2 4 6 8 10 12 14 16 18 20Performance of LPG-td in Rovers-TimeWindowsMilliseconds1
time window per waypoint 10 time windows per waypoint 100 time windows per
waypoint 1000 time windows per waypoint 10,000 time windows per waypoint
atendatend Helvetica 10 100 1000 10000 100000 1e+06 0 2 4 6 8 10 12 14 16
18 20Performance of LPG-td in ZenoTravel-TimeWindowsMilliseconds1 time window
per city 10 time windows per city 100 time windows per city 1000 time windows
per city 10,000 time windows per city Figure 12:Performance of lpg-td in
the Rovers and ZenoTravel domains extended with
timed initial literals {1{10,000 time windows for each timed literal}. The
test problems were generated using method II. On the x-axis we have the problem
names simplified by numbers; on the y-axis we have CPU-milliseconds{logarith-
mic scale}.cost of temporal reasoning during planning for these problems.
In our approach to temporal
planning, each search step defines a set of temporal constraints formed
by the ordering and
scheduling constraints in the current TDA-graph. Table 2 gives statistical
information about such DTPs using both the compact constraint representation
of lpg-td and the \classical"
DTP representation. For each action in the TDA-graph, we have two temporal
variables {the start/end times of the action}, except a
start
and a end
{for which, as we have pointed out,
we can use only one variable}. The number of the scheduling constraints
and the number of the ordering constraints depend on which actions are in
the current TDA-graph, and on
how these actions are {causally or exclusively} related to each other, respectively
{we have
one scheduling constraint for each action with a timed precondition in the
TDA-graph}.
Notice that our representation of the scheduling constraints is much more
compact than
the classical DTP formulation.
14
The table also gives information about the average number of DTPs {i.e.,
search steps}
generated during planning, indicating how many of them are satisfiable {indicated
with \Sat. DTPs"}.
Table 3 gives the CPU-time required by our temporal reasoning techniques
implemented
in lpg-td {Solve-DTP
+
} and by tsat++ {Armando, Castellini,Giunchiglia, & Maratea,
2004},a state-of-the-art generalDTP solver. The DTPs considered here are
the same as those of Table 2, i.e., the sets of the temporal constraints
in the TDA-graph at each search14. The classical DTP-translation of a scheduling
constraint contains an exponential number of disjuncts
with respect to the number of time windows in the scheduling constraint.
For example, let q be a
timed precondition of a and W
q
= f[25; 50}; [75; 125}g. The scheduling constraint of a determined by q
is translatedinto four classical DTP constraints {a
s
abbreviates a
start
}: {a
+
s
000 a
000
024 00025 _ a
+
s 000 a
000
024 00075},
{a +
s
000 a 000
024 00025 _ a
+
000 a
+
s
024 125}, {a
+ 000 a
+
s
024 50 _ a +
s
000 a
000
024 00075}, {a
+
000 a
+
s
024 50 _ a
+ 000 a
+
s
024 125}.216An Approach to Temporal Planning and SchedulingProblemsVariablesSC
with 10 windows {DC}DTPsmaxmeanmaxmean{Sat. DTPs}RoversProblem 12818.41 {1024}0.13
{136.5}15 {15}Problem 55630.02 {2048}0.33 {341.3}27 {27}Problem 109465.82
{2048}1.41 {1447}104 {47}Problem 159858.83 {3072}1.01 {1037}77 {55}Problem
20206105.04 {4096}1.45 {1489}108 {108}ZenoTravelProblem 1861 {1024}0.33 {341.3}3
{3}Problem 536203 {3072}0.88 {910.2}18 {18}Problem 1011483.416 {16,384}10.5
{10,769}1162 {175}Problem 15172122.424 {24,576}16.3 {16,673}291 {128}Problem
20282194.642 {43,008}24.9 {25,536}750 {637}Table 2:Characteristics of the
DTPs generated duringplanning by lpg-td when solving
some problems in the Rovers and ZenoTravel domains: maximum/mean num-
ber of variables {2nd/3rd columns}; maximum/mean number of scheduling con-
straints {\SC"} and of non-unary disjunctions {\DC"} in their DTP-form
transla-
tion{4th/5th columns}; number of DTPs and of satisfiable DTPs solved by
lpg-td
{6th column}.step of the planning process. It should be noted that the comparison
of Solve-DTP
+
and tsat++ is by no means intended to determine which one is better than
the other. Indeed
tsat++ was developed to manage a much larger class of DTPs. However, tothe
best of
our knowledge there exists no other more specialized DTP-solver handling
scheduling con-
straints that we could have used. The goal of this comparison is to experimentally
show
that existing general DTP solvers, although designed to work efficiently
in the general case,
are not adequate for managing the class of DTPs that arise in our planning
framework.
Hence, it is important to develop more specialized techniques which, as
empirically demon-
strated by the results of Table 3, can be much more efficient. For instance,
consider problem 15 in the Rovers domain. As indicated by the last column
of Table 2, lpg-td solves this problem with 77 search steps, whichdefines
77 DTPs. The data in Table 3 show that
the total CPU-time spent by lpg-td for solving all these temporal reasoning
problems is
negligible {< 10
0006
seconds}, whiletsat++ requires 16.8 CPU-seconds in total {note that
the whole temporal planning problem is solved by lpg-td in only 0.25 seconds}.
15 Overall,
our specialized temporal reasoning technique is several orders of magnitude
faster than an
efficient general DTP, in terms of both CPU-time for solving a single DTP,
and CPU-time
for solving all the DTPs that are generated during planning.15. The CPU-time
of tsat++ includes neither the generation of the explicit {classical} DTPs
from the TDA-
graph, nor the parsing time. Moreover, while tsat++ only decides satisfiability
of the input DTPs,
Solve-DTP
+
also findsa schedule that is optimal, if the DTP is satisfiable.217Gerevini,
Saetti & SerinaProblemsCPU-seconds for Temporal ReasoningTotalSolve-DTP
+tsa t++CPU-Timemaxmeantotalmaxmeantotalof lpg-tdRoversProblem 1<10 0006<
10 0006< 10 00060.0050.0020.090.02Problem 5<10
0006< 10
0006< 10
00060.0450.0020.140.03Problem 10<10
0006< 10
0006< 10
00060.540.03912.70.30Problem 15<10
0006< 10
0006< 10
00060.540.02816.80.25Problem 200.010.00080.033.170.10107.13.03ZenoTravelProblem
1<10
0006< 10
0006< 10
00060.0010.00030.010.02Problem 5<10
0006< 10 0006< 10
00060.040.0040.210.05Problem 100.010.000170.22.79.8601822.0Problem 150.010.000140.0444.63.918,87713.9Problem
200.010.000650.5323.924.2177,595376.2Table 3:Performance of Solve-DTP
+
and tsat++ for the DTPs generated during planning by lpg-td when solving
some problems in the Rovers and ZenoTravel domains: maximum, mean and total
CPU-seconds. The last column gives the total CPU- time of lpg-td for solving
the planning problem. tsat++ was run using its default
settings.Finally, we experimentally tested the effectiveness of the improvements
to Solve-DTP
+
formaking the algorithm incremental that we have describedat the end of
Section 2 {such
improvements are included in the implementation of Solve-DTP
+
of Table 3}. In particular we observed that, for the problems of Table 3,
the average CPU-timeof the basic {non-
incremental} version of Solve-DTP
+
is from one to three orders of magnitude higher than the incremental version.
However, the basic version is still always significantly faster than
tsat++ {from one to four orders of magnitude}. 5. Related Work
Several researchers have addressed temporal reasoning in the context of
the DTP frame- work. Some general techniques aimed at efficiently solving
a DTP have been proposed
{e.g., Armando et al., 2004; Tsamardinos & Pollack, 2003}, but their worst-case
complexity
remains exponential. In Section 4, we presented some experimental results
indicating that the simple use of a state-of-the-art DTP solver is not adequate
for solving the subclass of DTPs that arise in our context. Various planning
approaches supporting the temporal features considered in this paper have
been proposed. One of the first planners that was capable of handlingpredictable
exogenous events is deviser {Vere, 1983}, which was developed from nonlin
{Tate, 1977}.
deviser is a temporal partial order planner using a network of activities
called a \plan
network". Before starting plan generation, the plan network contains the
exogenous events218An Approach to Temporal Planning and Schedulingas explicit
nodes of the network. During plan generation, the activities added to the
network
are ordered with respect to these scheduled events, depending on the relevance
of the events for the activities. A similar explicit treatment of the exogenous
events could be also adopted
in the context of the action-graph representation: the initial action graph
contains special
action nodes representing the predicted exogenous events. However, this
simple method has some disadvantages with respect to our method, that treats
exogenous events at the
tempor al level of the representation rather than at the logical {causal}
level. In particular,
when there is a high number of timed initial literals, the explicit representation
of the exogenous events in the action graph could lead to very large graphs,
causingmemory consumption problems and a possibly heavy CPU-time cost for
the heuristic evaluation of
the {possiblyvery large} search neighborhood.
In the late 80s and early 90s some other temporal planners handling exogenous
events
were developed. In general, these systems use input descriptions of the
planning prob- lem/domain that are significantly different from the PDDL
descriptions accepted by modern
fully-automated planners. One of the most successful among them is hsts
{Frederking &
Muscettola,1992; Muscettola, 1994}, a representation and problem solving
framework that
provides an integrated view of planningand scheduling. hsts represents predictable
ex- ogenousevents through \non-controllable state variables". Both lpg-td
and hsts manage
temporal constraints, but the two systems use considerably different approaches
to temporal
planning {lpg-td adopts the classical \state-transition view" of change,
whilehsts adopts the \histories view" of change, Ghallab, Nau, & Traverso,
2003}, andthey are based on
different plan representations and search techniques.
zeno {Penberthy, 1993; Penberthy& Weld, 1994} is one of the first domain-independent
planners which supports a rich class of metric-temporal features, including
exogenous events.
zeno is a powerful extension of the causal-link partial-order planner ucpop
{Penberthy &
Weld, 1992}. However, interms of computational performance, this planner
is not compet- itive with more recent temporal planners. IxTeT {Ghallab &
Laruelle, 1994; Laborie& Ghallab, 1995} is another causal-link plan- ner
which uses some techniques and ideas from scheduling, temporal constraint
reasoning, and graph algorithms. IxTeT supports a very expressive language
for the temporal de-
scription of the actions, including timed preconditions and some features
that cannot be
expressed in PDDL2.2. The expressive power of the language is obtained at
the cost of in-
creased semantic complexity {Fox & Long, 2005}. As observed by Ghallab,
Nau and Traverso
{2003}, IxTeT embodies a compromise between the expressiveness of complex
temporal do- mains, and the planning efficiency; however, this planner still
remains noncompetitive with
the more recent temporal planners. Smith and Weld {1999} studied an extension
of the Graphplan-style of planning for
managing temporal domains. They proposed an extension of their tgp planner
that makes
it possible to represent predictable exogenous events. tgp supports only
a subclass of the durative actions expressible in PDDL2.1, which prevents
some cases of concurrency that
in PDDL2.1 are admitted. tgp is an optimal planner {under the assumed conservative
model of action concurrency}, whilelpg-td is a near-optimal {satisficing}
planner. A main
drawback of tgp is that it does not scale up adequately.
More recently, Edelkamp {2004} proposed a method for planning with timed
initial
literals that is based on compiling the action timed preconditions into
a time window as-219Gerevini, Saetti & Serinasociated with the action, definingthe
interval during which the action can be scheduled.
He gives an efficient, polynomialalgorithm based on critical path analysis
for computing
an optimal action schedule from sequential plans generated using the compiled
represen-
tation. The techniques presented by Edelkamp assume a unique time window
for each
timed precondition. The techniques that we propose are more general, in
the sense that our
action representation treats multipletime windows associated with a timed
precondition,
and our temporal reasoning method computes optimal schedules for partially
ordered plans
preserving polynomiality.
Cresswell and Coddington {2004} proposedan extension of the lpgp planner
{Long
& Fox, 2003b} to handle timed initial literals, which are represented by
actions". A literal that is asserted to hold at time t is represented by
starting at the time of the initial state, and having duration t. The deadline
actions in the plan under construction are translated into particular linear
inequalities that, together with other equalities and inequalities generated
from the plan representation, are managed by a general linear programming
solver. lpg-td uses a different representation that does not
encode timed initial literals as special actions, and in which the temporal
and scheduling
constraints associated with the actions in the plan are managed by an efficient
algorithm
derived by specializing a general DTP solver. In order to handle problems
with timed initial literals in the sapa planner {Do & Kamb-
hampati, 2003}, Do, Kambhampati and Zimmerman {2004} proposed a forward
search
heuristic based on relaxed plans, which are constructed by exploiting a
technique similar to
the time slack analysis used in scheduling {Smith & Cheng, 1993}. Given
a set of candidate
actions for choosing an action to add to the relaxed plan under construction,
this technique
computes the minimum slack between each candidate action and the actions
currently in the relaxed plan. The candidate action with the highest minimum
slackis preferred. lpg-td
usesa different time slack analysis, which is exploited in a different way.
Our method for
selecting the actions forming the relaxed plan uses the time slacks for
counting the number
of scheduling constraints that would be violated when adding a candidate
action: we prefer
the candidate actions which cause the lowest number of violations. Moreover,
insapa the
slack analysis is limited to the actions of the relaxed plan, while our
method also considers
the actions in the re al plan under construction.
dt-pop is a recent planner {Schwartz & Pollack, 2004} extendingthe POP-style
of
planning with an action model involving disjunctive temporal constraints.
The language of
dt-pop is elegant and can express a rich class of temporal features, most
of which can be
only indirectly {and less elegantly} expressed in PDDL2.2 {Fox et al., 2004}.
The treatment
of the temporal constraints required to manage predictable exogenous events
in dt-pop
appears to be less efficient than in our planner, since dt-pop uses a general
DTP solver enhanced with some efficiency techniques, whilelpg-td uses a polynomialsolver
specialized
for the subclass of DTPs that arise in our representation. dt-pop handles
mutex actions
{\threats"} by posting explicit temporal disjunctive constraints imposingdisjointness
of the mutex actions, while lpg-td implicitly decides these disjunctions
at search time by
choosingthe level of the graph where actions are inserted, and asserting
the appropriate
precedence constraints. Moreover, the search procedure and heuristics in
dt-pop and lpg- td are significantly different.220An Approach to Temporal
Planning and SchedulingAt IPC-4, the planners that reasoned with timed initial
literals are tilsapa {Kavuluri &
U, 2004}, sgplan {Chen et al., 2004}, p-mep{Sanchez et al., 2004} and lpg-td.
For the first
two planners, at the time of writing, to the best of our knowledge in the
available literature
there is no sufficiently detailed description to clearly understand their
possible similarities and differences with lpg-td about the treatment of
predictable exogenous events. Regarding
p-mep, this planner uses forward state-space search guided by a relaxed
plan heuristic which,
differently from the relaxed plans of lpg-td, is constructed without taking
account of the temporal aspects of the relaxed plan and real plan under construction
{the makespan of the
constructed relaxed plans is considered only for their comparative evaluation}.
6. Conclusions
We have presented some techniques for temporal planning in domains where
certain fluents are made true or false at known times by predictable exogenous
events that cannot be influenced by the actions available to the planner.
Such external events are present in many
realistic domains, and a planner has to take them into account to guarantee
the correctness of the synthesized plans, to generate plansof good or optimal
quality {makespan}, and to
use effective search heuristics for fast planning.
In our approach, the causal structure of the plan is represented by a graph-based
rep-
resentation called TDA-graph, action ordering and scheduling constraints
are managed by
efficient constraint-based reasoning, and the plan search is based on a
stochastic local search
procedure. We have proposed an algorithm for managing the temporal constraints
in a TDA-graph which is a specialization of a general CSP-based method for
solving DTPs. The algorithm has a polynomialworst-case complexity and, when
combined with our plan representation, in practice it is very efficient.
We have also presented some local search
techniques for temporal planning using the new TDA-graph representation.
These tech-
niques improve the accuracy of the heuristic methods adopted in the previous
version of
lpg, and they extend them to consider action scheduling constraints in the
evaluation of the
search neighborhood, which is based on relaxed temporal plans exploiting
some {dynamic}
reachability information. All our techniques are implemented in the planner
lpg-td. We have experimentally
investigated the performance of our planner by a statistical analysis of
the IPC-4 results
using Wilcoxon's test. The results of this analysis show that our planner
performs very well compared to other recent temporal planners supporting
predictable exogenous events, both in terms of CPU-time to find a valid plan
and quality of the best plan generated. Moreover,
a comparison of the plans computed by lpg-td and those generated by the
optimal planners
of IPC-4 shows that very often lpg-td generates plans with very good or
optimal quality.
Finally , additional experiments indicate that our temporal reasoning techniques
manage the
class of DTPs that arise in our context very efficiently.
Some directions for future work on temporal planning within our framework
are: an
extension of the local search heuristics and temporal reasoning techniques
to explicitlyhan-
dle action effects with limited persistence or delays; the treatment of
predictable exogenous
events affecting numerical fluents in a discrete or continuous way; the
development of tech-221Gerevini, Saetti & Serinaniques supporting controllable
exogenous events;
16
and the management of actions with \variable" durations {Fox &Long, 2003},
i.e., actions whose durations are specified only by
inequalities constraining their lower or upper bounds, and whose actual
duration is decided
by the planner. Moreover, we intend to study the integration into our framework
of the techniques for
goal partitioning and subplan composition that have been successfully used
by sgplan
{Chen et al., 2004} inIPC-4, and the application of our approach to plan
revision. The
latter has already been partially explored, but only for simple strips domains
and using
lesspowerful search techniques {Gerevini & Serina, 2000}.
Acknowledgments
This paper is a revised and extended version of a paper appearing in the
Proceedings of
the Nineteenth International Joint Conference on Artificial Intelligence
{Gerevini, Saetti, &
Serina, 2005a}. The research was supported in part by MIUR Grant anemone.
The work of Ivan Serina was in part carried out at the Department of Computer
and Information Sciences of the University of Strathclyde {Glasgow, UK},
and was supported by Marie Curie
Fellowship N HPMF-CT-2002-02149. We would like to thank the anonymous reviewers
for
their helpful comments, and Paolo Toninelli who extended the parser of lpg-td
to handle
the new language features of PDDL2.2.
Appendix A: Reachability Information
The techniques described in the paper for computing the action evaluation
function use
heuristic reachability information about the minimum number of actions required
to reach
the preconditions of each domain action {N umacts} and a lower bound on
the earliest finishing time {E f t} of the reachable actions {the actions
whose preconditions are reachable}.
In the following, S {l} denotes the state defined by the facts corresponding
to the fact nodes
supported at level l of the current TDA-graph. When l = 1, S {l} represents
the initial state of the planning problem {I }.
For each action a, lpg-td pre-computes N umacts{a; I }, i.e., the estimated
minimum number of actions required to reach the preconditions of a from I
, and E f t{a; I }, i.e., the
estimated earliest finishing time of a {if a is reachable from I }. Similarly,
for each fact f that
is reachable from I , lpg-td computes the estimated minimum number of actions
required
to reach f from I {N umacts{f ; I }) and the estimated earliest time when
f can be made
true by a plan starting from I {E t{f ; I }). For l > 1, N umacts{a; S
{l}) and E f t{a; S {l})
can be computed only during search, because they depend on which action
nodes are in the
current TDA-graph at the levels preceding l. Since during search many action
nodes can be
added and removed, and after each of these operations N umacts{a; S {l})
and E f t{a; S {l})
could change {if the operation concerns a level preceding l}, it is important
that they are computed efficiently.16. Consider for instance a transportation
domain in which a shuttle bus is at the train station for an extra run to
the airport at midnight only if booked in advance. If the shuttle booking
is a domain action
available to the planner, then the event \night stop of the shuttle" can
be controlled by the planner.222An Approach to Temporal Planning and SchedulingReachabilityInformation{I
; O}Input: The initial state of the planning problem under consideration
{I } and all ground instances
{actions} of the operators {O};Output: For each action a, an estimate of
the number of actions {N umacts{a; I }) required to reach the preconditions
of a from I , an estimate of the earliest finishing time of a from I {E f
t{a; I }).1. forall facts f do /* the set of all facts is precomputed by
the operator instantiation phase */ 2. if f 2 I then
3. N umacts{f; I }   E t{f; I }   0; Action{f; I }   a
start ;
4. else N umacts{f; I }   E t{f; I }   1;
5. forall actions a do N umacts{a; I }   E f t{a; I }   Lf t{a}   1;
6. F   I ; F
new
I ; A   O; A rev
;;
7. while { F
new 6= ; or A
rev
6= ; } 8. F   F [ F
new
; F new
;; A   A[ A
rev ; A
rev
;;
9. while A
0
= fa 2 A j P re{a} 022 F g is not empty 10. a   an action in A
0
;
11. t   ComputeEFT{a; M AX
f 2P re{a}
Et{f; I });
12. if t < E f t{a; I } then E f t{a; I }   t;
13. Lf t{a}   ComputeLFT{a};
14. if E f t{a; I } 024 Lf t{a} then /* a can be scheduled */ 15. ra
RequiredActions{I ; P re{a}); 16. if N umacts{a; I } > ra then N umacts{a;
I }   ra; 17. forall f 2 Add{a} do
18. if E t{f; I } > t then
19. E t{f; I }   t;
20. A
rev
A
rev [fa
0
2 O 000 A j f 2 P re{a
0
}g; 21. if N umacts{f; I } > {ra+ 1} then
22. N umacts{f; I }   ra + 1; Action{f; I }   a;
23. F
new
F
new [Add{a} 000 F ;
24. A   A 000 fag;
RequiredActions{I ; G}Input: A set of facts I and a set of action preconditions
G;Output: An estimate of the min number of actions required to achieve all
facts in G from I {ACTS}.1. AC T S   ;;
2. G   G000 I ;
3. while G 6= ;
4. g   an element of G; 5. a   Action{g; I };
6. AC T S   AC T S [ fag;
7. G   G [ P re{a} 000 I 000
S b2ACT S
8. return{jAC T S j}.Figure 13:Algorithms for computing heuristic information
about the search cost and the
time for reaching a set of facts G from I .223Gerevini, Saetti & SerinaFigure
13 gives ReachabilityInformation, the algorithm used by lpg-td for computing
N umacts{a; I }, E f t{a; I }, N umacts{f ; I } and E t{f ; I }. ReachabilityInformation
is similar
to the reachability algorithm used by the version of lpg that took part
in 2002 planning
competition {lpg-ipc3}, but with some significant differences. The main
differences are:{i}in order to estimate the earliest finishingtime of the
domain actions, ReachabilityIn-
formationtakes into account the scheduling constraints, which were not considered
in
the previous version of the algorithm;{ii}the algorithm used by lpg-ipc3
applies each domain action at most once, whileReach-
abilityInformation can apply them more than once.
Notice that {i} improves the accuracy of the estimated finishing time of
the actions
{E f t}, which is an important piece of information used during the search
neighborhood
evaluation for selecting the actions forming the temporal relaxed plans
{see Section 3}.
Moreover,{i} allows us to identify some domain actions that cannot be scheduled
during
the time windows associated with their timed preconditions, and so these
can be pruned
away.
Regarding {ii}, during the forward process of computing the reachability
information, an
action is re-applied whenever the estimated earliest time of one of its
preconditions has been
decreased. This is important for two reasons. On one hand, reconsidering
applied is useful because it can lead to a better estimate of the action
finishing times; on the other hand, this is also necessary to guarantee the
correctness of the reachability algorithm. The latter is because, if we overestimate
the earliest finishing time of an action
with a scheduling constraint, then we could incorrectly conclude that the
action cannot be
scheduled {and so we would consider the action inapplicable}. But if this
action is necessary
in any valid plan, then the incorrect estimate of its earliest finishing
time could lead to the incorrectconclusion that the planningproblem is unsolvable.
In other words, the estimated finishing time of an action with a scheduling
constraint should be a lower bound of its actual
earliest finishing time.
ReachabilityInformation could be used to update N umacts{a; S {l}) and
E f t{a; S {l}) after each action insertion/removal, for any l > 1 {when
l >1, instead of I , in input the algorithm
has S {l}). However, in order to make the updating process more efficient,
the revision is done
in a more selective focused way. Instead of revising the reachability information
after each
graph modification {search step}, we do so befor e evaluating the search
neighborhoodand
choosingthe estimated best modification. Specifically, if we are repairing
the flawed level l,
we update only the reachability information for the actions and facts at
the levels pre c e ding
l that have not been updated yet. {For instance, suppose that at the ith
search step we add
an action to level 5, and that at the {i + 1}th step we add another action
at level 10. At
the {i + 1}th step we need to consider updating only the reachability information
at levels
6{10, sincethis information at levels 1{5 has already been updated by the
ith step.} This is
sufficient because the search neighborhood for repairing the flawed level
under consideration
{l} can contain only the graph modifications concerning the levels preceding
l.
Before describing the steps of ReachabilityInformation, we need to introduce
some no-
tation. Add {a} denotes the set of the positive effects of a; Pre {a} denotes
the set of the {non-timed}preconditions of a; A
rev
denotes the set of the actions already applied whose224An Approach to Temporal
Planning and Schedulingreachability could be revised because the estimated
earliest time of some of their precon- ditions has been revised after their
application. Given an action node a and its \current"
earliest start time t computed as the maximum over the earliest times at
which its pre-
conditions are reachable, ComputeEFT {a; t} is a function computing the
earliest finishing
time 034 of a that is consistent with the scheduling constraint of a {if
any} and such that
t + Dur{a} 024 034 .
17
ComputeLFT {a} is a function computing the latest finishing time of the
action a, i.e., it returns the upper bound of the last time window during
which a can be
scheduled {if one exists}, while it returns 1 if a has no timed precondition.
For example, let a be an action such that all its preconditions are true
in the initial
state I {i.e., t = 0}, the duration of a is 50, and a has a scheduling constraint
imposing that
the action is executed duringthe interval [25; 100}. ComputeEFT {a; t} returns
75, while ComputeLFT {a; t} returns 100. Thus, the scheduling constraint
of a can be satisfied. On the contrary, if the earliest start time of a is
500, then ComputeEFT {a; t} returns 550 and a cannot be scheduled during
[25; 100}.
For the sake of clarity, first we describe the steps of ReachabilityInformation
used to derive N umacts, and then we comment on those for the computation
of E f t. In steps 1{4, for
every fact f , the algorithm initializes N umacts{f ; I } to 0, iff 2 I
, and to 1 otherwise
{indicating that f is not reachable}; while,in step 5, N umacts{a; I } is
initialized to 1 {indicating that a is not reachable from I }. Then, in steps
7{24 the algorithm iteratively
constructs the set F of the facts that are reachable from I , startingwith
F = I , and
terminating when F cannot be further extended and the set A
rev
of the actions to reconsider
is empty. The set A of the available actions is initialized to the set of
all possible actions
{step 6}; A is reduced by a after its application {step 24}, and it is augmented
by the set of
actions A
rev {step 8} after each action application. When we modify the estimated
time at which a precondition of an action a becomes reachable, a is added
to A
rev
{step 20}. The
internal while-loop {steps 9{24} appliesthe actions in A to the current
F , possibly deriving
a new set of facts F
new in step 23. If F
new
or A
rev are not empty, then F is extended with
F
new
, A is extended with A
rev
, and the internal loop is repeated. When an action a in A
0
{the subset of actions currently in A that are applicable to F } is applied,
the reachability information for its effects are revised as follows. First
we estimate the minimum number
ra of actions required to achieve P re{a} from I using the subroutine RequiredActions
{step
15}. Then we use ra to possiblyupdate N umacts{a; I } and N umacts{f ; I
} for any effect f of a {steps 15{16, 21{22}. If the number of actions required
to achieve the preconditions
of a is lower than the current value of N umacts{a; I }, then N umacts{a;
I } is set to ra.
Moreover, ifthe application of a leads to a lower estimate of f , i.e.,
if ra + 1 is less than the
current value of N umacts{f ; I }, then N umacts{f ; I } is set to ra +
1. In addition, a data
structure indicating the current \best" action to achieve f from I {Action{f
; I }) is set to a
{step 22}. This information is used by the subroutine RequiredActions.
For any fact f in the initial state, thevalue of Action{f ; I } is a
start
{step 3}. The
subroutine RequiredActions is the same as the one in the reachability algorithm
of lpg-ipc3.
The subroutine uses Action to derive ra through a backward process starting
from the input set of action preconditions {G}, andending when G 022 I .
The subroutine incrementally constructs a set of actions {ACTS} achieving
the facts in G and the preconditions of the17. If there is no scheduling
constraint associated with a, or the existing scheduling constraints cannot
be
satisfied by starting the action at t, then ComputeEFT {a; t} returns t
+ Dur{a}.225Gerevini, Saetti & Serinaactions already selected {using Action}.
At each iteration the set G is revised by adding the
preconditions of the last action selected, and removing the facts belongingto
I or to the
effects of actions already selected {step 7}. Termination of RequiredActions
is guaranteed
because every element of G is reachable from I .
We now briefly describe the computation of the temporal information. Eft
{a; I }, iscom- puted in a way similar to N umacts{a; I }. In steps 1{4,
ReachabilityInformation initializes
the estimated earliest time {E t{f ; I }) when a fact f becomes reachable
to 0, if f 2 I , and to 1 otherwise; moreover, the algorithm sets E f t{a;
I } and Lf t{a; I } to 1. Then, at every application of an action a in the
forward process described above, we estimate the earliest
finishing time E f t by adding the duration of a to the {current} maximum
estimated earliest
time of the preconditions of a, and by taking into account the scheduling
constraints of a
using ComputeEFT {a} {step 11}. In addition, we compute the latest finishing
time Lf t
of a using ComputeLFT {a} {step 13}. When the earliest finishing time of
an action a is
greater than its latest finishingtime, the timed preconditions of a cannot
be satisfied from
I , and so steps 15{23 are not executed {see the if-statement of step 14}.
For any effect f of
a with a current temporal value higher than the earliest finishing time
t of a, steps 18{19 set E t{f ; I } to t, and step 20 adds a in A
rev
{because we have decreased the estimated earliestx time of f , and this
revision could decrease the estimated start time of an action
with precondition f }.
Appendix B: Wilcoxon Test for the Metric-Temporal Domains of IPC-4
In this appendix, we present the results of the Wilcoxon sign-rank test
on the performance
of lpg-td and the other satisficing IPC-4 planners that attempted the metric-temporal
domains. The performance is evaluated both in terms of CPU-times and plan
quality.
Each cell in the first two tables gives the result of a comparison between
the performance of lpg-td and another IPC-4 planner. When the number of samples
is sufficiently large, the
T-distribution used by the Wilcoxon test is approximatively a normal distribution.
Hence,
in each cell of the Figure we give the z-value and the p-value characterizing
the normal
distribution. The higher the z-value, the more significant the difference
of the performance
is. The p-value represents the level of significance in the difference of
the performance.
We use a confidence level of 99.9045; therefore, ifthe p-value is lower
than 0.001, then the performance of the two planners is statistically different.
When this information appears
on the left {right} side of the cell, the first {second} planner named in
the title of the cell
performs better than the other. For the analysis comparing the CPU-time,
the value under
each cell is the number of the problems solved by at least one planner;
while for the analysis comparing the plan quality, it is the number of problems
solved by both the planners.
The pictures under the tables show the partial order of the performance
of the compared
planners in terms of CPU-time and plan quality. A solid edge from a planner
A to another
planner B {or a cluster of planners B} indicates that the performance of
A is statistically
different from the performance of B, and that A performs better than B {every
plannerin B}. A dashed edge from A to B indicates that A is better than B
a significant number of
times, but there is not significant Wilcoxon relationship between them at
a confidence level
of 99.9045.226An Approach to Temporal Planning and SchedulingAnalysis of
CPU-Timelpg-td.svs crikeylpg-td.s vs p-meplpg-td.s vs sgplanlpg-td.s vs tilsapa11:27511:1320:38712:324<
0:001< 0:001{0:699}< 0:001169215513136Analysis of Plan Qualitylpg-td.bq vs
crikeylpg-td.bq vs p-meplpg-td.bq vs sgplanlpg-td.bq vs tilsapa10:5004:01616:8796:901<
0:001< 0:001< 0:001< 0:0011732145263p-mepCPU-Timetilsapacrikeylpg-td.ssgplanB
:A is better than B a significant number of times{confidence level 94.78045}Ap-mepsgplancrikeytilsapalpg-td.bqPlan
QualityB :A is consistently better than BA227Gerevini, Saetti & SerinaReferencesArmando,
A., Castellini,C., Giunchiglia, E., & Maratea, M. {2004}. A SAT-based decision
procedure for the boolean combination of difference constraints. In Pro
c e e dings of the Seventh International Conferenc e on Theory and Applications
of Satisfiability Testing
{SA T-04}, Berlin, Heidelberg, New York. Springer-Verlag. SAT 2004 LNCS
Volume.Blum, A., & Furst, M. {1997}. Fast planningthrough planning graph
analysis. Artificial Intel ligence, 90, pp. 281{300.Chen, Y., Hsu, C., &
Wah B., W. {2004}. SGPlan: Subgoal partitioning and resolution
in planning. In Edelkamp, S., Hoffmann, J., Littman, M., & Younes, H. {Eds.},
In Abstract Booklet of the Competing Planners of ICAPS-04, pp. 30{32.Cresswell,
S., & Coddington, A. {2004}. Adapting LPGP to plan with deadlines. In Pro-
c e e dings of the Sixteenth Europ e an Conferenc e on Artificial Intel
ligence {ECAI-04},
pp. 983{984, Amsterdam,The Netherlands. IOS Press.Dechter, R., Meiri, I.,
& Pearl, J. {1991}. Temporal constraint networks. Artificial Intel li-
gence, 49, pp. 61{95.Do, M., B., Kambhampati, S., & Zimmerman, T. {2004}.
Planning - scheduling connections
through exogenous events. In Pro c e e dings of the ICAPS-04 Workshop on
Integr ating
Planning into Scheduling, pp. 32{37.Do, M., & Kambhampati, S. {2003}. SAPA:
A multi-ob jective metric temporal planner.
Journalof Artificial Intel ligence Rese ar ch {JAIR}, 20, pp. 155{194.Edelkamp,
S. {2004}. Extended critical paths in temporal planning. In Pro c e e dings
of the ICAPS-04 Workshop on Integr ating Planning into Scheduling, pp. 38{45.Edelkamp,
S., & Hoffmann, J. {2004}. PDDL2.2: The language for the classic part of
the
4th international planning competition. Technical report 195, Institut fªur
Informatik,
Freiburg, Germany.Edelk amp, S., Hoffmann, J., Littman, M., & Younes, H.
{2004} In Abstract Booklet of the
comp eting planners of ICAPS-04.Erschler, J., Roubellat, F., & Vernhes,
J. P. {1976}. Finding some essential characteristics of the feasible solutions
for a scheduling problem. Oper ations Rese ar ch {OR}, 24, pp.
772{782.Fox, M., & Long, D. {2003}. PDDL2.1: An extension to PDDL for expressing
temporal
planning domains. Journalof Artificial Intel ligence Rese ar ch {JAIR},
20, pp. 61{124.Fox, M., & Long, D. {2005}. Planning in time. In Fisher, M.,
Gabbay, D., & Vila, L. {Eds.},
Handbo ok of Temp or al Re asoning in Artificial Intel ligence, pp. 497{536.
Elsevier Sci-
ence Publishers,New York, NY, USA.Fox, M., Long, D., & Halsey, K. {2004}.
An investigation into the expressive power of
PDDL2.1. In Pro c e e dings of the Sixteenth Europ e an Conferenc e on Artificial
Intel li-
gence {ECAI-04}, pp. 338{342, Amsterdam,The Netherlands. IOS Press.Frederking,
R., E., & Muscettola, N. {1992}. Temporal planning for transportation plan-
ning and scheduling. In IEEE International Conferenc e on Rob otics and Automation
{ICRA-92}, pp. 1125{1230. IEEE Computer Society Press.228An Approach to
Temporal Planning and SchedulingGerevini, A., & Cristani, M. {1997}. On finding
a solution in temporal constraint satisfaction
problems. In Pro c e e dings of the Fifteenth International Joint Conferenc
e on Artificial
Intel ligence {IJCAI-97}, Vol. 2, pp. 1460{1465, SanFrancisco, CA, USA.
Morgan
Kaufmann Publishers.Gerevini, A., Saetti, A., & Serina, I. {2003}. Planning
through stochastic local search and
temporal action graphs. Journal of Artificial Intel ligence Rese ar ch {JAIR},20,
pp.
239{290.Gerevini, A., Saetti, A., & Serina, I. {2004}. An empirical analysis
of some heuristic features
for local search in LPG. In Pro c e e dings of the Fourte enth International
Conferenc e on Automated Planning and Scheduling {ICAPS-04}, pp. 171{180,
MenloPark, CA,
USA. AAAI Press.Gerevini, A., Saetti, A., & Serina, I. {2005a}. Integrating
planning and temporal reason-
ing for domains with durations and time windows. In Pro c e e dings of the
Nineteenth International Joint Conferenc e on Artificial Intel ligence {IJCAI-05},
pp. 1226{1235,
Menlo Park, CA, USA. International Joint Conference on Artificial Intelligence
Inc.Gerevini, A., Saetti, A., Serina, I., & Toninelli, P. {2005b}. Fast planning
in domains with
derived predicates: An approach based on rule-action graphs and local search.
In
Pro c e e dings of the Twentieth National Conferenc e on Artificial Intel
ligence {AAAI- 05}, pp. 1157{1162, Menlo Park, CA, USA. AAAI Press.Gerevini,
A., & Serina, I. {1999}. Fast planningthrough greedy action graphs. InPro
c e e dings
of the Sixteenth National Conferenc e on Artificial Intel ligence {AAAI-99},
pp. 503{ 510, Menlo Park, CA, USA. AAAI Press/MIT Press.Gerevini,A., & Serina,
I. {2000}. Fast plan adaptation through planning graphs: Local and
systematic search techniques. In Pro c e e dings of the Fifth International
Conferenc e on
Artificial Intel ligence Planning and Scheduling {AIPS-00}, pp. 112{121,
MenloPark,
CA, USA. AAAI Press/MIT Press.Ghallab, M., & Laruelle, H. {1994}. Representation
and control in IxTeT, a temporal plan- ner. In Pro c e e dings of the Sec
ond International Conferenc e on Artificial Intel ligence
Planning Systems {AIPS-94},pp. 61{67, Menlo Park, CA, USA. AAAI press.Ghallab,
M., Nau, D., & Traverso, P. {2003}. Automated Planning: Theory and Practic
e.
Morgan Kaufmann Publishers, San Francisco, CA, USA.Glover, F., & Laguna,
M. {1997}. Tabu Sear ch. Kluwer Academic Publishers,Boston, USA.Helmert,
M. {2004}. A planning heuristic based on causal graph analysis. In Pro c
e e dings
of the Fourte enth International Conferenc e on Automated Planning and Scheduling
{ICAPS-04}, pp. 161{170, Menlo Park, CA, USA. AAAI Press.Kavuluri, B. R.,
& U, S. {2004}. Tilsapa - timed initial literals using SAPA. In Edelkamp,
S.,
Hoffmann, J., Littman, M., & Younes, H. {Eds.}, In Abstract Booklet of the
Competing
Planners of ICAPS-04, pp. 46{47.Laborie, P., & Ghallab, M. {1995}. Planning
with sharable resource constraints. In Pro- c e e dings of the Fourte enth
International Joint Conferenc e on Artificial Intel ligence
{IJCAI-95}, Vol. 2, pp. 1643{1651, San Francisco, CA, USA. Morgan Kaufmann
Pub- lishers.229Gerevini, Saetti & SerinaLong, D., & Fox, M. {2003a}. The
3rd international planning competition: Results and
analysis. Journal of Artificial Intel ligence Rese ar ch {JAIR}, 20, pp.
1{59.Long, D., & Fox, M. {2003b}. Exploiting a graphplan framework in temporal
planning. In
Pro c e e dings of the Thirteenth International Conferenc e on Automated
Planning and
Scheduling {ICAPS-03},pp. 52{61, Menlo Park, CA, USA. AAAI Press.McAllester,
D., & Rosenblitt, D. {1991}. Systematic nonlinear planning. In Pro c e e
dings
of the Ninth National Conferenc e on Artificial Intel ligence {AAAI-91},
pp. 634{639, Menlo Park, CA, USA. AAAI Press.Muscettola, N. {1994}. HSTS:
Integrating planning and scheduling. In Zweben, & Fox
{Eds.}, Intel ligent Scheduling, pp. 169{212, SanFrancisco, CA, USA. Morgan
Kauf-
mann Publishers.Nguyen, X., & Kambhampati, S. {2001}. Reviving partial order
planning. In Pro c e e dings of
the Seventeenth International Joint Conferenc e on Artificial Intel ligence
{IJCAI-01},
Vol. 1, pp. 459{464, San Francisco, CA, USA. Morgan Kaufmann Publishers.Penberthy,
J., & Weld, D. {1992}. UCPOP: A sound, complete, partial order planner for
ADL. In Pro c e e dings of the Third International Conferenc e on Principles
of Know ledge
R epr esentation and Re asoning {KR'92}, pp. 103{114, SanMateo, CA, USA.
Morgan
Kaufmann Publishers.Penberthy, J., & Weld, D. {1994}. Temporal planning
with continuous change. In Pro c e e dings
of the Twelfth National Conferenc e on Artificial Intel ligence {AAAI-94},
pp. 1010{
1015,Menlo Park, CA, USA. AAAI Press/MIT Press.Penberthy, J., S. {1993}.
Planning with Continuous Change. Ph.D. thesis, University of
Washington, Seattle, WA, USA. Available as technical report UW-CSE-93-12-01.Sanchez,
J., Tang, M., & Mali, A., D. {2004}. P-MEP: Parallel more expressive planner.
In
Edelkamp, S., Hoffmann, J., Littman, M., & Younes, H. {Eds.}, In Abstract
Booklet
of the Competing Planners of ICAPS-04, pp. 53{55.Schwartz, P., J., & Pollack,
M., E. {2004}. Planning with disjunctive temporal constraints.
In Pro c e e dings of the ICAPS-04 Workshop on Integr ating Planning into
Scheduling,
pp. 67{74.Smith, D., & Weld, D. {1999}. Temporal planning with mutual exclusive
reasoning. In
Pro c e e dings of the Sixteenth International Joint Conferenc e on Artificial
Intel ligence
{IJCAI-99}, pp. 326{337, San Francisco, CA, USA. Morgan Kaufmann Publishers.Smith,S.,
& Cheng, C. {1993}. Slack-based heuristics for constraint satisfaction scheduling.
In Pro c e e dings of the Eleventh National Conferenc e on Artificial Intel
ligence {AAAI- 93}, pp. 139{144, MenloPark, CA, USA. AAAI Press/The MIT press.Stergiou,
K., & Koubarakis, M. {2000}. Backtracking algorithms for disjunctionsof temporal
constraints. Artificial Intel ligence, 120 {1}, pp. 81{117.Tate, A. {1977}.
Generating pro ject networks. In Pro c e e dings of the Fifth International
Joint Conferenc e on Artificial Intel ligence {IJCAI-77}, pp. 888{889, Cambridge,MA,
USA. MIT, William Kaufmann.230An Approach to Temporal Planning and SchedulingTsamardinos,
I., & Pollack, M. E. {2003}. Efficient solution techniques for disjunctive
temporal reasoning problems. Artificial Intel ligence, 151 {1-2}, pp. 43{89.Vere,
S. A. {1983}. Planning in time: Windows and durations for activities and
goals. IEEE
Tr ansactions on Pattern Analysis and Machine Intel ligence, 5 {3}, pp.246{267.Vidal,
V. {2004}. A lookahead strategy for heuristic search planning. In Pro c e
e dings of the
Fourte enth International Conferenc e on Automated Planning and Scheduling
{ICAPS-
04}, pp. 150{159, MenloPark, CA, USA. AAAI Press.Wilcoxon, F., & Wilcox,
R. A. {1964}. Some Rapid Approximate Statistical Pro c e dur es.
American Cyanamid Co., Pearl River, NY, USA.231