The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models

D. V. Pynadath and M. Tambe

Volume 16, 2002

Links to Full Text:

Abstract:
Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeoffs, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COM-MTDP). The COM-MTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COM-MTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COM-MTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COM-MTDP model provides a basis for the development of novel team coordination algorithms. We derive a domain-independent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domain-independent software package based on COM-MTDPs to analyze teamwork coordination strategies, and we demonstrate its use by encoding and evaluating the two joint intentions strategies within an example domain.

Extracted Text

Journal of Artificial Intelligence Research 16 {2002} 389-423Submitted 2/02;
published 6/02
 TheCommunicative Multiagent TeamDecision Problem:
 Analyzing TeamworkTheories and Models
 DavidV. Pynadath pynadath@isi.edu Milind Tambe tambe@usc.edu
 Information Sciences Instituteand Computer Science Department
 University of Southern California
 4676 Admiralty Way, Marina del Rey, CA 90292 USA Abstract
 Despite the significantprogress in multiagent teamwork, existingresearch
does not ad-
 dress theoptimality of its prescriptions nor thecomplexity of the teamwork
problem.With-
 out a characterization of theoptimality-complexity tradeoffs, it is impossibleto
determine
 whether the assumptions andapproximations made by a particular theory gainenough
 efficiency to justify the losses in overallperformance. To provide a tool
for use by mul-
 tiagentresearchers in evaluating this tradeoff, we present a unified framework,
the COM- municative Multiagent TeamDecision Problem {COM-MTDP}. The COM-MTDP
model
 combines and extends existing multiagent theories, such as decentralizedpartially
observ-
 able Markov decision processes and economic team theory. Inaddition to their
generality
 of representation, COM-MTDPs also support the analysis of both the optimality
of team
 performanceand the computational complexity of the agents'decision problem.
In analyz-
 ingcomplexity, we present a breakdown of the computational complexity of
constructing optimal teams under various classes ofproblem domains, along
the dimensions of observ- ability and communication cost.In analyzing optimality,
we exploit theCOM-MTDP's
 ability to encode existing teamwork theories and models to encode twoinstantiations
of
 joint intentionstheory taken from the literature. Furthermore, the COM-MTDP
model
 provides a basis for the development of novel team coordination algorithms.
We derive a
 domain-independent criterion for optimal communication and provide a comparative
anal-
 ysis of the two joint intentions instantiations with respect to thisoptimal
policy. We have implemented a reusable, domain-independentsoftware package
based on COM-MTDPs to analyze teamwork coordination strategies,and we demonstrate
its use by encoding and evaluating the two joint intentions strategies within
an example domain. 1. Introduction
 Acentral challenge in the control and coordination of distributedagents
is enabling them to work together, as a team, toward a common goal. Such
teamwork is criticalin a vast
 range of domains|for futureteams of orbiting spacecraft, sensors for trackingtargets,
un-
 manned vehicles for urbanbattlefields, software agents for assistingorganizations
in rapid
 crisis response, etc.Research in teamwork theory has built thefoundations
for successful
 practicalagent team implementations in such domains.On the forefront are
theories based on belief-desire-intentions {BDI} frameworks, such as joint
intentions {Cohen &Levesque,
 1991b, 1991a; Levesque, Cohen,& Nunes, 1990}, SharedPlans {Grosz, 1996;
Grosz&Kraus,
 1996; Grosz& Sidner, 1990}, andothers {Sonenberg, Tidhar, Werner,Kinny,
Ljungberg,
 &Rao, 1994;Dunin-Keplicz & Verbrugge, 1996}, that have provided prescriptions
for co-
 c
 fl2002 AI Access Foundation and Morgan Kaufmann Publishers.All rights reserved.Pynadath
& Tambe
 ordination in practical systems.These theories have inspired the constructionof
practi-
 cal, domain-independentteamwork models and architectures {Jennings, 1995;Pynadath,
 Tambe, Chauvat, & Cavedon, 1999; Rich & Sidner, 1997;Tambe, 1997; Yen, Yin,
Ioerger, Miller, Xu, & Volz, 2001}, successfullyapplied in a range of complex
domains. Yet, two key shortcomings limit thescalability of these BDI-based
theories and imple- mentations. First, there are no techniques for the quantitative
evaluationof the degree of
 optimality of their coordination behavior. While optimal teamwork may be
impractical in
 real-worlddomains, such analysis would aid us in comparison of different
theories/models
 and in identifying feasible improvements. Onekey reason for the difficulty
in quantitative
 evaluation of most existing teamwork theories is that they ignore the various
uncertain-
 ties and costs inreal-world environments. Forinstance, joint intentions
theory {Cohen & Levesque, 1991b} prescribes that team members attain mutual
beliefs in keycircumstances,
 but it ignores the cost ofattaining mutual belief {e.g., viacommunication}.
Implementa-
 tionsthat blindly follow such prescriptions couldengage in highly suboptimal
coordination. On the other hand, practical systems haveaddressed costs and
uncertainties of real-world environments. For instance, STEAM {Tambe, 1997;
Tambe & Zhang,1998} extends joint
 intentionswith decision-theoretic communication selectivity. Unfortunately,
the very prag- matism of such approaches often necessarilyleads to a lack
of theoretical rigor, so itremains
 unanswered whether STEAM's selectivity is the best an agent can do, or whether
itis even
 necessary at all.The second key shortcoming of existing teamwork research
is the lack
 ofa characterization of the computational complexityof various aspects of
teamwork deci- sions. Understanding the computational advantages of a practical
coordination prescription could potentially justify the use ofthat prescriptionas
an approximation to optimality in
 particular domains.
 To address these shortcomings, we propose anew complementary framework,
the COM- municative Multiagent Te amDecision Problem {COM-MTDP},inspired
by work in ec o- nomic team theory {Marschak& Radner, 1971; Yoshik awa,1978;
Ho,1980}. While our
 COM-MTDP model borrows from a theory developed in another field, we make
several contributions in applying and extending theoriginal theory, most
notably adding explicit models of communication and system dynamics.With
these extensions, the COM-MTDP generalizes other recently developed multiagent
decision frameworks, such as decentralized
 POMDPs {Bernstein, Zilberstein,&Immerman, 2000}.
 Our definition of a team{like that in economic team theory} assumes onlythat
team
 members have a common goal and that they work selflessly towards thatgoal
{i.e., they
 have no other private goals of their own}. Interms of our decision-theoretic
framework, we assume that all of the team membersshare the same joint utility
function|that is,each
 team member's individualpreferences are exactly the preferences of the othermembers
and,
 thus, of the team asa whole. Our definition may appear to be a \bare-bones"
definition of
 a team, since it does not include common concepts and assumptions from the
literature on what constitutes a team {e.g., the teammatesform a joint commitment
{Cohen & Levesque, 1991b}, attain mutual belief upontermination of a joint
goal, intend thatteammates suc-
 ceed in their tasks {Grosz &Kraus, 1996}, etc.}. From our COM-MTDP perspective,
we
 view these concepts asmore intermediate concepts, as the means by which
agents improve their team's overall performance, ratherthan ends in themselves.
Our hypothesis in this
 investigation is that ourCOM-MTDP-based analysis can provide concretejustifications
for
 390The CommunicativeMultiagent Team Decision Problem these concepts. For
example, while mutual belief has no inherent value,our COM-MTDP
 model can quantify theimproved performance that we would expect from a team
that
 attains mutual beliefabout important aspects of its execution. More generally,
thispaper demonstratesthree new types of teamwork analyses made possible
by the COM-MTDP model.First, we analyze the computational complexityof
 teamwork within subclasses ofproblem domains. For instance, some researchers
have ad-
 vocated teamworkwithout communication {Goldberg & Mataric, 1997}.We use
the COM-
 MTDP model to show that, in general, the problem of constructingoptimal
teams without
 communicationis NEXP-complete, but allowing free communicationreduces the
problem
 to be PSPACE-complete. This paper presents a breakdown of the complexity
of optimal
 teamwork over problem domains classifiedalong the dimensions of observabilityand
com-
 munication cost.
 Second, the COM-MTDP model provides a powerful tool for comparing the optimality
of different coordination prescriptionsacross classes of domains. Indeed,
we illustrate that
 we can encode existing team coordination strategies within a COM-MTDP for
evaluation.
 F or our analysis, we selectedtwo joint intentions-based approaches fromthe
literature: one
 using the approachrealized within GRATE* and the joint responsibility model
{Jennings,
 1995}, andanother based on STEAM {Tambe,1997}. Through this encoding, we
derivethe
 conditions under which these team coordination strategies generate optimal
team behavior, and the complexity of the decisionproblems addressed by them.
Furthermore,we also
 derivea novel team coordination algorithm that outperforms these existingstrategies
in
 optimality, though notin efficiency. The end result is awel l-grounde d
characterizationof
 the complexity-optimalitytrade off among variousmeans of team coordination.
 Third, we can use the COM-MTDP model to empiricallyanalyze aspecific domain
of
 interest.We have implementedreusable, domain-independent algorithms that
allow one to evaluate the optimality of the behavior generated by different
prescriptive policies within a
 problem domain representedas a COM-MTDP. We apply thesealgorithms in an
example
 domainto empirically evaluate the aforementionedteam coordination strategies,
charac- terizing the optimality of each strategyas a function of the properties
of the underlying domain. For instance, Jenningsreports experimental results
{Jennings, 1995}indicating
 that his joint responsibilityteamwork model leads to lower wasteof community
effort than
 competingmethods of accomplishing teamwork.With our COM-MTDP model, we were
able to demonstrate the benefits ofJennings' approach under many configurations
ofour ex-
 ample domain. However,inprecisely characterizing the types ofdomains that
showed such
 benefits, we also identified domains where these competingmethods may actually
perform
 better. In addition, we can use ourCOM-MTDP model to re-create and explain
previous work that noted an instance of suboptimality in a STEAM-based, real-world
implementa- tion {Tambe, 1997}.While this previous work treated that suboptimality
as anomalous, our
 COM-MTDPre-evaluation of the domain demonstrated thatthe observed suboptimality
 was asymptom of STEAM's general propensity towardsextraneous communication
in a
 significantrange of domain types. Both the algorithmsand the example domain
model are
 available for public use in an Online Appendix 1.
 Section 2 presents the COM-MTDP model's representation and places it in
the context
 of related multiagent models fromthe literature. Section 3 uses the COM-MTDP
model to
 define and characterize thecomplexity of designing optimal agent teams.Section
4 analyzes
 391Pynadath & Tambe
 the optimality of existing team coordination algorithms and derives a novel
coordination
 algorithm. Section 5 presentsempirical results from applying our COM-MTDP
algorithms to an example domain. Section6 summarizes our results, and Section
7 identifies some
 promising future directions. 2. The COM-MTDP Model
 This section defines and describes theCOM-MTDP model itself and its ability
to represent
 the important aspects of multiagent teamwork. We begin inSection 2.1 by
defining the
 underlyingmultiagent team decision problem with no explicitcommunication.
Section 2.2
 defines thecomplete COM-MTDP model with its extension toexplicitly represent
commu-
 nication.Section 2.3 provides an illustration of howthe COM-MTDP model represents
the execution of a team of agents.Finally, Section 2.4 describes related
models of multiagent
 coordinationand shows how the COM-MTDP model generalizesthem.
 2.1 Multiagent Team Decision Problems
 Givena team of selfless agents, ff, who intend to perform some joint task,
we wish to evaluate
 possiblepolicies of behavior. Werepresent a multiagent team decision problem
{MTDP}
 model as a tuple, hS; A ff
 ; P; 012 ff
 ; O
 ff ; B
 ff
 ; Ri. We have taken theunderlyingcomponents of
 this model fromthe initial team decision model {Ho, 1980}, butwe have extended
them to
 handledynamic decisions over time and to more easilyrepresent multiagent
domains {in particular, agent beliefs}. We assume that the model is common
knowledgeto all of the
 team members.In other words, all of the agents believe the same model, and
they believe that they all believe the same model,etc.
 2.1.1 World Sta tes:S
 
*  S = 004 1
 002 001 001 001002 004
 m
 :a set of world states, expressed as a factored representation {a
 crossproduct ofseparate features}.
 Thestate of the worldhere is the state of the team's environment{e.g., terrain,
location of
 enemy}.Thus, each 004
 i represents the domain of an individualfeature of that environment,
 while S represents the domain of allpossible combinations of valuesover
the individual
 features. 2.1.2 Domain-Level Actions:A
 ff
 fA i
 g
 i2ff
 is a set of actions for eachagent to perform to change its environment,
implicitly
 defining a set of combined actions, A
 ff
 021 Q
 i2ff
 A
 i
 {corresponding to teamtheory's decision
 variables}.
 Extension to Dynamic Problem:P The original team decision problem focused
on
 a one-shot, static problem.We extend the original concept so that each component
is a
 time series of randomvariables. The effects of domain-levelactions {e.g.,
aflying action
 changes ahelicopter's position} obey a probabilisticdistribution, given
by a function P:
 S 002 A
 ff
 002 S ! [0; 1]. In other words, for eachinitialstate s at time t, combinedaction
a
 392The CommunicativeMultiagent Team Decision Problem taken at time t, and
final states
 0
 at time t+ 1, Pr{S
 t+1 = s
 0
 jS
 t
 = s; A t
 ff
 = a} = P {s; a; s 0
 }.
 The givendefinition of P assumes that the worlddynamics obey the Markov
assumption. 2.1.3 Agent Observ a tions:012
 ff
 f012 i
 g
 i2ff is a set of observationsthat each agent, i, can experience ofits world,
implicitly
 defining a combinedobservation, 012
 ff 021
 Q
 i2ff
 012
 i
. 012
 i
 mayinclude elements corresponding
 to indirectevidence of the state {e.g., sensor readings} andactions of other
agents {e.g.,
 movement of other helicopters}.In the original team-theoretic framework,
theinformation
 structure that represented the observation process of the agentswas a set
of deterministic
 functions,O
 i
 :S !012
 i
 .
 Extension of Allowable Information Structures:O
 ff
 Weextend the information
 structurerepresentationto allow for uncertain observations.We use a general
stochastic model, borrowed from thepartial ly observable Markov decision
pro c ess model {Smallwood &
 Sondik, 1973}, with a joint observation function: O
 ff {s; a; !}= Pr{012
 t
 ff = !jS
 t = s; A
 t0001
 ff
 =
a}. This function models the sensors,representing any errors, noise, etc.In
some cases, we
 can separate thisjoint distribution into individual observation functions:
O
 ff 021
 Q
 i2ff
 O
 i
,
 where O
 i
 {s; a; !} = Pr{012 t
 i
 = !jS
 t
 = s; A t0001
 ff = a}. Thus, the probability distribution
 specified byO
 ff
 forms the richerinformation structure used in our model. We can make
 usefuldistinctions between different classes ofinformation structures:
 CollectivePartial Observability Thisis the general case, where we make no
assump- tions on the observations. Collective ObservabilityThere is a unique
world state for thecombine d observations of the team: 8! 2012
 ff
 , 9s2 S such that 8s
 0
 6= s, Pr{012
 t
 ff = !jS
 t = s
 0
 } = 0.The set
 of domains that are collectivelyobservable is a strict subset of the domainsthat
are
 collectivelypartially observable.
 Individual ObservabilityThere is a unique world state for eachindividual
agent's ob-
 servations: 8! 2 012
 i
 , 9s 2 Ssuchthat 8s
 0 6= s, Pr{012
 t
 i
 = !jS
 t
 = s
 0
 } = 0. The set of domains that are individuallyobservable is a strict subset
of the domainsthat are
 collectively observable. Non-Observ ability The agents receive no feedback
from the world: 9! 2012
 i
, such that
 8s 2S and 8a 2 A ff
 , Pr{012
 t
 i
 = !jS
 t
 = s; A t0001
 ff = a} = 1. This assumptionholds
 in open-loop systems, whichcome under frequent consideration in classical
plan- ning {Boutilier, Dean, & Hanks, 1999}. 2.1.4 Policy {Strategy}Space
 031
 iA is a domain-level policy{or strate gy, inthe originalteam theory specification}
to map an agent's belief state to an action.In the original formalism, the
agent's beliefscorrespond
 directly to its observations {i.e., 031
 iA : 012
 i
 !A
 i
 }.
 Extensionto Richer Belief State Space: B ff
 We generalize the setof possible strate-
 gies to capture themore complex mental states of the agents.Each agent,
i 2 ff, forms a
 belief state,b
 t
 i
 2B
 i
 , based on its observations seen through time t, whereB
 i
 circumscribes 393Pynadath & Tambe
 the set of possible beliefstates for the agent. Thus, we definethe set of
possible domain-
 level policies as mappings from belief states to actions,031
 iA
 : B i
 ! A
 i . We define the set of possible combined belief states over all agents
to be B
 ff 021
 Q
 i2ff
 B
 i
 .The corresponding
 random variable,b
 t
 ff
 , representsthe agents' combined belief state at timet. We elaborate
 ondifferent types of belief states and themapping of observations to belief
states{i.e., the
 state estimator function}in Section 2.2.1.
 2.1.5Reward Function: R
 Acommon reward function is central to the notionof teamwork in a MTDP: R
:S 002 A
 ff !
 R. This function represents the team's joint preferences over statesand
the cost of domain-
 levelactions {e.g., destroyingenemy is good,returningto home base with only
10045 of original force is bad}. Weassume that, as selfless team members,
eachagent shares these
 preferencesat the individuallevel as well.Therefore, each team member wantsexactly
 what is best for the team as a whole.
 2.2 Extension for Explicit Communication: 006
 ff
We make an explicit separation between domain-level actions {A
 ff
 } and communicative actions. As defined in this section, communicative actions
affect the receiving agents'indi-
 vidual belief states, but, unlikedomain-level actions, they do not directly
changethe world
 state. Although thisdistinction is sometimes blurry in real-world domains,
we make this
 explicit separation so asto isolate, as much as possible, the effects of
the two types of
 actions.The leverage gained from this separation provides the basis for
the informative, analytical results presented in the rest ofthis paper. To
capture this separation, we extend
 our initial MTDP model to bea communic ative multiagent team decision problem
{COM- MTDP}, that we define as a tuple,hS; A
 ff
 ; 006
 ff
 ; P;012
 ff
 ; O ff
 ; B
 ff ; Ri, with a new component, 006
 ff
 , and anextended reward function, R.
 2.2.1 Communication: 006 ff
 f006
 i g
 i2ff
 is a set of possible messages for eachagent, implicitly defining a set of
combined communications, 006
ff
 021
 Q
 i2ff
 006
 i
 . An agent, i, may communicate message x 2006
 i
 to its teammates, whointerpret the communication by updating their belief
states in response.As
 a first step in this work, weassume that all of the agents receive themessages
instantaneously
 and correctly {i.e., there is no lag or noise in the communication channels}.
This model is
 common knowledge among all of the team members, so oncean agent has sent
a message,
 it knows that its team members havereceived the message, andits team members
know
 that it knows that they haveall received the message, and so on. Withcommunication,
we divide each decisionepoch into two phases: thepre-c ommuni-
 c ationand post-c ommunic ation phases,denotedby the subscripts 
* 006 and 006
* , respectively.
 Inparticular, the agents update their beliefstates at two distinct points
within eachde-
 cision epoch: onceupon receiving observation 012 t
 i
 {producing thepre-communication be-
 lief stateb
 t
 i
* 006 }, and again upon receiving the other agents' messages {producing
the post- communication belief state b t
 i006
* 
 }. The distinction allows us to differentiate between the belief
 state used by the agents in selecting theircommunication actions and the
more \up-to-date" belief state used in selecting theirdomain-level actions.
We also distinguish between the
 394The CommunicativeMultiagent Team Decision Problem separate state-estimator
functionsused ineach update phase:
 b
 0
 i
 =S E 0
 i
 {} {1} b
 t
 i
* 006 =S E
 i
* 006 {b
 t0001 i006
* 
 ; 012 t
 i
 } {2} b
 t
 i006
*  =S E
 i006
*  {b
 t
 i
* 006
 ; 006
 t ff
 } {3}
 whereS E
 i
* 006
 : B
 i
 002012
 i
 ! B i
 is the pre-communication stateestimator for agent i, and
 S E
 i006
*  : B
 i
 002006
 ff
 ! B i
 is the post-communication state estimator for agent i. Theinitial
 state estimator, S E 0
 i
 : ; !B
 i
 , specifies the agent's priorbeliefs, before any observations
 are made. Foreach of these, we also make the obviousdefinitions for the
corresponding
 estimators for the combined belief states:S E
 ff
* 006 , S E
 ff006
* 
 , and SE
 0
 ff
 .
 In this paper, as a first step, we assume that the agents haveperfe ct re
c al l.In other
 words, the agents recallall of their observations, as well as allcommunication
of the other
 agents.Thus, their belief states can representtheir entire histories as
sequences of obser- vations and received messages:B
 i
 = 012 003
 i
 002006
 003
 ff , where X
 003 denotes the set of all possible sequences {of any length} of elements
ofX . The agents realize perfect recallthrough the
 following state estimatorfunctions:
 S E
 0 i
 {} = hi{4}
 S E
 i
* 006
 {
 012012 012
 0
 i ; 006
 0
ff
 ff
 ; : : :;
 012
 012
 t0001
 i
; 006
 t0001 ff
 ffff
 ;012
 t
 i
 } =
 012012
 012 0
 i
 ; 006 0
 ff
 ff ; : : : ;
 012 012
 t0001 i
 ; 006
t0001
 ff
ff
 ;
 012
 012
 t
 i
; 001
 ffff
{5}
 S E
 i006
* 
 {
 012012 012
 0
 i ; 006
 0
ff
 ff
 ; : : :;
 012
 012
 t0001
 i
; 006
 t0001 ff
 ff
 ; 012
 012
 t i
 ; 001
 ffff ; 006
 t
 ff }
 =
 012012 012
 0
 i ; 006
 0
 ff ff
 ; : : : ; 012
 012
 t i
 ; 006
 t ff
 ffff
 {6} In other words, S E
 0
 i
 initializes agenti's belief state to be an emptyhistory, S E
 i
* 006
 appends a
 new observation to agent i's belief state,and S E
 i006
*  appends new messages to agenti's belief
 state. Underthis paper's assumptions of perfect recall, allthree state-estimator
functions
 takeonly constant time. However,we can potentially allow more complexfunctions
{though
 the complexity resultspresented hold only if the state-estimator functionstake
polynomial
 time}. For instance, although we assume perfect, synchronous, instantaneous
communica-
 tion here, we could potentially use the post-communication state estimator
to model any noise, temporal delays, asynchrony, cognitive burden, etc. present
in the communication
 channel.
 Weextend our definition of a policy of behavior to includea communic ationpolicy,
 031
 i006
 : B
 i ! 006
 i
 , analogous to Section 2.1.4's domain-level policy. We define the joint
policies, 031
 ff006
 and 031
 ffA
, as the combined policies across all agents in ff.
 2.2.2 ExtendedReward Function: R
 We extend the team's reward function to alsorepresent the cost of communicative
acts {e.g., communication channels may have associated cost}: R : S 002A
 ff
 002 006 ff
 ! R. We assume that
 the cost of communicationand of domain-level actions are independent ofeach
other, so we
 can decompose thereward function into two components:a communication-level
reward,
 R
 006
 : S002 006
 ff
 !R, and a domain-level reward,R
 A
 : S 002A
 ff
 ! R. The total reward is
 thesum ofthe two component values:R{s; a; 033}= R
 A
 {s;a} + R
 006 {s; 033}. We assume that
 395Pynadath & Tambe
 communication has no inherent benefit and may instead have some cost, sothat
for all
 states, s 2S , and messages, 033 2006
 ff
 , the reward isnever positive: R
 006 {s; 033} 0240. However,
 although we assigncommunication no explicit value, it can have significant
implicit value through its effect on the agents' beliefstates and, subsequently,
on their futureactions.
 As with the observabilityfunction, we parameterize the communication costsassociated
 with message transmissions: General Communication: Wemake no assumptions
about communication. Free Communication: R 006
 {s; 033} = 0for any 033 2006
 ff
 , ands 2 S . In other words, communication actions have no effect on the
agents' reward.
 No communication:006
 ff
 = ;, i.e., no explicit communication.Alternatively, communica-
 tion may be prohibitively expensive, sothat 8033 2006
 ff , and s 2 S ,R
 006
 {s;033 } = 0001.
 The fre e-c ommunic ationcase appears in the literature, when researcherswish
to focus
 on issues other than communication cost. Although, real-worlddomains rarely
exhibit
 such idealconditions, we may be able to model somedomains as having approximately
free communicationto a sufficient degree.In addition, analyzing this extreme
case givesus some
 understanding of the benefit ofcommunication, even if the results do not
applyacross all
 domains. We also identify the no-communic ation case becausesuch decision
problems have
 been of interest to researchers as well{Goldberg & Mataric, 1997}. Of course,
even if 006
 ff
 =;,
 it is possible that there are domain-level actions in A
 ff that have implicit communicative value by acting as signals that convey
information to the other agents.However, we still
 labelsuch agent teams as having no communic ation for the purposes of the
work here, since
 many of our results exploit an explicit separation between domain- andcommunication-level
 actions.
 2.3 Model Illustration
 We can view the evolving state as a Markov chain with separate stages for
domain-level and communication-level actions.In other words, each agent team
member, i 2 ff begins
 in some initial state, S
 0
 , with initial belief states,b
 0
 i
= S E
 0
 i {}. Each agent receives an observation 012
 0 i
 drawn according to the probability distribution O
 ff
 {S
 0
 ;null; 012
 0
 ff
 } {there are
 no actions yet}. Then, each agent updates its belief state, b
0
 i
* 006
= S E
 i
* 006 {b
 0
 i ; 012
 0
i
 }.
 Next, each agenti2 ff selects a message according toits communication policy,
006 0
 i
 =
 031
 i006
 {b
 0
 i
* 006 }, defininga combined communication,006
 0
 ff
 . Each agent interprets the commu- nications of all of the others by updating
its belief state, b
 0
 i006
* 
 =S E
 i006
*  {b
 0
 i
* 006
 ; 006 0
 ff
 }.Each
 then selects an action according toits domain-level policy, A 0
 i
 = 031 iA
 {b
 0 i006
* 
 }, defininga
 combined action A
0
 ff
 . By our centralassumption of teamwork, each agent receivesthe
 same joint reward, R 0
 = R{S 0
 ; A
 0 ff
 ; 006
0
 ff
 }. Theworld then moves into a new state,S
 1
 ,
 accordingto the distribution, P {S 0
 ; A
 0 ff
 }. Again, each agenti receives an observation 012 1
 i
 drawnfrom 012
 i
 according to thedistributionO
 ff
{S
 1
 ; A 0
 ff
 ; 012 1
 ff
 },and it updates its belief state, b
 1
 i
* 006
 = S E
 i
* 006
 {b
0
 i006
* 
 ;012
 1
 i
}.
 The process continues, with agents choosing communication- and domain-levelactions,
 observing the effects,and updating their beliefs. Thus,in addition to the
time series of world states, S
 0
 ;S
 1
 ; : : : ;S
 t
 , the agents themselves determine a time series of communication-level 396The
CommunicativeMultiagent Team Decision Problem and domain-level actions, 006
0
 ff
 ; 006 1
 ff
 ; : : :; 006
 t
 ff and A
 1
 ff ; A
 1
 ff ; : : : ; A
 t ff
 , respectively. We also have
 atime series of observations for each agent i, 012
 0
 i ; 012
 1
 i ; : : : ; 012
t
 i
 . Likewise,we can treat the
 combinedobservations, 012
 0 ff
 ; 012
1
 ff
 ; : : :; 012
 t
 ff , as a similar time series of random variables.
 Finally , the agents receivea series of rewards, R
 0 ; R
 1
 ; : : :; R
 t
 . We can define the value,
 V , of the policies, 031 ffA
 and 031
ff006
 , as the expected reward received when executing those
 policies. Over a finite horizon, T ,this value is equivalentto the following:
 V
 T
 {031
 ffA ; 031
 ff006 } = E
 "
T
 X
 t=0
 R
 t
 fi
 fi fi
 fi
 fi
 031
 ffA
 ;031
 ff006
 # {7}
 2.4 Related Work
 The COM-MTDP model subsumes manyexisting multiagent models, as presented
in Ta-
 ble 1 {i.e., we can map anyinstance of these models into a correspondingCOM-MTDP}.
 This generality enables us to perform novel analyses of real-world teamwork
domains, as
 demonstrated by Section 4'suse of the COM-MTDP model for analyzing theoptimality
of
 communication decisions. 2.4.1 Decentralized POMDPs
 With its model of observabilityand world dynamics, our COM-MTDP model closelypar-
 allels the structure of thedec entr alize d partially observable Markov
decision pro c ess {DEC-
 POMDP} {Bernstein etal., 2000}. Following our notational conventions, a
DEC-POMDP
 is a tuple,hS; A
 ff
; P; 012
 ff
 ; O
 ff
 ;Ri. There is no set of possiblemessages, 006
 ff
 , so the DEC-
 POMDP falls into theclass of domains with no communic ation. The DEC-POMDP
obser-
 vational model, O, is general enough tocapture col lectively partial ly
observabledomains.
 2.4.2 Par tiall y Observ ableIdentical Pa yoff Stochastic Games Stochasticgames
provide a rich framework for multiagent decision making when the agents
 may have their own individualgoals and preferences. The identical payoff
stochastic game {IPSG} restricts the agents to share asingle payoff function,
appropriate for modeling the single, global reward function of theteam context.
The partially observable IPSG
 {POIPSG}{Peshkin, Kim, Meuleau, & Kaelbling, 2000} is atuple, hS; A
 ff ; P; 012
 ff ; O
 ff
 ; Ri,
 very similar to the DEC-POMDPmodel. In other words, the observation function,
O
 ff , is
 general enough to supportcol lectively partially observabledomains, and
there isno commu-
 nic ation. 2.4.3 Multiagent MDPs Another relevant model is themultiagent
Markov decision pro c ess{MMDP} {Boutilier,
 1996}, which is a tuple, hS; A
 ff ; P; Ri, in our notation.Like the DEC-POMDP, the MMDP hasno communic
ation.In addition, the MMDP is a multiagentextension to the completely
 observable MDP model, so it assumes an environment that is individual ly
observable.
 397Pynadath & TambeModel006
 ffO
 ffDEC-POMDPnocommunicationcollective partial observabilityPOIPSGno communicationcollective
partialobservabilityMMDPno communicationindividualobservabilityXuan-Lessergeneral
communicationcollective observabilityT able 1:Existing models as COM-MTDP
subsets. 2.4.4 Xuan-Lesser Framework The COM-MTDP's separation of communication
from other actions is similar to previous work on multiagent decision models{Xuan,
Lesser, & Zilberstein, 2001},whichsupported
 generalcommunic ation. However,while the Xuan-Lesser model generalizes beyond
indi-
 vidually observableenvironments, it supports only a subset ofcol lectively
observable envi- ronments. In particular, the Xuan-Lesserframework cannot
represent agents who receive local observations of a common world state,
where the observationsof different agents could
 potentially be interdependent.
 3.COM-MTDP Complexity Analysis
 We can use the COM-MTDP model to provesome results about the complexity
of con- structing optimal agent teams {i.e., teamsthat coordinate to produce
optimal behaviorin
 a problem domain}. Theproblem facing these agents {or the designer ofthese
agents} is
 how to construct thejoint policies, 031
 ff006
 and 031
 ffA , so as to maximize their joint reward, as represented by the expected
value, V
 T
 {031
 ffA
 ; 031 ff006
 }. Inall of the results presented, we assume that all of the valuesin a
model instance {e.g., transitionprobabilities, rewards} are
 rational numbers, sothat we can express the particular instance as afinite-sized
input.
 Theorem 1The decision problem of whetherthere exist policies, 031 ff006
 and 031 ffA
 , for a given COM-MTDP, under general communicationand collective partial
observability, that yield
 a total rewar d at least K oversome finite horizon T is NEXP-complete if
jffj 025 2{i.e.,
 more than one agent}. Proof: To prove that theCOM-MTDP decision problem
is NEXP-hard, we reduce aDEC-
 POMDP {Bernstein et al., 2000} to aCOM-MTDP with no communication by copying
all of the other model features from the given DEC-POMDP. In other words,
ifwe are given a DEC-POMDP,
 012
 S; fA
 i
 g
 m
 i=1
 ; P; f012 i
 g
 m
 i=1
 ; O; R
 ff
 , we can construct a COM-MTDP,
 hS
 0
 ; fA
 0
 i g
 m
 i=1 ; 006
 0
 ff ; P
 0
 ; f012
 0
 i
g
 m
 i=1
 ; O
 0
 ff ; B
 0
 ff ; R
 0
 i,as follows:
 S
 0 = S
 A
 0 i
 = A
 i 006
 0
 = ; P
 0
 {s;ha
 1
 ; : : :; a
 m
 i ; s 0
 } = P {s
 0
 js; a 1
 ; : : : ; a m
 }
 398The CommunicativeMultiagent Team Decision Problem 012
 0
 i
= 012
 i
 O
 0
 ff
 {s;ha
 1
 ; : : :; a
 m
 i ; h!
 1
 ; : : : ;!
 m
 i} =O{!
 1
 ;: : : ; !
 m
ja
 1
 ; : : :; a
 m
 ; s} B
 0
 i
 =[
 T
 j=1
 {012 i
 }
 j
 {i.e.,observation sequences of length no more thanthe finite horizon}
 R
0
 {s; ha
 1
 ; : : : ; a
 m
 i ; 033} =R{s; a
 1
 ; : : : ; a
 m }
 The DEC-POMDP assumes perfectrecall, so we use the state estimator functionsfrom
 Equations 5 and 6. Sincethere is no communication for this COM-MTDP, we
have a fixed
 silentpolicy, 031
 ff006
 . We can translate anydomain-level policy, 031 ffA
 , into a DEC-POMDP joint policy, ffi, as follows:
 ffi
 i
 {o
 i 1
 ; : : : ;o
 i
 t
 }021 031
 iA
 { 012
 o
 i 1
 ; : : : ;o
 i
 t
 ff } {8}
 The expected utility of following this joint policy, ffi, withinthe DEC-POMDP
is identical to that of following 031 ff006
 and 031 ffA
 within the constructed COM-MTDP. Thus, there exists
 a policy withexpected utility greater than Kfor the COM-MTDP if and only
if there exists one for the DEC-POMDP.The decision problem for a DEC-POMDP
is known tobe
 NEXP-complete, so the COM-MTDP problem must be NEXP-hard.
 To show thatthe COM-MTDP is in NEXP, our proof proceeds similarly to that
of
 the DEC-POMDP. In other words, we guess the joint policy, 031
 ff
 , andwrite it down in
 exponentialtime {we assume that T 024jS j}. We can take theCOM-MTDP plus
the policy
 andgenerate {in exponential time} a correspondingMDP where the state space
is the space of all possible combined belief states ofthe agents. We can
then use dynamicprogramming
 to determine {in exponentialtime} whether 031
 ff generates an expected reward of at leastK .
 2
 In the remainderof this section, we examine the effect of communication
on the com-
 plexityof constructing team policies that generate optimalbehavior. We start
by examining the case underthe condition offre e communic ation,where we
would expect the benefit of communication to be the greatest.To begin with,
suppose that each agent is capable of
 communicatingits entireobservation {i.e., 006
 i 012
 i
}. Before we analyze the complexity of the team decision problem, we first
prove that the agents shouldexploit this capability and
 communicate their true observation, as long as they incur no cost in doingso:
 Theorem 2 Underfree communication, considera team of agents using a communic
ation
 p olicy: 031 i006
 {b
 t
 i
* 006
 }021 012
 t
 i . If the domain-level policy031
 ffA
 maximizesV
 T
 {031 ffA
 ; 031
 ff006
 }, then this combine d policy is dominant overany other policies. In other
words, for al l policies, 031 0
 ffA
 and031
 0
 ff006 , V
 T
 {031
 ffA
 ; 031 ff006
 } 025V
 T
 {031 0
 ffA
 ;031
 0
 ff006 }.
 Proof: Supposewe have some other communication policy, 031
 0
 ff006
 , that specifies something otherthan complete communication {e.g., keeping
quiet, lying}. Suppose that there issome
 domain-level policy,031
 0
 ffA , that allows the team to attain some expected reward, K , when
 used in combination with 031
 0
 ff006
 .Then, we can construct a domain-level policy, 031
 ffA
, such
 that the team attains the sameexpected reward, K , when used inconjunction
with the
 complete-communicationpolicy, 031
 ff006
 , as defined in the statementof Theorem 2.
 The communication policy, 031
 0
 ff006
 , produces a different set of belief states {denoted b
 0 t
 i
* 006
 and b
 0
 t i006
* 
 } than those for031
 ff006
 {denotedb
 t
 i
* 006 and b
 t
 i006
* 
 }. In particular, weuse state estimator
 399Pynadath & Tambe
 functions, S E
 0
 i
* 006
 andS E
 0
 i006
* 
 as defined in Equations 5 and 6 to generate b
 0
 t i
* 006
 and b 0
 t
 i006
*  .
 Each belief state is a completehistory of observation and communication
pairsfor each
 agent. On the other hand,under the complete communication of 031 ff006
 , the state estimator functionsof Equations 5 and 6 reduce to: S E
 i
* 006 {
 012
 012 0
 ff
 ; : : :; 012
 t0001 ff
 ff
 ; 012 t
 i
 } = 012
 012
 0 ff
 ; : : : ;012
 t0001
ff
 ; 012
 t i
 ff
 {9}
 S E
 i006
*  {
 012
 012 0
 ff
 ; : : :; 012
 t0001 ff
 ; 012
 t i
 ff
 ; 006 t
 ff
 } = 012
 012
 0 ff
 ; : : : ;012
 t0001
ff
 ; 006
 t ff
 ff
 =
 012
 012
 0
 ff
 ; : : : ; 012 t0001
 ff ; 012
 t
 ff ff
 {10}
 Thus,031
 ffA
 is defined over a different set of belief states than031
 0
 ffA
 . In order to determine
 an equivalent 031
 ffA
 , we must first define a recursivemapping, m, that translates the belief
states defined by 031
 ff006
 into those defined by 031
 0
 ff006 :
 m
 i
 {b
 t
 i006
* 
 } =m
 i 000012
 b
 t0001
 i006
*  ; 012
 t
 ff ff001
 = m
 i
 000012
 b t0001
 i006
* 
 ;
 012
 012
 t
 i
; 012
 t
 ff ffff001
 =
 D
 m
 i
{b
 t0001
 i006
* 
 }; D
 012
 t i
 ; 006
 0 t
 ff
 EE
 =
 *
 m
i
 {b
 t0001
 i006
* 
};
 *
 012 t
 i
 ;
 Y
 j2ff
006
 0
 t
 j ++
 =
 *
 m
 i
 {b t0001
 i006
* 
 };
 * 012
 t
 i
 ;
 Y
 j2ff 031
 0
 j006 {S E
 0
 j
* 006
 {m j
 {b
 t0001
 j006
*  }; 012
 t j
 })
 ++ {11}
 Given this mapping, wethen specify: 031
 iA {b
 t
 i006
* 
 } = 031 0
 iA
 {m i
 {b
 t i006
* 
 }).Executing this domain-
 levelpolicy, in conjunction with the communication policy, 031
 ff006
 , resultsin the identical
 behavior as execution of thealternate policies, 031
 0 ffA
 and 031
 0
 ff006
 .Therefore, the team following
 the policies, 031
 ffA and 031
 ff006 will achieve the same expected value of K , as under 031 0
 ffA
 and 031
 0
 ff006 . 2
 Given this dominance ofthe complete-communication policy,we can prove that
the
 problemof constructing teams that coordinate optimally issimpler when communication
is
 free. Theorem 3 The decisionproblem of determining whether there exist policies,
031
 ff006
 and
 031 ffA
 , for a given COM-MTDP with freecommunication under collective partialobservabil-
 ity, that yield atotal rewar d at least Kover some finite horizon Tis PSPACE-c
omplete.
 Proof:To prove that the problem is PSPACE-hard, we reduce the single-agent
POMDPto
 a COM-MTDP. In particular, if weare given a POMDP, hS;A; P; 012; O; Ri,we
can construct
 a COM-MTDP, hS
 0
 ;A
 0
 1
 ;006
 0
 1
; P
 0
 ; 012 0
 1
 ; O 0
 1
 ; B 0
 1
 ; R 0
 i, for a single-agentteam {i.e., ff = f1g}:
 S
 0
 =S
 A
 0
 1 = A
 006
 0 1
 = ;
 P 0
 {s; ha 1
 i ; s
 0 } = P {s; a 1
 ; s
 0 }
 012
 0
 1
 = 012
 400The CommunicativeMultiagent Team Decision Problem O
 0
 1
 {s; ha
 1
 i; h!
 1
 i}= O{s; a
 1 ; !
 1
 }
 B
 0
 1
 =[
 T
 j=1
 {012} j
 {i.e., observationsequences of length no more than the finitehorizon}
 R
 0
 A {s; ha
 1 i} = R{s;a
 1
 }
 R 0
 006
 {s;033} = 0
 This COM-MTDPsatisfies our assumption of free communication.The POMDP assumes
 perfect recall, so we use the state estimator functions from Equations 5and
6. Just as in
 the proof ofTheorem 1, we can show that there exists a policy with expected
utility greater than K for this COM-MTDP if andonly if there exists one for
the POMDP. The decision
 problemfor the POMDP isknown to be PSPACE-hard {Papadimitriou& Tsitsiklis,
1987},
 so the COM-MTDP problem under free communication must be PSPACE-hard.
 T o show that the problemis in PSPACE, we take a COM-MTDPunder free communi-
 cation and reduce itto a single-agent POMDP. In particular, ifwe are given
a COM-MTDP,
 hS; A
 ff
 ; 006
 ff
 ; P;012
 ff
 ; O ff
 ; B
 ff ; Ri, we can construct asingle-agent POMDP, hS 0
 ; A
 0 ; P
 0
 ; 012 0
 ; O
 0 ;
 R
 0
 i, as follows:
 S
0
 = S
 A
 0
 = A
 ff P
 0
 {s;a; s
 0
 }= P {s; a; s 0
 }
 012
 0
 = 012
 ff O
 0
 {s;a; !} = O
 ff
 {s; a;!}
 R
 0
 {s; a} = R
 A
 {s; a} From Theorem 2, we need to consideronly the complete-communication
policy for the COM-MTDP and this policy has a zero reward. Therefore, the
decision problem for the COM-MTDP is simply to find a domain-levelpolicy
that produces an expected reward exceeding K . Givenfull communication, the
state estimator functions for the COM-MTDP
 {as shown in the proofof Theorem 2} reduce to Equation 10.A policy for our
POMDP
 specifies anaction for each and every history of observations: 031
 0
 :[
 T
 j=1
 {012
 0
 }
 j
 ! A
 0 . The
 history of observations for the single-agent POMDP correspondsto the belief
states of our
 COM-MTDP underfull communication. Therefore, we can translate a POMDP-policy,
031
0
 ,
 into an equivalent domain-level policy for the COM-MTDP: 031
 A
 {h!
 0
 ; ! 1
 ; : : : ;!
 t
 i} 021031
 0
 {h! 0
 ; !
 1 ; : : : ; !
t
 i} {12}
 Ateam following 031
 A will perform the exact same domain-levelactions as a single agent
 following031
 0
 . Thus,there exists a policy with expected utilitygreater than K for the
COM- MTDP if and only if there exists one for thePOMDP. The decision problem
for a POMDP is known to be in PSPACE{Papadimitriou & Tsitsiklis, 1987}, so
theCOM-MTDP problem
 {under free communication}must be in PSPACE as well.2
 401Pynadath & Tambe
 Theorem 4 The decision problem of determining whether there exist policies,
031
 ff006
 and
 031 ffA
 , for a given COM-MTDP withfree communication and collective observability,
that
 yielda total rewar d at least K oversome finite horizon T is P-complete.
 Proof: The proof follows that of Theorem 3, but with a reduction to and
from the MDP
 decision problem, rather thanthe POMDP. The MDP decision problem isP-complete
{Pa-
 padimitriou & Tsitsiklis,1987}. 2
 Theorem 5The decision problem of determiningwhether there exist policies,031
 ff006
 and 031
 ffA
 ,for a givenCOM-MTDP with individual observability, that yield a total rewar
dat
 least K oversome finite horizon T {givenintegers K and T} is P-complete.
 Proof:The proof follows that of Theorem 4,except that we can reduce the
problem to and from an MDP regardless of what communication policy the team
uses. 2 Theorem 6 The decisionproblem of determining whether there exist
policies, 031
 ff006
 and
 031 ffA
 , for a given COM-MTDP with non-observability, that yield a totalrewar d
at least K over some finite horizon T{given integers K andT } is NP-complete.
 Proof: The proof follows that ofTheorem 4, except that we can reduce the
problemto and
 from an single-agent non-observable MDP {NOMDP} regardless of what communication
 policy the team uses.In particular, because the agents are allequally ignorant
of the state,
 communicationhas no effect. The NOMDP decision problemis NP-complete {Papadim-
 itriou& Tsitsiklis, 1987}. 2
 Thus,we have used the COM-MTDP framework to characterize the difficulty
of problem domains in agent teamwork along thedimensions of communication
cost and observability .
 T able 2 summarizes ourresults, which we can use in deciding where toconcentrate
our
 energies in attackingteamwork problems. We can use theseresults to draw
some conclusions
 about the challenges to designers of multiagent teams:
 
*  The greatest challenges lie in those domains with eithercol lective observability
or col lective partialobservability and with nonzero communication cost.
*  Under collective observability and col lective partial observability,
teamwork without
 communication ishighly intractable, but, withfre ecommunic ation, the complexity
becomes on par with that of single-agentplanningproblems.
 
*  Agentteam designers have much to gain byincreasing the observational capabilities
of their team {e.g., by adding new sensoragents} because of the reduction
in complexity gained by making the domain col lectively observable.
 
*  Furthermore, the results fromTheorems 3 and 4 hold in any domain where
theresult
 from Theorem 2 holds {i.e., whencomplete communication is the dominant policy}.
Therefore, whileperfectly free communication may be rare, these results show
that investment in communication in teamwork can pay off with a significantsimplification
 of optimal teamwork. 402The CommunicativeMultiagent Team Decision ProblemIndividuallyCollectivelyCollectivelyNon-ObservableObserv
ablePartially ObservableObserv ableNo Comm.P-completeNEXP-completeNEXP-completeNP-CompleteGeneral
Comm.P-completeNEXP-completeNEXP-completeNP-CompleteFree Comm.P-completeP-completePSPACE-completeNP-CompleteT
able 2: Time complexity of COM-MTDPs. 
*  On the other hand, when theworld is individual ly observableor non-observable,
com-
 munication makes no difference in performance.
 
*  It should be noted that evenunder those conditions where the problem is
P-complete, the complexity of optimal teamwork is polynomial in the number
of states of the world, which may still beimpractically high.
 
*  Theabove complexity results pertain tofinding policies that are optimal
subject to
 the domain prop erties. We will find different expected rewards of the optimal
policies under different observabilityand communication properties. Forinstance,
cutting off
 all of the agents' sensors makes the domain non-observableand reduces the
complexity
 of generating an optimal policy from NEXP to NP,but we would expect an associated
drop in the expected reward achievedby the team.
 4. EvaluatingTeam Coordination
 Table 2 shows that providing optimal domain-level and communication policies
for teams is a difficult challenge. Manysystems alleviate this difficulty
by havingdomain experts pro-
 vide the domain-level plans {Tambe, 1997; Tidhar,1993}. Then, the problem
for the agents reduces to generating the appropriate team coordination, 031
 ff006 , to ensure that they prop-
 erly execute the domain-level plans,031
 ffA
 . Inthis section, we demonstrate the COM-MTDP framework's ability to analyze
existing teamwork approaches in the literature.Our method-
 ology for such analysisbegins by encoding such a teamworkmethod as a communication-
 level policy. In other words, we translate the method into an algorithm
that maps agent beliefs {e.g., observationsequences} into communication decisions.To
evaluate the per- formance of this policy, we theninstantiate a COM-MTDP
that represents the states, transition probabilities, and reward function
of a domain of interest. Our methodology provides an evaluation of the policy
in terms of the expected reward earned by the team
 when following the policyin the specified domain.
 We demonstrate this methodology by using ourCOM-MTDP framework to analyze
joint intentions theory {Cohen & Levesque,1991b, 1991a; Levesque et al.,
1990}, which provides
 a common basis for many existingapproaches to team coordination. Section4.1
models two
 key instantiationsof joint intentions taken from the literature {Jennings,
1995; Tambe, 1997} as COM-MTDP communication policies.Section 4.2 analyzes
the conditions under which these policies generate optimal behavior and provides
a third candidate policy that makes communication decisions that are locallyoptimal
within the context of joint intentions. In
 403Pynadath & Tambe
 addition to providingthe resultsfor the particular team coordination strategies
investigated,
 this section also illustrates ageneral methodology by which one can use
ourCOM-MTDP
 framework to encode and evaluate coordination strategies proposed byexisting
multiagent
 research. 4.1 Joint Intentions in a COM-MTDP Joint intention theory provides
aprescriptive framework for multiagent coordination in a
 team setting. Itdoes not make any claims of optimality inits teamwork, butit
provides
 theoretical justifications for its prescriptions,grounded in the attainment
of mutual belief among the team members. We can use the COM-MTDP framework
to identifythe domain
 properties under which attainingmutual belief generates optimal behavior
andto quantify
 precisely how suboptimal theperformance will be otherwise.
 Joint intentions theory requires that team members jointly commit to a joint
persistent goal, G. It also requires thatwhen any team member privatelybelieves
that G is achieved {or unachievable or irrelevant}, it must then attain mutual
beliefthroughout the team
 about this achievement {or unachievability or irrelevance}. To encode this
prescription of jointintentions theory within our COM-MTDPmodel, we first
specify the joint goal,G, as
 a subset of states,G022 S , where the desired goal isachieved {or unachievable
or irrelevant}.
 Presumably , such a prescriptionindicates that joint intentions are not
specifically in-
 tended for individually observable environments. Uponachieving the goal
in an individually
 observable environment, each agent would simultaneously observe thatS
 t
 2 G.Because
 of our assumption that the COM-MTDPmodel components {including O ff
 } are common
 knowledge to the team, each agent wouldalso simultaneously come to believe
that itsteam-
 mates have observed thatS
 t
 2G,and that its teammates believe that it believes that all
 of the team members have observed that S
 t 2 G, and so on. Thus,the team immediately
 attains mutual beliefin the achievement of the goal underindividual observability
without
 anyadditional communication necessary by the team. Instead, the joint intention
frameworkaims at domains with some degree of unobserv- ability. In such domains,
the agents must signal the other agents, eitherthroughcommuni-
 cation or some informativedomain-level action, to attain mutual belief.However,
we can
 alsoassume that joint intention theory does notfocus on domains with fre
ecommunic ation,
 where Theorem 2shows that we can simply have the agents communicate everything,
all
 thetime, without the need for more complex prescriptions. The joint intention
framework doesnot specify a precise communication policy forthe
 attainment of mutual belief.In this paper, we focus on communicationonly
in the case of
 goal achievement, butour methodology extends to handle unachievability and
irrelevance as well. One well-known approach{Jennings, 1995} applied joint
intentions theory by having
 the agents communicatethe achievement of the joint goal,G, as soon as they
believeG to be
 true. Toinstantiate the behavior of Jennings' agentswithin a COM-MTDP, we
construct a communication policy, 031 J
 ff006
 , that specifies that an agent sends the specialmessage, 033
 G
 , when it first believes that Gholds. Following joint intentions'assumption
of sincerity {Smith & Cohen, 1996}, werequire that the agentsnever select
the special 033 G
 message in a belief state unless they believe Gto be true with certainty.With
this requirement and with our assumption of the team's common knowledge ofthe
communication model, we can assume 404The CommunicativeMultiagent Team Decision
Problem that all of the other agents immediatelyaccept the special message,
033 G
 , as true, and that the agents know that all their team members accept the
message as true, and so on.Thus,
 the team attains mutual beliefthat G is true immediately upon receiving
themessage, 033
 G
 . We can construct the communication policy, 031
 J
 ff006
 , in constant time. The STEAM algorithm is another instantiationof joint
intentions that has had success in several real-world domains {Tambe, 1997;
Pynadath et al., 1999; Tambe, Pynadath, Chau-
 vat,Das, & Kaminka, 2000; Pynadath & Tambe, 2002}. Unlike Jennings' instantiation,
the
 STEAM teamwork modelincludes decision-theoretic communication selectivity.
A domain
 specification includes two parameters for each joint commitment,G: 034
, the probability of miscoordinated termination of G; and C
 mt
 , the cost of miscoordinated termination of G.In
 this context, \miscoordinatedtermination" means that some agents immediatelyobserve
 that the team has achievedG while the rest do not. STEAM'sdomain specification
also
 includesa thirdparameter, C
 c
 , to represent the cost of communication of a fact {e.g.,the
 achievement of G}.Using these parameters, the STEAM algorithm evaluates
whether the
 expected cost ofmiscoordination outweighs the cost of communication. STEAM
expresses
 thiscriterion as the following inequality:034 001 C
 mt > C
 c
 .We can define a communication policy, 031
 S ff006
 based on this criterion:if the inequality holds, then an agent thathas observed
 the achievement ofG will send the message, 033 G
 ; otherwise, it will not.We can construct
 031 S
 ff006
 inconstant time.
 4.2 LocallyOptimal Policy
 Although the STEAM policy is more selective than Jennings', itremainsunanswered
 whether it is optimally selective, and researchers continueto struggle with
the question
 of when agents should communicate {Yen et al., 2001}.The few reports of
suboptimal {in particular, excessive} communication inSTEAM characterized
the phenomenon as an exceptional circumstance, but it is also possible that
STEAM's optimal performance is the exception. We use the COM-MTDP modelto
derive an analytical characterization of opti- mal communication here, while
Section 5provides an empirical one by creating analgorithm
 using that characterization. Both policies, 031
 J ff006
 , and 031 S
 ff006
 considersending 033
 G
 only when anagent first believes that
 Ghas been achieved. Oncean agent has the relevant belief, they make different
choices, and
 we consider here what the optimal decision is atthis point. The domain is
not individually observable, so certain agents may be unaware of the achievement
ofG. When not sending
 the033
 G
 message, these unaware agents may unnecessarily continue performing actions
in
 the pursuit of achieving G. The performance of theseextraneous actions could
potentially incur costs and lead to a lower utility than one would expect
when sending the033
 G
 message. The decision to send 033
 G
 or not matters only if the team achieves G and one agent
 comes to know this fact. Wedefine the random variable, T G
 , to be the earliest time at which an agent knows this fact.We denote agent
K
G
 as the agent who knows of the achievement at time T G
 . If K
 G
 = i, for some agent,i, and T
 G
=t
 0
 , then agenti has some
 pre-communicationbelief state, b
 t
 0
 i
* 006
 = fi , that indicates thatG has been achieved. Tomore
 precisely quantify the difference between agent i sending the033
 G
 message at timeT
 G
 vs.
405Pynadath & Tambe
 never sending it, we definethe following value:
 001 T
 {t
 0 ; i; fi } 021E "
 T 000t
 0
 X
 t=0
 R
 t
 0
 +t
 fi
 fi
 fi fi
 fi
 006
 t
 0
 i
 =033
 G
 ; T G
 = t
 0 ; K
 G
 =i; b
 t
 0
 i
* 006
 = fi #
 000 E
 " T 000t
 0
 X
 t=0
 R
 t
 0
 +t
 fi
 fi
 fi
 fi fi
 006
 t
 0
 i
 = null; T
 G
 = t 0
 ; K
 G = i; b
 t
 0
 i
* 006
= fi
 #
 {13} We assume that, for all times other than T
 G
 , the agents follow some communication policy,
 031
 ff006
 ,that never specifies 033
 G
 . Thus, 001 T
 measures the difference in expected reward that
 hinges on agenti's specific decision to send or not send033
 G
 at time t 0
 . Given this definition, it is locally optimal for agenti to send the special
message,033
 G
 , at timet
 0
 , if and only if 001
 T
 0250. We define the communication policy, 031
 ff006+033
 , as the communication policy following 031
 ff006 for all agents at all times, except foragent i under belief state
fi, when
 agent i sends message033. With this definition,031
 ff006+033 G
 , is the policy under which agent i
 communicates the achievement of G, and 031
 ff006+null
 is the policyunder which it does not.
 Therefore,we can alternatively describe agenti's decision criterion as choosing031
 ff006+033 G
 over 031 ff006+null
 if and only if 001
 T
 025 0. Unfortunately, whileEquation 13 identifies an exact criterion for
locally optimal commu-
 nication, this criterion is not yet operational. In other words, we can
notdirectly implement
 it as a communicationpolicy for the agents. Furthermore,Equation 13 hides
the underly-
 ing complexity of the computation involved, which is one of the key goals
of our analysis. Therefore, we use the COM-MTDP model toderive an operational
expression of 001 T
 0250.
 For simplicity, we define notationalshorthand for various sequences and
combinations of
 values. Wedefine a partial sequence of random variables, X
 t
 , X
 025t
 , etc.}. Theexpression, {S }
 T , denotes the cross
 productover states of the world,
 Q T
 t=0
 S, as distinguishedfrom the time-indexed random variable, S
 T , which denotes the valueof the state at time T . Thenotation, s
 025t
 0
 [t], specifies the element in slot t within the vector s
 025t
0
 . We define the function,007, as shorthand within
 our probabilityexpressions. It allows us to compactly represent a particular
subsequence
 ofworld andagent beliefstates occurring, conditioned onthe current situation,
as follows: Pr
 000
 007
 000012
 t; t
 0 ff
 ; s; fi
 
* 006
 001001
 021 Pr{S
 025t;024t
 0
 =s; b
 ff
* 006 025t;024t
 0 = fi
 
* 006 fi
 fi
 T G
 = t
 0 ; K
 G
 =i; b
 t
 0 i
* 006
 =fi }
 {14}
 Informally, 007 {ht; t
 0 i ; s; fi
 
* 006
 } represents the eventthat the world and belief states from time t through
t
 0
 correspond to the specified sequences,s and fi
 
* 006 , respectively, conditioned on agent i being the first to know of
G's achievement at timet
 0
 with a belief state,fi . We define
 the function, fi
 006
*  , to map a pre-communication belief stateinto the post-communication
 beliefstate that arises from a communication policy: fi
 006
* 
 {fi
 
* 006 ; 031
 ff006 } 021 S E
 ff006
* 
 {fi
 
* 006
 ; 031 ff006
 {fi 
* 006
 }) {15} This definition of fi
 006
* 
 is a well-definedfunction because of the deterministic nature of the policy,
031
 ff006
 , and state-estimator function,S E
 ff006
*  .
 406The CommunicativeMultiagent Team Decision Problem Theorem 7 If we assume
that, upon achievement of G, no communic ation other than 033
 G
 is possible, then the condition 001
 T
 {t
 0
 ,i,fi } 025 0 holds if and onlyif:
 X
 s
 024t
 0
 2{S} t
 0
 X
 fi 024t
 0
 
* 006
 2B
 t
 0
 ff
 Pr{007{h0; t
 0
i ; s
 024t
0
 ; fi
 024t 0
 
* 006
 }) 001
 0
 B
 @ X
 s
 025t 0
 2{S} T 000t
 0
 +1
 X
 fi
025t
 0
 
* 006
 2B
 T 000t
 0
 +1
 ff Pr
 020
 007{ht
 0
 ; Ti ; s
 025t
0
 ; fi
 025t
 0
 
* 006 }
 fi
 fi fi
 006
 t 0
 i
 = 033 G
 ; 007{h0; t
 0
 i ; s 024t
 0
 ;fi
 024t
 0 
* 006
 }
021
 001
 T
X
 t=t
 0 R
 A
 020 s
 025t
 0 [t]; 031
 ffA
 020
 fi 006
* 
 020
 fi
 
* 006
 025t
 0
 [t]; 031
 ff006+033 G
 021021021
000
 X
 s
 025t
 0
 2{S}
 T 000t
 0 +1
 X
 fi
 025t
 0
 
* 006
 2B
 T 000t
 0
 +1
 ff Pr
 020
 007{ht
 0
 ; Ti ; s
 025t
 0
 ; fi
 025t
 0
 
* 006 }
 fi
 fi fi
 006
 t 0
 i
 = null; 007{h0; t
 0
 i ; s
 024t
 0
 ; fi 024t
 0
 
* 006
 }
 021
 001
 T
 X
t=t
 0
 R A
 020
 s 025t
 0
 [t]; 031
 ffA 020
 fi
 006
* 
 020
 fi 
* 006
 025t 0
 [t]; 031 ff006+null
 021021021 !
 025 000
 X s2G
 X
fi2B
 ff
 Pr{007{ht
 0
; t
 0
 i ; s;fi}) R
 006
 {s; 033
 G
 }{16}
 Proof: The complete proofof the following theorem appears in Online Appendix
1.
 The definition of 001 T
 in Equation 13 is the difference between two expectations, where each expectation
is a sum over the possibletra jectories of the agent team.Each tra jectory
must
 includes a sequence of possible world states,since the agents' reward at
each pointin time
 depends on the particular state ofthe world at that time. The agents' reward
also depends
 on their actions {bothdomain- and communication-level}. Theseactions are
deterministic,
 giventhe agents' policies, 031
 ffA
 and 031
 006 , and their belief states. Thus,in addition to summing
 over the possible states of the world, we must alsosum over the possiblestates
of the agents' 407Pynadath & Tambe
 beliefs {both pre- and post-communication}:
 001
 T {t
 0
 ; i; fi}
 =
 X
 s 024T
 2{S}
 T
 X
 fi 
* 006
 024T 2{B}
 T X
 fi
 006
*  024T
 2{B} T
 Pr
 000
 S
 024T
 =s
 024T
 ; b 
* 006
 024T = fi
 
* 006 024T
 ; b
 006
* 
 024T
= fi
 006
* 
 024T
 j006
 t 0
 i
 = 033 G
 ; T
 G = t
 0
 ;K
 G
 = i; b t
 0
 i
* 006
 = fi
 001 001
 T
 X t=0
 R{s 024T
 [t]; 031
 A
 {fi
 006
* 
 024T
 [t]}; 031 006
 {fi
 
* 006
 024T
 [t]})
 000
 X
 s
 024T 2{S}
 T X
 fi
 
* 006 024T
 2{B}
 T
 X
 fi 006
* 
 024T 2{B}
 T Pr
 000
 S 024T
 = s 024T
 ; b 
* 006
 024T = fi
 
* 006 024T
 ; b 006
* 
 024T = fi
 006
*  024T
 j006
 t
 0
 i
= null; T
 G
 = t
 0
 ;K
 G
 = i; b t
 0
 i
* 006
 = fi
 001 001
 T
 X t=0
 R{s 024T
 [t]; 031
 A
 {fi
 006
* 
 024T
 [t]}; 031 006
 {fi
 
* 006
 024T
 [t]}) {17}
 We can rewrite these summations more simply usingour various shorthand notations:
=
 X
 s
 024T
 2{S}
 T
 X
 fi

* 006
 024T
 2{B}
 T
 Pr{007{h0; T i; s; fi
 
* 006 024T
 }j006 t
 0
 i
 = 033
 G
 } 001
 T
 X t=0
 R{s 024T
 [t]; 031
 A
 {fi
 006
* 
 {fi
 
* 006
 024T
 [t]; 031 006033
 G
}); 031
 006033 G
 {fi
 
* 006
 024T
 [t]})
 000
 X s
 024T
 2{S}
 T
 X fi
 
* 006
 024T
 2{B} T
 Pr{007{h0; T i ; s; fi
 
* 006
 024T
 }j006
 t
 0 i
 = null}
 001
 T
 X
 t=0
 R{s
 024T
 [t];031
 A
 {fi 006
* 
 {fi 
* 006
 024T [t]; 031
 006null
 }); 031 006null
 {fi 
* 006
 024T [t]}) {18}
 Theremaining derivation exploits our Markovianassumptions to rearrange the
summations and cancel like terms to produce thetheorem's result. 2
 Theorem7 states, informally, that we prefer sending 033
 G
 whenever the thecost of exe-
 cution after achievingG outweighs the cost of communication ofthe fact that
G has been
 achieved. More precisely,the outer summations on the left-hand side of theinequality
 iterate over allpossiblepast histories of world and belief states,producing
a probability
 distributionover the possible states the team can be in at time t
 0
 . For each such state, the expression inside the parentheses computes thedifference
in domain-level reward, over all possiblefuture sequences of world and belief
states, between sending andnot sending 033
 G
.
 By our theorem's assumption that no communication other than 033
 G is possible after G has been achieved, we can ignore anycommunication
costs in the future. However,ifwe relax
 this assumption, we can extend the left-hand side in a straightforwardmanner
into a longer
 408The CommunicativeMultiagent Team Decision ProblemIndividuallyCollectivelyCollectivelyNon-ObservableObserv
ablePartially ObservableObserv ableNo Comm.012{1}012{1}012{1}012{1}General
Comm.012{1}O{(jS j 001 j012
 ff
 j} T
 }O{(jS j 001 j012 ff
 j}
 T }012{1}FreeComm.012{1}012{1}012{1}012{1}Table 3: Time complexity
oflocally optimal decision.
 expressionthataccounts for the difference in future communication costs
as well. Thus, the left-hand side captures our intuition that,when not communicating,
the team will incur a cost if the agents other than iare unaware of G's achievement.The
right-hand side of the
 inequalityis a summation of the cost of sending the033
 G
 message over possible current states
 and belief states. We can use Theorem 7 to derivethe locally optimal communication
decision across various classes of problem domains.Under no communic ation,we
cannot send 033
 G . Under
 fre ecommunic ation, the right-hand side is0, so the inequality is always
true, and we know
 to prefer sending 033 G
 . Under no assumptions about communication, the determination is more complicated.
When the domain isindividual ly observable, the left-hand sidebecomes
 0, because al lof the agents know that G has beenachieved {and thus there
is no difference in execution when sending 033 G
 }. Therefore, the inequality is always false {unless under fre e
 c ommunic ation}, andweprefer not sending 033
 G . When the environment is notindividually
 observable and communicationis available but not free, then, to belocally
optimal at time
 t 0
 , agent i must evaluate Inequality 16 in its full complexity. Since the
inequality sums
 rewards over all possible sequences ofstates and observations, the time
complexityof the
 corresponding algorithm isO{(jS j 001 j012
 ff
 j} T
 }. While this complexityis unacceptable for most
 real-world problems,it still provides an exponential savings over searching
the entire policy space for the globally optimal policy, where any agent
could potentially send033
 G
 at times other than T
 G
 . Table 3 provides a table of thecomplexity required to determine the locally
optimal policy under the variousdomain properties.
 We can now show that although Theorem 7's algorithm for locallyoptimal communica-
 tion provides asignificant computational savings overfindingthe global optimum,
it still outperforms existing teamwork models, as exemplified by our 031
 J ff006
 and 031 S
 ff006
 policies.First,
 we can use the criterion of Theorem 7 to evaluate the optimality of the
policy, 031
 J
 ff006
 . If
 001 T
 {t
 0 ; i; fi } 025 0 for allpossible times t
 0
 , agents i, and belief statesfi that are consistent
 with the achievement of the goal G, then the locally optimal policy will
always specify sending 033
 G
 . In other words, 031 J
 ff006
 will beidentical to the locally optimal policy. However,
 if the inequality of Theorem 7 is ever false, then 031 J
 ff006
 is not even locally, let alone globally,
 optimal.
 Second, we can alsouse Theorem 7 to evaluate STEAM by viewingSTEAM's inequality,
 034001 C
 mt
 >C
 c
 , as a crude approximation of Inequality 16. In fact, there isa clear corre-
 spondencebetween theterms in the two inequalities. Theleft-hand side of
Inequality 16
 computes an exact expected cost of miscoordination. However, unlikeSTEAM's
monolithic
 034 parameter,the optimal criterion evaluates a completeprobability distribution
over all possible states of miscoordination byconsidering all possible past
sequences consistentwith
 409Pynadath & Tambe
 the agent's current beliefs.Likewise, unlike STEAM's monolithicC
 mt
 parameter, the opti- mal criterion looks ahead over all possible future
sequences of states to determine thetrue
 expected cost of miscoordination.Furthermore, we can view STEAM's parameter,C
 c
 , as an
 approximation of the communication cost computed by the right-hand side
of Inequality 16. Again, STEAM uses a single parameter, while the optimal
criterion computes an expected cost over all possible states of the world.
 STEAM does have someflexibilityin its representation, becauseC
 mt
 , 034, andC
 c
 are not necessarily fixed across the entire domain. For instance, C mt
 may vary based on the specific joint plan that the agentsmay have jointly
committed to {i.e., theremay be a
 different C mt
 for each goal G}. Thus, while Theorem 7 suggestssignificant additional flexi-
 bility incomputing C
 mt
 through explicitlookahead, the optimal criterion derivedwith the
 COM-MTDP model also provides ajustification for the overall structure behindSTEAM's
 approximate criterion. Furthermore, STEAM's emphasis on on-line computationmakes
the
 computational complexity ofInequality 16 {as presented in Table 3} unacceptable,
so the
 approximationerror may be acceptable given the gains in efficiency. For
a specific domain, we can use empirical evaluation{as demonstrated in the
next section} to quantify the error
 and efficiency to precisely judgethis tradeoff.
 5. EmpiricalPolicy Evaluation
 In addition toproviding these analytical results over generalclasses of
problem domains, the
 COM-MTDPframework also supports the analysis ofspe cific domains. Givena
particular
 problem domain, we canconstruct an optimal communication policy or, ifthe
complexity of
 computing an optimal policy is prohibitive, we can instead evaluate and
compare candidate
 approximate policies. To provide a reusable tool for such evaluations, we
have implemented
 the COM-MTDP model as a Python classwith domain-independent methods for
the eval-
 uation of arbitrary policies and forthe generation of both locally optimal
policiesusing
 Theorem 7 and globally optimal policies through brute-force search of the
policy space. This software is availablein Online Appendix 1.
 This section presents results of a COM-MTDP analysis of an exampledomain
involving
 agent-piloted helicopters,where we focus on the key communicationdecision
faced by many
 multiagentframeworks {as described in Section 4}, but vary the cost of communication
and degree of observability to generate aspace of distinct domains with differentimplications
 for the agents' performance.By evaluating communication policies over various
configura-
 tions of thisparticular testbed domain, we demonstrate a methodology by
which one can
 use the COM-MTDP framework to model any problem domain andto evaluate candidate
 communication policies for it.
 5.1 ExperimentalSetup
 Consider two helicopters that must fly across enemy territory to theirdestination,
as il-
 lustrated in Figure 1.The first, piloted by agent Tr ansp ort, is a transport
vehiclewith
 limited firepower.The second, piloted by agent Escort, is an escort vehicle
with significant
 firepower. Somewherealong their path is an enemy radar unit, butits location
is unknown
 {apriori} to the agents. Escortis capable of destroying the radar unit uponencountering
 it. However,Tr ansp ort isnot, but it can escape detection by the radar
unit by traveling 410The CommunicativeMultiagent Team Decision Problemon
paper, in 1/72inch coords'colortogray' and 'mergeprocs' come from xwd2ps
%     via xgrabFigure 1: Illustrationof helicopter team scenario.
 at a very low altitude {nap-of-the-earth flight},though at a lower speed
than at its typical,
 higher altitude. In this scenario,Escort will not worry about detection,given
its superior
 firepower;therefore, it will fly at a fast speed at itstypical altitude.
 The two agentsform a top-level joint commitment,G
 D
 , to reach theirdestination.
 There is no incentive forthe agents to communicate the achievementof this
goal, since they
 will both eventually reach their destination with certainty. However, in
the service of their top-level goal, G
 D , the two agents also adopt a joint commitment, G
 R
 , of destroying the
 radarunit. We consider here the problem facingEscort with respect to communicatingthe
 achievement of goal,G
 R
 . If Escort communicates the achievement ofG
 R
 , then Tr ansp ort
 knows that it is safeto fly at its normal altitude {thus reachingthe destination
sooner}.
 If Escort does not communicate the achievement of G
 R
 , thereisstill some chance that
 Tr ansp ortwill observe the event anyway. If Tr ansp ort does notobserve
the achievement
 ofG
 R
 , then it must flynap-of-the-earth the whole distance, and the teamreceives
a lower
 reward because ofthe later arrival. Therefore, Escort must weigh the increase
in expected reward against the cost of communication. In the COM-MTDP model
of this scenario{presented in Figures 2, 3 and 4},the world
 state is the position{along a straight line between originanddestination}
of Tr ansp ort, Esc ort, andthe enemy radar.The enemy is at a randomly selected
positionsomewhere
 in between the agents'initial position and their destination.Tr ansp ort
has no possible communication actions, but it can choosebetween two domain-level
actions:flying nap-of-
 the-earth and flying atits normal speed and altitude. Escort has two domain-levelactions:
flying at its normal speed and destroyingthe radar. Escort also has the optionof
communi-
 cating the special message,033
 G
 R
, indicatingthat the radar has been destroyed. In the tables
 of Figures 2, 3and 4, the \001" symbol representsa wild-card {or \don't
care"} entry.
 If Escort arrives at theradar, then it observes its presence with certainty
and can
 destroy it to achieve G
 R
 . Thelikelihood of Tr ansp ort's observingthe radar's destruction is a function
of its distance from the radar.We can vary this function'sobservability parameter
 411Pynadath & Tambeff= fEscort {E }; Transport {T }gS = 004
 E 002 004
 T
002 004
 RPosition of Escort: 004 E
 = f0;1; : : : ; 8; 9; DestinationgPosition of Transport:004
 T
 = f0; 0:5; : : :; 9; 9:5; Destination;DestroyedgPositionof Radar: 004
 R
 =f1; 2; : : :; 8; DestroyedgA ff
 = A
 E 002 A
 T
 =ffly; destroy;waitg 002 ffly-NOE;fly-normal; waitg006 ff
 = 006
E
 002 006
 T = fclear {033 G
 R
 };nullg 002 fnullgR
 A
 {h030 E
 ; 030
 T ; 030
 R
 i; a} =
 030 E030 TaR
 A0; : : : ;90; : : :; 9:5; Destroyed0010
 0; : : : ; 9Destination001r
 T Destination0; : : :; 9:5; Destroyed001r
 E
 DestinationDestination001r
 E + r
 TR 006
 {s; hnull; nulli} = 0R
 006 {s; h033
 G R
 ; nulli}= 000r
 006
 2 [0; 1]Figure 2: COM-MTDP model ofstates, actions, and rewards for helicopter
scenario. {025 in Figure 4} within the range[0; 1] to generate distinctdomainconfigurations
{0 means
 thatTr ansp ort will never observe the radar's destruction; 1 means Tr ansp
ortwill always
 observe it}.If the observability is 1, then they achieve mutual belief of
the achievementof
 G
 R
 assoon as it occurs {following the argumentpresented in Section 4.1}. However,for
any
 observability less than 1, there is a chance that the agents willnotachieve
mutual belief
 simplybycommon observation. The helicopters receivea fixed reward for each
time step spent at their destination. Thus,for a fixed time horizon, the
earlier thehelicopters reach
 there, the greater theteam's reward. Since flying nap-of-the-earth isslower
than normal
 speed,Tr ansp ort will switch to itsnormal flying as soon as it either observesthat
G
 R
 has been achieved or Escortsends the message, 033
 G R
 . Sending the message isnot free, so we
 impose a variable communication cost {r 006
 in Figure 2}, also withinthe range [0; 1].
 Weconstructed COM-MTDP models of this scenario foreach combination of observabil-
ity and communication cost within therange [0; 1] at 0.1 increments.For each
combination,
 we appliedthe Jennings and STEAM policies, as well as acompletely silent
policy. For this
 domain, the policy, 031
 J
 ff006
 , dictates that Escort always communicate 033 G
 R
 upon destroying the radar. For STEAM, we vary the 034 and C
 c
 parameters with the observability and com-
 munication costparameters, respectively. Weused two different settings {low
and medium}
 forthe cost of miscoordination, C
 mt
 . Following the publishedSTEAM algorithm {Tambe,
 1997},Escort sends message 033 G
 R
 if and only ifSTEAM's inequality 034 001 C mt
 > C
 c , holds.
 Thus, the twodifferent settings, low and medium, for C
 mt generate two distinct communica- tion policies; the high setting isstrictly
dominated by the other two settingsin this domain.
 We also constructed and evaluated locally and globally optimal policies.
In applying each
 ofthese policies, we used our COM-MTDP model to computethe expected reward
received
 by the team when following the selected policy. We can uniquely determine
thisexpected
 reward given the candidatecommunication policy and the particular observability
and com-
 municationcost parameters, as well as the COM-MTDP modelspecified in Figures
2, 3,
 and4.
 412The CommunicativeMultiagent Team Decision Problem 
*  P {h030 E0
 ; 030
 T0
 ; 030
 R0 i ; ha
 E ; a
 T
 i ;h030
 E1
 ; 030 T 1
 ; 030
 R1
 i} =
 P
 E
 {030
 E0
 ; a
 E ; 030
 E1
 }001 P
 T
 {h030
 T 0
 ;030
 R0
 i ; a T
 ; 030
 T1
 } 001 P
 R
 {h030
 E0
 ; 030
 R0 i ; a
 E
 ; 030 R1
 }
 Escort:Initial distribution, Pr{004
 0
 E
 = 0} = 1030 E0a E030
 E1P EDestination001Destination10; : : : ; 8fly030
 E0
 + 110; : : : ; 8destroy030
E0
 + 119flyDestination19destroyDestination1001wait030 E01Transport: Initial
distribution, Pr{004 0
 T
 = 0} = 1030 T 0030 R0a T030
 T 1P TDestination001001Destination1Destroyed001001Destroyed10; : : :
; 9001fly-NOE030
 T 0
 + 0:519:5001fly-NOEDestination10; : : : ; 8:5Destroyedfly-normal030
 T 0 + 119; 9:5Destroyedfly-normalDestination10016=Destroyedfly-normalDestroyed1001001wait030
 T 01Radar: Initialdistribution, 8030 2f1; 2; : : : ; 8g, P r{004
0
 R
 = 030} = 0:125030
 E0030
 R0a
 E030
 R1P
R001030 E0destroyDestroyed10010016= destroy030
 R010016= 030
 E0001030 R01Figure3: COM-MTDP model of transition probabilitiesfor helicopter
scenario {excludes
 zeroprobability rows}.
 413Pynadath & Tambe
 
*  012
 ff = 012
 E
 002012
 T
 { 012 E
 = 004
 E 002 004
 T
 002 012
 RE
 ,where agent Escort's possible observations of the radar
 consistof 012 RE
 = fpresent; destroyed; nullg { 012
 T
 =004
 E
 002 004 T
 002 012
RT
 , where agent Tr ansp ort's possible observations of the radar
 consist of 012 RT
 = fdestroyed; nullg
 
*  O ff
 {s; ha E
 ; a
 T
 i ; h!
 E
 ; !
 T
 i}= O
 E
 {s;ha
 E
 ; a T
 i ; !
 E } 001 O
 T {s; ha
 E ; a
 T
 i ; ! T
 }
 { O E
 {h030
 E
 ; 030
 T ; 030
 R
 i; ha
 E
 ; a T
 i ; h030 E
 ; 030
 T ; !
 RE
 i} =030
 E030
 Ra
 E! REO E001destroyeddestroydestroyed1001destroyed6= destroynull1030
 R1; : : :; 9001present16=030
 R1; : : : ; 9001null1{ O
 T
 {h030
 E
 ;030
 T
 ; 030
 R
 i ; ha
 E
 ; a
 T
 i ; h030
 E ; 030
 T
 ; ! RT
 i} =030
 T030
Ra
 E!
 RTO
 T0; : : : ; 9:5001destroydestroyed025e
 000{030
 R
 000030
 T
 }{1000025}0; : : : ; 9:5001destroynull1 000 025e
 000{030
 R
 000030 T
 }{1000025}0; : : : ; 9:50016=destroynull1destroyed001001null1025 2
[0; 1] Figure 4: COM-MTDP model of observability for helicopter scenario.
Thesetables exclude
 both zero probability rows and input feature columns from whichO is indepen-
 dent. For example, bothagents' observation functions are independent of
the transport's selected action, so neither tableincludes a a
 T
 column. 414The CommunicativeMultiagent Team Decision Problemon paper, in
1/72inch coords'colortogray' and 'mergeprocs' come from xwd2ps
%     via
xgrabFigure 5: Suboptimalityof silent and Jennings policies.on paper, in
1/72inch coords'colortogray' and 'mergeprocs' come from xwd2ps
%     via
xgrabon paper, in 1/72inch coords'colortogray' and 'mergeprocs' come from
xwd2ps
%     via xgrabFigure 6: Suboptimalityof STEAM policy under both lowand
medium costs of miscoordi- nation.
 5.2 ExperimentalResults
 Figures 5 and 6 plot how much utility the team can expect to lose by following
the Jennings,
 silent,and the two STEAM policies instead of thelocally optimal communication
policy {thus, higher values meanworse performance}. We can immediatelysee
that the Jennings
 and silent policies are significantly suboptimal for manypossible domain
configurations. For example, not surprisingly, the surfacefor the policy,
031
 J
 ff006
 , peaks{i.e., it does most poorly}
 when the communication cost is high and when theobservability is high, while
the silent policy does poorly under exactly theopposite conditions.
 Previously publishedresults {Jennings, 1995} demonstrated that the Jennings
policy
 led to better team performance by reducing waste of effort produced byalternate
policies
 like our silentone. These earlier results focused on a single domain, and
Figure 5 partially
 confirmstheir conclusion and shows that the superiority of the Jennings
policy over the silent policy extends over a broadrange of possible domain
configurations.On the other
 hand, our COM-MTDP resultsalso show that there is a significant subclass
of domains {e.g.,
 when communicationcost and observability are high} where theJennings policy
is actually
 inferiorto the silent policy. Thus,with our COM-MTDP model, we can characterizethe
 types of domains where the Jennings policy outperforms the silent policy
andvice versa.
 415Pynadath & Tambe
 Figure 6 shows the expected value lost by following the two STEAM policies.
We can
 view STEAM astrying to intelligently interpolate between the Jennings and
silent policies based on the particular domain properties.In fact, under
a low setting forC
 mt
 , we see two thresholds, one along each dimension,at which STEAM switches
between following the
 Jennings and silent policies, andits suboptimality is highest at these thresholds.Under
 a medium setting forC
 mt
 , STEAM does notexhibit a threshold along the dimension of communication
cost, dueto the increased costof miscoordination. Under both settings, STEAM's
performance generally follows the better of those two fixed policies, so
itsmaxi-
 mum suboptimality {0.587underboth settings} is significantly lower than
that of the silent
 {0.700}andJennings' {1.000} policies. Furthermore, STEAM outperforms the
two policies
 on average, across the space ofdomain configurations, as evidenced by its
meansubopti-
 mality of 0.063 underlow C
 mt
 and 0.083 undermedium C
 mt
 . Both values are significantly lower than the silent policy's meanof 0.160
and the Jennings' policy's mean of 0.161.Thus,
 we have been able toquantify the savings provided by STEAM over less selective
policies
 withinthis example domain.
 However,within a given domain configuration, STEAMmust either always or
never
 communicate, andthis inflexibility leads tosignificant suboptimality across
a wide range of domain configurations. On the otherhand, Figure 6 also shows
that there are domain configurations where STEAM is locallyoptimal. In this
relatively small-scale experimental
 testbed, there is no need toincurSTEAM's suboptimality, because theagents
can compute
 the superior locallyoptimal policy in under 5 seconds. Inlarger-scale domains,
on the other
 hand, the increased complexity of the locallyoptimal policies may render
its execution infeasible. In such domains, STEAM's constant-time execution
would potentially make it a preferable alternative. Thisanalysis suggests
a possible spectrum ofalgorithms that make
 different optimality-efficiency tradeoffs.
 Tounderstand the cause of STEAM's suboptimality, we can examine its performance
more deeply in Figures 7 and 8, which plotthe expected number of messages
sent using STEAM {with both low andmedium C
 mt
 } vs. the locally optimal policy, at observability
 v aluesof 0.3 and 0.7. STEAM's expected number of messages is either 0 or
1, so STEAM can make at most two {instantaneous}transitions between them:
one threshold value each
 along the observability and communication cost dimensions. From Figures
7 and 8, we see thatthe optimal policy can be more flexible thanSTEAM
 by specifying communication contingent on Escort's beliefs beyond simply
the achievement
 ofG
 R
 . Forexample, consider the messages sent underlow C
 mt
 in Figure 7,where STEAM
 matches the locally optimalpolicy at the extremes of the communication costdimension.
 Even if the communication cost is high, it is still worth sending message033
 G
 R
in states where
 Tr ansp ortis still very far from the destination.Thus, the surface for
the optimal policy,
 makes a more gradual transition fromalways communicating to never communicating.We
 can thus view STEAM's surface as a crude approximation to the optimal surface,
subject
 to STEAM's fewer degrees of freedom. We can also use Figures 7 and 8 to
identify the domain conditions under which joint
 intentions theory's prescriptionofattaining mutual belief is or is not optimal.In
particular,
 for any domain where theobservability is less than 1, the agentswillnot
attain mutual belief
 without communication. In both Figures 7and 8, there are many domain configurations
where the locally optimal policy is expected to send fewer than 1 033 G
 R
 message.Each of
 416The CommunicativeMultiagent Team Decision Problemon paper, in 1/72inch
coords'colortogray' and 'mergeprocs' come from xwd2ps
%     via xgrabon paper,
in 1/72inch coords'colortogray' and 'mergeprocs' come from xwd2ps
%     via
xgrabFigure 7: Expectednumber of messages sent by STEAM and locally optimal
policies when
 theobservability is 0.3.
on paper, in 1/72inch coords'colortogray' and 'mergeprocs' come from xwd2ps
%     via xgrabon paper, in 1/72inch coords'colortogray' and 'mergeprocs'
come from xwd2ps
%     via xgrabFigure 8: Expectednumber of messages sent
by STEAM and locally optimal policies when
 theobservability is 0.7. Underboth settings, STEAM sends 0 messages. 417Pynadath
& Tambe
on paper, in 1/72inch coords'colortogray' and 'mergeprocs' come from xwd2ps
%     via xgrabFigure 9: Suboptimalityof locally optimal policy.
 these configurations represents a domain wherethe locally optimal policy
will not attain mutual belief in at least one case.Therefore, attaining mutual
belief is suboptimal in those
 configurations! These experiments illustrate that STEAM,despite its decision-theoretic
communication selectivity, may communicate suboptimally under a significant
class of domainconfigura-
 tions. Previous work onSTEAM-based, real-world, agent-team implementationsinformally
 noted suboptimality in anisolated configuration within a more realistichelicopter
trans-
 port domain {Tambe, 1997}. Unfortunately,this previous work treated that
suboptimality {where the agents communicated more thannecessary} as an isolated
aberration, so there wasno investigation of the degree ofsuch suboptimality,
nor of the conditions under which
 such suboptimality mayoccur in practice. We re-created theseconditions within
the experi-
 mental testbed of this section by using a medium C
 mt
 .The resulting experiments {as shown in Figure 7} illustrated that the observedsuboptimality
was not an isolated phenomenon, but, in fact, that STEAM has a general propensity
towards extraneous communication in situations involving low observability
{i.e., low likelihood of mutual belief } and high com- munication costs.
This result matchesthe situation where the \aberration" occurred inthe
 more realistic domain.
 The locally optimal policy is itself suboptimal with respect to the globally
optimal policy, as we can see from Figure9. Under domain configurations with
high observability ,
 the globally optimal policyhas the escort wait an additional time step afterdestroying
 the radar and then communicateonly if the transport continues flyingnap-of-the-earth.
 The escort cannot directlyobserve which method of flight the transport has
chosen, but
 it can measure the change in the transport's position {since it maintains
a history of its
 past observations} and thus infer the method offlight with complete accuracy.
Ina sense,
 the escort following theglobally optimal policy is performingplan re c o
gnition to analyze the transport's possible beliefs.It is particularly noteworthy
that our domainspecification
 does not explicitly encodethis recognition capability. Infact, our algorithm
for finding the globally optimal policy does not evenmake any of the assumptions
made by our locally
 observable policy {i.e., singleagent is deciding whether to communicate
or not,regarding
 a single message, at a single point in time}; rather, our general-purpose
search algorithm
 traverses the policy spaceand \discovers" this possible means ofinference
on its own. We
 418The CommunicativeMultiagent Team Decision Problem expect that such COM-MTDP
analysis can provide an automatic method for discovering novel communication
policies of this type in other domains, even those modelingreal-world
 problems.
 Indeed, byexploiting this discovery capability within ourexample domain,
the globally
 optimal policygains a slight advantage in expectedutility over the locally
optimal policy,
 with a mean difference of 0.011,standarddeviation of 0.027, and maximum
of0.120. On the
 other hand, ourdomain-independent code never requires morethan 5 seconds
to compute
 the locallyoptimal policy in this testbed, whileourdomain-independentsearch
algorithm always required more than 150minutes to find the globally optimal
policy. Thus, through
 Theorem7, we have used the COM-MTDP model toconstruct a communication policy
 that, for this testbed domain, performs almostoptimally and outperforms
existing team- work theories, witha substantial computational savings over
finding the globally optimal policy. Although these results holdfor an isolated
communication decision, we expect the
 relative performance of the policies to stay the same even with multipledecisions,
where the
 inflexibilityof the suboptimalpolicies will only exacerbatetheir losses
{i.e., the shapes of the graphs would stay roughly the same,but the suboptimality
magnitudes would increase}. 6. Summary
 The COM-MTDP model is a novel framework that complements existing teamwork
research by providing the previouslylackingcapability to analyze the optimality
and complexity of
 team decisions. Whilegrounded within economic team theory,the COM-MTDP's
exten-
 sions to include communication and dynamism allow it to subsume manyexisting
multiagent
 models.We were able to exploit the COM-MTDP'sability to represent broad
classes of multiagent team domains to derive complexity results for optimal
agent teamwork under arbitrary problem domains. Wealso used the model to
identify domain properties that can
 simplify that complexity.
 The COM-MTDP framework provides ageneral methodology for analysis across
both general domain subclasses and specificdomain instantiations. As demonstrated
in Section4,
 we can express important existingteamwork theories within a COM-MTDP frameworkand
 derive broadly applicable theoreticalresults about their optimality.Section
5 demonstrates
 our methodology forthe analysis of a specific domain.By encoding a teamwork
problem as a COM-MTDP, we can use the leverage of our general-purpose software
tools{available in
 OnlineAppendix 1}to evaluate the optimality of teamworkbased on potentially
any other existing theory, as demonstrated in thispaper using two leading
instantiations ofjoint
 intentions theory. In combining both theory and practice, we can use the
theoretical results
 derived using the COM-MTDP framework as thebasis for new algorithms to extend
our software tools, just as we did intranslating Theorem 7 from Section 4
into animplemented
 algorithmfor locally optimalcommunication in Section 5. Weexpect that the
COM-MTDP
 framework, thetheorems and complexity results, and the reusablesoftware
willform a basis
 forfurther analysis of teamwork, both byourselves and others in the field.
419Pynadath & Tambe
 7. Future Workfor COM-MTDP Team Analysis
While our initial COM-MTDP results are promising,there remain at least three
key areas where future progress in COM-MTDPs is critical.First, analysis
using COM-MTDPs {such as the one presented in Section 5} requires knowledge
of the rewards, transition probabil- ities, and observation probabilities,
as well as of the competing policies governingagent
 behavior. It may not always be possible to have such a model of the domain
and agents'
 policiesreadily available. Indeed, other proposedteam-analysis techniques
{Nair, Tambe, Marsella, & Raines, 2002b; Raines, Tambe, & Marsella, 2000},
do not require apriori hand-
 coding of such models,but rather acquire them automatically through machinelearning
 overlarge numbers ofruns. Also, in the interests of combatingcomputational
complexity
 and improvedunderstandability, some researchers emphasizethe need for multiple
models
 at multiple levels of abstraction, rather thanfocusing on a single model
{Nair et al.,2002b}.
 For instance, one level ofthe model may focus on the analysis of theindividual
agents' ac-
 tions in support of a team, whileanother level may focus on interactions
among subteams
 of a team.We can potentially extend the COM-MTDP model in both of these
directions
 {i.e., machine learning of model parameters,and hierarchical representations
of the team to provide multiple levels of abstraction}. Second, it is important
to extend COM-MTDPanalysis to other aspects of teamwork beyondcommunication.
Forinstance, team formation {where agents may beassigned spe-
 cific roles within the team} and reformation {where failure of individualagentsleads
to role
 reassignmentwithin in the team} are key problems in teamwork that appear
suitable for
 COM-MTDPanalysis. Such analysis may require extensionsto the COM-MTDP frame-
 work {e.g.,explicit modeling of roles}.Ongoing research {Nair, Tambe,& Marsella,
2002a}
 has begun investigating the impact of such extensions and theirapplications
in domains
 such as RoboCupRescue {Kitano, Tadokoro, Noda, Matsubara,Tak ahashi, Shinjoh,
& Shi-
 mada, 1999}. Analysis of more complex team behaviors may require further
extensions to the COM-MTDP model to explicitly accountfor additional aspects
of teamwork {e.g., notions of authority structure within teams}. Third, extending
COM-MTDP analysis beyondteamwork to model other types of co- ordination may
require relaxation of COM-MTDP's assumption of selfless agents receiving
the same joint reward. Morecomplex organizations may require modeling othernon-joint
 rewards. Indeed,enrichingthe COM-MTDP model in this manner may enable analy-
 sis of some of theseminal work in multiagent coordination inthe tradition
of PGP and
 GPGP {Decker & Lesser, 1995; Durfee & Lesser, 1991}.Such enriched models
may first require new advances in the mathematicalfoundations of our COM-MTDP
framework, and ultimately contribute towards the emergingsciences of agents
and multiagent systems. Acknowledgments
 Thisarticle is a significantly extended version ofa paper, \Multiagent
Teamwork:Analyzing
 the Optimality and Complexityof Key Theories and Models", by the sameauthors,
in the
 Pro c e e dingsof the International Joint Conferenc eon Autonomous Agents
and Multi-Agent
 Systems, 2002. Thisarticle extends the initialcontentby providing proofs
missing in the original paper, as well as new theoretical results, a detailed
description of our experimental
 420The CommunicativeMultiagent Team Decision Problem setup, new experimental
results, andadditional discussion and explanations of key points.
 This research was supported by DARPA award No. F30602-98-2-0108,under the
Control
 of Agent BasedSystems program, and managed by AFRL/Rome ResearchSite. We
would
 liketo thank Daniel Bernstein, Ashish Goel, DanielMarcu, Stacy Marsella,
Ranjit Nair,
 and Paul Rosenbloom for valuablediscussion and feedback. We also thankthe
anonymous
 reviewers for their helpfulcomments and suggestions.
 References Bernstein, D. S., Zilberstein, S., &Immerman, N. {2000}. The
complexity of decentralized
 control of Markov decisionprocesses. In Pro c e e dingsof the Conferenc
e on Uncertainty in Artificial Intel ligence, pp. 32{37.
 Boutilier, C. {1996}.Planning, learning and coordination in multiagent decision
processes.
 In Pro c e e dings of the Conferenc eon Theor etic al Aspe ctsof Rationality
and Know ledge,
 pp. 195{210.
 Boutilier,C., Dean, T., & Hanks, S. {1999}.Decision-theoretic planning:
Structural as- sumptions and computational leverage.Journal of Artificial
Intel ligence Rese ar ch,
 11, 1{93.
 Cohen, P. R., & Levesque, H. J. {1991a}. Confirmationand joint action. In
Pro c e edings of
 the International Joint Conferenc e on Artificial Intel ligence.
 Cohen, P. R., & Levesque, H. J. {1991b}. Teamwork.Nous, 25 {4}, 487{512.
Decker,K., & Lesser, V. {1995}.Designing a family of coordination algorithms.In
Pro c e e d-
 ingsof the International Conferenc e on Multi-Agent Systems.
 Dunin-K eplicz,B., & Verbrugge, R. {1996}. Collectivecommitments. In International
 Conferenc e on Multi-AgentSystems, pp. 56{63.
 Durfee,E., & Lesser, V. {1991}. Partialglobal planning: a coordination framework
for distributed planning. IEEE transactions on Systems, Man and Cybernetics,
21 {5}.
 Goldberg, D., & Mataric,M. J. {1997}. Interference as a tool for designing
and evaluat-
 ingmulti-robot controllers. In Pro c e e dings of the National Conferenc
e on Artificial
 Intelligence, pp. 637{642.
 Grosz, B.{1996}. Collaborating systems. ArtificialIntel ligence Magazine,
17{2}, 67{85.
 Grosz, B., & Kraus, S.{1996}. Collaborative plans for complex group actions.
Artificial
 Intelligence, 86, 269{358. Grosz, B. J., & Sidner, C. L. {1990}.Plans for
discourse. In Cohen, P. R., Morgan,
 J., & Pollack, M.E. {Eds.}, Intentions in Communication, pp.417{444. MITPress,
 Cambridge, MA. Ho, Y.-C. {1980}. Teamdecision theory and information structures.Pro
c e e dings of the IEEE, 68 {6}, 644{654. Jennings, N. {1995}. Controllingcooperative
problem solving in industrial multi-agent
 systems using joint intentions. Artificial Intel ligence, 75, 195{240.
 421Pynadath & Tambe
 Kitano, H., Tadokoro,S., Noda, I., Matsubara, H., Tak ahashi,T., Shinjoh,
A., & Shimada,
 S. {1999}.Robocuprescue: Search and rescue forlarge-scale disasters as a
domain for multiagent research. InPro c e e dings of the IEEE InternationalConferenc
e on Systems,
 Manand Cybernetics.
 Levesque,H. J., Cohen, P. R., & Nunes, J. {1990}.On acting together. In
Pro c e e dings of
 the National Conferenc e on Artificial Intel ligence.
 Marschak, J., & Radner, R.{1971}. The Economic Theory of Te ams. YaleUniversity
Press,
 New Haven, CT. Nair, R., Tambe, M., & Marsella, S.{2002a}. Team formation
for reformation formultia-
 gent domains like robocuprescue. In Pro c e e dingsof the International
Symposium on Rob oCup.
 Nair, R., Tambe, M., Marsella, S., & Raines, T.{2002b}. Automated assistants
for analyzing team behaviors. Journalof Autonomous Agents and Multiagent
Systems, to appear.
 Papadimitriou, C. H., &Tsitsiklis, J. N. {1987}. The complexity ofMarkov
decision pro-
 cesses.Mathematics of Oper ation Rese ar ch, 12 {3}, 441{450. Peshkin, L.,
Kim, K.-E., Meuleau, N., &Kaelbling, L. P. {2000}. Learningto cooperate via
 policysearch. In Pro c e e dingsof the Conferenc e on Uncertaintyin Artificial
Intel li-
 gence, pp. 489{496.
 Pynadath,D. V., & Tambe, M. {2002}.An automated teamwork infrastructure
for het- erogeneous software agents and humans.Journal of Autonomous Agentsand
Multi-
 Agent Systems: Spe cial Issue on Infrastructur eand Re quir ements for Building
Re-
 se ar ch Grade Multi-Agent Systems, to appear.
 Pynadath, D. V., Tambe, M., Chauvat, N., & Cavedon, L. {1999}.Toward team-oriented
 programming.In Jennings, N. R., & Lesperance, Y. {Eds.}, Intel ligentAgents
VI:
 Agent Theories, Archite ctur es and Languages, pp. 233{247. Springer-Verlag.
 Raines, T., Tambe,M., & Marsella, S. {2000}. Automatedagents that help humans
under-
 stand team behaviors. In Pro c e e dings of the International Conferenc
e on Autonomous
 Agents.
 Rich, C., & Sidner, C. {1997}.COLLAGEN: When agents collaborate with people.
In
 Pro c e e dingsof the International Conferenc e on Autonomous Agents.
 Smallwood,R. D., & Sondik, E. J. {1973}. Theoptimal control of partially
observable Markov processes over a finitehorizon. Oper ations Rese ar ch,
21, 1071{1088.
 Smith,I. A., & Cohen, P. R. {1996}.Toward a semantics for an agentcommunications
 language based on speech-acts. In Pro c e e dingsof the National Conferenc
e on Artificial Intel ligence, pp. 24{31. Sonenberg, E., Tidhar, G., Werner,
E., Kinny, D., Ljungberg, M., & Rao, A. {1994}. Planned
 team activity. Tech. rep. 26, Australian AI Institute. Tambe, M. {1997}.Towards
flexible teamwork.Journal of Artificial Intel ligence Rese ar ch,
 7, 83{124.
 422The CommunicativeMultiagent Team Decision Problem Tambe, M., Pynadath,
D. V., Chauvat, N., Das, A., & Kaminka, G. A.{2000}. Adaptive
 agent integrationarchitectures for heterogeneous team members.In Pro c e
e dings of the International Conferenc eon Multi-Agent Systems, pp. 301{308.
Tambe, M., & Zhang, W. {1998}.Towards flexible teamwork in persistent teams.
In Pro- c e e dings of the International Conferenc e on Multi-Agent Systems,
pp. 277{284.
 Tidhar, G. {1993}.Team-oriented programming: Preliminary report.Tech. rep.
41, Aus-
 tralianArtificial Intelligence Institute.
 Xuan, P., Lesser, V., & Zilberstein, S.{2001}. Communication decisions in
multi-agent
 cooperation: Model and experiments. In Pro c e e dingsof the International
Conferenc e on Autonomous Agents,pp. 616{623.
 Yen, J., Yin, J., Ioerger, T. R., Miller, M. S., Xu, D., & Volz, R. A. {2001}.
CAST:
 Collaborative agents for simulating teamwork.In Pro c e e dings of theInternational
 Joint Conferenc eon Artificial Intel ligence,pp. 1135{1142.
 Yoshik awa,T. {1978}. Decomposition of dynamic teamdecision problems. IEEE
Tr ansac- tions on Automatic Control, AC-23 {4}, 627{632. 423