Abstract:Causal models defined in terms of a collection of equations, as defined by Pearl, are axiomatized here. Axiomatizations are provided for three successively more general classes of causal models: (1) the class of recursive theories (those without feedback), (2) the class of theories where the solutions to the equations are unique, (3) arbitrary theories (where the equations may not have solutions and, if they do, they are not necessarily unique). It is shown that to reason about causality in the most general third class, we must extend the language used by Galles and Pearl. In addition, the complexity of the decision procedures is characterized for all the languages and classes of models considered.
Extracted Text
Journal of Artificial Intelligence Research 12 (2000) 317-337 Submitted 6/99;
published 5/00
Axiomatizing Causal Reasoning Joseph Y. Halpern halpern@cs.cornell.edu Cornell
University, Computer Science Department Ithaca, NY 14853 http://www.cs.cornell.edu/home/halpern
Abstract Causal models defined in terms of a collection of equations, as
defined by Pearl, are axiomatized here. Axiomatizations are provided for
three successively more general classes of causal models: (1) the class of
recursive theories (those without feedback), (2) the class of theories where
the solutions to the equations are unique, (3) arbitrary theories (where
the equations may not have solutions and, if they do, they are not necessarily
unique). It is shown that to reason about causality in the most general third
class, we must extend the language used by Galles and Pearl (1997, 1998).
In addition, the complexity of the decision procedures is characterized for
all the languages and classes of models considered.
1. Introduction The important role of causal reasoning--in prediction, explanation,
and counterfactual reasoning--has been argued eloquently in a number of recent
papers and books (Chajewska & Halpern, 1997; Heckerman & Shachter, 1995;
Henrion & Druzdzel, 1990; Druzdzel & Simon, 1993; Pearl, 1995; Pearl & Verma,
1991; Spirtes, Glymour, & Scheines, 1993). If we are to reason about causality,
then it is certainly useful to find axioms that characterize such reasoning.
The way we go about axiomatizing causal reasoning depends on two critical
factors:
ffl how we model causality, and ffl the language that we use to reason about
it.
In this paper, I consider one approach to modeling causality, using structural
equations. The use of structural equations as a model for causality is standard
in the social sciences, and seems to go back to the work of Sewall Wright
in the 1920s (see (Goldberger, 1972) for a discussion); the particular framework
that I use here is due to Pearl (1995). Galles and Pearl (1997) introduce
some axioms for causal reasoning in this framework; they also provide a complete
axiomatic characterization of reasoning about causality in this framework,
under the strong assumption that there is a fixed, given causal ordering
OE of the equations (Galles & Pearl, 1998). Roughly speaking, this means
there is a way of ordering the variables that appear in the equations and
we have explicit axioms that say Xj has no influence of Xi if Xi OE Xj in
this causal ordering.
In this paper, I extend the results of Galles and Pearl by providing a complete
axiomatic characterization for three increasingly general classes of causal
models (defined by structural equations):
cfl2000 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.
Halpern 1. the class of recursive theories (those without feedback--this
generalizes the situation
considered by Galles and Pearl (1998), since every fixed causal ordering
of the variables gives rise to a recursive theory),
2. the class of theories where the solutions to the equations are unique,
3. arbitrary theories (where the equations may not have solutions and, if
they do, they
are not necessarily unique).
In the process, I clarify some problems in the Galles-Pearl completeness
proof that arise from the lack of propositional connectives (particularly
disjunction) in the language they consider and, more generally, highlight
the role of the language in reasoning about causality. I also characterize
the complexity of the decision problem for all these languages and classes
of models.
The rest of the paper is organized as follows. In Section 2, I give the syntax
and semantics of the languages I will be considering and review the definition
of modifiable causal models. In Section 3, I present the complete axiomatizations.
In Section 4 I consider the complexity of the decision procedure. I conclude
in Section 5.
2. Syntax and Semantics An axiomatization is given with respect to a particular
language and a class of models, so we need to make both precise. Both the
language and models I use are based on those considered by Galles and Pearl
(1997, 1998). To make comparisons easier, I use their notation as much as
possible. I start with the semantic model, since it motivates some of the
choices in the syntax, then give the syntax, and finally define the semantics
of formulas.
2.1 Causal Models The basic picture here is that we are interested in the
values of random variables, some of which have a causal effect on others.
This effect is modeled by a set of structural equations.
In practice, it seems useful to split the random variables into two sets,
the exogenous variables, whose values are determined by factors outside the
model, and the endogenous variables. It is these endogenous variables whose
values are described by the structural equations.
More formally, a signature S is a tuple (U ; V; Rg, where U is a finite set
of exogenous variables, V is a finite set of endogenous variables, and R
associates with every variable Y 2 U [ V a nonempty set R(Y ) of possible
values for Y (the range of possible values of Y ). Unless explicitly noted
otherwise, I assume that R(Y ) is a finite set for each Y 2 U [ V and jR(Y
)j * 2. The assumption that U and V are finite is relatively innocuous; as
we shall see, the assumption that R(Y ) is finite has more of an impact on
the axioms. The assumption that jR(Y )j * 2 allows us to ignore the trivial
situation where jR(Y )j = 1. If jR(Y )j = 1, we can just remove the variable
Y from the signature without loss of expressiveness.
A causal model over signature S is a tuple T = (S; F ) where F associates
with each variable X 2 V a function denoted FX such that FX : (Theta U2U
R(U )) Theta (Theta Y 2VGamma fXgR(Y )) ! R(X). FX tells us the value
of X given the values of all the other variables in U [ V. We think of the
functions FX as defining a set of (modifiable) structural equations, relating
the
318
Axiomatizing Causal Structures values of the variables. Because FX is a function,
there is a unique value of X once we have set all the other variables. Notice
we have such functions only for the endogenous variables. The exogenous variables
are taken as given; it is their effect on the endogenous variables (and the
effect of the endogenous variables on each other) that we are modeling with
the structural equations.
Given a causal model T = (S; F ) over signature S, a (possibly empty) vector
~X of variables in V, and vectors ~x and ~u of values for the variables in
~X and U , respectively, we can define a new causal model denoted T ~X~x(~u)
over the signature S ~X = (;; VGamma ~X ; RjVGamma ~X).1
Intuitively, this is the causal model that results when the variables in
~X are set to ~x and the variables in U are set to ~u. Formally, T ~X~x(~u)
= (S ~X ; F ~X~x;~ug), where F ~X~x;~uY is obtained from FY by setting the
values of the variables in ~X to ~x and the values of the variables in U
to ~u. The causal model T ~X~x(~u) is called a submodel of T by Pearl (1999).
It can describe a possible counterfactual situation; that is, even though,
under normal circumstances, setting the exogenous variables to ~u may result
in the variables ~X having values ~x0 6= ~x, this submodel describes what
happens if they are set to ~x due to some "external action", the cause of
which is not modeled explicitly. For example, to determine if the manufacturer
is at fault in an accident that involved a poorly maintained car, we may
want to consider what would have happened had the car been well maintained.
If there is a random variable in the signature that describes how well maintained
the car is, then this means examining the submodel where that random variable
is set to 1 (the car is well maintained). It is this ability to examine counterfactual
situations that makes causal structures a useful tool for reasoning about
causality.
Notice that, in general, there may not be a unique vector of values that
simultaneously satisfies the equations in T ~X~x(~u); indeed, there may not
be a solution at all. One special case where there is guaranteed to be such
a unique solution is if there is some total ordering OE of the variables
in V such that if X OE Y , then FX is independent of the value of Y ; i.e.,
FX(: : : ; y; : : :) = FX (: : : ; y0; : : :) for all y; y0 2 R(Y ). In this
case, the causal model is said to be recursive or acyclic. Intuitively, if
the theory is recursive, there is no feedback. If X OE Y , then the value
of X may affect the value of Y , but the value of Y has no effect on the
value of X.
It should be clear that if T is a recursive theory, then there is always
a unique solution to the equations in T ~X~x(~u), for all ~X, ~x, and ~u.
(We simply solve for the variables in the order given by OE.) On the other
hand, as the following example shows, it is not hard to construct nonrecursive
theories for which there is always a unique solution to the equations that
arise.
Example 2.1: Let S = (;; fX; Y g; Rg), where R(X) = R(Y ) = fGamma 1; 0;
1g, and let T = (S; F ), where FX is characterized by the equation X = Y
and FY is characterized by the equation Y = Gamma X (that is, FX (y) = y
and FY (x) = Gamma x). Clearly T is not recursive; the value of X depends
on the value of Y and the value of Y depends on that of X. Nevertheless,
it is easy to see that T has the unique solution X = 0; Y = 0, TXx has the
unique solution Y = Gamma x, and TY y has the unique solution X = y.
1. I am implicitly identifying the vector ~X with the subset of V consisting
of the variables in ~X. I do this
throughout the paper. RjVGamma ~X is the restriction of R to the variables
in V Gamma ~X.
319
Halpern In this paper, I consider three successively more general classes
of causal models for a given signature S = (U ; V; R).
1. Trec(S): the class of recursive causal models over signature S, 2. Tuniq(S):
the class of causal models T over S where for all ~X ` V, ~x, and ~u, the
equations in T ~X~x(~u) have a unique solution,
3. T (S): the class of all causal models over S. I often omit the signature
S when it is clear from context or irrelevant, but the reader should bear
in mind its important role.
Why should we be interested in causal models that do not possess unique solutions?
Are there real causal systems that do not possess unique solutions? The issue
of whether nonrecursive system can be given a causal interpretation is discussed
at some length by Strotz and Wold (1960). They argue that there are reasonable
ways of interpreting causal interpretations where the answer is yes. Under
these interpretations, there may well be more than one solution to the equations.
Perhaps the best way to view such equations is to think of the variables
in V as being mutually interdependent; changing any one of them may cause
a change in the others. (Think of demand and supply in economics or populations
of rabbits and wolves.) The solutions to the equations then represent equilibrium
situations. If there is more than one equilibrium, there will be more than
one solution to the equations. Of course, if there are no equilibria, then
there will be no solutions to the equations.
A related way of thinking about these equations is that they represent atemporal
versions of temporal causal equations. That is, suppose that we replace every
variable Y 2 U [ V by a family of variables Y0; Y1; Y2; : : :, where, intuitively,
Yt represents the value of Y at time t. Each equation fX 2 F is then replaced
by a family of equations fXt, where fXt depends only on exogenous variables
Ut0 with t0 ^ t and endogenous variables Yt0 with t0 ! t. This gives us a
recursive system. The values of Xt under some setting of the variables with
subscript 0 represents the evolution of X under that setting of the variables.
If Xt eventually stabilizes, then we might expect the equilibrium value to
be the value of X in some solution to the original set of equations. If Xt
stabilizes, then there would not be a solution to the original set of equations.
2.2 Syntax I focus here on two languages. Both languages are parameterized
by a signature S. The first language, L+(S), borrows ideas from dynamic logic
(Harel, 1979). Again, I often write L+ rather than L+(S) (and similarly for
the other languages defined below) to simplify the notation. A basic causal
formula is one of the form [Y1 y1; : : : ; Yk yk]', where ' is a Boolean
combination of formulas of the form X(~u) = x, Y1; : : : ; Yk; X are variables
in V, Y1; : : : ; Yk are distinct, x 2 R(X), and ~u is a vector of values
for all the variables in U . I typically abbreviate such a formula as [~Y
~y]'. The special case where k = 0 (which is allowed) is abbreviated as [true]'.
[~Y ~y]X(~u) = x can be interpreted as "in all possible solutions to the
structural equations obtained after setting Yi to yi, i = 1; : : : ; k, and
the exogenous variables to ~u, random variable X has value x". As we shall
see, this formula is true in a causal model T if in all solutions to the
equations in T~Y ~y(~u), the random
320
Axiomatizing Causal Structures variable X has value x. Note that this formula
is trivially true if there are no solutions to the structural equations.
A causal formula is a Boolean combination of basic causal formulas.
Just as with dynamic logic, we can also define the formula h~Y ~yi(X(~u)
= x) to be an abbreviation of :[~Y ~y]:(X(~u) = x). h~Y ~yi(X(~u) = x)
is the dual of [~Y ~y](X(~u) = x); it is true if, in some solution to the
structural equations obtained after setting Yi to yi, i = 1; : : : ; k, and
the exogenous variables to ~u, random variable X has value x. Taking true(~u)
to be an abbreviation for X(~u) = x . X(~u) 6= x for some variable X and
x 2 R(X), and taking false(~u) to be an abbreviation for :true(~u), we have
that h~Y ~yitrue(~u) is true if there is some solution to the equations
obtained by setting Yi to yi, i = 1; : : : ; k, and the variables in U to
~u (since [~Y ~y]false(~u) says that in every solution to the equations
obtained by setting Yi to yi and U to ~u, the formula false(~u) is true,
and thus holds exactly if the equations have no solution).
Let Luniq(S) be the sublanguage of L+(S) which consists of Boolean combinations
of formulas of the form [~Y ~y]X(~u) = x. Thus, the difference between Luniq
and L+ is that in Luniq, only X(~u) = x is allowed after [~Y ~y], while
in L+, arbitrary Boolean combinations of formulas of the form X(~u) = x are
allowed. As we shall see, for reasoning about causality in Tuniq, the language
Luniq is adequate, since it is equivalent in expressive power to L+. However,
this is no longer the case when reasoning about causality in T .
Following Galles and Pearl's notation, I often write [~Y ~y]X(~u) = x as
X~Y ~y(~u) = x.
If ~Y is clear from context or irrelevant, I further abbreviate this as X~y(~u)
= x. (This is actually the notation used by Galles and Pearl.) Let LGP(S)
be the sublanguage of Luniq(S) consisting of just conjunctions of formulas
of the form X~y(~u) = x. In particular, it does not contain disjunctions
or negations of such formulas. Although Galles and Pearl (1998) are not explicit
about the language they are using, it seems to be LGP.2
2.3 Semantics A formula in L+(S) is true or false in a causal model in T
(S). As usual, we write T j= ' if the causal formula ' is true in causal
model T . For a basic causal formula, we have T j= [~Y ~y](X(~u) = x) if
in all solutions to T~Y ~y(~u) (i.e., in all vectors of values for the
variables in V Gamma ~Y that simultaneously satisfy all the equations F
~Y ~yZ , for Z 2 V Gamma ~Y ), the variable X has value x. We define the
truth value of arbitrary causal formulas, which are just Boolean combinations
of basic causal formulas, in the obvious way:
ffl T j= '1 ^ '2 if T j= '1 and T j= '2 ffl T j= :' if T 6j= '.
As usual, we say that a formula ' is valid with respect to a class T 0 of
causal models if T j= ' for all T 2 T 0.
I can now make precise the earlier claim that in Tuniq (and hence Trec),
the language Luniq is just as expressive as the full language L+.
Lemma 2.2: The following formulas are valid in Tuniq:
2. This was confirmed by Judea Pearl [private communication, 1997].
321
Halpern (a) Tuniq j= [~Y ~y](' . ) , [~Y ~y]' . [~Y ~y],
(b) Tuniq j= [~Y ~y](' ^ ) , [~Y ~y]' ^ [~Y ~y], (c) Tuniq j= [~Y ~y]:'
, :[~Y ~y]'. Hence, in Tuniq, every formula in L+ is equivalent to a formula
in Luniq. Proof: Straightforward; left to the reader.
Note that it follows from these equivalences that in Tuniq, [~Y ~y]' is
equivalent to h~Y ~yi'. It is also worth noting that Lemma 2.2(b) holds
in arbitrary causal models in T , not just in Tuniq. However, parts (a) and
(c) do not, as the following example shows.
Example 2.3: Let S = (;; fX; Y g; R), where R(X) = R(Y ) = f0; 1g; let T
= (S; F), where FX is characterized by the equation X = Y and FY is characterized
by the equation Y = X. Clearly T =2 Tuniq; both (0; 0) and (1; 1) are solutions
to T . It is easy to see that T j= [true](X = 0 . X = 1) ^ :[true](X = 0)
^ :[true](X = 1), showing that part (a) of Lemma 2.2 does not hold in T ,
and that T j= :[true](X = 1) ^ :[true]:(X = 1), showing that part (c) does
not hold either.
3. Complete Axiomatizations I briefly recall some standard definitions from
logic. An axiom system AX consists of a collection of axioms and inference
rules. An axiom is a formula (in some predetermined language L), and an inference
rule has the form "from '1; : : : ; 'k infer ," where '1; : : : ; 'k; are
formulas in L. A proof in AX consists of a sequence of formulas in L, each
of which is either an axiom in AX or follows by an application of an inference
rule. A proof is said to be a proof of the formula ' if the last formula
in the proof is '. We say ' is provable in AX, and write AX ` ', if there
is a proof of ' in AX; similarly, we say that ' is consistent with AX if
:' is not provable in AX.
An axiom system AX is said to be sound for a language L with respect to a
class T 0 of causal models if every formula in L provable in AX is valid
with respect to T 0. AX is complete for L with respect to T 0 if every formula
in L that is valid with respect to T 0 is provable in AX.
We now want to find axioms that characterize the classes of causal models
in which we are interested, namely Trec, Tuniq, and T . To deal with Trec,
it is helpful to define Y ; Z, read "Y affects Z", as an abbreviation for
the formula
. ~X`V;~x2Theta X
2VR(X);y2R(y);~u2Theta U2UR(U);z6=z02R(Z)(Z~xy(~u) = z
0 ^ Z~x(~u) = z):
Thus, Y affects Z if there is some setting of the exogenous variables and
some other endogenous variables for which changing the value of Y changes
the value of Z. This definition is used in axiom C6 below, which characterizes
recursiveness.
Consider the following axioms:
C0. All instances of propositional tautologies. C1. X~y(~u) = x ) X~y(~u)
6= x0 if x; x0 2 R(X); x 6= x0 (equality)
322
Axiomatizing Causal Structures C2. .x2R(X)X~y(~u) = x (definiteness) C3.
(W~x(~u) = w ^ Y~x(~u) = y) ) Y~xw(~u) = y (composition) C4. Xx ~w(~u) =
x (effectiveness) C5. (Y~xw(~u) = y ^ W~xy(~u) = w) ) Y~x(~u) = y (reversibility)
C6. (X0 ; X1 ^ : : : ^ XkGamma 1 ; Xk) ) :(Xk ; X0) (recursiveness)
We have one rule of inference: MP. From ' and ' ) , infer (modus ponens)
C1 just states an obvious property of equality: if X = x for every solution
of the equations in T~x(~u), then we cannot have X = x0, if x0 6= x.3 In
a richer language, this could have been expressed as (X~y(~u) = x ^ X~y(~u)
= x0) ) (x = x0), but this formula is not in L+ (since L+ does not include
expressions such as x0 = x). C2 states that there is some value x 2 R(X)
which is the value of X in all solutions to the equations in T~x(~u). C2
is not valid in T , but it is valid in Tuniq. Note that in stating C2, I
am making use of the fact that R(X) is finite (otherwise C2 would involve
an infinite disjunction, and would no longer be a formula in Luniq). In fact,
it can be shown that if we allow signatures where the sets R(X) are infinite,
we include C2 only for those random variables X such that R(X) is finite.4
C3-C5 were introduced by Galles and Pearl (1997, 1998), as were their names.
Roughly speaking, C3 says that if the value of W is w in all solutions to
the equations T~x(~u), then all solutions to the equations in T~xw(~u) are
the same as the solutions to the equations in T~x(~u). C3 is valid in T as
well as Tuniq. As we shall see, a variant of C3 (obtained by replacing "all"
by "some") is also valid in T . C4 simply says that in all solutions obtained
after setting X to x, the value of X is x. C5 is perhaps the least obvious
of these axioms; the proof of its soundness is not at all straightforward.
It says that if setting ~X to ~x and W to w results in Y having value y and
setting ~X to ~x and Y to y results in W having value w, then Y must already
have value when we set ~X to x (and W must already have value w).
Finally, it is easy to see that C6 holds in recursive models. For if Y ;
Z, then Y must precede Z in the causal ordering. Thus, if X0 ; X1 ^ : : :
^ XkGamma 1 ; Xk, then X0 must precede Xk in the causal ordering, so Xk
cannot affect X0. Thus, :(Xk ; X0) holds. As we shall see, in a precise sense,
C6 characterizes recursive models.
C6 can be viewed as a collection of axioms (actually, axiom schemes), one
for each k. The case k = 1 already gives us :(Y ; Z) . :(Z ; Y ) for all
variables Y and Z. That
3. In an earlier draft of this paper, where C1 and C2 were introduced, C1
was called "uniqueness". Galles
and Pearl (1998) then adopted this name as well. In retrospect, this axiom
really does not say anything about uniqueness. The axiom which does is D10,
which will be discussed later. 4. The assumption that R(X) and V are finite
is also necessary for the abbreviation X ; Y used in C6 to
be in Luniq; however, we can replace C6 by the axiom scheme
:(^kGamma 1i=0 (Xi+1)~yixi (~ui) = zi ^ (Xi+1)~yi = z0i) ^ (X0)~ykxk (~uk)
= zk ^ (X0)~yk = z0k); where xi 2 R(Xi) for i = 1; : : : ; k. That is, we
essentially replace C6 by all its instances. This axiom is equivalent to
C6 (although not as transparent) and can be expressed even if jVj is infinite
or jR(X)j is infinite for some X 2 V.
323
Halpern is, it tells us that, for any pair of variables, at most one affects
the other. However, just restricting C6 to the case of k = 1 does not suffice
to characterize Trec, as the following example shows.
Example 3.1: Let S = (;; fX0; X1; X2g; R), where R(X0) = R(X1) = R(X2) =
f0; 1; 2g, and let T = (S; F ), where FXi is characterized by the equation
Xi = ( 2 if XiPhi 1 = 10 otherwise and Phi is addition mod 3. It is easy
to see that T 2 Tuniq: If any of the variables are set, the equations completely
determine the values of all the other variables. On the other hand, if none
of the variables are set, it is easy to see that (0; 0; 0) is the only solution
that satisfies all the equations. Moreover, in T ~X~x, the variable Xi is
0 unless it is set to a value other than 0 or XiPhi 1 is set to 1. It easily
follows that Xi is affected only by XiPhi 1. A straightforward verification
(or an appeal to Theorem 3.2 below) shows that T satisfies all the axioms
other than C6. C6 does not hold in T , since T j= X0 ; X1 ^ X1 ; X2 ^ X2
; X0. This also shows that T is not recursive. However, the restricted version
of C6 (where k = 1) does hold in T . A generalization of this example (with
k random variables rather than just 2) can be used to show that we cannot
bound k at all in C6; we need C6 to hold for all finite values of k.
Let AXuniq(S) consist of C0-C5 and MP; let AXrec(S) consist of C0-C4, C6,
and MP. We could include C5 in AXrec(S); I did not do so because, as Galles
and Pearl (1998) point out, it follows from C3 and C6. Note that the signature
S is a parameter of the axiom system, just as it is for the language and
the set of models. This is because, for example, the set R(X) (which is determined
by S) appears explicitly in C1 and C2.
Theorem 3.2: AXuniq(S) (resp., AXrec(S)) is a sound and complete axiomatization
for Luniq(S) with respect to Tuniq(S) (resp., Trec(S)).
Proof: See the appendix.
As I said in the introduction, Galles and Pearl (1998) prove a similar completeness
result for causal models whose variables satisfy a fixed causal ordering.
Given a total ordering OE on the variables in V, consider the following axiom:
Ord. Y~xw(~u) = Y~x(~u) if Y OE W Since ~x, w, and ~u are implicitly universally
quantified in Ord, this axiom says that :(W ; Y ) holds if Y OE W . It follows
that if W ; Y , then W OE Y . From this and the fact that OE is a total order,
it is easy to see that Ord implies C6.
Galles and Pearl show that C1-C4 and Ord is a sound and complete axiomatization
with respect to the class of causal models satisfying Ord for LGP. More precisely,
Galles and Pearl take AC to consist of the axioms C1-C4 and Ord (but not
C0 or MP), and show, in their notation, that S j= oe implies S `AC oe, where
S [ foeg is a set of formulas in LGP. There is an important subtle point
worth stressing about their result: C1 and C2, which
324
Axiomatizing Causal Structures are axioms in AC , are not expressible in
LGP (since their statement involves disjunction and negation).
So what exactly is Galles and Pearl's result saying? They interpret S j=
oe, as usual, as meaning that in all causal models satisfying S, oe is true.5
They interpret S `AC oe as meaning that oe is provable from S and the axioms
in the axioms of AC "together with the rules of logic", which presumably
means C0 and MP. It follows easily from Theorem 3.2 that their result is
correct (see below), but it is unlike typical soundness and completeness
proofs, since the proof of oe from S will in general involve formulas in
Luniq that are not in LGP. (In particular, this will happen whenever C1-C3
are used in the proof.)
To see that Galles and Pearl's result follows from Theorem 3.2, define SLambda
to be the formula in Luniq(S) which is the conjunction of the formulas in
S (there can only be finitely many, since LGP(S) itself has only finitely
many distinct formulas), together with the conjunction of all the instances
of the axiom Ord (again, there are only finitely many). Note that S j= oe
holds iff Tuniq(S) j= SLambda ) oe (since the formulas in Ord guarantee
that the only causal models that satisfy SLambda are recursive, and hence
are in Tuniq(S)). Thus, by Theorem 3.2, S j= oe iff AXuniq(S) ` SLambda
) oe. The latter statement is equivalent to S `AC oe, as defined by Galles
and Pearl. In fact, Theorem 3.2 shows that AXuniq(S) + Ord gives a sound
and complete axiomatization with respect to causal models satisfying Ord
for the language Luniq(S), which allows Boolean connectives. (Of course,
Theorem 3.2 shows more, since it extends Galles and Pearl's result to Trec(S)
and Tuniq(S).) This suggests that Luniq is a more appropriate language for
reasoning about causality than LGP, at least for causal models in Tuniq.
LGP cannot express a number of properties of causal reasoning of interest
(for example, the ones captured by axioms C1-C3). When we use Luniq, not
only is every formula in Luniq valid in Tuniq provable from the axioms in
AXuniq, but the proof involves only formulas in Luniq.
What about T ? I have not been able to find a complete axiomatization for
the language Luniq with respect to T . However, I do not think that finding
a complete axiomatization for Luniq with respect to T is of great interest,
because Luniq is simply not a language appropriate for reasoning about causality
in T . Because there is not necessarily a unique solution to the equations
that arise in a causal model T 2 T , it is useful to be able to say both
that there exists a solution with certain properties and that all solutions
have certain properties. This is precisely what the language L+ lets us do.6
As I now show, there is in fact an elegant sound and complete axiomatization
for L+ with respect to T .
Consider the following axioms:
D0. All instances of propositional tautologies. D1. [~Y ~y](X(~u) = x )
X(~u) 6= x0) if x; x0 2 R(X), x 6= x0 (functionality) D2. [~Y ~y](.x2R(X)X(~u)
= x) (definiteness)
D3. h ~X ~xi(W (~u) = w ^ ~Y (~u) = ~y) ) h ~X ~x; W wi(~Y (~u) = ~y)
(composition) 5. Although they do not say this explicitly, it is clear that
they intend to further restrict to casual models
satisfying S and Ord, for the fixed order OE. Without this restriction, their
result is not true. 6. Note that L+ allows us to say that there is a unique
solution for a random variable X after setting some
other variables. For example, h~Y ~yitrue(~u) ^ [~Y ~y](X(~u) = x) says
that there are solutions to the equations when ~Y is set to ~y and U is set
to ~u and, in all of them, X is uniquely determined to be x.
325
Halpern D4. [ ~W ~w; X x](X(~u) = x) (effectiveness) D5. (h ~X ~x; Y
yi(W (~u) = w ^ ~Z(~u) = ~z) ^ h ~X ~x; W wi(Y (~u) = y ^ ~Z(~u) = ~z))
) h ~X ~xiW (~u) = w ^ Y (~u) = y ^ ~Z(~u) = ~z)), where ~Z = V Gamma
( ~X [ fW; Y g)
(reversibility)
D6. (X0 ; X1 ^ : : : ^ XkGamma 1 ; Xk) ) :(Xk ; X0) (recursiveness) D7.
([ ~X ~x]' ^ [ ~X ~x](' ) )) ) [ ~X ~x] (distribution) D8. [ ~X ~x]'
if ' is a propositional tautology (generalization) D9. h~Y ~yitrue(~u) ^
.x2R(X)[~Y ~y](X(~u) = x) if Y = V Gamma fXg
(unique solutions for V Gamma fXg)
D10. h~Y ~yitrue(~u) ^ .x2R(X)[~Y ~y](X(~u) = x) (unique solutions)
D11. h~Y ~yi('1(~u1) ^ : : : ^ 'k(~uk)) , (h~Y ~yi'1(~u1) ^ : : : ^ h~Y
~yi'k(~uk), if 'i(~ui)
is a Boolean combination of formulas of the form X(~ui) = x and ~ui 6= ~uj
for i 6= j (separability)
D1-D6 are the analogues of C1-C6 in L+. D4 and D6 are just C4 and C6, with
no changes at all. The other axioms are not quite the same though. For example,
C1 is actually [~Y ~y](X(~u) = x) ) :[~Y ~y](X(~u) = x0) if x 6= x0. By
Lemma 2.2, this is equivalent to D1 in Tuniq; however, the two formulas are
not equivalent in general. Similarly, C2 is .x2R(X)[~Y ~y](X(~u) = x), which
is closer to D10 than D2 (since the disjunction is outside the scope of the
[~Y ~y]). Again, D10 and D2 are equivalent in Tuniq (both are equivalent
to C2 in this case) but, in general, D10 is stronger than D2. Only D2 and
D9, both weaker than D10, hold in T . The exact analogue of C3 would use
[ ] instead of h i and say Y (~u) = y instead of ~Y (~u) = ~y. For completeness,
it is necessary to have a vector of variables here. Using [ ] instead of
h i also results in a valid formula (and would not require a vector ~Y ).
While the two variants are equivalent in Tuniq, they are different in general,
and the one given here is the more useful. (More precisely, with it we get
completeness, while the version with [ ] does not suffice for completeness.)
Similarly, in D5, we use h i instead of [ ], and add the extra clause ~Z(~u)
= ~z. Both turn out to be necessary for soundness. In some sense, we can
think of D1-D6 as capturing the "true content" of C1-C6, once we drop the
assumption that the structural equations have a unique solution. D7 and D8
are standard properties of modal operators. D10 is what we need to capture
the fact that the structural equations have unique solutions. D11 essentially
says that the solutions to the equations that arise when the exogenous variables
are set to ~u are independent of the solutions that arise when the exogenous
variables are set to ~u0 6= ~u.
Let AX+ consist of D0-D5, D7-D9, D11, and MP (modus ponens); let AX+uniq
be the
result of adding D10 to AX+; let AX+rec be the result of adding D6 to AX+uniq.
Theorem 3.3: AX+(S) (resp., AX+uniq(S), AX+rec(S)) is a sound and complete
axiomatization for L+(S) with respect to T (S) (resp., Tuniq(S), Trec(S)).
Proof: See the appendix.
326
Axiomatizing Causal Structures 4. Decision Procedures In this section I consider
the complexity of deciding if a formula is satisfiable (or valid). This,
of course, depends on the language (L+, Luniq, or LGP) and the class of models
(Trec, Tuniq, T ) we consider. It also depends on how we formulate the problem.
One version of the problem is to consider a fixed signature S = (U ; V; R),
and ask how hard it is to decide if a formula ' 2 L+(S) (resp., Luniq(S),
LGP(S)) is satisfiable in Trec(S) (resp., Tuniq(S), T (S)). If S is finite
(that is, if V and U are finite and R(Y ) is finite for each Y 2 U 2 V),
this turns out to be quite easy, for trivial reasons.
Theorem 4.1: If S is a fixed finite signature, the problem of deciding if
a formula ' 2 L+(S) (resp., Luniq(S), LGP(S)) is satisfiable in Trec(S) (resp.,
Tuniq(S), T (S)) can be solved in time linear in j'j (the length of ' viewed
as a string of symbols).
Proof: If S is finite, there are only finitely many causal models in T (S),
independent of '. Given ', we can explicitly check if ' is satisfied in any
(or all) of them. This can be done in time linear in j'j. Since S is not
a parameter to the problem, the huge number of possible causal models that
we have to check affects only the constant.
We can do even better than Theorem 4.1 suggests if S is a fixed finite signature.
Suppose that V consists of 100 variables and ' mentions only 3 of them. A
causal model must specify the equations for all 100 variables. Is it really
necessary to consider what happens to the 97 variables not mentioned in '
to decide if ' is satisfiable or valid? As the following result shows, if
we restrict to models in Tuniq, then we need to check only the variables
that appear in S. Given a signature S = (U ; V; R), let S' = (fU Lambda
g; V'; R'), where V' consists of the variables in V that appear in ', U Lambda
is a fresh exogenous variable, not mentioned in V or U , R'(X) = R(X) for
X 2 V', and R'(U Lambda ) consists of all those tuples in Theta U2U R(U
) that are mentioned in '.
Theorem 4.2: A formula ' 2 L+(S) is satisfiable in Trec(S) (resp., Tuniq(S))
iff it is satisfiable in Trec(S') (resp., Tuniq(S')).
Proof: See the appendix.
The analogue to Theorem 4.2 does not hold for T . For example, suppose that
S = (;; fX; Y; Zg; R), where R(X) = R(Y ) = R(Z) = f0; 1g, and ' is the formula
hX 0i(Y = 0) ^ hX 0i(Y = 1). It is easy to see that there is a causal model
in T (S) satisfying '. For example, if T = (S; F), where FX (y; z) = y Phi
z, FY (x; z) = x Phi z and FZ(x; y) = x Phi y, and Phi represents addition
mod 2, then it is easy to check that T j= '. On the other hand, there is
no causal model T 0 2 T (S') such that T 0 j= '. For suppose T 0 j= ' and
T 0 = (S'; F 0). Since T 0 j= hX 0i(Y = 0), we must have F 0Y (0) = 0; since
T 0 j= hX 0i(Y = 1), we must have F 0Y (0) = 1. But we cannot have both
F 0Y (0) = 0 and F 0Y (1) = 1, since F 0Y is a function.
There is a variant of Theorem 4.2 that does hold for T that does give us
a bound on the number of variables we need to consider. Given a signature
S = (U ; V; R), define jjSjj = Theta X2V jR(X)j (where we take jjSjj = 1
if either V is infinite or jR(X)j = 1 for some X 2 V). If jjSjj ? jjS'jj2
+ jjS'jj, let S+' = (fU Lambda g; V+' ; R+' ), where V+' is V' as
327
Halpern defined above together with one fresh endogenous variable XLambda
, R+' (XLambda ) = Theta X2V' R(X), and R+' (U Lambda ) = R'(U Lambda
). If jjSjj ^ jjS'jj2 + jjS'jj, let S+' = (fU Lambda g; V; R0), where R0(X)
= R(X) for X 2 V and R0(U Lambda ) = R'(U Lambda ).
Theorem 4.3: A formula ' 2 L+(S) is satisfiable in T (S) iff ' is satisfiable
in T (S+' ). Proof: See the appendix.
Note that if jjSjj ^ jjS'jj2 + jjS'jj, then, since we have assumed (without
loss of generality) that jR(X)j * 2 for each variable X, it must be the case
that there are at most 2 log2(jjS'jj) + 1 variables in signature S.
Since Theorems 4.2 and 4.3 apply to all formulas in L+(S), they apply a fortiori
to formulas in Luniq(S) and LGP(S). Although stated only in terms of satisfiability,
it is immediate that they also hold for validity. Thus, they tell us that,
without loss of generality, when considering satisfiability or validity,
we need to consider only finitely many variables (essentially, the ones that
appear in ', and perhaps a few more). In this sense, we can restrict to signatures
with only finitely many variables without loss of generality. Note that these
results do not tell us that we can restrict to finite sets of values for
these variables without loss of generality.
Returning to the complexity of the decision problem, note that Theorem 4.1
is the analogue of the observation that for propositional logic, the satisfiability
problem is in linear time if we restrict to a fixed set of primitive propositions.
The proof that the satisfiability problem for propositional logic is NP-complete
implicitly assumes that we have an unbounded number of primitive propositions
at our disposal.
There are two ways to get an analogous result here. The first is to allow
the signature S to be infinite and the second is to make the signature part
of the input to the problem. The results in both cases are similar, so I
just consider the case where the signature is part of the input here.
Theorem 4.4: Given as input a pair ('; S), where ' 2 L+(S) (resp., Luniq(S))
and S is a finite signature, the problem of deciding if ' is satisfiable
in Trec(S) (resp., Tuniq(S), T (S)) is NP-complete (resp., NP-hard) in j'j;
if ' 2 LGP(S), then the problem of deciding if ' is satisfiable in Trec(S)
(resp., Tuniq(S)) is NP-complete (resp., NP-hard).
Proof: See the appendix.
I believe that the problem of deciding if a formula ' in Luniq(S) or L+(S)
is satisfiable in Tuniq(S) and T (S) is NP-complete, as is the case of deciding
if ' 2 LGP(S) is satisfiable in Tuniq(S). However, I have not been able to
show this. What about the satisfiability problem for formulas in LGP in T
(S)? This may well be in constant time! Indeed, if S is an infinite signature
(that is, if S = (U ; V; R) and jVj = 1), then it is provably in constant
time. The point is that a formula in LGP(S) is trivially satisfiable in a
structure T 2 LGP(S) where for all settings ~X ~x, the equations in T ~X~x
have no solutions, and there always is such model structure if S has infinitely
many variables. If S has only finitely many variables, we do not have such
trivial models, but it may still be possible to show that a "trivial enough"
model exists that satisfies the formula. This just emphasizes that LGP(S)
is simply too weak a language to reason about models in T (S).
328
Axiomatizing Causal Structures 5. Conclusion I have provided complete axiomatizations
and decision procedures for propositional languages for reasoning about causality.
I have tried to stress the important role of the choice of language (and
the signature) in both the axiomatizations and, more generally, in the reasoning
process.
Both the models and the languages considered here are somewhat limited. For
example, a more general approach to modeling causality would allow there
to be more than one value of X once we have set all the other variables.
This would be appropriate if we model things at a somewhat coarser level
of granularity, where the values of all the variables other than X do not
suffice to completely determine the value of X. I believe the results of
this paper can be extended in a straightforward way to deal with this generalization,
although I have not checked the details. For general causal reasoning, I
believe we need a richer language, which includes some first-order features.
I hope to return to the issue of finding appropriate richer languages for
causal reasoning in future work.
Acknowledgments I'd like to thank Judea Pearl for his comments on a previous
version of this paper, as well as his generous help in providing pointers
to the literature. This work was supported in part by NSF under grant IRI-96-25901
and by the Air Force Office of Scientific Research under grant F49620-96-1-0323.
A preliminary version of this paper appears in Proc. Fourteenth Conference
on Uncertainty in AI, pp. 202-210, 1998.
Appendix A. Proofs Theorem 3.2: AXuniq (resp., AXrec) is a sound and complete
axiomatization for Luniq(S) with respect to Tuniq(S) (resp., Trec(S)).
Proof: Soundness is proved by Galles and Pearl. To make the paper self-contained,
I reprove the only non-obvious case--the validity of C5 in Tuniq.
Let T 2 Tuniq and suppose that T j= Y~xw(~u) = y ^ W~xy(~u) = w. We want
to show that T j= Y~x(~u) = y. Since we are in Tuniq, there is a unique vector
~v1 that satisfies the equations in T~xw(~u) and a unique vector ~v2 that
satisfies the equations in T~xy(~u). I claim
that ~v1 = ~v2. By assumption, the ~X, Y , and W components of these vectors
are the same (~x, y, and w, respectively). Now consider the T~xyw(~u). I
claim that ~v1 and ~v2 are both solutions to the equations in that causal
theory. Note that for any variable Z other than those in ~X [ fW; Y g, the
equation F ~xw;~uZ for Z in T~xw(~u) is the same as the equations F ~xy;~uZ
and F ~xyw;~uZ for Z in T~xy(~u) and T~xyw(~u), respectively, except that
in the first case, w has been plugged in as the value of W , in the second
case y has been plugged in as the value of Y , and in the third case, both
w and y have been plugged in. However, since w and y are the values of W
and Y , respectively, in both ~v1 and ~v2, and since these vectors satisfy
both equation F ~xwZ and F ~xyZ , they must also satisfy F ~xwyZ . Since
the equations for T~xyw(~u) have a unique solution, we have that ~v1 = ~v2,
as desired.
329
Halpern Next, I claim that ~v1 satisfies the equations in T~x(~u). Again,
as above, it is clear that it satisfies the equation for Z =2 ~X [ fW; Y
g. A similar argument shows that it satisfies the equation for Y in T~x(~u),
since ~v1 satisfies the equation for Y in T~xw(~u). Finally, a similar argument
shows that it satisfies the equation for W in T~x(~u), since ~v2 = ~v1 satisfies
the equation for W in T~xy(~u). Since the Y component of ~v1 is y, it follows
that Y~x(~u) = y.
So much for soundness. For completeness, as usual, it suffices to prove that
if a formula in Luniq is consistent with AXuniq (resp., AXrec), then it is
satisfied in a causal model in Tuniq (resp., Trec). (Here's the argument:
We want to show that every valid formula is provable. Suppose that we have
shown that every consistent formula is satisfiable and that ' is valid. If
' is not provable, then :' is consistent. By assumption, this means that
:' is satisfiable, contradicting the assumption that ' is valid.)
I now give the argument in the case of AXuniq. Suppose that a formula ' 2
Luniq(S), with S = (U ; V; V ), is consistent with AXuniq. Consider a maximal
consistent set C of formulas that includes '. (A maximal consistent set is
a set of formulas whose conjunction is consistent such that any larger set
of formulas would be inconsistent.) It follows easily from standard propositional
reasoning (i.e., using C0 and MP only) that such a maximal consistent set
exists. Moreover, from C1 and C2, it follows that for each random variable
X 2 V and vector ~y of values, there exists exactly one element x 2 R(X)
such that X~y = x 2 C. I now construct a causal model T = (S; F ) 2 Tuniq(S)
that satisfies every formula in C (and, in particular, satisfies ').
A term X~Y ~y(~u) is complete (for X) if ~Y consists of all the variables
in V Gamma fXg. Thus, X~Y ~y(~u) is a complete term if every random variable
other than X is determined. We use the complete terms to define the structural
equations. For each variable in X 2 V, define FX (~u; ~y) = x if X~y(~u)
= x, where X~y(~u) is a complete term. This gives us a causal model T . Now
we have to show that this model is in Tuniq and that all the formulas in
C are satisfied by T .
I show that X~Y ~y(~u) = x is in C iff T j= X~Y ~y(~u) = x by induction on
jVj Gamma j~Y j. The
case where jVj Gamma j~Y j = 0 follows immediately from C4, since then
X is in ~Y . If jVj Gamma j~Y j 6= 0, we can assume without loss of generality
that X is not in ~Y , for otherwise the result again follows from C4. Given
this assumption, if jVj Gamma j~Y j = 1, the result follows by definition
of the equations FX .
For the general case, suppose that jVj Gamma j~Y j = k ? 1. We want to
show that there is a unique solution to the equations in T~Y ~y(~u) and that,
in this solution, X has value x. To see that there is a solution, we define
a vector ~v and show that it is in fact a solution. If W 2 ~Y and W w is
the assignment to W in ~Y ~y, then we set the W component of ~v to w. If
W is not in ~Y , then set the W component of ~v to the unique value wLambda
such that W~Y ~y(~u) = wLambda is in C. (By C1 and C2 there is such a unique
value w.) I claim that ~v is a solution to the equations in T~Y ~y(~u).
To see this, let W be a variable in V not in ~Y . Let ~Y 0 = ~Y W . By C3
and C4, for every variable Z 2 V Gamma ~Y 0, we have Z~ywLambda (~u) =
zLambda . Since jVj Gamma j~Y 0j = k Gamma 1, by the inductive
hypothesis, ~v is in fact the unique solution for T~ywLambda (~u). For every
variable Z in V Gamma ~Y 0, the equation F ~yw
Lambda ;~u
Z for Z in T~ywLambda (~u) is the same as the equation F
~y;~u Z for Z in T~y(~u), except
that W is set to wLambda . Thus, every equation in T~y(~u) except possibly
the equation F ~y;~uW is
satisfied by ~v. To see that F ~y;~uW is also satisfied by ~v, simply repeat
this argument above
330
Axiomatizing Causal Structures starting with another variable W 0 in V Gamma
~Y . (Such a variable must exist since jVj Gamma j~Y j was assumed to be
at least 2.)
It remains to show that ~v is the unique solution to the equations in T~y(~u).
Suppose there were another solution, say ~v0, to the equations. Suppose that
for each variable W in V Gamma ~Y , the W component of ~v0 is wLambda
Lambda . For some variable Z, we must have zLambda Lambda 6= zLambda
. Since Z~y(~u) = zLambda , by assumption, it follows from C1 that Z~y(~u)
6= zLambda Lambda is in C (since C is
a maximal consistent set). It is also easy to see that for each W in V Gamma
~Y , the vector ~v0 is also a solution to the equations in T~ywLambda Lambda
(~u). Let W be a variable other than Z in V Gamma ~Y . By the induction
hypothesis, it follows that W~yzLambda Lambda (~u) = wLambda Lambda
and Z~ywLambda Lambda (~u) = zLambda Lambda are both in C. By C5 (reversibility),
Z~y(~u) = zLambda Lambda is in C. But this contradicts the consistency
of C.
This completes the proof in the case of Tuniq(S). Essentially the same proof
works for Trec. We just need to observe that C6 guarantees that the theory
we construct can be taken to be recursive. To see this, given a formula '
consistent with Trec, consider a maximal set C of formulas consistent with
Trec that contains '. Let TC be the causal model determined by C, as above.
The set C also determines a relation OE on the exogenous variables: define
Y OE Z if Y ; Z 2 C. It easily follows from C6 that the transitive closure
OELambda of OE is a partial order: if X OELambda Y and Y OELambda X,
then X = Y . Any total order on the variables consistent OELambda gives
an ordering for which TC is recursive.
Theorem 3.3: AX+ (resp., AX+uniq, AX+rec) is a sound and complete axiomatization
for L+(S) with respect to T (S) (resp., Tuniq(S), Trec(S)).
Proof: Soundness proceeds much as that of Theorem 3.2; I leave details to
the reader. For completeness, we again proceed much as in the proof of Theorem
3.2. Because the proofs are so similar in spirit, I just sketch the proof
for AX+; the modifications for AX+uniq and AX+rec are left to the reader.
Again, given a formula ' consist with AX+, we consider a maximal consistent
set of formulas containing ' that is consistent with AX+, and use it to construct
a causal model T . Note that D9 suffices for this, because in defining FX
(~u; ~y), we needed to know only the unique x such that [~Y ~y](X(~u) =
x) for ~Y = V Gamma X, and D9 (together with D1) assures us that there
is a unique such x. Again, we want to show that all the formulas in C are
satisfied by T .
To do this, it clearly suffices to show that for every formula of the form
h~Y ~y', we have in C iff T j= . We can reduce to considering even simpler
formulas, namely, ones where ' has the form ~X(~u) = ~x, by applying some
of the axioms. To see this, first observe that standard arguments of modal
logic (using D0, D7, D8, and MP) show that h~Y ~yi('1 . '2) is provably
equivalent to h~Y ~yi'1 . h~Y ~yi'2. That means we can assume without loss
of generality that ' is a conjunction of formulas of the form X(~u) x and
their negations. From D2 it follows that h~Y ~yi(' ^ X(~u) 6= x) is equivalent
to h~Y ~yi(' ^ (.x02R(X)Gamma fxgX(~u) = x0). Thus, we can assume without
loss of generality that ' has no negations. By applying D11, we can assume
without loss of generality that the same setting ~u of the exogenous variables
is used in all the conjuncts. Thus, it suffices to show that h~Y ~yi( ~X(~u)
= ~x) 2 C iff T j= h~Y ~yi( ~X (~u) = ~x) for ~X = V Gamma ~Y .
To do this, we proceed by induction on jVj Gamma j~Y j again. The base
case is dealt with using D4, as before. So assume that k * 1 and jVj Gamma
j~Y j = k + 1. Suppose that h~Y ~yi( ~X(~u) = ~x) 2 C. Let X1; X2 2 ~X.
Suppose that X1 x1 and X2 x2 are the assignments to
331
Halpern X1 and X2 in ~X ~x. Let ~X0 ~x0 and ~X00 ~x00 be the result of
removing X1 x1 and X2 x2, respectively, from ~X ~x. By D3, both h~Y ~y;
X1 x1i( ~X00(~u) = ~x00) and h~Y ~y; X2 x2i( ~X0(~u) = ~x0) are in C.
By the induction hypothesis, both of these formulas are true in T . By the
soundness of D5, it follows that T j= h~Y ~yi( ~X(~u) = ~x0), as desired.
Conversely, suppose that T j= h~Y ~yi( ~X(~u) = ~x0). Then, since D3 is
sound, we have that T j= h~Y ~y; X1 x1i( ~X00(~u) = ~x00) and T j= h~Y
~y; X2 x2i( ~X0(~u) = ~x0). By the induction hypothesis, we have that both
h~Y ~y; X1 x1i( ~X00(~u) = ~x00) and h~Y ~y; X2 x2i( ~X0(~u) = ~x0) are
in C. We now apply D5 to complete the proof.
Theorem 4.2: A formula ' 2 L+(S) is satisfiable in Trec(S) (resp., Tuniq(S))
iff it is satisfiable in Trec(S') (resp., Tuniq(S')).
Proof: Clearly, if a formula is satisfiable in Trec(S') (resp., Tuniq(S')),
then it is satisfiable in Trec(S) (resp., Tuniq(S)). We can easily convert
a causal model T = (S'; F) 2 Trec(S') satisfying ' to a causal model T 0
= (S; F 0) 2 Trec(S) satisfying ' by simply defining F 0X to be a constant,
independent of its arguments, for X 2 V Gamma V'; if X 2 V', define F 0X
(~u; ~x; ~y) = FX (~u; ~x), where ~u 2 R(U Lambda ), ~x 2 Theta Y 2V'Gamma
fXgR(Y ) and ~y 2 Theta Y 2VGamma V'R(Y ); if ~u =2 R(U Lambda ), define
F 0X (~u; ~x; ~y) to be an arbitrary constant. An identical transformation
works for T 2 Tuniq(S').
For the converse, suppose that ' is satisfiable in a causal model T = (S;
F ) 2 Trec(S). Thus, there is an ordering OE on the variables in V such that
if X OE Y , then FX is independent of the value of Y . This means we can
view FX as a function of the exogenous variables in U and the variables Y
2 V such that Y OE X. Let Pre(X) = fY 2 V : Y OE Xg. For convenience, I allow
FX to take as arguments the values of only the variables in U [ Pre(X), rather
than requiring its arguments to include the values of all the variables in
U [ V Gamma fXg. Now define functions F 0X : (Theta U2U R(U )) Theta
(Theta Y 2V'Gamma fXgR(Y )) ! R(X) for all X 2 V by induction on OE (that
is, start with the OE-minimal element, whose value is independent of that
of all the other variables, and work up the OE chains). Suppose X 2 V' and
~x is a vector of values for the variables in V' Gamma fXg. If X is OE-minimal,
then define F 0X (~u; ~x) = FX (~u). In general, define F 0X (~u; ~x) = FX
(~u; ~z), where ~z is a vector of values for the variables in Pre(X) defined
as follows. If Y 2 Pre(X) " V', then the value of the Y component in ~z is
the value of the Y component in ~y; if Y 2 Pre(X) Gamma V', then the value
of the Y component in ~z is F 0Y (~u; ~x). (By the induction hypothesis,
F 0Y (~u; ~x) has already been defined.) Now define a causal model T 0 =
(S'; F 0). It is easy to check that T 0 2 Trec(S') (the ordering of the variables
is just OE restricted to V'). Moreover, the construction guarantees that
if
~X ` V', then the solutions to the equations T 0~
X~x(~u) and T ~X~x(~u) are the same, whenrestricted to the variables in V
'. It follows that T 0 satisfies '.
The argument in the case that T 2 Tuniq(S) is similar in spirit. For X 2
V', ~u 2 (Theta U2U R(U )), and ~x 2 (Theta Y 2V'Gamma fXgR(Y )), define
F 0X (~u; ~x) to be the value of X in the unique solution to the equations
in TV'Gamma fXg~x(~u).7 It is again straightforward to check that now T
0 = (S'; F 0) 2 Tuniq(S') and satisfies '.
7. This definition is easily seen to agree with the earlier definition of
F 0X if T 2 Trec.
332
Axiomatizing Causal Structures Theorem 4.3: A formula ' 2 L+(S) is satisfiable
in T (S) iff ' is satisfiable in T (S+' ). Proof: If jjSjj ^ jjS'jj2+jjS'jj
then the proof is immediate, so suppose that jjSjj ? jjS'jj2 + jjS'jj and
' is satisfied in a causal model T = (S; F) 2 T (S). Before going on with
the proof, it is useful to define some notation. Let V = fX1; : : : ; Xmg,
where V' = fX1; : : : ; Xkg and V Gamma V' = fXk+1; : : : ; Xmg. Given
a vector ~x 2 R(XLambda ) = Theta X2V'R(X) and Xi 2 V', let ~xGamma i
denote the vector excluding the value for Xi. For each Xi 2 V', choose two
values xi0 and xi1 in R(Xi). Define T 0 = (S'; F 0) by defining F 0X (~u;
~xGamma i; ~yGamma i; yi) = x, where
ffl x = yi if ~xGamma i = ~yGamma i and X = yi in some solution to the
equations in TV'Gamma fXig~xGamma i(~u); ffl x = xi0 if yi 6= xi0 and either
~xGamma i 6= ~yGamma i or there is no solution to the equations in
TV'Gamma fXg~xGamma i(~u) in which X = yi;
ffl x = xi1 otherwise. Finally, define FXLambda (~u; ~x) = ~x.
I now show that the construction again guarantees that if ~X ` V', then the
solutions to the equations T 0~X~x(~u) and T ~X~x(~u) are the same, when
restricted to the variables in V'. First suppose that (~y; ~z) is a solution
to the equations in T ~X~x(~u), where ~y 2 R(XLambda )
and ~z 2 Theta Y 2VGamma V'R(Y ). It must be the case that ~x and ~y agree
on the variables in ~X, so (~y; ~z) is also a solution of the equations in
TV'Gamma fXig~yGamma i (~u) if Xi 2 V' Gamma ~X. Thus, F 0Xi (~u; ~yGamma
i; ~y) = yi. It follows that (~y; ~y) is a solution to the equations in T
0~X~x(~u).
Conversely, suppose that (~y; ~y0) is a solution to the equations in T 0~X~x(~u).
Then the
definition of FXLambda guarantees that ~y = ~y0. Moreover, since ~x and
~y agree on the variables in ~X, (~y; ~y) must also be a solution to the
equations in T 0V'Gamma fX1g~yGamma 1(~u). Thus, F 0X1 (~u; ~yGamma 1;
~y) = y1, which means that there must be some vector ~z of values for the
variables in V Gamma V' such that (~y; ~z) is a solution to the equations
in TV'Gamma fX1g~yGamma 1 (~u). But then it is easy to check that (~y;
~z) must in fact be a solution to the equations in TV'Gamma fXig~yGamma
i(~u) for all i = 1; : : : ; k. It follows that (~y; ~z) is a solution to
the equations in T ~X~x(~u), as desired. This suffices to prove this direction
of the theorem.
Now suppose that ' is satisfied in a causal model T = (S+' ; F ) 2 T (S+'
). Since jjSjj ? jjS'jj2 + jjS'jj, there must be an injective function f
: R(XLambda ) ! Theta Y 2VGamma V'R(Y ) and two distinct vectors ~y0 =
(y01; : : : ; y0k); ~y1 = (y11; : : : ; y1k) that are not in the range of
f . Choose two distinct vectors ~x0 = (x10; : : : ; xk0); ~x1 = (x11; : :
: ; xk1) 2 R(XLambda ). Define T 0 = (S; F 0) 2 T (S) as follows. If Xi
2 V', ~xGamma i 2 Theta Y 2V'Gamma fXigR(Y ), ~z 2 R(XLambda ), and ~y
Theta Y 2VGamma V' R(Y ), let
F 0Xi (~xGamma i; ~y) = 8?!?:
FXi (~xGamma i; ~z) if f (~z) = ~y, x0i if ~y is not in the range of f ,
~y 6= ~y1, x1i otherwise.
If Xj 2 V Gamma V', ~x 2 R(XLambda ) and ~yGamma j 2 Theta Y 2VGamma
V'Gamma fXjgR(Y ), then let
F 0Xj (~x; ~yGamma j) = 8?!?:
y if f (FXLambda (~x)) = (~yGamma j; y), y0j if f (FXLambda (~x)) 6=
(~yGamma j; y0) for all y0 2 R(Xj), ~x 6= ~x0, y1j otherwise.
333
Halpern Again, I show that the construction guarantees that if ~X ` V', then
the solutions to the equations T 0~X~x(~u) and T ~X~x(~u) are the same, when
restricted to the variables in V'. First suppose that (~y; ~z) is a solution
to the equations in T ~X~x(~u), where ~y; ~z 2 R(XLambda ). It is easy to
check that (~y; f (~z)) is a solution to the equations in T 0~X~x(~u). Conversely,
suppose that (~y; ~z) is a solution to the equations in T 0~X~x(~u), where
~y 2 R(XLambda ) and ~z 2 Theta Y 2VGamma V'R(Y ). I claim that we must
have ~z = f (FXLambda (~y)). If, in fact, this is the case, then it is easy
to check that (~y; FXLambda (~y) is a solution to the equations in T ~X~x(~u).
On the other hand, if ~z 6= f (FXLambda (~y)), then the definition of F
0Xj for Xj 2 V Gamma V' guarantees that ~z = ~y0 unless
~y = ~x0; if ~y = ~x0, then ~z = ~y1. But the definition of FXi for Xi 2
V' guarantees that if ~z = ~y0, then ~y = ~x0: otherwise, ~y = ~x1. Thus,
(~y; ~z) is a solution iff ~z = f (FXLambda (~y)). This
suffices to prove the result.
Theorem 4.4: Given as input a pair ('; S), where ' 2 L+(S) (resp., Luniq(S))
and S is a finite signature, the problem of deciding if ' is satisfiable
with respect to Trec(S) (resp., Tuniq(S), T (S)) is NP-complete (resp., NP-hard)
in j'j; if ' 2 LGP(S), then the problem of deciding if ' is satisfiable in
Trec(S) (resp., Tuniq(S)) is NP-complete (resp., NP-hard).
Proof: The NP-lower bound is easy for L+(S) and Luniq(S), since there is
an obvious way to encode the satisfiability problem for propositional logic
into the satisfiability problem for L+ and Luniq. Given a propositional formula
' with primitive propositions p1; : : : ; pk, let S = (;; fX1; : : : ; Xkg;
R), where R(Xi) = f0; 1g for i = 1; : : : ; k. Replace each occurrence of
the primitive proposition pi in ' with the formula Xi = 1. This gives us
a formula '0 in Luniq(S). It is easy to see that if '0 is satisfiable in
a causal model T 2 T (S) (and, a fortiori if '0 is satisfiable in a causal
model T in either Trec(S) or Tuniq(S)) then the solution to the equations
in T defines a satisfying assignment for '. Conversely, if ' is satisfiable,
say by some truth assignment v, then we can trivially construct a causal
model T 2 Trec(S) such that FXi = v(pi). (For simplicity, I assume that valuations
assign values 0 and 1 rather than false and true.)
This trivial construction of '0 will not work for LGP(S), since we do not
have disjunctions or negations available. The lack of negations does not
cause a problem. We can assume without loss of generality that the negations
occur only in front of primitive propositions, and we can capture :pi by
the formula Xi = 0. The idea for dealing with disjunctions is that a formula
such as p1 . :p2 . p3 is translated to [X1 0; X2 1; X3 1](Y = 0), where
Y is a fresh variable. Essentially, we are viewing p1 .:p2 .p3 as (:p1 ^p2
^:p3) ) false, which is why we write, for example, X1 0 even though p1 appears
positively in the disjunction.
To make matters simpler, assume that ' is a formula in 3-CNF. This suffices
for NPhardness, since the satisfiability problem for 3-CNF formulas is also
NP-hard (Garey & Johnson, 1979). Suppose ' is of the form c1 ^: : :^cm, where
each cl is a clause consisting of a disjunction of three primitive propositions
and their negations. Suppose that the primitive propositions that appear
in ' are p1; : : : ; pk. Let S = (;; fX1; : : : ; Xk; Y1; : : : ; Ymg; R),
where R(Xi) = R(Yj) = f0; 1g for all i; j. Suppose that cj, the jth clause
of ', is of the form qj1 . qj2 . qj3, where qji is either pji or :pji for
some ji. Let ctj be the LGP formula
[Xj1 xj1; Xj2 xj2; Xj3 xj3](Yj = 0);
334
Axiomatizing Causal Structures where xjh is 0 if qjh is pjh and xjh is 1
if qjh is :pjh for h = 1; 2; 3. Let '0 be
[true](Y1 = 1 ^ : : : ^ Ym = 1) ^ ct1 ^ : : : ^ ctm: I claim that ' is a
satisfiable propositional formula iff the LGP formula '0 is satisfiable in
Trec(S) (resp. Tuniq(S)). First suppose that '0 is satisfiable, say in some
model T 2 Tuniq(S). (If this direction holds for T 2 Tuniq(S), it clearly
holds a fortiori for T 2 Trec(S).) Let ~z be the unique solution to the equations
in T . By construction, the Yj component of ~z is 1 for j = 1; : : : ; m.
Let xLambda i be the value of the Xi component in ~z. Consider the valuation
v such that v(pi) = xLambda i . I claim that v(') = 1. To see this, suppose
that clause cj is qj1 . qj2 . qj3. If v makes qj1, qj2, and qj3 false, then
we must have xjh = xLambda jh for h = 1; 2; 3. Since T j= [Xj1 xj1; Xj2
xj2; Xj3 = xj3)](Yj = 0) and the value of the Xjh component of ~z is xjh
for h = 1; 2; 3, it follows that ~z is a solution to the equations in TXj1xj1;Xj2xj2;Xj3xj3
. But this contradicts the fact that T j= [Xj1 xj1; Xj2 xj2; Xj3 xj3](Yj
= 0) (since the Yj component of ~z is 1). It follows that v(cj ) = v(qj1
. qj2 . qj3) = 1. Since this is true for all clauses cj, we must have that
v(') = 1.
For the converse, suppose that ' is satisfiable, say by valuation v. I show
that '0 is satisfiable in T 2 Trec(S). Order the variables so that Xj1; Xj2
; Xj3 OE Yj. (There are many orderings of the variables that satisfy these
constraints; any one will do.) Define FXi = v(pi) (so that FXi is a constant,
independent of its arguments); define FYj (xj1 ; xj2 ; xj3) = 1 if (xj1 ;
xj2; xj3 ) = (v(pj1 ); v(pj2 ); v(pj3 )) and 0 otherwise. It is easy to check
that T j= '0, as desired.
For the NP upper bound in the case of Trec(S), it clearly suffices to deal
with ' 2 L+. Suppose we are given ('; S) with ' 2 L+. We want to check if
' is satisfiable in Trec(S). The basic idea in to guess a causal model T
and verify that it indeed satisfies '. There is a problem with this though.
To completely describe a model T , we need to describe the functions FX .
However, there may be many variables X in S and they can have many possible
inputs. Just describing these functions may take time much longer than polynomial
in '. Part of the solution to this problem is provided by Theorem 4.2, which
tells us that it suffices to check whether ' is satisfiable in Trec(S').
In light of this, for the remainder of this part of the proof, I assume without
loss of generality that S = S'. This limits the number of variables that
we must consider to O(j'j). But even this does not solve our problem completely.
Since we are not given any bounds on jR(Y )j for variables Y in S', even
describing the functions FY for the variables Y that appear in ' on all their
possible input vectors could take time much more than polynomial in '. The
solution is to give only a short partial description of a model T and show
that this suffices.
Consider all pairs (~Y ~y; ~u) such that there is a subformula of ' of the
form [~Y ~y] and ~u appears in . Let R be the set of all such pairs. Note
that jRj ! j'j2. We say that two causal models T and T 0 in Trec(S) agree
on R if, for all pairs (~Y ~y; ~u) 2 R, the (unique) solutions to the equations
in T~Y ~y(~u) and T 0~Y ~y(~u) are the same. It is easy to see
that if T and T 0 agree on R, then either both T and T 0 satisfy ' or neither
do. That is, all we need to know about a causal model is how it deals with
the relevant equations--those corresponding to pairs in R.
For each pair (~Y ~y; ~u) 2 R, guess a vector ~v(~Y ~y; ~u) of values for
the endogenous variables; intuitively, these are the unique solutions to
the relevant equations in a model satisfying T . Given these guesses, it
is easy to check if ' is satisfied in a model where these
335
Halpern guesses do indeed represent the solutions to the relevant equations.
It remains to show that there exists a causal model in Trec(S) where the
relevant equations have these solutions.
To do this, first guess an ordering OE on the variables. We can then verify,
for each fixed ~u that appears in ', whether the solution vectors ~v(~Y
~y; ~u) guessed for the relevant equations are compatible with OE, in the
sense that it is not the case that there are two solutions (~u; ~x) and (~u;
~x0) such that some variable X takes on different values in ~x and ~x0, but
all variables Y such that Y OE X take on the same values in ~x and ~x0. It
is easy to see that if the solutions are compatible with OE, we can define
the functions FX for X 2 V such that all the equations hold and FX is independent
of the values of Y if X OE Y for all X; Y 2 V. (Note we never actually have
to write out the functions FX , which may take too long; we just have to
know they exist.) To summarize, as long as we can guess some solutions to
the relevant equations such that a causal model that has these solutions
satisfies ', and an ordering OE such that these solutions are compatible
with OE, then ' is satisfiable in Trec(S). Conversely, if ' is satisfiable
in T 2 Trec(S), then there clearly are solutions to the relevant equations
that satisfy ' and an ordering OE such that these solutions are compatible
with OE. (We just take the solutions and the ordering OE from T .) This shows
that the satisfiability problem for Trec is in NP, as desired.
References Chajewska, U., & Halpern, J. Y. (1997). Defining explanation in
probabilistic systems. In
Proc. Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI
'97), pp. 62-71.
Druzdzel, M. J., & Simon, H. A. (1993). Causality in bayesian belief networks.
In Uncertainty in Artificial Intelligence 9, pp. 3-11.
Galles, D., & Pearl, J. (1997). Axioms of causal relevance. Artificial Intelligence,
97 (1-2),
9-43.
Galles, D., & Pearl, J. (1998). An axiomatic characterization of causal counterfactuals.
Foundation of Science, 3 (1), 151-182.
Garey, M., & Johnson, D. S. (1979). Computers and Intractability: A Guide
to the Theory
of NP-completeness. W. Freeman and Co., San Francisco, Calif.
Goldberger, A. S. (1972). Structural equation methods in the social sciences.
Econometrica,
40 (6), 979-1001.
Harel, D. (1979). First-Order Dynamic Logic. Lecture Notes in Computer Science,
Vol. 68.
Springer-Verlag, Berlin/New York.
Heckerman, D., & Shachter, R. (1995). Decision-theoretic foundations for
causal reasoning.
Journal of Artificial Intelligence Research, 3, 405-430.
Henrion, M., & Druzdzel, M. J. (1990). Qualitative propagation and scenario-based
approaches to explanation of probabilistic reasoning. In Uncertainty in Artificial
Intelligence 6, pp. 17-32.
336
Axiomatizing Causal Structures Pearl, J. (1995). Causal diagrams for empirical
research. Biometrika, 82 (4), 669-710. Pearl, J. (1999). Causality. Cambridge
University Press, New York. Forthcoming. Pearl, J., & Verma, T. (1991). A
theory of inferred causation. In Principles of Knowledge
Representation and Reasoning: Proc. Second International Conference (KR '91),
pp. 441-452.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, Prediction, and
Search.
Springer-Verlag, New York.
Strotz, R. H., & Wold, H. O. A. (1960). Recursive vs. nonrecursive systems:
an attempt at
synthesis. Econometrica, 28 (2), 417-427.
337