### Random Worlds and Maximum Entropy

A. J. Grove, J. Y. Halpern and D. Koller

Volume 2, 1994

Abstract:
Given a knowledge base KB containing first-order and statistical facts, we consider a principled method, called the random-worlds method, for computing a degree of belief that some formula Phi holds given KB. If we are reasoning about a world or system consisting of N individuals, then we can consider all possible worlds, or first-order models, withdomain {1,...,N} that satisfy KB, and compute thefraction of them in which Phi is true. We define the degree of belief to be the asymptotic value of this fraction as N grows large. We show that when the vocabulary underlying Phi andKB uses constants and unary predicates only, we can naturally associate an entropy with each world. As N grows larger,there are many more worlds with higher entropy. Therefore, we can usea maximum-entropy computation to compute the degree of belief. This result is in a similar spirit to previous work in physics and artificial intelligence, but is far more general. Of equal interest to the result itself are the limitations on its scope. Most importantly, the restriction to unary predicates seems necessary. Although the random-worlds method makes sense in general, the connection to maximum entropy seems to disappear in the non-unary case. These observations suggest unexpected limitations to the applicability of maximum-entropy methods.

Extracted Text

```Journal of Artificial Intelligence Research 2 {1994} 33{88 Submitted 4/94;
published 8/94
Random Worlds and Maximum Entropy
NEC Research Institute, 4 Independence Way
Princeton, NJ 08540 Joseph Y. Halpern halpern@almaden.ibm.com
IBM Almaden Research Center, 650 Harry Rd.
San Jose, CA 95120 Daphne Koller daphne@cs.berkeley.edu Computer Science
Division, University of California
Berkeley, CA 94720
Abstract
Given a knowledge base KB containingfirst-order and statistical facts, we
consider a
principled method, called the random-worlds method, for computing a degree
of belief that
some formula ' holds given KB. If we are reasoning about a world or system
consisting of
N individuals, then we can consider all possible worlds, or first-order
models,with domain
f1; : : :; N g that satisfy KB, and compute the fraction of them in which
' is true. We define
the degree of belief to be the asymptotic value of this fraction as N growslarge.
We show that when the vocabulary underlying ' and KB uses constants and unary
predicates only,
we can naturally associate an entropy with each world. AsN grows larger,
there are many more worlds with higher entropy. Therefore, we can use a maximum-entropy
computation
to compute the degree of belief. This result is in a similar spirit to previous
work in physics
and artificial intelligence, but is far more general. Of equal interest
to the result itself are the limitations on its scope. Most importantly,
the restriction to unary predicates seems
necessary. Although the random-worlds method makes sense in general, the
connection to
maximum entropy seems to disappear in the non-unary case. These observations
suggest
unexpected limitations to the applicability of maximum-entropy methods.
1. Introduction
Consider an agent {or expert system} with some information about a particular
subject, such as internal medicine. Some facts, such as \all patients with
hepatitis exhibit jaundice", can
be naturally expressed in a standard first-order logic, while others, such
as \80045 of patients
that exhibit jaundice have hepatitis", are statistical. Suppose the agent
wants to use this
information to make decisions. For example, a doctor might need to decide
whether to
administer antibiotics to a particular patient Eric. To apply standard tools
of decision
theory {see {Luce & Raiffa, 1957} for an introduction}, the agent must assign
probabilities,
or degrees of belief, to various events. For example, the doctor may need
to assign a degree
of belief to an event such as \Eric has hepatitis". We would therefore
like techniques for
computing degrees of belief in a principled manner, using all the data at
hand. In this paper we investigate the properties of one particular formalism
for doing this.
The method we consider, which we call the random-worlds method, has origins
that go back to Bernoulli and Laplace {1820}. It is essentially an application
of what has been
c
Halpern, & Koller called the principle of indifference {Keynes, 1921}. The
basic idea is quite straightforward.
Suppose we are interested in attaching a degree of belief to a formula '
given a knowledge
base KB . One useful way of assigning semantics to degrees of belief formulasis
to use
a probability distribution over a set of possible worlds {Halpern, 1990}.
More concretely,
supposefor now that we are reasoning about N individuals, 1; : : :; N .
A world is a complete
description of which individuals haveeach of the properties of interest.
Formally, a world is just a model, or interpretation, over our first-order
language. For example, if our lan-
guage consists of the unary predicates Hepatitis, Jaundice, Child , and
BlueEyed, the binary predicate Infected-By, and the constant Eric, then a
world describes which subset of the
N individuals satisfies each of the unary predicates, which set of pairs
is in the Infected-By
relation, and which of the N individuals is Eric. Given a prior probability
distribution
overthe set of possible worlds, the agent can obtain a degree of belief
in ' given KB by
conditioning on KB to obtain a posterior distribution, and then computing
the probability
of ' according to this new distribution. The random-worlds method uses the
principle of
indifference to choose a particular prior distribution over the set of worlds:
all the worlds
are taken to be equally likely. It is easy to see that the degree of belief
in ' given KB is
then precisely the fraction of worlds satisfying KB that also satisfy '.
The approach so far described applies whenever we actually know the precise
domain
size N ; unfortunately this is fairly uncommon. In many cases, however,
it is reasonable to
believe thatN is\large". We are thus particularly interested in the asymptotic
behavior of
this fraction; that is, we take our degree of belief to be the asymptotic
value of this fraction
as N grows large.
For example, suppose we want to reason about a domain of hospital patients,
and KB
is the conjunction of the following four formulas:

*  8x{Hepatitis{x} } Jaundice{x}) {\all patients with hepatitis exhibit
jaundice"},
*  kHepatitis{x}|Jaundice{x}k
x
031 0:8 {\approximately 80045 of patients that exhibit jaun-
dice have hepatitis"; we explain this formalism and the reason we say \approximately
80045" rather than \exactly 80045" in Section 2},

*  jjBlueEyed{x}jj x
031 0:25 {\approximately 25045 of patients have blue eyes"},

*  Jaundice{Eric} ^ Child{Eric} {\Eric is a child who exhibits jaundice"}.
Let' be Hepatitis{Eric}; that is, we want to ascribe a degree of belief
to the statement \Eric
has hepatitis". Suppose the domain has size N . Thenwe want to consider
all worlds with
domain{1; : : :; N } such that the set of individuals satisfying Hepatitis
is a subset of those satisfying Jaundice, approximately 80045 of the individuals
satisfying Jaundice also satisfy
Hepatitis, approximately 25045 of the individuals satisfy BlueEyed, and
{the interpretation
of} Eric is an individual satisfying Jaundice and Child. It is straightforward
to show that, as
expected, Hepatitis{Eric} holds in approximately 80045 of these structures.
Moreover, as N gets large, the fraction of structures in which Hepatitis{Eric}
holds converges to exactly 0:8.
Since 80045 of the patients that exhibit jaundice have hepatitis and Eric
exhibits jaundice,
a degree of belief of 0.8 that Eric has hepatitis seems justifiable. Note
that, in this example,
the information that Eric is a child is essentially treated as irrelevant.
We would get the
same answer if we did not have the information Child{Eric}. It can also
be shown that
34Random Worlds and Maximum Entropy
the degree of belief in BlueEyed {Eric} converges to 0:25 as N gets large.
Furthermore,
the degree of belief of BlueEyed{Eric} ^ Jaundice{Eric} converges to 0:2,
the product of
0:8 and 0:25. As we shall see, this is because the random-worlds method
treats BlueEyed
and Jaundice as being independent, which is reasonable because there is
no evidence to
the contrary. {It would surely be strange to postulate that two properties
were correlated unless there were reason to believe they were connected in
some way.} Thus, at least in this example, the random-worlds method gives
AI systems {Pearl, 1989; Pollock, 1984; Spiegelhalter, 1986}. Are such intuitive
results typical? When do we get convergence?
And when we do, is there a practical way to compute degrees of belief?
The answer to the first question is yes, as we discuss in detail in {Bacchus,
Grove,
Halpern,& Koller, 1994}. In that paper, we show that the random-worlds method
is re-
markably successful at satisfying the desiderata of both nonmonotonic {default}
reasoning
{Ginsberg, 1987} and reference class reasoning {Kyburg, 1983}. The results
of {Bacchus
et al., 1994} show that the behavior we saw in the example above holds quite
generally, as do many other properties we would hope to have satisfied. Thus,
in this paper we do
notspend time justifying the random-worlds approach, nor do we discuss its
strengths and
weaknesses; the reader is referred to {Bacchus et al., 1994} for such discussion
and for
an examination of previous work in the spirit of random worlds {most notably
{Carnap, 1950, 1952} and subsequent work}. Rather, we focus on the latter
above. These questions may seem quite familiar to readers aware of the work
on asymp-
totic probabilities for various logics. For example, in the context of first-order
formulas,
it is well-known that a formula with no constant or function symbols has
an asymptotic
probability of either 0 or 1 {Fagin, 1976; Glebski025020, Kogan, Liogon'ki025020,
& Talanov, 1969}. Furthermore, we can decide which {Grandjean, 1983}. However,
the 0-1 law fails if the
language includes constants or if we look at conditional probabilities {Fagin,
1976}, and we
need both these features in order to reason about degrees of belief for
formulas involving
particular individuals, conditioned on what is known.
In two companion papers {Grove, Halpern, & Koller, 1993a,1993b}, we consider
the
question of what happens in the pure first-order case {where there is no
statistical informa-
tion} in greater detail. We show that as long as there is at least one binary
predicate symbol
in the language, then not only do we not get asymptotic conditional probabilities
in general
{as was already shown by Fagin {1976}), but almost all the questions one
might want to ask {such as deciding whether the limiting probability exists}
are highly undecidable. However,
if we restrict to a vocabulary with only unary predicate symbols and constants,
then as
long as the formula on which we are conditioning is satisfiable in arbitrarily
large models
{a question which is decidable in the unary case}, the asymptotic conditional
probability
existsand can be computed effectively.
In this paper, we consider the much more useful case where the knowledge
base has
statistical as well as first-order information. In light of the results
of {Grove et al., 1993a, 1993b}, for most of the paper we restrict attention
to the case when the knowledge base is
expressed in a unary language. Our major result involves showing that asymptotic
condi-
tional probabilities can often be computed using the principle of maximum
entropy {Jaynes,
1957; Shannon & Weaver, 1949}.
35Grove, Halpern, & Koller To understand the use of maximum entropy, suppose
the vocabulary consists of the unary predicate symbols P 1
; : : :; P
k
. We can consider the 2 k
atoms that can be formed from
these predicate symbols, namely, the formulas of the form P
0
1
^: : : ^ P 0
k
, where each P
0
i
is either P
i
or :P
i
. We can view the knowledge base as placing constraints on the proportion
ofdomain elements satisfying each atom. For example, the constraint kP
1
{x}|P
2 {x}k
x = 1=2
says that the proportion of the domain satisfying some atom that contains
P
2
as a conjunct is twice the proportion satisfying atoms that contain both
P
1
and P
2
as conjuncts. Given a
model of KB, we can define the entropy of this model as the entropy of the
vector denoting
the proportions of the different atoms. We show that, as N grows large,
there are many
more models with high entropy than with lower entropy. Therefore, models
with high
entropy dominate. We use this concentration phenomenon to show that our
degree of belief in ' given KB according to the random-worlds method is closely
related to the assignment
of proportions to atoms that has maximum entropy among all assignments consistent
with
the constraints imposed by KB .
The concentration phenomenon relating entropy to the random-worlds method
is well-
known {Jaynes, 1982, 1983}. In physics, the \worlds" are the possible configurations
of
a system typically consisting of many particles or molecules, and the mutually
exclusive
properties {our atoms} can be, for example, quantum states. The corresponding
entropy
measure is at the heart of statistical mechanics and thermodynamics. There
are subtle but importantdifferences between our viewpoint and that of the
physicists. The main one lies in
our choice of language. We want to express some intelligent agent's knowledge,
which is why we take first-order logic as our starting point. The most specific
difference concerns constant
symbols. We need these because the most interesting questions for us arise
when we have
some knowledge about|and wish to assign degrees of belief to statements
concerning|a particular individual. The parallel in physics would address
properties of a single particle,
which is generally considered to be well outside the scope of statistical
mechanics.
Another work that examines the connection between random worlds and entropy
from
our point of view|computing degrees of belief for formulas in a particular
logic|is that of
Paris and Vencovska {1989}. They restrict the knowledge base to consist
of a conjunction of
constraints that {in our notation} have the form kff{x}|fi{x}k
x
031 r and jjff{x}jj x
031 r, where fi
and ff are quantifier-free formulas involving unary predicates only, with
no constant symbols.
Not only is most of the expressive power of first-order logic not available
in their approach, but the statistical information that can be expressed
is quite limited. For example, it is not
possible to make general assertions about statistical independence. Paris
and Vencovska
show that the degree of belief can be computed using maximum entropy for
their language.
Shastri {1989} has also shown such a result, of nearly equivalent scope.
But, as we have already suggested, we believe thatit is important to look
at a far richer language. Our
language allows arbitrary first-order assertions, full Boolean logic, arbitrary
polynomial
combinations of statistical expressions, and more; these are all features
that are actually useful to knowledge-representation practitioners. Furthermore,
the random-worlds method makes perfect sense in this rich setting. The goal
of this paper is to discover whether the
connection to maximum entropy also holds. We show that maximum entropy continues
to be widely useful, covering many problems that are far outside the scope
of {Paris &
Vencovska, 1989; Shastri, 1989}.
36Random Worlds and Maximum Entropy
On the other hand, it turns out that we cannot make this connection for
our entire
language. For one thing, as we hinted earlier, there are problems if we
try to condition on a
knowledge base that includes non-unary predicates; we suspect that maximum
entropy has
no role whatsoever in this case. In addition, we show that there are subtleties
that arise
involving the interaction between statistical information and first-order
quantification. We
feel that an important contribution of this paper lies in pointing out some
limitations of
maximum-entropymethods.
The rest of this paper is organized as follows. In the next section, we
discuss our formal
framework {essentially, that of {Bacchus, 1990; Halpern, 1990}). We discuss
the syntax
and semantics of statistical assertions, issues involving \approximately
equals", and define the random-worlds method formally. In Section 3 we state
the basic results that connect
maximum entropy to random-worlds, and in Section 4 we discuss how to use
these results
as effective computational procedures. In Section 5 we return to the issue
of unary versus
non-unary predicates, and the question of how widely applicable the principle
of maximum
entropy is. We conclude in Section 6 with some discussion. 2. Technical
preliminaries In this section, we give the formal definition of our language
and the random-worlds method.
The material is largely taken from {Bacchus et al., 1994}.
2.1 The language
We are interested in a formal logical language that allows us to express
both statistical
information and first-order information. We therefore define a statistical
language L
031
, which is a variant of a language designed by Bacchus {1990}. For the remainder
of the
paper, let 010 be a finite first-order vocabulary, consisting of predicate
and constant symbols, and let X be a set of variables.
1
Our statistical language augments standard first-order logic with a form
of statistical
quantifier. For a formula  {x}, the term jj {x}jj
x
is a proportion expression. It will be
interpreted as a rational number between 0 and 1, that represents the proportion
of domain elements satisfying  {x}. We actually allow an arbitrary set of
variables in the subscript
and in the formula  . Thus, for example, jjChild{x; y}jj x
describes, for a fixed y, the
proportion of domain elements that are children of y; jjChild{x; y}jj
y
describes, for a fixed
x, the proportion of domain elements whose child is x; and jjChild{x; y}jj
x;y
describes the
proportionof pairs of domain elements that are in the child relation.
2
We also allow proportion expressions of the form k {x}|022{x}k
x
, which we call conditional
proportion expressions. Suchan expression is intended to denote the proportion
of domain
elements satisfying   from among those elements satisfying 022. Finally,
any rational number
is also considered to be a proportion expression, and the set of proportion
expressions is
closed under addition and multiplication.1. For simplicity, weassume that
010 does not contain function symbols, since these can be defined in terms
of predicates. 2. Strictly speaking, these proportion expression should
be written with sets of variables in the subscript,
as in jjChild{x; y}jj fx;yg
. However, when the interpretation is clear, we often abuse notation and
drop
the set delimiters. 37Grove, Halpern, & Koller One important difference
between our syntax and that of {Bacchus, 1990} is the use of
approximate equality to compare proportion expressions. There are both philosophical
and
practical reasons why exact comparisons can be inappropriate. Consider a
statement such
as \80045 of patients with jaundice have hepatitis". If this statement
appears in a knowledge base, it is almost certainly there as a summary of
a large pool of data. So it would be wrong
to interpret the value too literally, to mean that exactly 80045of all
patients with jaundice havehepatitis. Furthermore, this interpretation would
imply {among other things} that the
number of jaundiced patients is a multiple of five! This is unlikely to
be something we intend.
We therefore use the approach described in {Bacchus et al., 1994; Koller
& Halpern, 1992},
and compare proportion expressions using {instead of = and 024} one of
an infinite family of
connectives 031 i
and 026
i , for i = 1; 2; 3 : : : {\i-approximatelyequal" or \i-approximately
less than or equal"}. For example, we can express the statement \80045
of jaundiced patients
have hepatitis" by the proportion formula kHep{x}|Jaun{x}k
x
031
1 0:8. Theintuition behind the semantics of approximate equality is that
each comparison should be interpreted using
some small tolerance factor to account for measurement error, sample variations,
and so on. The appropriate tolerance will differ for various pieces of information,
so our logic allows different subscripts on the \approximately equals" connectives.
A formula such as
kFly{x}|Bird {x}k x
031
1
1 ^ kFly{x}|Bat {x}k
x
031
2
1 says that both kFly{x}|Bird{x}k
x
and
kFly{x}|Bat {x}k
x
are approximately 1, but the notion of \approximately" may be different
in each case. Theactual choice of subscript for 031 is unimportant. However,
it is important to use different subscripts for different approximate comparisons
unless the tolerances for the different measurements are known to be the
same.
We can now give a recursive definition of the language L
031
.
Definition 2.1: The set of terms in L
031
is X [ C where C is the set of constant symbols in
010. The set of proportion expressionsis the least set that
{a} contains the rational numbers,
{b} contains proportion terms of the form jj jj X
and k |022k
X
for formulas  ; 022 2 L
031 and
a finite set of variables X 022 X , and
{c} is closed under addition and multiplication. The set of formulas in
L
031
is the least set that {a} contains atomic formulas of the form R{t
1
; : : : ; t
r
}, where R is a predicate symbol in
010 [ f=} of arity r and t 1
; : : : ; t r
are terms,
{b} contains proportion formulas of the form 020 031
i 020
0
and020 026
i
020 0
, where 020 and 020
0
are
proportion expressions and i is a natural number, and
{c} is closed under conjunction, negation, and first-order quantification.Note
that L 031
allows the use of equality when comparing terms, but not when comparing
proportion expressions.
This definition allows arbitrary nesting of quantifiers and proportion expressions.
As
observed in {Bacchus, 1990}, the subscript x in a proportion expressions
binds the variable
x in the expression; indeed, we can view jj001jj
x
as a new type of quantification. 38Random Worlds and Maximum Entropy
We now need to define the semantics of the logic. As we shall see below,
most of the
definitions are fairly straightforward. The two features that cause problems
are approxi-
mate comparisons and conditional proportion expressions. We interpret the
approximate
connective 020 031
i
020 0
to mean that 020 is very close to 020
0 . Moreprecisely, it is within some very
small tolerance factor. We formalize this using a tolerance vector ~034
=h034
1 ; 034
2
; : : :i, 034
i
> 0.
Intuitively 020 031 i
020
0
if the values of 020 and 020 0
are within 034
i
of each other. Of course, one prob-
lem with this is that we generally will not know the value of 034 i
. Wepostpone discussion of
this issue until the next section. Another difficulty arises when interpreting
conditional proportion expressions. The problem is that k |022k
X
cannot be defined as a conditional probability when there are no assignments
to the variables in X that would satisfy 022, because we cannot divide by
zero. When standard equality is used rather than approximate equality this
problem is
easily overcome, simply by avoiding conditional probabilities in the semantics
altogether.
Following {Halpern, 1990}, we can eliminate conditional proportion expressions
altogether
by viewing a statement such as k |022k
X
= ff as an abbreviation for jj  ^ 022jj
X
= ffjj022jj
X
. Thus, we never actually form quotients of probabilities. This approach
agrees completely
with the standard interpretation of conditionals so long as jj022jj X
6= 0. If jj022jj
X
= 0, it
enforces the convention that formulas such as k |022k
X
= ff or k |022k
X
024 ff are true for any ff. {Note that we do not really care much what
happens in such cases, so long as it is
consistent and well-defined. This convention represents one reasonable choice.}
We used the same approach in an earlier version of this paper {Grove, Halpern,
&
Koller, 1992} in the context of a language that uses approximate equality.
Unfortunately,
as the following example shows, this has problems. Unlike the case for true
equality, if we multiply by jj022jj X
to clear all quotients, we do not obtain an equivalent formula even if
jj022jj
X is nonzero.
Example 2.2: First consider the knowledge base KB ={kFly{x}|Penguin{x}k
x
031
1 0}.
This says that the number of flying penguins forms a tiny proportion of
all penguins.
However, if we interpret conditional proportions as above and multiply out,
we obtain the knowledge base KB
0 = jjFly{x} ^ Penguin{x}jj x
031
1 0 001 jjPenguin{x}jj
x
, which is equivalent
to jjFly{x} ^ Penguin{x}jj
x
031 1
0. KB
0 just says that the number of flying penguins is small,
and has lost the {possibly important} information that the number of flying
penguins is small relative to the number of penguins. It is quite consistent
with KB
0
that all penguins fly
{provided the total number of penguins is small}; this is not consistent
with KB . Clearly, the
process of multiplying out across an approximate connective does not preserve
the intended
interpretation of the formulas.This example demonstrates an undesirable
interaction between the semantics we have
chosenfor approximate equality and the process of multiplying-out to eliminate
conditional
proportions. We expect k |022k
X 031
1
ff to mean that k |022k
X
is within some tolerance 034
1
of ff.
Assuming jj022jj X
> 0, this is the same as saying that jj  ^ 022jj X
is within 034 1
jj022jj X
of ff jj022jj
X
.
On the other hand, the expression that results by multiplying out is jj
^ 022jj
X
031
1
ffjj022jj X
.
This says that jj  ^ 022jj
X is within 034
1 {not 034
1
jj022jj
X
!} of ff jj022jj X
. As we saw above, the
difference between the two interpretations can be significant. Because of
this problem, we cannot treat conditional proportions as abbreviations and
instead have added them as primitive expressions in the language. Of course,
we now have
39Grove, Halpern, & Koller to give them a semantics that avoids the problem
illustrated by Example 2.2. We would
like to maintain the conventions used when we had equality in the language.
Namely,
in worlds where jj022{x}jj
x
6= 0, we want k {x}|022{x}k
x
to denote the fraction of elements satisfying 022{x} that also satisfy
{x}. In worlds where jj022{x}jj x
= 0, we want formulas of the form k {x}|022{x}k x
031
i
ff or k {x}|022{x}k
x
026
i
ff to be true. There are a number of
ways of accomplishing this. The way we take is perhaps not the simplest,
but it introduces
machinery that will be helpful later. The basic idea is to make the interpretation
of 031 more explicit, so that we can eliminate conditional proportions by
multiplication and keep track
of all the consequences of doing so. We give semantics to the language L
031
by providing a translation from formulas in L
031
to formulas in a language L
=
whose semantics is more easily described. The language L =
is
essentially the language of {Halpern, 1990}, that uses true equality rather
than approximate
equality when comparing proportion expressions. More precisely, the definition
of L
=
is
identical to the definition of L
031
given in Definition 2.1, except that:
*  we use = and 024 instead of 031
i
and 026
i
,

*  we allow the set of proportion expressions to include arbitrary real numbers
{not just
rational numbers},
*  we do not allow conditional proportion expressions,

*  we assume that L
= has a special family of variables "
i
, for i = 1; 2; : : :, interpreted
overthe reals. The variable "
i
is used to explicitly interpret the approximate equality connectives 031
i
and
026
i
. Once this is done, we can safely multiply out the conditionals, as described
above. More
precisely, every formula 037 2 L
031
can be associated with a formula 037
003
2 L =
as follows:

*  every proportion formula 020 026
i
020 0
in 037 is {recursively} replaced by 020 000020 0
024 "
i ,

*  every proportion formula 020 031
i
020
0
in 037 is {recursively} replaced by the conjunction {020 000 020
0
024 "
i
} ^ {020
0 000 020 024 "
i
},

*  finally, conditional proportion expressions are eliminated by multiplying
out.
This translation allows us to embed L
031
into L
=
. Thus, for the remainder of the paper, weregard L
031
as a sublanguage of L
= . This embedding avoids the problem encountered in
Example 2.2, because when we multiply to clear conditional proportions the
tolerances are
explicit, and so are also multipled asappropriate.
The semantics for L
=
is quite straightforward, and is similar to that in {Halpern, 1990}.
We give semantics to L
=
in terms of worlds, or finite first-order models. For any natural
number N , let W
N
consist of all worlds with domain {1; : : : ; N}. Thus, in W
N
, we have
one world for each possible interpretation of the symbols in 010 over the
domain{1; : : :; N}.
Let W
003
denote [ N
W
N
.
Now, consider some world W 2 W
003
over the domain D ={1; : : :; N}, some valuation
V : X ! D for the variables in X , and some tolerance vector ~034 . We
simultaneously assign toeach proportion expression 020 a real number [020]
{W;V;~034} and to each formula 030 a truth value
40Random Worlds and Maximum Entropy
with respect to {W; V; ~034}. Most of the clauses of the definition are
completely standard, so we omit them here. In particular, variables are interpreted
using V , the tolerance variables "
i
are interpreted using the tolerances 034 i
, the predicates and constants are interpreted using W ,
the Boolean connectives and the first-order quantifiers are defined in the
standard fashion, and when interpreting proportion expressions, the real
numbers, addition, multiplication, and 024 are given their standard meaning.
It remains to interpret proportion terms. Recall
that we eliminate conditional proportion terms by multiplying out, so that
we need to deal
only with unconditional proportion terms. If 020 is the proportion expression
jj jj
x i
1
;:::;x i
k
{for i 1
< i
2
< : : : < i
k
}, then
[020]
{W;V;~034}
=
1|D
k
|
fi fi
fi
n
{d
1
; : : : ; d k
} 2 D
k : {W; V [x i
1
=d
1
; : : : ; x
i
k
=d
k ]; ~034}|=
o
fi fi
fi
:
Thus, if|D| = N , the proportion expression jj jj
x
i
1
;:::;x i
k
denotes the fraction of the N
k
k-tuplesin D
k
that satisfy  . For example, [jjChild{x; y}jj
x ]
{W;V;~034}
is the fraction of domain elements d that are children of V {y}.
Using our embedding of L
031
into L
=
, we now have semantics for L
031
. For 037 2 L
031
, we
say that {W; V; ~034}|= 037 iff {W; V; ~034}|= 037
003
. It is sometimes useful in our future results to
incorporate particular values for the tolerances into the formula 037
003
. Thus, let 037[~034 ] represent
the formula that results from 037
003
if each variable "
i is replaced with its value according to ~034 , that is, 034 i
.
3
Typically we are interested in closed sentences, that is, formulas with
no free variables.
In that case, it is not hard to show that the valuation plays no role. Thus,
if 037 is closed,
we write {W; ~034}|= 037 rather than {W; V; ~034}|= 037. Finally, if
KB and 037 are closed formulas, we write KB|= 037 if {W; ~034}|= KB implies
{W; ~034}|= 037. 2.2 Degrees of belief
As we explained in the introduction, we give semantics to degrees of belief
by considering all
worlds of size N tobe equally likely, conditioning on KB , and then checking
the probability
of ' over the resulting probability distribution. In the previous section,
we defined what it
means for a sentence 037 to be satisfied in a world of size N using a tolerance
vector ~034 . Given
N and ~034 , we define #worlds
~034
N {037} to be the number of worlds in W
N
such that {W; ~034}|= 037.
Since we are taking all worlds to be equally likely, the degree of belief
in ' given KB with
respectto W
N
and ~034 is
Pr
~034 N
{'|KB} =
#worlds
~034 N
{' ^ KB}#worlds ~034
N
{KB}
:
If #worlds
~034
N {KB} = 0, this degree of belief is not well-defined.
The careful reader may have noticed a potential problem with this definition.
Strictly speaking, we should write W N
{010} rather than W
N
, since the set of worlds under consider-
ation clearly depends on the vocabulary. Hence, the number of worlds in
W
N
also depends
on the vocabulary. Thus, both #worlds
~034
N
{'} and #worlds
~034 N
{' ^ KB} depend on the choice3. Note that some of the tolerances 034
i
may be irrational; it is for this reason that we allowed irrational
numbers in proportion expressions in L
=
. 41Grove, Halpern, & Koller of 010. Fortunately, this dependence \cancels
out": If 010
0 033 010, then there is a constant c
such that for all formulas 037 over the vocabulary 010, #[010
0
]worlds ~034
N
{037} = c#[010]worlds
~034
N {037}.
This result, from which it follows that the degree of belief Pr
~034
N
{'|KB} is independent of
our choice of vocabulary, is proved in {Grove et al., 1993b}. Typically,
we know neither N nor ~034 exactly. All we know is that N is\large" and
that ~034 is \small". Thus, we would like to take our degree of belief
in ' given KB to
be lim ~034!
~
0
lim
N!1
Pr
~ 034
N
{'|KB}. Notice that the order of the two limits over~034 andN
is important. If the limit lim
~034!
~
0
appeared last, then we would gain nothing by using approximate equality,
since the result would be equivalent to treating approximate equality
as exact equality. This definition, however, is not sufficient; the limit
may not exist. We observed above
that Pr ~034
N
{'|KB} is not always well-defined. In particular, it may be the case that
for
certain values of ~034 , Pr
~034
N {'|KB} is not well-defined forarbitrarily large N . In order to
deal with this problem of well-definedness, we define KB to be eventually
consistent if
for all sufficiently small ~034 and sufficiently large N , #worlds
~034
N
{KB} > 0. Among other
things, eventual consistency implies that the KB issatisfiable in finite
domains of arbitrarily
large size. For example, a KB statingthat \there are exactly 7 domain elements"
is not
eventually consistent. For the remainder of the paper, we assume that all
knowledge bases are eventually consistent. In practice, we expect eventual
consistency to be no harder
to check than consistency. We do not expect a knowledge base to place bounds
on the domain size, except when the bound is readily apparent. For those
unsatisfied with this intuition, it is also possible to find formal conditions
ensuring eventual consistency. For
instance, it is possible to show that the following conditions are sufficient
to guarantee
that KB iseventually consistent: {a} KB does not use any non-unary predicates,
including
equality between terms and {b} KB is consistent for some domain size when
all approximate comparisons are replaced by exact comparisons. Since we concentrate
on unary languages in this paper, this result covers most cases of interest.
Even if KB iseventually consistent, the limit may not exist. For example,
it may be the case that Pr
~034 N
{'|KB} oscillates between ff + 034
i
and ff 000 034
i
for some i as N gets large. In
this case, for any particular ~034 , the limit as N grows will not exist.
However, it seems as if
the limit as ~034 grows small should, in this case, be ff, since the oscillations
We avoid such problems by considering the lim sup and lim inf, rather than
the limit. For
any set S 032 IR, the infimum of S, inf S, is the greatest lower bound
of S. Thelim inf of a sequence is the limit of the infimums; that is,
lim inf
N!1 a
N
= lim N!1
inf{a i
: i > N}:
The lim inf exists for any sequence bounded from below, even if the limit
does not. The lim
sup is defined analogously, where sup S denotes the least upper bound of
S. If lim
N!1 a
N
does exist, then lim
N!1
a
N
= lim inf N!1
a
N = lim sup
N!1 a
N
. Since, for any ~034 , the sequence Pr
~034
N
{'|KB} is always bounded from above and below, the lim sup and lim inf
always exist. Thus, we do not have to worry about the problem of nonexistence
for particular
values of ~034 . We can now present the final form of our definition.
Definition 2.3: If
lim ~034!
~
0
lim inf
N!1 Pr
~034
N {'|KB} and lim
~034!
~
0
lim sup N!1
Pr
~034
N
{'|KB}
42Random Worlds and Maximum Entropy
both exist and are equal, then the degree of belief in ' given KB , written
Pr
1
{'|KB}, is
defined as the common limit; otherwise Pr
1 {'|KB} does not exist. We close this section with a few remarks on our
definition. First note that, even using this definition, there are many cases
where the degree of belief does not exist. However,
as some of our later examples show, in many situations the nonexistence
of a degree of
belief can be understood intuitively {for instance, see Example 4.3 and
the subsequent
discussion}. We could, alternatively, have taken the degree of belief to
be the interval defined by lim
~034!
~
0
lim inf
N!1
Pr
~034
N
{'|KB} and lim
~034!
~
0
lim sup
N!1
Pr ~034
N
{'|KB}, provided
each of them exist. This would have been a perfectly reasonable choice;
most of the results
we state would go through with very little change if we had taken this definition.
Our
definition simplifies the exposition slightly. Finally, we remark that it
may seem unreasonable to take limits if we know the domain
size or have a bound on the domain size. Clearly,if we know N and ~034
, then it seems more
reasonable to use Pr
~034
N
rather than Pr 1
as our degree of belief. Indeed, as shown in {Bacchus et al., 1994}, many
of the important properties that hold for the degree of belief defined
by Pr
1
hold for Pr
~034
N
, for all choices of N and ~034 . The connection to maximum entropy
that we make in this paper holds only at the limit, but because {as our
proofs show} the
convergence is rapid, the degree of belief Pr
1
{'|KB} is typically a very good approximation
to Pr
~034
N
{'|KB}, even for moderately large N and moderately small ~034 .
3. Degrees of belief and entropy
3.1 Introduction to maximum entropy
The idea of maximizing entropy has played an important role in many fields,
including the study of probabilistic models for inferring degrees of belief
{Jaynes, 1957; Shannon &
Weaver, 1949}. In the simplest setting, we can view entropy as a real-valued
function on
finite probability spaces. If 012 is a finite set and 026 is a probability
measure on 012, the entropy
H{026} is defined to be 000
P
!2012
026{!} ln 026{!} {we take 0 ln 0 = 0}.
One standard application of entropy is the following. Suppose we know the
space 012, but
have only partial information about 026, expressed in the form of constraints.
For example,
we might have a constraint such as 026{! 1
} + 026{!
2
} 025 1=3. Although there may be many
measures 026 that are consistent with what we know, the principle of maximum
entropy
suggests that we adopt that 026
003
which has the largest entropy among all the consistent possibilities. Using
the appropriate definitions, it can be shown that there is a sense in
which this 026
003
incorporates the \least" additional information {Shannon & Weaver, 1949}.
For example, if we have no constraints on 026, then 026
003
will be the measure that assigns equal
probability to all elements of 012. Roughly speaking, 026 003
assigns probabilities as equally as
possible given the constraints. 3.2 From formulas to constraints Likemaximum
entropy, the random-worlds method is also used to determine degrees of be-
lief {i.e., probabilities} relative to a knowledge base. Aside from this,
is there any connection
between the two ideas? Of course, there is the rather trivial observation
that random-worlds
considers a uniform probability distribution {over the set of worlds satisfying
KB }, and it is 43Grove, Halpern, & Koller well-known that the uniform distribution
over any set has the highest possible entropy. But
in this section we show another, entirely different and much deeper, connection
between
random-worlds and the principle of maximum entropy. This connection holds
provided that we restrict the knowledge base so that it uses only unary predicates
and constants. Inthis
casewe can consider probability distributions, and in particular the maximum-entropy
dis-
tribution, over the set of atoms. Atoms are of course very different from
possible worlds;
for instance, there are only finitely many of them {independent of the domain
size N }.
Furthermore, the maximum-entropy distributions we consider will typically
not be uniform.
Nevertheless, maximum entropy in this new space can tell us a lot about
the degrees of belief defined by random worlds. In particular, this connection
will allow us to use maxi-
mum entropy as a tool for computing degrees of belief. We believe that the
restriction to
unary predicates is necessary for the connection we are about to make. Indeed,
as long as the knowledge base makes use of a binary predicate symbol {or
unary function symbol}, we
suspect that there is no useful connection between the two approaches at
all; see Section 5
for some discussion. Let L
031
1 be the sublanguage of L
031
where only unary predicate symbols and constant
symbols appear in formulas; in particular, we assume that equality between
terms does not
occur in formulas in L
031
1
.
4
{Recall that in L
031
, we allow equality between terms, but disallow equality between proportion
expressions.} Let L
= 1
be the corresponding sublanguage of
L
=
. In this subsection, we show that the expressive power of a knowledge base
KB in the
language L
031 1
is quite limited. In fact, such a KB can essentially only place constraints
on the
proportions of the atoms. If we then think of these as constraints on the
\probabilities of the atoms", then we have the ingredients necessary to
apply maximum entropy. In Section 3.3
weshow that there is a strong connection between the maximum-entropy distribution
found
this way and the degree of belief generated by random-worlds method.
To see what constraints a formula places on the probabilities of atoms,
it is useful to
convert the formula to a certain canonical form. Asa first step to doing
this, we formalize
the definition of atom given in the introduction. Let P ={P
1
; : : : ; P
k
} consist of the unary
predicate symbols in the vocabulary 010.
Definition 3.1: An atom {over P} is conjunction of the form P
0
1
{x} ^ : : : ^ P 0
k
{x}, where
each P
0
i
is either P i
or :P
i
. Since the variable x is irrelevant to our concerns, we typically
suppress it and describe an atom as a conjunction of the form P
0
1
^ : : : ^ P
0
k
.Note that there are 2
jPj
= 2
k atoms over P andthat they are mutually exclusive and
exhaustive. Throughout this paper, we use K todenote 2
k
and A
1
; : : : ; A K
to denote the
atoms over P, listed in some fixed order.
Example 3.2: There are K = 4 atoms over P = {P
1
; P 2
}: A
1
= P
1 ^P
2
, A
2
= P 1
^ :P
2 ,
A
3
= :P
1
^ P
2
, A
4
= :P
1 ^ :P
2
.The atomic proportion terms jjA
1
{x}jj
x
; : : : ; jjA
K
{x}jj
x
will play a significant role in
our technical development. It turns out that L
031 1
is a rather weak language: a formula
KB 2L
031
1
does little more than constrain the proportion of the atoms. In other words,
for4. We remark that many of our results can be extended to the case where
the KB mentions equality, but
the extra complexity obscures many of the essential ideas.
44Random Worlds and Maximum Entropy
any such KB we can find an equivalent formula in which the only proportion
expressions
are these unconditional proportions of atoms. The more complex syntactic
machinery in L
031
1
|proportions over tuples, first-order quantification, nested proportions,
however; knowledge
can often be expressed far more succinctly if the full power of the language
is used.}
Given any KB , the first step towards applying maximum entropy is to use
L 031
1
's lack of expressivity and replace all proportion terms by atomic proportion
terms. It is also useful
to make various other simplifications to KB that will help us in Section
4. We combine
these steps and require that KB be transformed into a special canonical
form which we now describe.
Definition 3.3: An atomic term t over P is a polynomial overterms of the
form jjA{x}jj
x
,
where A is an atom over P. Such an atomic term t is positive if every coefficient
of the
polynomial t is positive.Definition 3.4: A {closed} sentence 037 2 L
=
1
is in canonical form if it is a disjunction of
conjunctions, where each conjunct is one of the following:

*  t 0
= 0, {t
0
> 0 ^ t 024 t
0
"
i }, or {t
0
> 0 ^ :{t 024 t
0
"
i }), where t and t
0
are atomic terms and t
0
is positive,
*  9x A
i {x} or :9x A
i
{x} some atom A
i
, or
*  A
i
{c} for some atom A
i
and some constant c.
Furthermore, a disjunct cannot contain both A
i
{c} and A
j
{c} for i 6= j as conjuncts, nor can
it contain both A i
{c} and :9x A
i
{x}. {Note that these last conditions are simply minimal
consistency requirements.}Theorem 3.5: Every formula in L
=
1 is equivalent to a formula in canonical form. More-
over, there is an effective procedure that, given a formula 030 2 L
=
1 , constructs an equivalent
formula
b
030 in canonical form.
The proof of this theorem, and of all theorems in this paper, can be found
in the appendix.
We remark that the length of the formula
b
030 is typically exponential in the length of 030.
Such a blowup seems inherent in any scheme defined in terms of atoms. Theorem
3.5 is a generalization of Claim 5.7.1 in {Halpern, 1990}. It, in turn, is
a generalization of a well-known result which says that any first-order formula
with only unary
predicates is equivalent to one with only depth-one quantifier nesting.
Roughly speaking,
this is because for a quantified formula such as 9x 030
0
, subformulas talking about a variable
y other than x can be moved outside the scope of the quantifier. This is
possible because
no literal subformula can talk about x and y together. Our proof uses the
same idea and extends it to proportion statements. Inparticular, it shows
that for any 030 2 L
031
1 there is an
equivalent ^
030 which has no nested quantifiers or nested proportions.
Notice, however, that such a result does not hold once we allow even a single
binary predicate in the language. For example, the formula 8y 9x R{x; y}
clearly needs nested quantification because R{x; y} talks about both x and
y and so must remain within the
45Grove, Halpern, & Koller scope of both quantifiers. With binary predicates,
each additional depth of nesting really
does add expressive power. This shows that there can be no \canonical form"
theorem quite
like Theorem 3.5 for richer languages. This issue is one of the main reasons
why we restrict the KB to a unary language in this paper. {SeeSection 5 for
further discussion.} Given any formula in canonical form we can immediately
derive from it, in a syntactic manner, a set of constraints on the possible
proportions of atoms.
Definition 3.6: Let KB bein canonical form. Weconstruct a formula 000{KB
} in the lan-
guageof real closed fields {i.e., over the vocabulary{0;1; +; 002g} as
follows, where u
1 ;: : : ; u
K
are fresh variables {distinct from the tolerance variables "
j }:

*  we replace each occurrence of the formula A
i
{c} by u i
> 0,

*  we replace each occurrence of 9x A
i
{x} by u
i
> 0 and replace each occurrence of :9x A
i
{x} by u
i
= 0,

*  we replace each occurrence of jjA
i
{x}jj
x
by u
i
.Notice that 000{KB} has two types of variables: the new variables u
i that we just introduced, and the tolerance variables " i
. In order to eliminate the dependence on the latter, we often consider
the formula 000{KB [~034 ]} for some tolerance vector ~034 .
Definition 3.7: Given a formula fl over the variables u
1
; : : : ; u
K
, let Sol [fl] be the set of
vectors in 001
K
={~u 2 [0; 1]
K :
P
K
i
u
i
= 1} satisfying fl. Formally, if {a
1
; : : :; a
K
} 2 001
K
, then {a
1
; : : : ; a
K
} 2 Sol [fl] iff {IR; V }|= fl, where V is a valuation such that V{u
i
} = a
i
.Definition 3.8: The solution space of KB given ~034 , denoted S
~034
[KB], is defined to be the
closure of Sol[000{KB[~034 ]}].
5If KB is not in canonical form, we define 000{KB} and S
~034
[KB] to be 000{
d
KB } and S ~034
[
d KB],
respectively, where
d
KB is the formula in canonical form equivalent to KB obtained by the
procedure appearing in the proof of Theorem 3.5.
Example 3.9: Let P be{P
1
; P
2
}, with the atoms ordered as in Example 3.2. Consider
KB =8x P
1
{x} ^ 3kP
1
{x} ^ P
2 {x}k
x 026
i
1: The canonical formula
d KB equivalent to KB is:
6
:9x A 3
{x} ^ :9x A
4
{x} ^ 3jjA
1
{x}jj
x 000 1 024 "
i :
As expected,
d
KB constrains both jjA
3
{x}jj
x
and jjA
4
{x}jj
x
{i.e., u 3
and u
4 } to be 0. We also see that jjA
1 {x}jj
x {i.e., u
1
} is {approximately} at most 1=3. Therefore:
S
~034 [KB] =
n {u
1
;: : : ; u
4
} 2 001 4
: u
1 024 1=3+ 034 i
=3; u
3
= u
4 = 0
o
:5. Recall that the closure of a set X 022 IR
K
consists of all K-tuples that are the limit of a sequence of
K-tuples in X.
6. Notethat here we are viewing KB asa formula in L
= , under the translation defined earlier; we do this
throughout the paper without further comment.
46Random Worlds and Maximum Entropy
3.3 The concentration phenomenon With every world W 2 W
003
, we can associate a particular tuple {u
1 ; : : :; u
K }, where u
i
is
the fraction of the domain satisfying atom A
i
in W :
Definition 3.10: Givena world W 2 W
003
, we define 031{W } 2 001
K
to be
{jjA
1
{x}jj
x
; jjA
2 {x}jj
x ; : : : ; jjA
K
{x}jj
x
}
where the values of the proportions are interpreted over W . We say that
the vector 031{W }
is the point associated with W .We define the entropy ofany model W to be
the entropy of 031{W }; that is, if 031{W } =
{u 1
; : : : ; u
K
}, then the entropy of W is H{u
1 ;: : : ; u
K
}. As we are about to show, the entropy
of ~u turns out to be a very good asymptotic indicator of how many worlds
W there are such that 031{W } =~u. In fact, there are so many more worlds
near points of high entropy that
we can ignore all the other points when computing degrees of belief. This
concentration
phenomenon, as Jaynes {1982} has called it, is essentially the content of
the next lemma
andjustifies our interest in the maximum-entropy point{s} of S
~034
[KB].
For any S 022 001
K
let #worlds ~034
N
[S]{KB} denote the number of worlds W of size N
such that {W; ~034}|= KB andsuch that 031{W } 2 S; for any ~u 2 001
K let #worlds
~034 N
[~u]{KB}
abbreviate #worlds ~034
N
[{~u}]{KB}. Of course #worlds
~034
N
[~u]{KB} is necessarily zero unless all components of ~u are multiples of
1=N . However, if there are any models associated with ~u
at all, we can estimate their number quite accurately using the entropy
function:
Lemma 3.11: There exist some function h : IN ! IN andtwo strictly positive
polynomial
functions f; g : IN !IR such that, for KB 2 L
031
1
and~u 2 001
K
, if #worlds
~034
N [~u]{KB} 6= 0,
then
{h{N }=f {N })e
NH{~u}
024 #worlds ~034
N
[~u]{KB} 024 h{N }g{N }e
NH{~u}
:
Of course, it follows from the lemma that tuples whose entropy is near maximum
have
overwhelmingly more worlds associated with them than tuples whose entropy
is further from maximum. Thisis essentially the concentration phenomenon.
Lemma 3.11 is actually fairly easy to prove. The following simple example
illustrates
themain idea.
Example 3.12: Suppose 010={P} and KB =true: We have 001
K
= 001 2
={{u 1
; 1 000 u 1
} : 0 024 u
1
024 1} ;
where the atoms are A 1
= P and A 2
= :P . For any N , partition the worlds in W
N
according to the point to which they correspond. For example, the graph
in Figure 1 shows us the partition of W
4 . In general, consider some point~u = {r=N; {N 000 r}=N }. The number
of worlds corresponding to ~u is simply the number of ways of choosing the
denotation of P . We need to choose which r elements satisfy P ; hence, the
number of such worlds is 000
N
r
001
=
N!r!{N000r}!
. Figure 2 shows the qualitative behavior of this function for large values
of
N . It is easy to see the asymptotic concentration around ~u = {0:5; 0:5}.
47Grove, Halpern, & Koller00.250.50.751||P{x}||x#worlds4Figure 1: Partitionof
W 4
according to 031{W }.
00.10.20.30.40.50.60.70.80.91Figure 2: Concentrationphenomenon for worlds
of size N .
We can estimate the factorials appearing in this expression using Stirling's
approx-
imation, which asserts that the factorial m! is approximately m
m
= e
m ln m
. So, after
substituting for the three factorials, we can estimate 000
N
r
001
as e
N log N000{r log r+{N000r} log{N000r})
,
which reduces to e
NH{~u}
. Theentropy term in the general case arises from the use of Stir-
ling's approximation in an analogous way. {Amore careful estimate is done
in the proof of
Lemma 3.11 in the appendix.}48Random Worlds and Maximum Entropy
Because of the exponential dependence onN times the entropy, the number
of worlds
associated with points of high entropy swamp all other worlds as N grows
large. This
concentration phenomenon, well-known in the field of statistical physics,
forms the basis
for our main result in this section. It asserts that it is possible to compute
degrees of
belief according to random worlds while ignoring all but those worlds whose
entropy is near
maximum. The next theorem essentially formalizes this phenomenon. Theorem
3.13: For all sufficiently small ~034 , the following is true. Let Q be
the points with greatest entropy in S
~034
[KB] and let O 022 IR
K be any open set containing Q. Then for all
022 2 L
031
and for lim
003
2 flim sup; lim inf} we have lim
N!1
003 Pr
~034
N {022|KB } = lim
N!1
003
#worlds
~034
N
[O]{022 ^ KB }#worlds ~034
N
[O]{KB}
:
We remark that this is quite a difficult theorem. We have discussed why
Lemma 3.11 lets
us look at models of KB whose entropy is {near} maximum. But the theorem
tells us to look at the maximum-entropy points of S
~034
[KB ], which we defined using a {so far unmotivated}
syntactic procedure applied to KB. It seems reasonable to expect that S
~034
[KB] should tell
us something aboutmodels of KB. But making this connection precise, and
in particular showing how the maximum-entropy points of S
~034
[KB ] relate to models of KB withnear-
maximum entropy, is difficult. However, we defer all details of the proof
of that result to
the appendix.
In general, Theorem 3.13 may seem to be of limited usefulness: knowing that
we only
have to look at worlds near the maximum-entropy point does not substantially
reduce the number of worlds we need to consider. {Indeed, the whole point
of the concentration
phenomenon is that almost all worlds have high entropy.} Nevertheless,as
the rest of this
paper shows, this result can be quite useful when combined with the following
two results.
The first of these says that if all the worlds near the maximum-entropy
points have a certain property, then we should have degree of belief 1 that
this property is true.
Corollary 3.14: For all sufficiently small ~034 , the following is true.
Let Q be the points with
greatest entropy in S
~034 [KB], let O 022 IR
K
be an open set containing Q, and let 022[O] 2 L
=
be
an assertion that holds for every world W such that 031{W } 2 O. Then
Pr
~034
1 {022[O]|KB} = 1:
Example 3.15: For the knowledge base true in Example 3.12, it is easy to
see that the
maximum-entropy point is {0:5; 0:5}. Fix some arbitrary
*  > 0. Clearly, there is some open
set O around this point such that the assertion 022 = jjP {x}jj
x 2 [0:5 000
* ;0:5 +
* ] holds for every world in O. Therefore, we can conclude that
Pr
~034
1
{jjP {x}jj
x
2 [0:5 000
* ; 0:5 +
* ] |true} = 1:As we show in {Bacchus et al., 1994}, formulas 022 with degree
of belief 1 can essentially
be treated just like other knowledge in KB . That is, the degrees of belief
relative to KB
and KB ^022 will be identical {even if KB and KB ^022 are not logically
equivalent}. More
formally: 49Grove, Halpern, & Koller Theorem 3.16: {Bacchus et al., 1994}
If Pr
~034
1
{022|KB} = 1 and lim
003
2 flim sup; lim inf},
then for any formula ':
lim
N!1 003
Pr
~034 N
{'|KB} = lim
N!1
003 Pr
~034
N {'|KB ^022}:
Proof: For completeness, we repeat the proof from {Bacchus et al., 1994}
here. Basic
probabilistic reasoning shows that, for any N and ~034 :
Pr
~034
N
{'|KB} = Pr
~034
N
{'|KB ^ 022} Pr ~034
N
{022|KB} + Pr
~034
N
{'|KB ^ :022} Pr
~034
N
{:022|KB }:
By assumption, Pr
~034
N
{022|KB } tends to 1 when we take limits, so the first term tends to
Pr
~034
N
{'|KB ^ 022}. On the other hand, Pr ~034
N
{:022|KB } has limit 0. BecausePr
~034
N
{'|KB ^ :022} is
bounded, we conclude that the second product also tends to 0. The result
follows.As we shall see in the next section, the combination of Corollary
3.14 and Theorem 3.16
is quite powerful. 4. Computingdegrees of belief Although the concentration
phenomenon is interesting, its application to actually computing degrees
of belief may not be obvious. Since we know that almost all worlds will have
high
entropy, a direct application of Theorem 3.13 does not substantially reduce
the number of
worlds we must consider. Yet, as we show in this section, the concentration
theorem can
form the basis of a practical technique for computing degrees of belief
in many cases. We
begin in Section 4.1 by presenting the intuitions underlying this technique.
In Section 4.2
we build on these intuitions by presenting results for a restricted class
of formulas: those queries which are quantifier-free formulas over a unary
language with a single constant symbol. In spite of this restriction, many
of the issues arising in the general case can be
seen here. Moreover, as we show in Section 4.3, this restricted sublanguage
is rich enough
to allow us to embed two well-known propositional approaches that make use
of maximum entropy: Nilsson's probabilistic logic {Nilsson, 1986}and the
maximum-entropy extension
of
* -semantics {Geffner & Pearl, 1990} due to Goldszmidt, Morris, Pearl {1990}
{Goldszmidt, Morris, & Pearl, 1993}). In Section 4.4, we consider whether
the results for
the restricted language can be extended. We show that they can, but several
difficult and
subtle issues arise.
4.1 The general strategy Although the random-worlds method is defined by
counting worlds, we can sometimes find
more direct ways to calculate the degrees of belief it yields. In {Bacchus
et al., 1994} we present a number of such techniques, most of which apply
only in very special cases. One
of the simplest and most intuitive is the following version of what philosophers
have termed
direct inference {Reichenbach, 1949}. Suppose that all we know about an
individual cis
some assertion  {c}; in other words, KB hasthe form  {c} ^ KB
0
, and the constant c does
not appear in KB
0
. Also suppose that KB , together with a particular tolerance ~034 , implies
that k'{x}| {x}k
x
is in some interval [ff; fi]. It seems reasonable to argue that c is should
be treated as a \typical" element satisfying  {x}, because by assumption
KB containsno
50Random Worlds and Maximum Entropy
information suggesting otherwise. Therefore, we might hope to use the statistics
directly,
and conclude that Pr ~034
1
{'{c}|KB} 2 [ff; fi]: This is indeed the case, as the following theorem
shows.
Theorem 4.1: {Bacchus et al., 1994} Let KB be a knowledge base of the form
{~c } ^ KB
0
,
and assume that for all sufficiently small tolerance vectors ~034 ,
KB [~034]|= k'{~x}| {~ x}k
~ x
2 [ff; fi]:
If no constant in ~c appears in KB 0
, in '{~x}, or in  {~x}, then Pr
1 {'{~c }|KB} 2 [ff; fi] {if
the degree of belief exists at all}.
This result, in combination with the results of the previous section, provides
us with a
very powerful tool. Roughlyspeaking, we propose to use the following strategy:
Thebasic
concentration phenomenon says that most worlds are very similar in a certain
sense. As
shown in Corollary 3.14, we can use this to find some assertions that are
\almost certainly"
true {i.e., with degree of belief 1} even if they are not logically implied
by KB . Theorem 3.16
then tells us that we can treat these new assertions as if they are in fact
known with certainty. When these new assertions state statistical \knowledge",
they can vastly increase
our opportunities to apply direct inference. The following example illustrates
this idea.
Example 4.2: Consider a very simple knowledge base over a vocabulary containing
the
single unary predicate {P}:
KB ={jjP {x}jj
x
026
1
0:3}:
There are two atoms A
1
and A 2
over P, with A
1
= P and A
2
= :P . The solution space of
this KB given~034 isclearly S
~034
[KB ] ={{u
1
; u
2
} 2 001
2
: u
1
024 0:3+ 034
1}:
A straightforward computation shows that, for 034 1
< 0:2, this has a unique maximum-entropy
point ~v = {0:3+ 034 1
; 0:7 000 034
1
}.
Now, consider the query P {c}. For all
*  > 0, let 022[
* ] be the formula jjP {x}jj
x 2 [{0:3 +
034
1
} 000
* ; {0:3 + 034
1
} +
* ]. This satisfies the condition of Corollary 3.14, so it follows that
Pr
~034 1
{022[
* ]|KB} = 1: Using Theorem 3.16, we know that for lim
003
2 flim inf; lim sup},
lim
N!1
003
Pr
~034
N
{P {c}|KB} = lim
N!1
003
Pr
~034
N {P {c}|KB ^022[
* ]}:
But now we can use direct inference. {Note that here, our \knowledge" about
c is vacuous,
i.e., \true{c}".} We conclude that, if there is any limit at all, then
necessarily Pr
~034
1 {P {c}|KB ^022[
* ]} 2 [{0:3 + 034
1
} 000
* ; {0:3 + 034
1
} +
* ]:
So, for all
*  > 0,
Pr
~034
1
{P {c}|KB} 2 [{0:3+ 034
1
} 000
* ; {0:3 + 034
1
} +
* ]:
Since this is true for all
* , the only possible value for Pr
~034
1
{P {c}|KB} is 0:3 + 034
1
, which is the value of u
1
{i.e., jjP {x}jj
x
} at the maximum-entropy point. Note that it is also clear what
happens as ~034 tendsto
~
0: Pr 1
{P {c}|KB} is 0:3.51Grove, Halpern, & Koller This example demonstrates the
main steps of one possible strategy for computing degrees of belief. First
the maximum-entropy points of the space S
~034
[KB ] are computed as a function
of ~034 . Then, these are used to compute Pr ~034
1
{'|KB}, assuming the limit exists {if not, the
lim sup and lim inf of Pr
N
{'|KB} are computed instead}. Finally, we compute the limit of
this probability as ~034 goes to zero. Unfortunately, this strategy has
a serious potential problem. We clearly cannot compute Pr
~034
1 {'|KB} separately for each of the infinitely many tolerance vectors ~034
andthen take the limit as ~034 goesto 0. We might hope to compute this probability
as an explicit function of
~034 , and then compute the limit. For instance, in Example 4.2 Pr ~034
1
{P {c}|KB} was found to
be 0:3 + 034
1
, and so it is easy to see what happens as 034 1
! 0. But there is no reason to
believe that Pr
~034
1
{'|KB} is, in general, an easily characterizable function of ~034 . If
it is not,
then computing the limit as~034 goes to 0 can be difficult or impossible.
We would like to
find a way to avoid this explicit limiting process altogether. It turns
out that this is indeed
possible in some circumstances. The main requirement is that the maximum-entropy
points
of S
~034
[KB ] converge to the maximum-entropy points of S
~
0
[KB ]. {For future reference, notice
that S ~
0
[KB] is the closure of the solution space of the constraints obtained from
KB by
replacing all occurrences of 031
i by = and all occurrences of 026
i
by 024.} Inmany such cases,
we can compute Pr
1
{'|KB} directly in terms of the maximum-entropy points of S
~
0
[KB], without taking limits at all.
As the following example shows, this type of continuity does not hold in
general: the
maximum-entropy points of S
~034
[KB] do not necessarily converge to those of S
~
0
[KB].
Example 4.3: Consider the knowledge base
KB = {jjP {x}jj x
031
1
0:3 _ jjP{x}jj
x
031 2
0:4} ^ jjP{x}jj
x 6031
3
0:4 :
It is easy to see that S
~
0
[KB] is just{{0:3; 0:7}}: The point {0:4; 0:6} is disallowed by the
second conjunct. Now, consider S ~034
[KB] for ~034 >
~
0. If 034
2 024 034
3
, then S
~034
[KB] indeed does
not contain points where u
1
is near 0:4; the maximum-entropy point of this space is easily seen to be
0:3 + 034 1
. However, if 034
2
> 034 3
then there will be points in S
~034
[KB] where u
1 is
around 0:4; for instance, those where 0:4 + 034 3
< u
1 024 0:4 + 034 2
. Since these points have
a higher entropy than the points in the vicinity of 0:3, the former will
dominate. Thus,
the set of maximum-entropy points of S
~034
[KB ] does not converge to a single well-defined
set. What it converges to {if anything} depends on how ~034 goes to
~
0. This nonconvergence
has consequences for degrees of belief. It is not hard to show Pr
~034
1 {P {c}|KB} can be either
0:3 + 034
1
or 0:4 + 034
2
, depending on the precise relationship between 034
1
, 034
2 , and 034
3
. Itfollows
that Pr
1
{P {c}|KB} does not exist.We say that a degree of belief Pr
1
{'|KB} is not robust if the behavior of Pr
~034
1
{'|KB} {or of
liminf Pr ~034
N
{'|KB} and lim sup Pr ~034
N
{'|KB}) as ~034 goes to
~
0 depends on how ~034 goes to
~
0. In
other worlds, nonrobustness describes situations when Pr
1
{'|KB} does not exist because of sensitivity to the exact choice of tolerances.
We shall see a number of other examples of
nonrobustness in later sections.
Itmight seem that the notion of robustness is an artifact of our approach.
Inparticular,
it seems to depend on the fact that our language has the expressive power
to say that the two
tolerances represent a different degree of approximation, simply by using
different subscripts 52Random Worlds and Maximum Entropy
{031
2
vs.031
3
in the example}. In an approach to representing approximate equality that
does
not make these distinctions, we are bound to get the answer 0:3 in the example
above, since then jjP {x}jj
x
6031
3 0:4 really would be the negation of jjP {x}jj
x
031
2 0:4. We would argue that the answer 0:3 is not as reasonable as it might
at first seem. Suppose one of the two
different instances of 0:4 in the previous example had been slightly different;
for example, suppose we had used 0:399 rather than 0:4 in the first of them.
In this case, the second
conjunct is essentially vacuous, and can be ignored. The maximum-entropy
point in S
~
0
[KB]
is now 0:399, and we indeed derive a degree of belief of 0:399 in P {c}.
Thus, arbitrarily small
changes to the numbers in the original knowledge base can cause large changes
in our degrees
of belief. But these numbers are almost always the result of approximate
observations; this
is reflected by our decision to use approximate equality rather than equality
when referring
to them. It does not seem reasonable to base actions on a degree of belief
that can change
so drastically in the face of small changes in the measurement of data.
Note that, if we
know that the two instances of 0:4 do, in fact, denote exactly the same
number, we can represent this by using the same approximate equality connective
in both disjuncts. In this
case, it is easy to see that we do get the answer 0:3. A close look at the
example shows that the nonrobustness arises because of the negated proportion
expression jjP {x}jj
x
6031
3
0:4. Indeed, we can show that if we start with a KB
in canonical form that does not contain negated proportion expressions then,
in a precise
sense, the set of maximum-entropy points of S
~034
[KB] necessarily converges to the set of
maximum-entropypoints of S
~
0 [KB]. An argument can be made that we should eliminate
negated proportion expressions from the language altogether. It is one thing
to argue
that sometimes we have statistical values whose accuracy we are unsure about,
so that we want to make logical assertions less stringent than exact numerical
equality. It is harder
to think of cases in which the opposite is true, and all we know is that
some statistic is
\not even approximately" equal to some value. However, we do not eliminate
negated
proportion expressions from the language, since without them we would not
be able to
prove an analogue to Theorem 3.5. {They arise when we try to flatten nested
proportion
expressions, for example.} Instead, we have identified a weaker condition
that is sufficient
to prevent problems such as that seen in Example 4.3. Essential positivity
simply tests that negations are not interacting with the maximum-entropy
computation in a harmful way. Definition 4.4: Let 000 024
{KB[
~
0]} be the result of replacing each strict inequality in 000{KB[
~
0]}
with its weakened version. More formally, we replace each subformula of
the form t < 0
with t 024 0, and each subformula of the form t > 0 with t 025 0. {Recall
that these are the only constraints possible in 000{KB [
~
0]}, since all tolerance variables " i
are assigned 0.} Let S
024
~
0
[KB] beSol [000
024 {KB[
~
0]}], where we useX to denote the closure of X. Wesay that KB
is essentially positive if the sets S
024
~
0 [KB] and S
~ 0
[KB] have the same maximum-entropy
points.Example 4.5: Consider again the knowledge base KB from Example 4.3.
The constraint
formula 000{KB[
~ 0]} is {after simplification}: {u
1
= 0:3_ u
1
= 0:4}^ {u
1
< 0:4_ u
1
> 0:4}:
Its \weakened" version is 000
024
{KB[ ~
0]}:
{u 1
= 0:3_ u
1
= 0:4}^ {u
1
024 0:4_ u
1 025 0:4};
53Grove, Halpern, & Koller which is clearly equivalent tou
1
= 0:3 _ u
1
= 0:4. Thus, S
~ 0
[KB ] = {{u
1
; u
2
} 2 001 2
:
u
1 024 0:3} whereas S
024
~
0 [KB ] = S
~ 0
[KB] [ f{0:4; 0:6}}. Since the two spaces have different
maximum-entropy points, the knowledge base KB isnot essentially positive.As
the following result shows, essential positivity suffices to guarantee that
the maximum-
entropy points of S
~034
[KB] converge to those of S
~
0
[KB].
Proposition 4.6: Assume that KB is essentially positive and let Q be the
set of maximum- entropy points of S
~ 0
[KB] {and thus also of S
024
~
0 [KB]}. Then for all
*  > 0 and all sufficiently small tolerance vectors ~034 {where \sufficiently
small" may depend on
* }, every maximum-
entropypoint of S
~034
[KB] is within
*  of some maximum-entropy point in Q.
4.2 Queries for a single individual
We now show how to compute Pr
1
{'|KB} for a certain restricted class of first-order for-
mulas ' and knowledge bases KB. The most significantly restriction is that
the query '
should be a quantifier-free {first-order} sentence over the vocabulary P
[ fc}; thus, it is a queryabout a single individual, c. While this class
is rather restrictive, it suffices to express
many real-life examples. Moreover, it is significantly richer than the language
considered
by Paris and Vencovska {1989}. The following definition helps define the
class of interest.
Definition 4.7: A formula is essentially propositional ifit is a quantifier-free
and proportion-
free formula in the language L
031
{{P 1
; : : : ; P k
}} {so that, in particular, it has no constant
symbols}and has only one free variable x.We say that '{c} is a simple query
for KB if:

*  '{x} is essentially propositional,

*  KB isof the form  {c} ^ KB
0 , where  {x} is essentially propositional and KB
0
does
not mention c.
Thus, just as in Theorem 4.1, we suppose that  {c} summarizes all that is
c. In this section, we focus on computing the degree of belief Pr
1 {'{c}|KB} for a formula
'{c} which is a simple query for KB .
Note that an essentially propositional formula 030{x} is equivalent to
a disjunction of
atoms.For example, over the vocabulary{P
1
; P
2
}, the formula P
1
{x} _ P
2
{x} is equivalent
toA 1
{x}_A
2
{x}_A
3
{x} {where the atoms are ordered as in Example 3.2}. For an essentially
propositional formula 030, we take A{030} be the {unique} set of atoms
such that 030 is equivalent to
W
A
j
2A{030}
A
j
{x}. If we view a tuple~u 2 001
K
as a probability assignment to the atoms, we can extend~u to
a probability assignment on all essentially propositional formulas using
this identification
of an essentially propositional formula with a set of atoms: Definition
4.8: Let 030 be an essentially propositional formula. We define a function
F
[030]
:
001 K
!IR as follows: F
[030]
{~u} =
X
A
j
2A{030}
u
j
:
54Random Worlds and Maximum Entropy
For essentially propositional formulas '{x} and  {x} we define the {partial}
function F
['j ] :
001
K
! IR to be:
F
['j ]
{~u} =
F
['^ ]
{~u}F [ ]
{~u}
:
Note that this function is undefined when F
[ ]
{~u} = 0.As the following result shows, if ' is a simple query for KB {of
the form  {c}^KB
0
}, then all that matters in computing Pr 1
{'|KB} is F
['j ] {~u} for tuples~u of maximum entropy. Thus, in a sense, we are only
using KB
0
to determine the space over which we maximize
entropy. Having defined this space, we can focus on   and ' in determining
the degree of belief.
Theorem 4.9: Suppose '{c} is a simple query for KB . For all ~034 sufficiently
small, if Q
is the set of maximum-entropy points in S
~034 [KB] and F
[ ]
{~v} > 0 for all ~v 2 Q, then for
lim
003
2 flim sup; lim inf} we have
lim N!1
003
Pr ~034
N
{'{c}|KB} 2
"
inf
~v2Q
F
['j ]
{~v}; sup
~v2Q F
['j ] {~v}
# :
The following is an immediate but important corollary of this theorem. It
asserts that, if
the space S
~034
[KB] has a unique maximum-entropy point, then its value uniquely determines
the probability Pr
~034 1
{'{c}|KB}.
Corollary 4.10: Suppose '{c} is a simple query for KB. For all ~034 sufficiently
small, if ~v
is the unique maximum-entropy point in S
~034
[KB] and F
[ ]
{~v} > 0, then Pr
~034
1 {'{c}|KB} = F
['j ]
{~v}:
We are interested in Pr 1
{'{c}|KB}, which means that we are interested in the limit of
Pr ~034
1
{'{c}|KB} as ~034 !
~ 0. Suppose KB isessentially positive. Then, by the results of the previous
section and the continuity of F
['j ] , it is enough to look directly at the maximum-
entropy points of S
~
0
[KB]. More formally, by combining Theorem 4.9 with Proposition 4.6,
we can show:
Theorem 4.11: Suppose '{c} is a simple query for KB . If the space S
~
0
[KB] has a unique
maximum-entropy point~v, KB is essentially positive, and F
[ ] {~v} > 0, then
Pr
1
{'{c}|KB} = F
['j ]
{~v}: We believe that this theorem will turn out to cover a lot of cases
that occur in practice.
As our examples and the discussion in the next section show, we often do
get simple queries
and knowledge bases that are essentially positive. Concerning the assumption
of a unique
maximum-entropy point, note that the entropy function is convex and so this
assumption
is automatically satisfied if S
~
0
[KB] is a convex space. Recall that a space S is convex if
for all~u;~ u
0
2 S, and all ff 2 [0; 1], it is also the case that ff~u + {1 000 ff}~u
0
2 S. The space
S ~
0
[KB] is surely convex if it is defined using a conjunction of linear constraints.
While it
is clearly possible to create knowledge bases where S
~
0 [KB] has multiple maximum-entropy
55Grove, Halpern, & Koller points{for example, using disjunctions}, we expect
that such knowledge bases arise rarely in
practical applications. Perhaps the most restrictive assumption made by
this theorem is the
seemingly innocuous requirement that F
[ ]
{~v} > 0. This assumption is obviously necessary
forthe theorem to hold; without it, the function F
['j ]
is simply not defined. Unfortunately,
we show in Section 4.4 that this requirement is, in fact, a severe one;
in particular, it prevents
the theorem from being applied to most examples derived from default reasoning,
using our statistical interpretation of defaults {Bacchus et al., 1994}.
We close this subsection with an example of the theorem in action. Example
4.12: Let the language consist of P ={Hepatitis; Jaundice; BlueEyed} and
the
constant Eric. Thereare eight atoms in this language. We use A P
0
1
P
0
2
P 0
3
to denote the atom P
0
1
{x}^P
0
2 {x}^P
0 3
{x}, where P
0
1
is either H {denoting Hepatitis} orH {denoting :Hepatitis},
P
0 2
is J orJ {for Jaundice and :Jaundice, respectively}, and P
0
3
is B orB {for BlueEyed
and :BlueEyed , respectively}.
Consider the knowledge base KB
hep
: 8x {Hepatitis{x} } Jaundice{x}) ^
kHepatitis{x}|Jaundice{x}k
x
031 1
0:8 ^ jjBlueEyed{x}jj x
031
2
0:25 ^
Jaundice{Eric}:
If we order the atoms as A
HJB
,A
HJB ,A
HJ B
,A
HJB
, AH JB ,AH JB
,AHJ B
,AHJB
, then it is not hard to show that 000{KB
hep
} is:
u
3
= 0 ^ u
4
= 0 ^
{u
1
+ u
2
} 024 {0:8 + " 1
}{u
1 + u
2
+ u
5
+ u
6
} ^
{u
1
+ u
2
} 025 {0:8 000 "
1
}{u
1
+ u 2
+ u
5 + u
6
} ^
{u
1
+ u
3
+ u 5
+ u
7 } 024 {0:25 + "
2
} ^ {u
1
+ u
3
+ u
5
+ u
7
} 025 {0:25 000 "
2
} ^ {u
1
+ u
2
+ u
5
+ u
6
} > 0:
To find the space S
~
0
[KB
hep ] we simply set "
1
= "
2
= 0. Then it is quite straightforward to find the maximum-entropy point
in this space, which, taking fl = 2 1:6
, is:
{v
1
; v 2
; v
3
; v
4
; v
5
; v
6
; v
7
; v
8
} =
022 15 + fl ;
35 + fl
; 0; 0; 14{5 + fl}
;
34{5 + fl}
;
fl4{5 + fl}
;
3fl4{5 + fl}
:
Using ~v, we can compute various asymptotic probabilities very easily. For
example,
Pr
1 {Hepatitis{Eric}|KB
hep
} = F
[HepatitisjJaundice]
{~v}
=
v
1
+ v
2v
1
+ v
2
+ v
5 + v
6
= 15+fl +
35+fl15+fl
+ 35+fl +
14{5+fl}
; 34{5+fl}
= 0:8;
as expected. Similarly, wecan show that Pr
1
{BlueEyed{Eric}|KB
hep } = 0:25 and that
Pr
1
{BlueEyed{Eric} ^ Hepatitis{Eric}|KB
hep } = 0:2. Note that the first two answers also
56Random Worlds and Maximum Entropy
follow from the direct inference principle {Theorem 4.1}, which happens
to be applicable
in this case. The third answer shows that BlueEyed andHepatitis are being
treated as
independent. It is a special case of a more general independence phenomenon
that applies
to random worlds; see {Bacchus et al., 1994, Theorem 5.27}.4.3 Probabilistic
propositional logic
In this section we consider two variants of probabilistic propositional
logic. As the following discussion shows, both can easily be captured by
our framework. The embedding we discuss
uses simple queries throughout, allowing us to appeal to the results of
the previous section.
Nilsson {1986} considered the problem of what could be inferred about the
proba-
bility of certain propositions given some constraints. For example, we might
know that
Pr{fly|bird} 025 0:7 and that Pr{yellow} 024 0:2, and be interested in
Pr{fly|bird ^yellow }.
Roughly speaking, Nilsson suggests computing this by considering all probability
distri-
butions consistent with the constraints, and then computing the range of
values given to
Pr{fly|bird ^yellow } by these distributions. Formally, suppose our language
consists of k
primitive proposition, p
1
; : : : ; p k
. Consider the set 012 of K = 2
k
truth assignments these
propositions. We give semantics to probabilistic statements over this language
in terms of
a probability distribution 026 over the set 012 {see {Fagin, Halpern,
& Megiddo, 1990} for de-
tails}. Since each truth assignment ! 2 012 determines the truth value
of every propositional
formula fi, we can determine the probability of every such formula:
Pr
026 {fi} =
X !j=fi
026{!}:
Clearly, we can determine whether a probability distribution 026 satisfies
a set 003 of proba-
bilistic constraints. The standard notion of probabilistic propositional
inference would say
that 003 |= Pr{fi} 2 [025
1
; 025 2
] if Pr
026 {fi} is within the range [025
1
; 025 2
] for every distribution 026 that
satisfies the constraints in 003.
Unfortunately, while this is a very natural definition, the constraints
that one can derive
from it are typically quite weak. For that reason, Nilsson suggested strengthening
this no-
tion of inference by applying the principle ofmaximum entropy: rather than
considering all
distributions 026 satisfying 003, we consider only the distribution{s}
026
003 that have the greatest
entropy among those satisfying the constraints. As we now show, one implication
of our results is that the random-worlds method provides a principled motivation
for this introduc-
tion of maximum entropy to probabilistic propositional reasoning. In fact,
the connection
between probabilistic propositional reasoning and random worlds should now
be fairly clear:
*  The primitive propositions p
1
; : : : ; p k
correspond to the unary predicates P
1
; : : : ; P
k
.
*  A propositional formula fi over p
1
; : : : ; p
k
corresponds uniquely to an essentially propo- sitional formula 030
fi as follows: we replace each occurrence of the propositional symbol p
i
with P i
{x}.

*  The set 003 of probabilistic constraints corresponds to a knowledge base
KB
0
[003]|a
constant-free knowledge base containing only proportion expressions. The
correspon- dence is as follows:
57Grove, Halpern, & Koller { Aprobability expression of the form Pr{fi}
appearing in 003 is replaced by the
proportion expression jj030
fi
{x}jj
x
. Similarly, a conditional probability expression
Pr{fi|fi
0 }is replaced by k030 fi
{x}|030
fi
0
{x}k
x
. { Each comparison connective = is replaced by 031
i
for some i, and each 024 with 026
i
. {The particular choices for the approximate equality connectives do not
matter in this context.}
The other elements that can appear in a proportion formula {such as rational
num-
bers and arithmetical connectives} remain unchanged. For example, the formula
Pr{fly|bird} 025 0:7 would correspond to the proportion formula kFly{x}|Bird
{x}k
x
027
i
0:7.

*  There is a one-to-one correspondence between truth assignments and atoms:
the truth
assignment ! corresponds to the atom A = P
0
1
^ : : : ^ P
0
k
where P
0 i
is P
i if !{p
i } = true
and :P
i
otherwise. Let !
1
; : : :; ! K
be the truth assignments corresponding to the
atoms A 1
; : : :; A
K
, respectively.
*  There is a one-to-one correspondence between probability distributions
over the set
012 of truth assignments and points in 001
K
. For each point ~u 2 001
K
, let 026
~u
denote the
corresponding probability distribution over 012, where 026 ~u
{!
i
} = u
i .
Remark 4.13: Clearly, !
j
|= fi iff A
j
2 A{030
fi
}. Therefore, for all ~u, we have
F
[030
fi
] {~u} = Pr 026
~u
{fi}:The following result demonstrates the tight connection between probabilistic
proposi- tionalreasoning using maximum entropy and random worlds.
Theorem 4.14 : Let 003 be a conjunction of constraints of the form Pr{fi|fi
0
} = 025 or
Pr{fi|fi
0
} 2 [025
1
; 025 2
]. There is a unique probability distribution 026
003
of maximum entropy satisfying 003. Moreover, for all fi and fi
0
, if Pr
026 003
{fi
0
} > 0, then Pr
1
{030 fi
{c}|030
fi
0
{c} ^ KB
0 [003]} = Pr
026 003
{fi|fi 0
}:
Theorem 4.14 is an easy corollary of Theorem 4.11. To check that the preconditions
of the latter theorem apply, note that the constraints in 003 are linear,
and so the space S
~
0
[KB
0
[003]] has a unique maximum-entropy point ~v. In fact, it is easy to show
that 026
~v
is
the {unique} maximum-entropy probability distribution over 012 satisfying
the constraints
003. In addition, because there are no negated proportion expressions in
003, the formula
KB=030 fi
0
{c}^ KB
0
[003] is certainly essentially positive.
Most applications of probabilistic propositional reasoning consider simple
constraints
of the form used in the theorem, and so such applications can be viewed
as very special cases of the random-words approach. In fact, this theorem
is essentially a very old one.
The connection between counting \worlds" and the entropy maximum in a space
defined
as a conjunction of linear constraints is very well-known. It has been extensively
studied inthe field of thermodynamics, starting with the 19th century work
of Maxwell and Gibbs. Recently, this type of reasoning has been applied to
problems in an AI context by Paris and
58Random Worlds and Maximum Entropy
Vencovska {1989} and Shastri {1989}. The work of Paris and Vencovska is
particularly rele-
vant because they also realize the necessity of adopting a formal notion
of \approximation",
although the precise details of their approach differ from ours. To the
best of our knowledge, most of the work on probabilistic propositional reason-
ing and all formal presentations of the entropy/worlds connection {in particular,
those of {Paris & Vencovska, 1989; Shastri, 1989}) have limited themselves
to conjunctions of lin-
ear constraints. Our more general language gives us a great deal of additional
expressive
power. For example, it is quite reasonable to want the ability to express
that properties are {approximately} statistically independent. For example,
we may wish to assert that
Bird {x} and Yellow{x} are independent properties by saying jjBird{x} ^
Yellow{x}jj
x
031 jjBird{x}jj x
001 jjYellow{x}jj
x
. Clearly,such constraints are not linear. Nevertheless, our The-
orem 4.11 covers such cases and much more. A version of probabilistic propositional
reasoning has also been used to provide proba- bilistic semantics for default
reasoning {Pearl, 1989}. Herealso, the connection to random worlds is of
interest. In particular, it follows from Corollary 4.10 that the recent work
of
Goldszmidt, Morris, and Pearl {1990} can be embedded in the random-worlds
framework.
In the rest of this subsection, we explain their approach and the embedding.
Consider a language consisting of propositional formulas over the propositional
variables p
1
; : : :; p k
, and default rules of the form B ! C {read \B's are typically C's"}, where
B
and C arepropositional formulas. A distribution 026 is said to
* -satisfy a default rule B ! C
if 026{C|B} 025 1 000
* . In addition to default rules, the framework also permits the use of materialimplication
in a rule, as in B } C. A distribution 026 is said to satisfy such a rule
if 026{C|B} = 1. A parameterized probability distribution {PPD} is a collection{026
*
}

* >0
of
probabilitydistributions over 012, parameterized by
* . A PPD{026

* }

* >0

* -satisfies a set R of
rules if for every
* , 026

*
* -satisfies every default rule r 2 R and satisfies every non-default rule
r 2 R. A set R of default rules
* -entails B ! C iffor every PPD that
* -satisfies R,
lim
* !0
026

*
{C|B} = 1.
As shown in {Geffner & Pearl, 1990},
* -entailment possesses a number of reasonable
propertiestypically associated with default reasoning, including a preference
for more spe-
cific information. However, there are a number of desirable properties that
it does not have.
Among other things, irrelevant information is not ignored. {See {Bacchus
et al., 1994} for
an extensive discussion of this issue.}
* -semantics is extended in {Goldszmidt et al.,
1990} by an application of the principle of maximum entropy. Instead of
considering all possible PPD's, as above, we consider only the PPD
n
026 003

* ;R
o

* >0
such that, for each
* ,
026 003

* ;R
has the maximum entropy among distributions that
* -satisfy all the rules in R. {See
{Goldszmidt et al., 1990} for precise definitions and technical details.}
Note that, since the
constraints used to define 026
003
* ;R
are all linear, there is indeed a unique such point of maximum entropy.
A rule B ! C is an ME-plausible consequence of R if lim

* !0
026
003

* ;R
{C|B} = 1.
The notion of ME-plausible consequence is analyzed in detail in {Goldszmidt
et al., 1990},
where it is shown to inherit all the nice properties of
* -entailment {such as the preference
for more specific information}, while successfully ignoring irrelevant information.
Equally
importantly, algorithms are provided for computing the ME-plausible consequences
of a set
of rules in certain cases.
59Grove, Halpern, & Koller Our maximum-entropy results can be used to show
that the approach of {Goldszmidt et al., 1990} can be embedded in our framework
in a straightforward manner. We simply
translate a default rule r of the form B ! C intoa first-order default rule
022 r
=
def
k030
C
{x}|030
B
{x}k
x
031 1
1;
as in our earlier translation of Nilsson's approach. Notethat the formulas
that arise under
this translation all use the same approximate equality connective 031
1
. The reason is that
the approach of {Goldszmidt et al., 1990} uses the same
*  for all default rules. We can
similarly translate a {non-default} rule r of the form B } C into a first-order
constraint using universal quantification: 022
r
=
def
8x {030 B
{x} } 030 C
{x}): Under this translation, we can prove the following theorem.
Theorem 4.15: Let c be a constant symbol. Using the translation described
above, for a
set R of defeasible rules, B ! C is an ME-plausible consequence of R iff
Pr 1

030
C
{c}
fi fi
fi
fi
fi
030
B
{c} ^
^ r2R
022
r !
= 1:
In particular, this theorem implies that all the computational techniques
and results
described in {Goldszmidt et al., 1990} carry over to this special case of
the random-worlds
method. It also shows that random-world provides a principled justification
for the approach
{Goldszmidt et al., 1990} present {one which is quite different from the
justification given
in {Goldszmidt et al., 1990} itself}.
4.4 Beyond simple queries
In Section 4.2 we restricted attention to simple queries. Our main result,
Theorem 4.11,
needed other assumptions as well: essentialpositivity, the existence of
a unique maximum-
entropy point ~v, and the requirement that F
[ ]
{~v} > 0. We believe that this theorem is useful
in spite of its limitations, as demonstrated by the discussion in Section
4.3. Nevertheless,
this result allows us to take advantage of only a small fragment of our
rich language. Can
we find a more general theorem? After all, the basic concentration result
{Theorem 3.13}
holds with essentially no restrictions. In this section we show that it
is indeed possible to
extend Theorem 4.11 significantly. However, there are serious limitations
and subtleties.
Weillustrate these problems by means of examples, and then state an extended
result.
Our attempt to address these problems {so far as is possible} leads to a
rather com-
plicated final result. In fact, the problems we discuss are as interesting
and important as
the theorem we actually give: they help us understand more of the limits
of maximum
entropy. Of course, every issue we discuss in this subsection is relatively
minor compared to maximum entropy's main {apparent} restriction, which concerns
the use of non-unary predicates. For the reader who is less concerned about
the other, lesser, issues we remark
that it is possible to skip directly to Section 5.
We first consider the restrictions we placed on the KB, and show the difficulties
that
arise if we drop them. We start with the restriction to a single maximum-entropy
point. As
60Random Worlds and Maximum Entropy
theconcentration theorem {Theorem 3.13} shows, the entropy of almost every
world is near
maximum. But it does not follow that all the maximum-entropy points are
surrounded by
similar numbers of worlds. Thus, in the presence of more than one maximum-entropy
point, we face the problem of finding the relative importance, or weighting,
of each maximum-
entropypoint. As the following example illustrates, this weighting is often
sensitive to the
tolerance values. For this reason, non-unique entropy maxima often lead
to nonrobustness.
Example 4.16: Suppose 010={P; c}, and consider the knowledge base KB =
{kP {x}k
x
026 1
0:3} _ {kP{x}k
x
027
2
0:7}:
Assume we want to compute Pr
1
{P {c}|KB}. In this case, S
~034
[KB] is
{{u
1
; u
2
} 2 001
2
: u 1
024 0:3+ 034
1
or u 1
025 0:7000 034
2
}; and S
~
0 [KB] is
{{u
1
; u
2
} 2 001
2 : u
1
024 0:3 or u
1 025 0:7}:
Note that S
~
0 [KB] has two maximum-entropy points: {0:3; 0:7} and {0:7; 0:3}.
Now consider the maximum-entropy points of S ~034
[KB] for ~034 >
~
0. It is not hard to show that if 034
1
> 034
2
, then this space has a unique maximum-entropy point, {0:3 + 034
1
;0:7 000 034 1
}.
In this case, Pr ~034
1
{P {c}|KB} = 0:3 + 034
1 . On the other hand, if 034 1
< 034
2 , then the unique
maximum-entropypoint of this space is {0:7 + 034
2
; 0:3 000 034
2 }, in which case Pr
~034
1
{P {c}|KB} =
0:7 + 034
2
. If 034
1
= 034
2 , then the space S
~034
[KB] has two maximum-entropy points, and by
symmetry we obtain that Pr
~034
1
{P {c}|KB} = 0:5. So, by appropriately choosing a sequence
of tolerance vectors converging to
~
0, we can make the asymptotic value of this fraction
either 0:3, 0:5, or 0:7. ThusPr
1
{P {c}|KB} does not exist.
It is not disjunctions per se that cause the problem here: if we consider
0
= {kP {x}k x
026
1
0:3} _ {kP {x}k
x
027
2
0:6}, then there is no difficulty. There is
a unique maximum-entropy point of S
~
0 [KB
0
]|{0:6; 0:4}|and the asymptotic probability
Pr
1
{P {c}|KB 0
} = 0:6, as we would want.
7In light of this example {and many similar ones we can construct}, we continue
to assume
that there is a single maximum-entropy point. As we argued earlier, we expect
this to be true in typical practical applications, so the restriction does
not seem very serious. We now turn our attention to the requirement that
F
[ ]
{~v} > 0. As we have already
observed, this seems to be an obvious restriction to make, considering that
the function F
['j ] {~v} is not defined otherwise. However, this difficulty is actually
a manifestation of a
much deeper problem. As the following example shows, any approach that just
uses the
maximum-entropypoint of S ~
0
[KB ] will necessarily fail in some cases where F
[ ]
{~v} = 0.
Example 4.17: Consider the knowledge base KB ={jjPenguin{x}jj
x
031 1
0}^ {kFly{x}|Penguin{x}k
x
031 2
0} ^ Penguin{Tweety}:7. We remark that it is also possible to construct
examples of multiple maximum-entropy points by using quadratic constraints
rather than disjunction. 61Grove, Halpern, & Koller Suppose we want to compute
Pr 1
{Fly{Tweety}|Penguin{Tweety}). We can easily conclude
from Theorem 4.1 that this degree of belief is 0, as we would expect. However,
we cannot reach this conclusion using Theorem 4.11 or anything like it. For
consider the maximum-
entropy point of S ~
0
[KB]. The coordinates v
1 , corresponding to Fly ^ Penguin, and v
2 ,
corresponding to :Fly ^Penguin, are both 0. Hence, F
[Penguin] {~v} = 0, so that Theorem 4.11
does not apply.
But, as we said, the problem is more fundamental. The information we need
{that the proportion of flying penguins is zero} is simply not present if
all we know is the maximum-
entropy point ~v. We can obtain the same space S
~
0
[KB] {and thus the same maximum- entropy point} from quite different knowledge
bases. In particular, consider KB
0
which
simply asserts that {jjPenguin{x}jj
x
031
1 0} ^ Penguin{Tweety}. This new knowledge base
tells us nothing whatsoever about the fraction of flying penguins, and in
fact it is easy to
show that Pr
1
{Fly{Tweety}|KB
0
} = 0:5. But of course it is impossible to distinguish this
case from the previous one just by looking at~v. It follows that no result
in the spirit of Theorem 4.11 {which just uses the value of~v} can be comprehensive.The
example shows that the philosophy behind Theorem 4.11 cannot be extended
very
far, if at all: it is inevitable that there will be problems when F
[ ]
{~v} = 0. But it is natural to
ask whether there is a different approach altogether in which this restriction
can be relaxed.
That is, is it possible to construct a technique for computing degrees of
belief in those cases
where F
[ ]
= 0? As we mentioned in Section 4.1, we might hope to do this by computing
Pr
~034
1 {'|KB} as a function of ~034 and then taking the limit as ~034 goes
to 0. In general, this
seems very hard. But, interestingly, the computational technique of {Goldszmidt
et al.,
1990} does use this type of parametric analysis, demonstrating that things
might not be
so bad for various restricted cases. Another source of hope is to remember
that maximum
entropy is, for us, merely one tool for computing random-worlds degrees
of belief. There
may be other approaches that bypass entropy entirely. In particular, some
of the theorems
we give in {Bacchus et al., 1994} can be seen as doing this; these theorems
will often apply
even if F
[ ]
= 0. Another assumption made throughout Section 4.2 is that the knowledge
base has a spe- cial form, namely  {c} ^ KB
0
, where   is essentially propositional and KB
0
does not contain any occurrences of c. The more general theorem we state
later relaxes this somewhat, as
follows.
Definition 4.18: A knowledge base KB issaid to be separable with respect
to query ' if
it has the form  ^ KB
0
, where   contains neither quantifiers nor proportions, and KB
0 containsnone of the constant symbols appearing in ' or in  . 8It should
be clear that if a query '{c} is simple for KB {as assumed in previous subsection},
then the separability condition is satisfied.
As the following example shows, if we do not assume separability, we can
easily run into
nonrobust behavior:
Example 4.19: Consider the following knowledge base KB over the vocabulary
010 ={P; c}:
{jjP {x}jj x
031
1
0:3 ^ P{c}) _ {jjP {x}jj
x
031
2
0:3 ^ :P {c}):8. Clearly, since our approach is semantic, it also suffices
if the knowledge base is equivalent to one of this
form. 62Random Worlds and Maximum Entropy
KB isnot separable with respect to the query P {c}. The space S
~
0 [KB] consists of a
unique point {0:3; 0:7}, which is also the maximum-entropy point. Both disjuncts
of KB
are consistent with the maximum-entropy point, so we might expect that the
presence
of the conjuncts P {c} and :P {c} in the disjuncts would not affect the
degree of belief.
That is, if it were possible to ignore or discount the role of the tolerances,
we would
expect Pr 1
{P {c}|KB} = 0:3. However, this is not the case. Consider the behavior of
Pr ~034
1
{P {c}|KB} for ~034 >
~
0. If 034
1
> 034 2
, then the maximum-entropy point of S
~034 [KB ] is
{0:3 + 034
1
;0:7 000 034
1 }. Now, consider some
*  > 0 sufficiently small so that 034 2
+
*  < 034 1
. By
Corollary 3.14, we deduce that Pr
~034
1
{(jjP {x}jj
x
> 0:3 + 034 2
}| KB} = 1. Therefore, by The-
orem 3.16, Pr
~034
1
{P {c}|KB} = Pr
~034
1 {P {c}| KB ^ {jjP {x}jj
x
> 0:3 + 034
2
}) {assuming the limit
exists}. But since the newly added expression is inconsistent with the second
disjunct, we
obtain that Pr
~034
1
{P {c}|KB} = Pr
~034 1
{P {c}| P {c} ^ {jjP {x}jj
x
031
1
0:3}) = 1, and not 0:3. On
the other hand, if 034
1
< 034
2
, we get the symmetric behavior, where Pr
~034
1
{P {c}|KB} = 0. Only
if034
1
= 034
2
do we get the expected value of 0:3 for Pr
~034
1
{P {c}|KB}. Clearly, by appropriately
choosing a sequence of tolerance vectors converging to
~ 0, we can make the asymptotic value of this fraction any of 0, 0:3, or
1, or not exist at all. Again, Pr
1
{P {c}|KB} is not robust.We now turn our attention to restrictions on the
query. In Section 4.2, we restricted
to queries of the form '{c}, where '{x} is essentially propositional. Although
we intend to
ease this restriction, we do not intend to allow queries that involve statistical
information. The following example illustrates the difficulties.
Example 4.20: Consider the knowledge base KB =jjP {x}jj x
031
1 0:3 and the query ' =
jjP {x}jj
x
031
2 0:3. It is easy to see that the unique maximum-entropy point of S
~034
[KB] is {0:3+
034 1
;0:7 000 034
1
}. First suppose 034
2
< 034 1
. From Corollary 3.14, it follows that Pr
~034 1
{(jjP {x}jj
x
> 0:3 + 034
2 }| KB} = 1. Therefore, by Theorem 3.16, Pr
~034 1
{'|KB} = Pr
~034
1
{'|KB ^ {jjP {x}jj x
>
0:3 + 034
2
}) {assuming the limit exists}. The latter expression is clearly 0. On
the other hand, if 034
1
< 034 2
, then KB [~034 ]|= '[~034 ], so that Pr ~034
1
{'|KB} = 1. Thus, the limiting behavior of
Pr ~034
1
{'|KB} depends on how ~034 goes to
~ 0, so that Pr
1
{'|KB} is nonrobust.The real problem here is the semantics of proportion
expressions in queries. While the
utilityof the 031 connective in expressing statistical information in the
knowledge base should
be fairly uncontroversial, its role in conclusions we might draw, such as
' in Example 4.20, is
much less clear. The formal semantics we have defined requires that we consider
all possible tolerances for a proportion expression in ', so it is not surprising
that nonrobustness is the
usual result. One might argue that the tolerances in queries should be allowed
to depend
more closely on tolerances of expressions in the knowledge base. It is possible
to formalize
this intuition, as is done in {Koller & Halpern, 1992}, to give an alternative
semantics for
dealing with proportion expressions in queries that often gives more reasonable
behavior. Considerations of this alternative semantics would lead us too
far afield here; rather, we
focus for the rest of the section on first-order queries.
In fact, our goal is to allow arbitrary first-order queries, even those
that involve predi-
cates of arbitrary arity and equality {although we still need to restrict
the knowledge base
to the unary language L
031
1 }. However, as the following example shows, quantifiers too can cause
problems.
Example 4.21: Let 010 ={P; c} and consider KB
1
= 8x :P {x}, KB
2
= jjP {x}jj
x
031
1
0, and
' = 9x P {x}. It is easy to see that S
~
0
[KB
1
] = S ~
0
[KB 2
] ={{0; 1}}, and therefore the unique 63Grove, Halpern, & Koller maximum-entropy
point in both is ~v = {0; 1}. However, Pr
1
{'|KB
1
} is clearly 0, whereas Pr
1
{'|KB
2
} is actually 1. To see the latter fact, observe that the vast majority
of models
of KB
2
around ~v actually satisfy 9x P {x}. Thereis actually only a single world
associated
with {0; 1} at which 9x P {x} is false. This example is related to Example
4.17, because it
illustrates another case in which S
~
0
[KB] cannot suffice to determine degrees of belief.In the case of the knowledge
base KB
1
, the maximum-entropy point {0; 1} is quite
misleadingabout the nature of nearby worlds. We must avoid this sort of
\discontinuity"
when finding the degree of belief of a formula that involves first-order
quantifiers. The
notion of stability defined below is intended to deal with this problem.
To define it, we first
need the following notion of a size description.
Definition 4.22: A size description {over P} is a conjunction of K formulas:
for each
atomA
j
over P, it includes exactly one of 9x A
j
{x} and :9x A
j
{x}. For ~u 2 001
K
, the size
description associated with ~u, written 033{~u}, is that size description
which includes :9x A
i {x}
if u i
= 0 and 9x A i
{x} if u i
>0.The problems that we want to avoid occur when there is a maximum-entropy
point ~v
with size description 033{~v} such that in a neighborhood of ~v, most of
the worlds satisfying
KB are associated with other size descriptions. Intuitively, the problem
with this is that the
of worlds near ~v, and
9
We give a sufficient condition which can be used to avoid this
problem in the context of our theorems. This condition is effective and
uses machinery {in
particular, the ability to find solution spaces} that is needed to use the
maximum-entropy
approach in any case. Definition 4.23: Let ~v be a maximum-entropy point
of S
~034
[KB]. We say that ~v is safe
{withrespect to KB and ~034 } if ~v is not contained in S ~034
[KB ^ :033{~v}]. We say that KB and
~034 are stable for 033 003
if for every maximum-entropy point ~v 2 S ~034
[KB] we have that 033{~v} = 033
003
and that ~v is safe with respect to KB and ~034 .The next result is the
key property of stability that we need. Theorem 4.24: If KB and~034 >
~
0 are stable for 033
003 then Pr
~034 1
{033
003|KB} = 1.
Our theorems will use the assumption that there exists some 033
003
such that, for all suf-
ficiently small ~034 , KB and ~034 are stable for 033 003
. We note that this does not imply that 033
003 is
necessarily the size description associated with the maximum-entropy point{s}
of S
~
0
[KB].
Example 4.25: Consider the knowledge base KB 2
in Example 4.21, and recall that ~v =
{0; 1} is the maximum-entropy point of S
~
0
[KB
2
]. The size description 033{~v} is :9x A
1 {x} ^
9x A 2
{x}. However the maximum-entropy point of S
~034
[KB 2
] for ~034 >0 is actually {034
1
; 1 000 034 1
},
so that the appropriate 033
003
for such a ~034 is9x A
1
{x} ^ 9x A
2
{x}.9. We actually conjecture that problems of this sort cannot arise in
the context of a maximum-entropy point of S
~034
[KB] for ~034 > ~
0. Moreprecisely, for sufficiently small ~034 anda maximum-entropy point
~v of S ~034
[KB] with KB 2 L
031 1
, we conjecture that Pr ~034
1
[O]{033{~ v}jKB} = 1 where O is an open set that contains ~v but no other
maximum-entropy point of S
~034
[KB]. If this is indeed the case, then the machinery of stability
that we are about to introduce is unnecessary, since it holds in all cases
that we need it. However, we
have been unable to prove this.
64Random Worlds and Maximum Entropy
As we now show, the restrictions outlined above and in Section 4.1 suffice
for our next
result on computing degrees of belief. In order to state this result, we
concept. Recall that in Section 4.2 we expressed an essentially propositional
formula '{x}
as a disjunction of atoms. Since we wish to also consider formulas ' using
more than
oneconstant and non-unary predicates, we need a richer concept than atoms.
This is the
motivation behind the definition of complete descriptions.
Definition 4.26: Let Z be some set of variables and constants. A complete
description D over 010 and Z is an unquantified conjunction of formulas
such that:
*  For every predicate R 2 010 [ f=} of arity r and for every z
i 1
; : : : ; z i
r
2 Z, D contains
exactly one of R{z
i
1 ; : : : ; z
i
r
} or :R{z
i
1
; : : : ; z
i
r
} as a conjunct.

*  D is consistent.
10Complete descriptions simply extend the role of atoms in the context of
essentially proposi-
tional formulas to the more general setting. As in the case of atoms, if
we fix some arbitrary
ordering of the conjuncts in a complete description, then complete descriptions
are mutu-
ally exclusive and exhaustive. Clearly, a formula 030 whose free variables
and constants are
contained in Z, and which is is quantifier- and proportion-free, is equivalent
to some dis- junctionof complete descriptions over Z. For such a formula
030, let A{030} be a set of complete
descriptions over Z such that 030 is equivalent tothe disjunction
W
D2A{030}
D, where Z is the
set of constants and free variables in 030.
For the purposes of the remaining discussion {except within proofs}, we
are interested
only in complete descriptions over an empty set of variables. For a set
of constants Z, we
can view a description D over Z as describing the different properties of
the constants in Z.
In our construction, when considering a KB ofthe form  ^ KB 0
which is separable with respect to a query ', we define the set Z tocontain
precisely those constants in ' and in
. In particular, this means that KB
0
will mention no constant in Z.
A complete description D over a set of constants Z canbe decomposed into
three parts:
the unary part D
1
which consists of those conjuncts of D that involve unary predicates
{and thus determines an atom for each of the constant symbols}, the equality
part D
=
which consists of those conjuncts of D involving equality {and thus determines
which of
the constants are equal to each other}, and the non-unary part D
>1
which consists of those conjuncts of D involving non-unary predicates {and
thus determines the non-unary
properties other than equality of the constants}. As we suggested, the unary
part of such
a complete description D extends the notion of \atom" to the case of multiple
constants.
Forthis purpose, we also extend F
[A]
{for an atom A} and define F
[D]
for a description D.
Intuitively, we are treating each of the individuals as independent, so
that the probability
that constant c
1
satisfies atom A
j
1
and that constant c 2
satisfies A
j
2
is just the product of
the probability that c 1
satisfies A
j
1
and the probability that c
2
satisfies A
j
2
. Definition 4.27 : For a complete description D without variables whose
unary part is
equivalent to A
j
1
{c
1
} ^ : : : ^ A
j
m {c
m
} {for distinct constants c
1
; : : :; c
m
} and for a point10. Inconsistency is possible because of the use of equality.
For example, if D includes z 1
= z
2 as well as
both R{z
1
; z 3
} and :R{z
2
; z 3
}, it is inconsistent. 65Grove, Halpern, & Koller ~u 2 001
K , we define
F
[D]
{~u} =
m
Y
`=1
u
j
`
:Note that F
[D]
is depends only on D
1
, the unary part of D.
As we mentioned, we can extend our approach to deal with formulas ' that
also use
non-unary predicate symbols. Our computational procedure for such formulas
uses the
maximum-entropy approach described above combined with the techniques of
{Grove et al., 1993b}. Theselatter were used in {Grove et al., 1993b} to
compute asymptotic conditional probabilities when conditioning on a first-order
knowledge base KB
fo
. The basic idea in
that case is as follows: To compute Pr
1 {'|KB
fo }, we examine the behavior of ' in finite
models of KB
fo
. We partition the models of KB
fo
into a finite collection of classes such that ' behaves uniformly in each
individual class. By this we mean that almost all worlds in the
class satisfy ' or almost none do; i.e., there is a 0-1 law for the asymptotic
probability of '
when we restrict attention to models in a single class. In order to compute
Pr
1
{'|KB
fo
} we
therefore identify the classes, compute the relative weight of each class
{which is required
because the classes are not necessarily of equal relative size}, and then
decide for each class
whether the asymptotic probability of ' is zero or one. It turns out that
much the same ideas continue to work in this framework. In this case,
the classes are defined using complete descriptions and the appropriate
size description 033
003
.
The main difference is that, rather than examining all worlds consistent
with the knowledge base, we now concentrate on those worlds in the vicinity
of the maximum-entropy points, as
outlined in the previous section. It turns out that the restriction to these
worlds affects very
few aspects of this computational procedure. In fact, the only difference
is in computing the relative weight of the different classes. Thislast step
can be done using maximum entropy,
using the tools described in Section 4.2.
Theorem 4.28: Let ' be a formula in L 031
and let KB = ^ KB
0
be an essentially positive
knowledge base in L
031
1 which is separable with respect to '. Let Z be the set of constants
appearing in ' or in   {so that KB
0 contains none of the constants in Z} and let 037
6=
be
the formula V
c;c
0
2Z
c 6= c
0
. Assume that there exists a size description 033
003 such that, for
all ~034 > 0, KB and ~034 are stable for 033 003
, and that the space S
~
0
[KB] has a unique maximum-
entropy point ~v. Then Pr
1
{'|KB} =
P
D2A{ ^037
6=
}
Pr 1
{'|033 003
^ D}F [D]
{~v}P
D2A{ ^037
6=
}
F [D]
{~ v}
if the denominator is positive. Since both ' and 033 003
^ D are first-order formulas and 033
003 ^ D is precisely of the required form
in {Grove et al., 1993b}, then Pr
1
{'|033 003
^ D} is either 0 or 1, and we can use the algorithm of {Grove et al., 1993b}
to compute this limit, in the time bounds outlined there. One corollary of
the above is that the formula 037
6=
holds with probability 1 given any knowledgebase KB of the form we are interested
in. This corresponds to a default assump-
tion of unique names, a property often considered to be desirable in inductive
reasoning
systems. 66Random Worlds and Maximum Entropy
While this theorem does represent a significant generalization of Theorem
4.11, it still
hasnumerous restrictions. There is no question that some of these can be
loosened to some
extent, although we have not been able to find a clean set of conditions
significantly more
general than the ones that we have stated. Weleave it as an open problem
whether such a set of conditions exists. Ofcourse, the most significant restriction
we have made is that of
allowing only unary predicates in the KB . Thisissue is the subject of the
next section.
5. Beyond unary predicates
The random-worlds method makes complete sense for the full language L 031
{and, indeed, for even richer languages}. On the other hand, our application
of maximum entropy is limited to unary knowledge bases. Is this restriction
essential? While we do not have a theorem to
this effect {indeed, it is not even clear what the wording of such a theorem
would be}, we
conjecture that it is.
Certainly none of the techniques we have used in this paper can be generalized
signif-
icantly. One difficulty is that, once we have a binary or higher arity predicate,
we see no
analogue to the notion of atoms and no canonical form theorem. InSection
3.2 and in the
proof of Theorem 3.5, we discuss why it becomes impossible toget rid of
nested quantifiers and proportions when we have non-unary predicates. Even
considering matters on a more intuitive level, the problems seem formidable.
In a unary language, atoms are useful be-
cause they are simple descriptions that summarize everything that might
domain element in a model. But consider a language with a single binary
predicate R{x; y}.
Worlds over this language include all finite graphs {where we think of R{x;
y} as holding if there is an edge from x to y}. In this language, there are
infinitely many properties that
maybe true or false about a domain element. For example, the assertions
\the node x has
m neighbors" are expressible in the language for each m. Thus, in order
to partition the domainelements according to the properties they satisfy,
we would need to define infinitely
many partitions. Furthermore, it can be shown that \typically" {i.e., in
almost all graphs
of sufficiently greatsize} each node satisfies a different set of first-order
properties. Thus, in most graphs, all the nodes are \different", so a partition
of domain elements into a finite
number of \atoms" makes little sense. It is very hard to see how the basic
proof strat-
egy we have used, of summarizing a model by listing the number of elements
with various
properties, can possibly be useful here.
The difficulty of finding an analogue to entropy in the presence of higher-arity
predicates
is supported by results from {Grove et al., 1993a}. In this paper we have
shown that maximum entropy can be a useful tool for computing degrees of
belief in certain cases, if
the KB involves only unary predicates. In{Grove et al., 1993a} we show that
there can be
no general computational technique to compute degrees of belief once we
have non-unary predicate symbols in the KB . The problem of finding degrees
of belief in this case is highly
undecidable. This result was proven without statistical assertions in the
language, and in
fact holds for quite weak sublanguages of first-order logic. {For instance,
in a language
without equality and with only depth-two quantifier nesting.} So even if
there is some
generalized version of maximum entropy, it will either be extremely restricted
in application or will be useless as a computational tool.
67Grove, Halpern, & Koller 6. Conclusion
This paper has had two major thrusts. Thefirst is to establish a connection
between max- imum entropy and the random-worlds approach for a significant
fragment of our language,
one far richer than that considered by Paris and Vencovska {1989} or Shastri
{1989}. The
second is to suggest that such a result is unlikely to obtain for the full
language.
The fact that we have a connection between maximum entropy and random worlds
is
significant. For one thing, it allows us to utilize all the tools that have
been developed for
computing maximum entropy efficiently {see {Goldman, 1987} and the further
references therein}, and may thus lead to efficient algorithms for computing
degrees of belief for a large
class of knowledge bases. In addition, maximum entropy is known to have
many attractive
properties {Jaynes, 1978}. Our result shows these properties are shared
by the random- worlds approach in the domain where these two approaches agree.
Indeed, as shown in {Bacchus et al., 1994}, the random-worlds approach has
many of these properties for the
full {non-unary} language.
On the other hand, a number of properties of maximum entropy, such as its
dependence on the choice of language and its inability to handle causal reasoning
appropriately, have been severely criticized {Pearl, 1988; Goldszmidt et
al., 1990}. Not surprisingly, these
criticisms apply to random worlds as well. A discussion ofthese criticisms,
and whether
theyreally should be viewed as shortcomings of the random-worlds method,
is beyond the
scope of this paper; the interested reader should consult {Bacchus et al.,
1994, Section 7}
for a more thorough discussion of these issues and additional references.
We believe that our observations regarding the limits of the connection between
the
random-worlds method and maximum entropy are also significant. The question
of how
widely maximum entropy applies is quite important. Maximum entropy has been
gaining
prominence as a means of dealing with uncertainty both in AI and other areas.
However, the difficulties of using the method once we move to non-unary predicates
seem not to havebeen fully appreciated. In retrospect, this is not that hard
to explain; in almost all
applications where maximum entropy has been used {and where its application
can be best
justified in terms of the random-worlds method} the knowledge base is described
in terms
of unary predicates {or, equivalently, unary functions with a finite range}.
For example, in
physics applications we are interested in such predicates as quantum state
{see {Denbigh
&Denbigh, 1985}). Similarly, AI applications and expert systems typically
use only unary
predicatessuch as symptoms and diseases {Cheeseman, 1983}. We suspect that
this is not an accident, and that deep problems will arise in more general
cases. This poses a challenge to
proponents of maximum entropy since, even if one accepts the maximum-entropy
principle,
the discussion above suggests that it may simply be inapplicable in a large
class of interesting
examples.
Appendix A. Proofs for Section 3.2 Theorem 3.5: Every formula in L
=
1
is equivalent to a formula in canonical form. More- over, there is an effective
procedure that, given a formula 030 2 L =
1
constructs an equivalent
formula
b
030 in canonical form.
68Random Worlds and Maximum Entropy
Proof: We show how to effectively transform 030 2 L =
1
to an equivalent formula in canonical
form. We first rename variables if necessary, so that all variables used
in 030 are distinct {i.e., no two quantifiers, including proportion expressions,
ever bind the same variable sym-
bol}.
We next transform 030 into an equivalent flat formula 030
f 2 L
031
1 , where a flat formula
is one where no quantifiers {including proportion quantifiers} have within
their scope a constant or variable other than the variable{s} the quantifier
itself binds. {Note that in this
transformation we do not require that 030 be closed. Also, observe that
flatness implies that there are no nested quantifiers.} We define the transformation
by induction on the structure of 030. There are three easy
steps:

*  If 030 is an unquantified formulas, then 030
f
= 030.

*  {030
0
_ 030
00 }
f
= 030 0
f
_ 030 00
f

*  {:030
0
}
f
= :{030 f
}.
All that remains is to consider quantified formulas of the form 9x 030
0
, jj030
0
jj ~x
, or k030
0
|030 00
k
~x . It
turns out that the same transformation works in all three cases. Weillustrate
the transfor-
mation by looking at the case where 030 is of the form jj030
0
jj ~x
. By the inductive hypothesis, we
can assume that 030
0
is flat. For the purposes of this proof, we define a basic formula to be
an atomic formula {i.e., one of the form P {z}), a proportion formula, or
a quantified formula
{i.e., one of the form 9x 037}. Let 037
1
;: : : ; 037
k
be all basic subformulas of 030
0
that do not mention
any variable in ~x. Letz be a variable or constant symbol not in ~x that
is mentioned in 030
0
. Clearly z mustoccur in some basic subformula of 030
0
, say 037
0
. By the inductive hypothesis, it is easy to see that 037 0
cannot mention any variable in ~x and so, by construction, it is in
{037
1
; : : : ; 037
`
}. In other words, not only do {037
1
; : : : ; 037
`
} not mention any variable in ~x, but they also contain all occurrences of
the other variables and constants. {Notice that this
argument fails if the language contains any high-arity predicates, including
equality. For
then 030 0
might include subformulas of the form R{x; y} or x = y, which can mix variables
outside~x with those in ~x.}
Now, let B 1
;: : : ; B
2
`
be all the \atoms" over 037
1
; : : : ; 037
`
. That is, we consider all formulas
037
0
1
^ : : : ^037
0 `
where 037
0
i
is either 037 i
or :037
i
. Now consider the disjunction:
2
`
_
i=1
{B
i ^ jj030
0
jj
~x
}:
This is surely equivalent to jj030
0
jj ~x
, because some B
i
must be true. However, if we assume
that a particular B
i
is true, we can simplify jj030 0
jj
~x by replacing all the 037 i
subformulas by trueor false, according to B
i
. {Note that this is allowed only because the 037 i
do not mention
any variable in ~x}. The result is that we can simplify each disjunct {B
i
^jj030
0
jj
~x
} considerably. In fact, because of our previous observation about{037
1
; : : :; 037
`}, there will be no constants or variables outside ~x left within the proportion
quantifier. This completes this step of
theinduction. Since the other quantifiers can be treated similarly, this
proves the flatness result.
69Grove, Halpern, & Koller It now remains to show how a flat formula can
be transformed to canonical form. Sup-
pose 030 2 L
031
1
is flat. Let 030
003
2 L
=
1
be the formula equivalent to 030 obtained by using the translation of Section
2.1. Every proportion comparison in 030
003
is of the form t 024 t
0
"
i
where t
and t 0
are polynomials over flat unconditional proportions. In fact, t 0
is simply a product of flat unconditional proportions {where the empty product
is taken to be 1}. Note also that
since we cleared away conditional proportions by multiplying by t
0
, if t 0
= 0 then so is t,
and so the formula t 024 t
0
"
i is automatically true. Wecan therefore replace the comparison
by {t 0
= 0} _ {t 024 t
0
"
i
^ t
0
> 0}. Similarly, wecan replace a negated comparison by an
expression of the form :{t 024 t
0
"
i
} ^ t
0
> 0. The next step is to rewrite all the flat unconditional proportions
in terms of atomic proportions. In any such proportion jj030
0
jj
~x
, the formula 030
0
is a Boolean combination of
P {x
i
} for predicates P 2 P and x
i
2 ~x. Thus, the formula 030
0
is equivalent to a disjunction
W
j
{A
j
1
{x
i
1
} ^ : : : ^ A
j m
{x
i
m
}), where each A j
i
is an atom over P and~x ={x i
1
; : : : ; x i
m
}. These disjuncts are mutually exclusive and the semantics treats distinct
variables as being
independent, so
jj030 0
jj
~x =
X
j
m Y
i=1
jjA
j
i
{x}jj
x
: We perform this replacement for each proportion expression. Furthermore,
any term t
0
in an expression of the form t 024 t
0
"
i will be a product of such expressions, and so will be
positive. Next, we must put all pure first-order formulas in the right form.
Wefirst rewrite 030 to
push all negations inwards as far as possible, so that only atomic subformulas
and existential
formulas are negated. Next, note that since 030 is flat, each existential
subformula must have the form 9x 030
0 , where 030
0 is a quantifier-free formula which mentions no constants and only
the variable x. Hence, 030
0 is a Boolean combination of P {x} for predicates P 2 P. Again,
the formula 030
0
is equivalent to a disjunction of atoms of the form W
A2A{030}
A{x}, so 9x 030
0
is equivalent to
W
A2A{030}
9x A{x}. We replace 9x 030
0
by this expression. Finally, we must deal with formulas of the form P {c}
or :P {c} for P 2 P. This is easy: We can again replace a formula 030 of
the form P {c} or :P {c} by the disjunction
W
A2A{030} A{c}.
Thepenultimate step is to convert 030 into disjunctive normal form. This
essentially brings things into canonical form. Note that since we dealt with
formulas of the form :P {c} in
the previous step, we do not have to deal with conjuncts of the form :A
i
{c}.
The final step is to check that we do not have A
i
{c} and either :9x A
i
{x} or A
j
{c} for
some j 6= i as conjuncts of some disjunct. If we do, we simply remove that
disjunct.Appendix B. Proofsfor Section 3.3
Lemma3.11: There exist some function h : IN ! IN andtwo strictly positive
polynomial
functions f; g : IN !IR such that, for KB 2 L
031
1
and~u 2 001
K
, if #worlds
~034
N [~u]{KB} 6= 0,
then
{h{N }=f {N })e
NH{~u}
024 #worlds ~034
N
[~u]{KB} 024 h{N }g{N }e
NH{~u}
:
Proof: Tochoose a world W 2 W
N
satisfying KB such that 031{W } =~u, we must partition the domain among
the atoms according to the proportions in ~u, and then choose an assign-
ment for the constants in the language subject to the constraints imposed
by KB . Finally,
70Random Worlds and Maximum Entropy
even though KB mentions only unary predicates, if there are any non-unary
predicates in
the vocabulary we must choose a denotation for them.
Suppose ~u = {u
1 ; : : : ; u
K
}, and let N
i
= u
i
N for i = 1; : : :; K. The number of parti-
tions of the domain into atoms is
000 N
N
1
;:::;N K
001
; each such partition completely determines the
denotation for the unary predicates. We must also specify the denotations
of the constant
symbols. There are at most N
jCj ways of choosing these. On the other hand, we know there is at least
one model {W; ~034} of KB suchthat 031{W } = ~u, so there there at least
one choice. In
fact, there is at least one world W
0
2 W
N such that {W
0 ; ~034}|= KB for each of the
000
N
N
1 ;:::;N
K
001 ways of partitioning the elements of the domain {and each such world
W 0
is isomorphic to
W }. Finally we must choose the denotation of the non-unary predicates.
However, ~u does not constrain this choice and, by assumption, neither does
KB . Therefore the number of
such choices is some function h{N } which is independent of ~u.
11 We conclude that:
h{N }

N N
1
; : : : ; N K
!
024 #worlds
~034
N
[~u]{KB} 024 h{N }N jCj

N N
1
; : : : ; N K
!
:
It remains to estimate
N
N
1
; : : : ; N
K
! =
N !N
1
!N
2
! : : : N K
!
:
To obtain our result, we use Stirling's approximation for the factorials,
which says that
m! =
p2031m m
m
e
000m
{1 + O{1=m}): It follows that exist constants L; U >0 such that
L m
m
e
000m
024 m! 024 U m m
m
e
000m
for all m. Using these bounds, as well as the fact that N
i
024 N , we get:
LU
K
N
K
N
N Q
K
i=1 e
N
ie
N
Q K
i=1
N N
i
i
024
N !N
1
!N 2
! : : : N
K
!
024
U NL
K
N
N
Q
K
i=1
e
N
ie
N
Q
K
i=1
N
N
i
i
:
Now, consider the expression common to both bounds: N
N
Q
K
i=1
e
N
ie N
Q
K
i=1
N
N
i
i
=
N NQ
K
i=1
N
N
i
i
= K
Y
i=1 022
NN
i

N
i
=
K Y
i=1
e N
i
ln{N=N i
}
= e 000N
P
K i=1
u
i ln{u
i
} = e
NH{~u}
:11. Itis easy to verify that in fact
h{N}= Y
R2010000011
2
N
arity{R}
; where 011 is the unary fragment of 010 and arity{R} denotes the arity
of the predicate symbol R.
71Grove, Halpern, & Koller We obtain that
h{N }LU K
N
K
e NH{~u}
024 #worlds
~034 N
[~u]{KB} 024 N
jCj
h{N }
U NL
K
e
NH{~u} ;
which is the desired result.We next want to prove Theorem 3.13. To do this,
it is useful to have an alternative
representation of the solution space S
~034 [KB]. Towards this end, we have the following
definition. Definition B.1: Let 005 ~034
N
[KB] ={031{W } : W 2 W
N
; {W; ~034} |= KB}. Let 005 ~034
1
[KB] be the
limit of these spaces. Formally,
005
~034 1
[KB ] ={~u : 9N
0 s.t. 8N 025N
0
9~u
N 2 005
~034 N
[KB ] s.t. lim
N!1
~u
N
= ~u}:The following theorem establishes a tight connection between S
~034
[KB] and 005
~034 1
[KB].
Theorem B.2:
{a} For all N and ~034 , we have 005
~034 N
[KB ] 022 S
~034
[KB].
{b} For all sufficiently small ~034 , we have 005
~034
1 [KB ] = S
~034
[KB]. Proof: Part {a} is immediate: If ~u 2 005
~034
N
[KB], then ~u = 031{W } for some W 2 W
N
such
that {W; ~034}|= KB . It is almost immediate from the definitions that
031{W } must satisfy
000{KB [~034]}, so 031{W } 2 Sol[000{KB[~034 ]}]. Theinclusion 005
~034
N
[KB] 022 S
~034
[KB] now follows. One direction of part {b} follows immediately from part
{a}. Recall that 005 ~034
N
[KB ] 022
S
~034
[KB] and that the points in 005
~034 1
[KB ] are limits of a sequence of points in 005
~034
N
[KB ]. Since
S
~034
[KB] is closed, it follows that 005
~034
1
[KB] 022 S
~034
[KB].
For the opposite inclusion, the general strategy of the proof is to show
the following:
{i} If ~034 is sufficiently small, then for all~u 2 S
~034 [KB ], there is some sequence of points
n
~u
N
0
;~ u N
0
+1
;~ u
N
0
+2
;~ u
N
0
+3
; : : : o
032 Sol[000{KB[~034 ]}] such that, for all N 025N
0
, the coor-
dinates of ~u
N
are all integer multiples of 1=N and lim N!1
~u
N
= ~u.
{ii} if ~w 2 Sol[000{KB[~034 ]}] and all its coordinates are integer multiples
of 1=N , then ~w 2
005
~034 N
[KB ].
This clearly suffices to prove that ~u 2 005
~034
1
[KB ]. We begin with the proof of {ii}, which is straightforward. Suppose
the point
~w = {r
1
=N; r 2
=N; : : : ; r K
=N } is in Sol[000{KB [~034]}]. We construct a world W 2 W
N
such that031{W } = ~w as follows. The denotation of atom A
1
is the set of elements{1; : : :; r 1
},
the denotation of atom A
2
is the set{r
1 + 1; : : : ; r
1
+ r
2}, and so on. It remains to choose
the denotations of the constants {since the denotation of the predicates
of arity greater
than 1 is irrelevant}. Without loss of generality we can assume KB isin
canonical form.
{If not, we consider
d
KB .} Thus, KB is a disjunction of conjunctions, say
W
j
030
j
. Since
~w 2 Sol[000{KB [~034]}], we must have ~w 2 Sol[000{030
j [~034 ]}] for some j. We use 030
j
to define the
properties of the constants. If 030
j
contains A
i {c} for some atom A i
, then we make c satisfy
72Random Worlds and Maximum Entropy
A
i
. Note that, by Definition 3.6, if 030 j
has such a conjunct then u
i
> 0. If 030
j
contains no atomic conjunct mentioning the constant c, then we make c satisfy
A
i
for some arbitrary atomwith u
i
> 0. It should now be clear that {W; ~034} satisfies 030
j
, and so satisfies KB . Note
that in this construction it is important that we started with ~w in Sol[000{KB[~034
]}], rather
than just in the closure space S ~034
[KB]; otherwise, the point would not necessarily satisfy
000{KB [~034]}.
We now consider condition {i}. Thisis surprisingly difficult to prove; the
proof involves
techniques from algebraic geometry. Ourjob would be relatively easy if Sol[000{KB[~034
]}] were
an open set. Unfortunately, it is not. On the other hand, it would behave
essentially like
an open set if we could replace the occurrences of 024 in 000{KB [~034]}
by <. It turns out that, for our purposes here, this replacement is possible.
Let 000
< {KB[~034 ]} be the same as 000{KB [~034 ]} except that every {unnegated}
conjunct of
the form {t 024 034
i
t 0
} is replaced by {t < 034
i
t
0
}. {Notice that this is essentially the opposite
transformation to the one used when defining essential positivity in Definition
4.4.} Finally, let S
<~034
[KB] beSol[000
<
{KB [~034]}]. It turns out that, for all sufficiently small ~034 , S
<~034
[KB] =
S
~034
[KB]. This result, which we label as Lemma B.5, will be stated and proved
later. For
now we use the lemma to continue the proof of the main result. Consider
some ~u 2 S
~034
[KB]. It suffices to show that for all ffi > 0 there exists N
0
such
that for all N > N
0
, there exists a point~u
N
2 Sol [000
< {KB[~034 ]}] such that all the coordinates
of ~u
N
are integer multiples of 1=N and such that|~u000~ u
N
| < ffi. {For then we can take smaller
and smaller ffi's to create a sequence~u
N
converging to ~u.} Hence, let ffi > 0. By Lemma B.5,
we can find some ~u
0
2 Sol[000
<
{KB [~034]}] such that|~u000~ u
0
| < ffi=2. By definition, every conjunct
in 000
<
{KB[~034 ]} is of the form q
0
{ ~w} = 0, q 0
{ ~w} > 0, q{ ~w} < 034
i
q 0
{ ~w}, or q{ ~w} > 034 i
q
0
{ ~w}, where
q
0
is a positive polynomial. Ignore for the moment the constraints of the form
q
0
{ ~w} = 0,
and consider the remaining constraints that ~u
0
satisfies. Theseconstraints all involve strict inequalities, and the functions
involved {q and q
0
} are continuous. Thus, there exists some ffi
0
> 0 such that for all ~w for which |~u
0
000 ~w| < ffi
0 , these constraints are also satisfied by ~w.
Now consider a conjunct of the form q
0 {~w} = 0 that is satisfied by ~u
0 . Since q
0 is positive, this
happens if and only if the following condition holds: for every coordinate
w
i
that actually
appears in q
0
, we have u
0
i
= 0. In particular, if ~w and ~u
0
have the same coordinates with
value 0, then q
0
{ ~w} = 0. It follows that for all ~w, if|~u 0
000 ~w| < ffi
0
and ~u
0
and ~w have the
same coordinates with value 0, then ~w also satisfies 000
< {KB[~034 ]}. We now construct ~u N
that satisfies the requirements. Let i
003
be the index of that
component of ~u
0
with the largest value. We define ~u
N
by considering each of its components u
N
i
, for 1 024 i 024 K:
u N
i
=
8 >
<
>
:
0 u
0
i = 0
dN u
0
i
e=N i 6= i
003 and u
0
i > 0
u
N i
000
P
j6=i
003
{u
N
j
000 u
0
j } i = i
003 :
It is easy to verify that the components of ~u
N sum to 1. All the components in ~u
0
, other than the i
003
'th, are increased by at most 1=N . The component u N
i
003
is decreased by at most
K=N . Wewill show that ~u
N
has the right properties for all N >N
0 , where N
0
is such that
1=N
0
< min{u
i
003
; ffi=2; ffi
0
}=2K. The fact that K=N
0
< u i
003
guarantees that ~u
N
is in 001
K
for
all N >N
0
. The fact that 2K=N
0
< ffi=2 guarantees that ~u
N
is within ffi=2 of ~u 0
, and hence
within ffi of ~u. Since 2K=N
0
< ffi
0
, it follows that|~u
0
000 ~u
N
| < ffi
0
. Since ~u
N
is constructed
73Grove, Halpern, & Koller to have exactly the same 0 coordinates as ~u
0
, we conclude that ~u
N
2 Sol [000 <
{KB[~034 ]}], as
required. Condition {i}, and hence the entire theorem, now follows.It now
remains to prove Lemma B.5, which was used in the proof just given. As we
hinted earlier, this requires tools from algebraic geometry. We base our
definitions on the presentation in {Bochnak, Coste, & Roy, 1987}. A subset
A of IR
`
is said to be semi-algebraic
if it is definable in the language of real-closed fields. That is, A is
semi-algebraic if there is a first-order formula '{x
1
; : : :; x `
} whose free variables are x
1
; : : :; x `
and whose only non- logical symbols are 0, 1, +, 002, < and =, such that
IR|= '{u
1
; : : : ; u `
} iff {u 1
; : : : ; u `
} 2 A. 12
A function f : X ! Y , where X 022 IR
h
and Y 022 IR
`
, is said to be semi-algebraic if its graph
{i.e.,{{~u; ~w} : f {~u} = ~w}} is semi-algebraic. The main tool we use
is the following Curve
SelectionLemma {see {Bochnak et al., 1987, p. 34}): Lemma B.3: Suppose
that A is a semi-algebraic set in IR
`
and ~u 2A. Then there exists
a continuous, semi-algebraic function f : [0; 1] ! IR
`
such that f {0} = ~u and f {t} 2 A for
all t 2 {0; 1].
Our first use of the Curve Selection Lemma is in the following, which says
that, in a
certain sense, semi-algebraic functions behave \nicely" near limits. The
type of phenomenon we wish to avoid is illustrated by x sin
1x
which is continuous at 0, but has infinitely many
local maxima and minima near 0.
Proposition B.4: Suppose that g : [0; 1] ! IR is a continuous, semi-algebraic
function
such that g{u} > 0 if u > 0 and g{0} = 0. Then there exists some
*  > 0 such that g is
strictlyincreasing in the interval [0;
* ].
Proof: Suppose,by way of contradiction, that g satisfies the hypotheses
of the proposition
but there is no
*  such that g is increasing in the interval [0;
* ]. Wedefine a point u in [0; 1] to be bad iffor some u
0
2 [0; u} we have g{u 0
} 025 g{u}. Let A be the set of all the bad points.
Since g is semi-algebraic so is A, since u 0
2 A iff
9u
0
{(0 024 u
0
< u} ^ {g{u} 024 g{u
0
})}:
Since, by assumption, g is not increasing in any interval [0;
* ], we can find bad points arbitrarily close to 0 and so 0 2A. By the Curve
Selection Lemma, there is a continuous semi-algebraic curve f : [0; 1] !
IR such that f {0} = 0 and f {t} 2 A for all t 2 {0; 1].
Because of the continuity of f , the range of f , i.e., f {[0; 1]}, is [0;
r] for some r 2 [0; 1]. By
the definition of f , {0; r] 022 A. Since062 A, it follows that f {1} 6=
0; therefore r > 0 and so,
by assumption, g{r} > 0. Since g is a continuous function, it achieves a
maximum v > 0
over the range [0; r]. Consider the minimum point in the interval where
this maximum is
achieved. More precisely, let u be the infimum of the set {u
0
2 [0; r] : g{u
0
} = v}; clearly, g{u} = v. Sincev > 0 we obtain that u > 0 and therefore
u 2 A. Thus, u is bad. Butthat
means that there is a point u 0
< u for which g{u
0
} 025 g{u}, which contradicts the choice of
v and u.We can now prove Lemma B.5. Recall, the result we need is as follows.12.
In {Bochnak et al., 1987}, a set is taken to be semi-algebraic if it is definable
by a quantifier-free formula
in the language of real closed fields. However, as observed in {Bochnak
et al., 1987}, since the theory of
real closed fields admits elimination of quantifiers {Tarski, 1951}, the
two definitions are equivalent.
74Random Worlds and Maximum Entropy
Lemma B.5: Forall sufficiently small ~034 , S
<~034
[KB] = S
~034
[KB ].
Proof: Clearly S
<~034
[KB] 022 S
~034 [KB]. To prove the reverse inclusion weconsider
d KB,
a canonical form equivalent of KB . We consider each disjunct of
d
KB separately. Let
030 be a conjunction that is one of the disjuncts in
d KB. It clearly suffices to show that
Sol[000{030[~034]}] 022 S
<~034
[030] =Sol[000
<
{030[~034]}]. Assume, by way of contradiction, that for arbitrarily
small ~034 , there exists some ~u 2 Sol[000{030[~034 ]}] which is \separated"
from the set Sol [000
< {030[~034]}], i.e., is not in its closure. More formally, we say that
~u is ffi-separated from Sol[000
<
{030[~034]}] if there is no ~u
0 2 Sol[000
< {030[~034]}] such that|~u 000~u
0
| < ffi.
We now consider those ~034 andthose points in Sol[000{030[~034]}] that
are separated from
Sol[000
<
{030[~034]}]:
13
A ={{~034;~u; ffi} : ~034 >
~
0; ffi > 0; ~u 2 Sol[000{030[~034 ]}] is ffi-separated from Sol[000
<
{030[~034 ]}]}:
Clearly A is semi-algebraic. By assumption, there are points in A for arbitrarily
small tolerance vectors ~034 . Since A is a bounded subset of IR
m+K+1 {where m is the number of tolerance values in ~034 }, we can use
the Bolzano{Weierstrass Theorem to conclude that this
set of points has an accumulation point whose first component is
~
0. Thus,there is a point
{
~
0; ~w; ffi 0
} inA. By the Curve Selection Lemma, there is a continuous semi-algebraic
function f : [0;1] ! IR
m+K+1
such that f {0} = { ~
0; ~w; ffi 0
} and f {t} 2 A for t 2 {0; 1].
Since f issemi-algebraic, it is semi-algebraic in each of its coordinates.
By Lemma B.4, there is some v > 0 such that f isstrictly increasing in each
of its first m coordinates over
the domain [0; v]. Suppose that f {v} = {~034 ;~u; ffi}. Now, consider
the constraints in 000{030[~034 ]}
that have the form q{ ~w} > 034
j
q
0
{ ~w}. These constraints are all satisfied by ~u and they all involve strong
inequalities. By the continuity of the polynomials q and q
0
, there exists some

*  > 0 such that, for all ~u
0
such that|~u 000~u
0
| <
* , ~u
0
also satisfies these constraints.
Now, by the continuity of f , there exists a point v
0
2 {0; v} sufficiently close to v
such that if f {v
0
} = {~034
0 ;~u
0
; ffi
0
}, then |~u 000 ~u 0
| < min{ffi;
* }. Since f {v} = {~034;~u; ffi} 2 A and|~u000~ u 0
| < ffi, it follows that ~u
0
62 Sol[000
<
{030[~034]}]. We conclude the proof by showing that this is impossible.
That is, we show that~u
0
2 Sol [000
<
{030[~034]}]. The constraints appearing in 000
<
{030[~034 ]}
can be of the following forms: q 0
{ ~w} = 0, q
0
{ ~w} > 0, q{ ~w} < 034
j q
0
{ ~w}, or q{ ~w} > 034
j
q 0
{ ~w}, where q
0
is a positive polynomial. Since f {v
0
} 2 A, we know that ~u 0
2 Sol[000{030[~034
0 ]}]. The
constraints of the form q
0
{ ~w} = 0 and q
0
{ ~w} > 0 are identical in 000{030[~034
0
]} and in 000
<
{030[~034]},
and are therefore satisfied by ~u
0
. Since|~u
0
000~ u| <
* , our discussion in the previous paragraph
implies that the constraints of the form q{ ~w} > 034
j
q 0
{ ~w} are also satisfied by ~u 0
. Finally,
consider a constraint of the form q{ ~w} < 034 j
q
0
{ ~w}. The corresponding constraint in 000{030[~034
0
]}
is q{ ~w} 024 034
0
j
q 0
{ ~w}. Since ~u
0
satisfies this latter constraint, we know that q{~u
0 } 024 034
0 j
q
0
{~u
0
}.
But now, recall that we proved that f is increasing over [0; v] in the first
m coordinates.
In particular, 034
0 j
< 034
j . By the definition of canonical form, q
0
{~u
0
} > 0, so that we conclude
q{~u
0
} 024 034 0
j
q
0
{~u
0
} < 034
j q
0
{~u
0
}. Hence the constraints of this type are also satisfied by ~u
0
. This concludes the proof that ~u
0
2 Sol[000 <
{KB[~034 ]}], thus deriving a contradiction and proving
theresult.We are finally ready to prove Theorem 3.13.13. Weconsider only
those components in the infinite vector ~034 thatactually appear in Sol[000{030[~034]}].
75Grove, Halpern, & Koller Theorem 3.13: For all sufficiently small ~034
, the following is true. Let Q be the points with greatest entropy in S ~034
[KB] and let O 022 IR
K
be any open set containing Q. Then for
all 022 2 L
031
and for lim 003
2 flim sup; lim inf} we have
lim N!1
003
Pr ~034
N
{022|KB } = lim
N!1
003
#worlds ~034
N
[O]{022 ^ KB }#worlds
~034
N
[O]{KB}
:
Proof: Let ~034 besmall enough so that Theorem B.2 applies and let Q and
O be as in the
statement of the theorem. It clearly suffices to show that the set O contains
almost all of the worlds that satisfy KB . More precisely, the fraction of
such worlds that are in O tends
to 1 as N !1:
Let 032 be the entropy of the points in Q. We begin the proof by showing
the existence
of 032
L < 032
U
{< 032} such that {for sufficiently large N } {a} every point~u 2 005
~034
N
[KB] where
~u 62 O has entropy at most 032
L
and {b} there is at least one point~u 2 005
~034
N
[KB] with~u 2 O
and entropy at least 032
U
.
For part {a}, consider the space S
~034 [KB] 000 O. Since this space is closed, the entropy function takes
on a maximum value in this space; let this be 032
L
. Since this space does not include any point with entropy 032 {these are
all in Q 022 O}, we must have 032 L
< 032.
ByTheorem B.2, 005
~034
N [KB ] 022 S ~034
[KB]. Therefore, for any N , the entropy of any point in
005 ~034
N
[KB] 000 O is at most 032
L
.
For part {b}, let 032
U
be some value in the interval {032 L
; 032} {for example {032
L
+ 032}=2} and
let ~v be any point in Q. By the continuity of the entropy function, there
exists some ffi > 0
such that for all~u with|~u 000 ~v| < ffi, we have H{~u} 025 032
U
. Because O is open we can, by
considering a smaller ffi if necessary, assume that|~u 000 ~v| < ffi implies
~u 2 O. By the second partof Theorem B.2, there is a sequence of points ~u
N
2 005
~034 N
[KB ] such that lim
N!1
~u N
= ~v. In particular, for N large enough we have|~u N
000 ~v| < ffi, so that H{~u
N
} > 032
U
, proving part {b}.
To complete the proof, we use Lemma 3.11 to conclude that for all N ,
#worlds
~034
N
{KB} 025 #worlds
~034 N
[~u
N
]{KB} 025 {h{N }=f {N })e
NH{~u
N
} 025 {h{N }=f {N })e
N032 U
:
On the other hand,
#worlds
~034 N
[001
K
000 O]{KB} 024 X
~u2005 ~ 034
N
[KB]000O
#worlds
~ 034
N
[~u]{KB}
024 jf ~w 2 005
~034
N
[KB] : ~w 62 Ogj h{N }g{N }e
N032
L
024 {N + 1}
K
h{N }g{N }e
N032 L
:
Therefore the fraction of models of KB whichare outside O is at most
{N +1}
K
h{N }f {N }g{N}e
N032 Lh{N }e
N032
U
=
{N +1} K
f {N }g{N }e
N{032
U
000032
L }
:
Since {N + 1}
k
f {N }g{N } is a polynomial in N , this fraction tends to 0 as N grows large.
The result follows.76Random Worlds and Maximum Entropy
Appendix C. Proofs for Section 4
Proposition4.6: Assume that KB isessentially positive and let Q be the set
of maximum-
entropy points of S
~
0 [KB] {and thus also of S
024
~
0 [KB]}. Then for all
*  > 0 and all sufficiently small tolerance vectors ~034 {where \sufficiently
small" may depend on
* }, every maximum-
entropypoint of S
~034
[KB] is within
*  of some maximum entropy-point in Q.
Proof: Fix
*  > 0. By way of contradiction, assume that that there is some sequence
of tolerance vectors ~034
m
, m = 1; 2; : : :, that converges to
~ 0, and for each m a maximum- entropy point ~u m
of S
~034
m
[KB] such that for all m, ~u
m
is at least
*  away from Q. Since the space 001
K
is compact, we can assume without loss of generality that this sequence
convergesto some point ~u. Recall that 000{KB } is a finite combination
{using \and" and
\or"} of constraints, where every such constraint is of the form q
0
{ ~w} = 0, q
0
{ ~w} > 0, q{ ~w} 024 "
j
q
0 { ~w}, or q{ ~w} > "
j
q
0
{ ~w}, such that q 0
is a positive polynomial. Since the overall
number of constraints is finite we can assume, again without loss of generality,
that all the ~u
m
's satisfy precisely the same constraints. We claim that the corresponding
conjuncts in 000
024
{KB[
~
0]}are satisfied by ~u. For a conjunct of the form q
0
{ ~w} = 0 note that, if q
0
{~u
m
} = 0
for all m, then this also holds at the limit, so that q{~u} = 0. A conjunct
of the form q
0
{~w} > 0
translates into q 0
{ ~w} 025 0 in 000
024
{KB [
~
0]}; such conjuncts are trivially satisfied by any point
in 001
K . If a conjunct of the form q{ ~w} 024 "
j
q
0 { ~w} is satisfied for all~u
m
and ~034
m
, then at
the limit we have q{~u} 024 0, which is precisely the corresponding conjunct
in 000
024
{KB [
~
0]}.
Finally, for a conjunct of the form q{ ~w} > " j
q
0
{ ~w}, if q{~u
m
} > 034
m
j
q
0
{~u
m
} for all m, then at
the limit we have q{~u} 025 0, which again is the corresponding conjunct
in 000
024
{KB [
~
0]}. It follows that ~u is in S
024
~
0 [KB ].
By assumption, all points ~u
m
are at least
*  away from Q. Hence, ~u cannot be in Q.
If we let 032 represent the entropy of the points in Q, since Q is the
set of all maximum-
entropy points in S
024
~
0 [KB ], it follows that H{~u} < 032. Choose 032
L and 032
U
such that H{~u} <
032
L
< 032
U
< 032. Since the entropy function is continuous, we know that for sufficiently
large m, H{~u
m
} 024 032 L
. Since ~u
m
is a maximum-entropy point of S
~034 m
[KB], it follows that the entropy achieved in this space for sufficiently
large m is at most 032
L
. We derive a
contradiction by showing that for sufficiently large m, there is some point
in Sol[000{KB[~034
m
]}] with entropy at least 032 U
. The argument is as follows. Let ~v be some point in Q. Since ~v is a maximum-entropy
point of S
~
0
[KB], there are points in Sol[000{KB[
~
0]}] arbitrarily close
to ~v. In particular, there is some point ~u
0
2 Sol[000{KB [
~ 0]}] whose entropy is at least 032
U
.
As we now show, this point is also in Sol[000{KB[~034 ]}] for all sufficiently
small ~034 . Again,
consider all the conjuncts in 000{KB[ ~
0]} satisfied by ~u
0
and the corresponding conjuncts in
000{KB [~034]}. Conjuncts of the form q
0
{ ~w} = 0 and q
0
{ ~w} > 0 in 000{KB[
~
0]} remain unchanged in 000{KB[~034 ]}. Conjuncts of the form q{ ~w} 024
034 j
q
0
{ ~w} in 000{KB [~034]} are certainly satisfied by ~u
0
, since the corresponding conjunct in 000{KB[
~
0]}, namely q{ ~w} 024 0, is satisfied by ~u 0
,
so that q{~u
0
} 024 0 024 034
j
q
0
{~u
0
} {recall that q
0
is a positive polynomial}. Finally, consider a
conjunctin 000{KB[~034 ]} of the form q{ ~w} > 034
j
q
0
{ ~w}. Thecorresponding conjunct in 000{KB[
~
0]} is q{ ~w} > 0. Suppose q{~u
0
} = ffi > 0. Since the value of q
0
is bounded over the compact
space001
K , it follows that for all sufficiently small 034
j
, 034
j
q
0
{~u
0
} < ffi. Thus, q{~u
0 } > 034
j
q
0
{~u
0
} for
all sufficiently small 034
j
, as required. It follows that ~u
0
is in Sol[000{KB [~034]}] for all sufficiently small ~034 and,in particular,
in Sol[000{KB[~034 m
]}] for all sufficiently large m. But H{~u
0
} 025 032
U
, whereas we showed that the maximum entropy achieved in S
~034
m
[KB ] is at most 032
L < 032
U
. 77Grove, Halpern, & Koller This contradiction proves that our assumption
was false, so that the conclusion of the
proposition necessarily holds.Theorem 4.9: Suppose '{c} is a simple query
for KB. For all ~034 sufficiently small, if Q
is the set of maximum-entropy points in S
~034 [KB] and F
[ ]
{~v} > 0 for all ~v 2 Q, then for
lim
003
2 flim sup; lim inf} we have
lim N!1
003
Pr ~034
N
{'{c}|KB} 2 "
inf
~v2Q
F
['j ]
{~v}; sup
~v2Q
F ['j ]
{~v}
#
:
Proof: LetW 2 W
003
, and let ~u = 031{W }. The value of the proportion expression jj {x}jj
x
at W is clearly X
A
j
2A{ }
jjA
j
{x}jj
x
=
X
A j
2A{ } u
j
= F [ ]
{~u}:
If F
[ ]
{~u} > 0, then by the same reasoning we conclude that the value of k'{x}|
{x}k
x
at
W is equal to F
['j ]
{~u}.
Now, let 025
L
and 025
R be inf
~v2Q F
['j ] {~v} and sup
~v2Q
F
['j ]
{~v} respectively; by our as- sumption, F
['j ]
{~v} is well-defined forall ~v 2 Q. Since the denominator is not 0, F
['j ] is
a continuous function at each maximum-entropy point. Thus, since F
['j ]
{~v} 2 [025
L
; 025 R
] for
all maximum-entropy points, the value of F ['j ]
{~u} for ~u \close" to some ~v 2 Q, will either
be in the range [025
L
; 025 U
] or very close to it. More precisely, choose any
*  > 0, and define
022[
* ] to be the formula
k'{x}| {x}k
x
2 [025
L
000
* ; 025 U
+
* ]: Since
*  > 0, it is clear that there is some sufficiently small open set O around
Q such
that this proportion expression is well-defined and within these bounds
at all worlds in O.
Thus, by Corollary 3.14, Pr ~034
1
{022[
* ]|KB} = 1. Using Theorem 3.16, we obtain that for lim
003
as above, lim
N!1
003 Pr
~034
N {'{c}|KB} = lim
N!1
003
Pr
~034 N
{'{c}|KB ^ 022[
* ]}:
But now we can use the direct inference technique outlined earlier. We are
interested in
theprobability of '{c}, where the only information we have about c in the
knowledge base
is  {c} and where we have statistics for k'{x}| {x}k x
. These are precisely the conditions
under which Theorem 4.1 applies. We conclude that
lim
N!1
003
Pr
~034
N
{'{c}|KB} 2 [025
L
000
* ; 025
U
+
* ]:
Since this holds for all
*  > 0, it is necessarily the case that lim
N!1
003 Pr
~034
N {'{c}|KB} 2 [025
L ; 025
U
];
as required.Theorem 4.11: Suppose '{c} is a simple query for KB . If the
space S
~
0
[KB ] has a unique
maximum-entropy point~v, KB is essentially positive, and F
[ ]
{~v} > 0, then Pr
1
{'{c}|KB} = F ['j ]
{~v}:
78Random Worlds and Maximum Entropy
Proof: Note that the fact that S
~
0
[KB] has a unique maximum-entropy point does not
guarantee that this is also the case for S
~034
[KB]. However, Proposition 4.6 implies that
the maximum-entropy points of the latter space are necessarily close to~v.
More precisely,
if we choose some
*  > 0, we conclude that for all sufficiently small ~034 , all the maximum-
entropy points of S
~034
[KB] will be within
*  of ~v. Now, pick some arbitrary ffi > 0. Since
F
[ ] {~v} > 0, it follows that F
['j ] is continuous at ~v. Therefore, there exists some
*  > 0
such that if~u is within
*  of~v, F
['j ] {~u} is within ffi of F
['j ]
{~v}. In particular, this is the
case for all maximum-entropy points of S ~034
[KB] for all sufficiently small ~034 . This allows
us to apply Theorem 4.9 and conclude that for all sufficiently small ~034
andfor lim
003 2
flim sup; lim inf}, lim
003
N!1
Pr
~034
N
{'{c}|KB} is within ffi of F ['j ]
{~v}. Hence, this is also the casefor lim
~034! ~
0
lim
003
N!1
Pr
~ 034
N
{'{c}|KB}. Since this holds for all ffi > 0, it follows that lim
~034!
~
0
lim inf
N!1
Pr
~034 N
{'{c}|KB} = lim
~034!
~
0
lim sup
N!1
Pr ~034
N
{'{c}|KB} = F
['j ] {~v}:
Thus, by definition, Pr
1
{'{c}|KB} = F
['j ]
{~v}.Theorem 4.14: Let 003 be a conjunction of constraints of the form
Pr{fi|fi 0
} = 025 or Pr{fi|fi
0
} 2 [025
1
; 025
2
]. There is a unique probability distribution 026
003
of maximum entropy
satisfying 003. Moreover, for all fi and fi
0
, if Pr
026
003 {fi
0
} > 0, then
Pr
1
{030
fi {c}|030
fi
0
{c} ^ KB
0
[003]} = Pr 026
003
{fi|fi
0
}:
Proof: Clearly, the formulas '{x} = 030
fi
{x} and  {x} = 030 fi
0
{x} are essentially propositional.
The knowledge base KB
0 [003] is in the form of a conjunction of simple proportion formulas,
none of which are negated. As a result, the set of constraints associated
with KB =
{c} ^ KB
0
[003] also has a simple form. KB
0
[003] generates a conjunction of constraints which can be taken as having
the form q{ ~w} 024 "
j
q
0 { ~w}. On the other hand,  {c}generates
someBoolean combination of constraints all of which have the form w
j
> 0. We begin by considering the set S
024 ~
0
[KB] {rather than S
~
0
[KB ]}, so we can ignore the latter constraints
for now. S
024
~
0
[KB] is defined by a conjunction of linear constraints which {as discussed
earlier}
implies that it is convex, and thus has a unique maximum-entropy point,
say ~v. Let 026
003
= 026 ~v
be the distribution over 012 corresponding to ~v. It is clear that the
constraints of 000
024
{KB[ ~
0]}
on the points of 001
K
are precisely the same ones as those of 003. Therefore, 026
003
is the unique maximum-entropy distribution satisfying the constraints of
003. By Remark 4.13, it follows
that F
[030 fi
0
]
{~v} = 026
003
{fi
0 }. Since we have assumed that 026
003
{fi 0
} > 0, we are are almost in a
positionto use Theorem 4.11. Itremains to prove essential positivity.
Recall that the difference between 000
024
{KB [
~
0]} and 000{KB [
~
0]} is that the latter may have
some conjuncts of the form w
j
> 0. Checking definitions 3.4 and 3.6 we see that such terms can appear
only due to 030 fi
0
{c} and, in fact, together they assert that F
[030
fi 0
]
{ ~w} > 0. But we have assumed that F
[030
fi
0 ]
{~v} > 0 and so ~v is a maximum-entropy point of S
~ 0
[KB ] as well. Thus, essential positivity holds and so, by Theorem 4.11,
Pr
1 {'{c}| {c} ^ KB
0 [003]} = F
['j ]
{026
003
} = Pr
026 003
{fi|fi 0
}
as required.79Grove, Halpern, & Koller Theorem 4.15: Let c be a constant
symbol. Using the translation described in Section 4.3,
for a set R of defeasible rules, B ! C is an ME-plausible consequence of
R iff
Pr
1

030
C
{c}
fi
fi
fi
fi
fi
030 B
{c} ^ ^
r2R
022 r
!
= 1: Proof: Let KB
0 denote
V
r2R 022
r
. For all sufficiently small ~034 and for
*  = 034
1
, let 026
003 denote
026
003
* ;R
. It clearly suffices to prove that
Pr
~034 1
{030
C {c}|030
B
{c} ^ KB 0
} = Pr
026 003
{C|B};
where by equality we also mean that one side is defined iff the other is
also defined. It is
easy to verify that a point ~u in 001 K
satisfies 000{KB 0
[~034]} iff the corresponding distribution
026
* -satisfies R. Therefore, the maximum-entropy point ~v of S
~034 [KB
0
] {which is unique,
bylinearity} corresponds precisely to 026
003 . Now, there are two cases: either 026
003
{B} > 0 or
026 003
{B} = 0. In the first case, by Remark 4.13, Pr 026
003
{030 B
{c}) = F [030
B
{c}]
{~v}, so the latter is
also positive. This also implies that~v is consistent with the constraints
000{ {c}) entailed
by  {c} = 030
B
{c}, so that ~v is also the unique maximum-entropy point of S
~034 [KB] {where
KB =030
B
{c} ^ KB
0
}. We can therefore use Corollary 4.10 and Remark 4.13 to conclude that
Pr
~034
1
{030 C
{c}|KB} = F
[030 C
{c}j030
B
{c}] {~v} = Pr
026
003
{C|B} and that all three terms are well-defined.
Assume, on the other hand, that 026
003
{B} = 0, so that Pr
026 003
{C|B} is not well-defined. In this case, we can use a known result {see
{Paris & Vencovska, 1989}) for the maximum-entropy
point over a space defined by linear constraints, and conclude that for
all 026 satisfying R, necessarily026{B} = 0. Using the connection between
distributions 026 satisfying R and points ~u in S
~034
[KB
0
], we conclude that this is also the case for all ~u 2 S
~034
[KB
0
]. By part {a} of Theorem B.2, this means that in any world satisfying KB
0
, the proportion jj030
B
{x}jj
x is
necessarily 0. Thus, KB
0
is inconsistent with 030
B
{c}, and Pr
~034
1
{030
C
{c}|030
B
{c} ^ KB
0 } is also not
well-defined.Appendix D. Proofs for Section 4.4
Theorem 4.24: If KB and ~034 > ~
0 are stable for 033
003
then Pr ~034
1
{033
003
|KB} = 1.
Proof: By Theorem 3.14, it suffices to show that there is some open neighborhood
con- taining Q, the maximum-entropy points of S
~034 [KB], such that every world W of KB inthis
neighborhood has 033{W } = 033
003 . So suppose this is not the case. Then there is some sequence
of worlds W
1
; W
2
; : : : suchthat {W
i
; ~034}|= KB ^ :033
003
and lim i!1
min
~v2Q
|031{W i
} 000 ~v| = 0.
Since 001 K
is compact the sequence 031{W
1
}; 031{W
2
}; : : : must have at least one accumulation
point, say ~u. This point must be in the closure of the set Q. But, in fact,
Q is a closed set {because entropy is a continuous function} and so ~u 2
Q. By part {a} of Theorem B.2, 031{W
i
} 2 S
~034 [KB ^ :033
003
] for every i and so, since this space is closed, ~u 2 S
~034
[KB ^ :033 003
] as
well. But this means that ~u is an unsafe maximum-entropy point, contrary
to the definition
and assumption of stability.In the remainder of this section we prove Theorem
4.28. For this purpose, fix KB =
^ KB
0
, ', and 033
003
to be as in the statement of this theorem, and let ~v be the unique
maximum-entropypoint of S
~
0 [KB ].
80Random Worlds and Maximum Entropy
Let Z ={c 1
; : : : ; c
m
} be the set of constant symbols appearing in   and in '. Due to
the separability assumption, KB
0
contains none of the constant symbols in Z. Let 037
6= be
the formula
V i6=j
c
i
6= c
j . We first prove that 037
6=
has probability 1 given KB
0
. Lemma D.1: For037
6=
and KB
0 as above, Pr
1
{037
6=
|KB
0
}= 1. Proof: We actually show that Pr
1
{:037
6=
|KB
0 } = 0. Let c and c 0
be two constant symbols
in{c
1
; : : : ; c
m
} and consider Pr
1
{c = c
0
|KB
0
}. We again use the direct inference technique. Note that for any world
of size N the proportion expression jjx = x
0
jj x;x
0
denotes exactly
1=N . It is thus easy to see that Pr
1
{jjx = x
0 jj
x;x
0
031
i
0|KB 0
} = 1 {for any choice of i}. Thus,
byTheorem 3.16, Pr 1
{c = c
0
|KB
0
} = Pr
1
{c = c
0
|KB 0
^ jjx = x 0
jj
x;x
0
031
i
0}. Butsince c and
c 0
appear nowhere in KB
0
we can use Theorem 4.1 to conclude that Pr
1
{c = c
0
|KB
0
} = 0. It is straightforward to verify that, since :037
6=
is equivalent to a finite disjunction, each
disjunctof which implies c = c
0
for at least one pair of constants c and c
0
, we must have Pr
1
{:037 6=
|KB
0 } = 0.As we stated in Section 4.4, our general technique for computing
the probability of an
arbitrary formula ' is to partition the worlds into a finite collection
of classes such that '
behaves uniformly over each class and then to compute the relative weights
of the classes.
As we show later, the classes are essentially defined using complete descriptions.
Their
relative weight corresponds to the probabilities of the different complete
descriptions given
KB .
Proposition D.2: Let KB = KB
0
^  and ~v be as above. Assume that Pr
1
{ |KB
0
} > 0.
Let D be a complete description over Z that is consistent with  . {a} If
D is inconsistent with 037
6=
, then Pr
1
{D|KB}= 0.
{b} If D is consistent with 037
6=
, then
Pr 1
{D|KB} =
F
[D]
{~v}P
D
0
2A{ ^037 6=
}
F [D
0
] {~ v}
:
Proof: First, observe that if all limits exist and the denominator is nonzero,
then Pr
1
{:037 6=
| ^ KB 0
} =
Pr
1
{:037
6=
^  |KB 0
}Pr
1
{ |KB
0
} :
By hypothesis, the denominator is indeed nonzero. Furthermore, by Lemma
D.1, Pr
1
{:037 6=
^
|KB
0
} 024 Pr
1
{:037 6=
|KB
0
} = 0. Hence Pr 1
{037
6=
|KB} = Pr
1
{037
6=
|KB
0
^  } = 1. We can therefore use Theorem 3.16 to conclude that Pr
1
{D|KB}= Pr
1
{D|KB ^037
6=
}:
Part {a} of the proposition follows immediately.
To prove part {b}, recall that   is equivalent to the disjunction W
E2A{ }
E. By simple probabilistic reasoning, the assumption that Pr 1
{ |KB 0
} > 0, and part {a}, we conclude
that
Pr
1
{D|  ^ KB
0
} =
Pr 1
{D ^  |KB
0
}Pr
1
{ |KB
0
} =
Pr
1
{D ^  |KB
0
}P
E2A{ ^037
6=
} Pr
1
{E|KB
0
} :
81Grove, Halpern, & Koller By assumption, D is consistent with 037
6=
and is in A{ }. Since D is a complete description,
we must have that D }   is valid. Thus, the numerator on the right-hand
side of this equationis simply Pr
1 {D|KB
0 }. Hence, the problem of computing Pr 1
{D|KB} reduces to
a series of computations of the form Pr 1
{E|KB 0
} for various complete descriptions E.
Fix any such description E. Recall that E can be decomposed into three parts:
the
unary part E
1 , the non-unary part E
>1
, and the equality part E
=
. Since E is in A{037
6=
},
we conclude that037
6=
is equivalent to E
=
. Using Theorem 3.16 twice and some probabilistic
reasoning, we get:
Pr
1
{E >1
^E
1 ^ E
=
|KB 0
} = Pr
1 {E
>1
^ E
1
^ E
=
|KB
0
^ 037
6=
} = Pr
1
{E >1
^ E
1
|KB
0
^ 037
6=
} = Pr
1
{E >1
|KB
0
^ 037
6= ^ E
1
} 001 Pr
1
{E 1
|KB
0 ^ 037
6=
}
= Pr
1
{E
>1
|KB
0
^ 037 6=
^ E
1 } 001 Pr
1
{E
1
|KB 0
}:
In order to simplify the first expression, recall that none of the predicate
symbols in E >1
occur anywhere in KB
0
^ 037 6=
^ E
1 . Therefore, the probability of E
>1
given KB
0
^ 037 6=
is
equal to the probability that the elements denoting the jZj {different}
constants satisfy some particular configuration of non-unary properties.
It should be clear that, by symmetry, all
such configurations are equally likely. Therefore, the probability of any
one of them is a
constant, equal to 1 over the total number of configurations.
14
Let 032 denote the constant
which is equal to Pr
1
{E >1
|KB
0
^ 037
6= ^ E
1
} for all E.
The last step is to show that, if E
1
is equivalent to
V
m j=1
A
i j
{c
j }, then Pr
1
{E
1
|KB 0
} =
F [D]
{~v}:
Pr
1
{
m
^
j=1
A
i
j {c
j
}|KB
0
} = Pr
1
{A
i
1
{c
1
}|
m
^
j=2
A
i
j
{c
j
} ^ KB
0
} 001 Pr
1
{A
i 2
{c
2 }|
m
^ j=3
A
i j
{c
j } ^ KB
0
}
001 : : : 001 Pr
1
{A
i
m0001
{c
m0001
}|A
i
m
{c
m
} ^ KB
0
} 001 Pr
1
{A
i
m
{c
m
}|KB 0
}
= v i
1
001 : : : 001 v
i
m
{using Theorem 4.11; see below}
= F
[D]
{~v}: The first step is simply probabilistic reasoning. The second step
uses m applications of
Theorem 4.11. It is easy to see that A
i j
{c
j } is a simple query for A i
j+1
{c
j+1
} ^ : : : ^
A
i
m
{c
m
} ^ KB
0 . We would like to show that Pr
1
{A i
j
{c j
}|
m ^
`=j+1 A
i
`
{c
`
} ^ KB
0
} = Pr 1
{A
i j
{c
j }|KB
0
} = v
i
j ;
where Theorem 4.11 justifies the last equality. To prove the first equality,
we show that for
all j, the spaces S
~
0
[KB
0
] and S
~ 0
[
V
m
`=j+1
A
i
`
{c
j
} ^ KB 0
] have the same maximum-entropy
point, namely ~v. This is proved by backwards induction; the j = m case
is trivially true.
The difference between the {j000 1}st and jth case is the added conjunct
A
i
j
{c
j
}, which
amounts to adding the new constraint w
i
j > 0. There are two possibilities. First, if v
i j
> 0,14. Although we do not need the value of this constant in our calculations
below, it is in fact easy to verify that its value is
Q R2{010000011} 2
m
arity{R}
, where m = jZj.
82Random Worlds and Maximum Entropy
then ~v satisfies this new constraint anyway and so remains the maximum-entropy
point,
completing this step of the induction. If v
i j
= 0 this is not the case, and indeed, the
property we are trying to prove can be false {for j < m}. Butthis does not
matter, because we then know that Pr
1 {A
i
j {c
j
}| V
m
`=j+1
A
i
` {c
`
}^KB
0
} = Pr 1
{A
i j
{c
j }|KB
0
} = v
i
j = 0. Since
both of the products in question include a 0 factor, it is irrelevant as
to whether the other
terms agree. We can now put everything together to conclude that
Pr
1
{D|KB} =
Pr
1
{D|KB 0
}P
E2A{ ^037
6=
}
Pr
1
{E|KB
0
}
= F
[D]
{~v}P E2A{ ^037 6=
}
F [E]
{~ v}
;
proving part {b}.We now address the issue of computing Pr
1
{'|KB} for an arbitrary formula '. To do
that, we must first investigate the behavior of Pr ~034
1
{'|KB} for small ~034 . Fix some sufficiently small ~034 > 0, and let
Q be the set of maximum-entropy points of S
~034
[KB]. Assume KB and ~034 are stable for 033
003
. By definition, this means that for every ~v 2 Q, we have 033{~v} = 033
003
. Let
I bethe set of i's for which 033
003
contains the conjunct 9xA
i
{x}. Since 033{~v} = 033
003
for all ~v,
we must have that v
i > 0 for all i 2 I. Since Q is a closed set, this implies that there exists
some
*  > 0 such that for all ~v 2 Q and for all i 2 I, we have v
i >
* . Let 022[
* ] be the formula
^
i2I
jjA
i
{x}jj
x
>
* :
The following proposition is now easy to prove:
Proposition D.3: Suppose that KB and ~034 arestable for 033
003 and that Q, i, 022[
* ], and 037
6=
are as above. Then
Pr
~034 1
{'|KB} =
X
D2A{ }
Pr
~034
1
{'|KB
0
^ 022[
* ] ^ 033
003 ^ D} 001 Pr
~034
1
{D|KB}:
Proof: Clearly,022[
* ] satisfies the conditions of Corollary 3.14, allowing us to conclude that
Pr
~034 1
{022[
* ]|KB} = 1. Similarly, byTheorem 4.24 and the assumptions of Theorem 4.28,
we can conclude that Pr ~034
1
{033
003
|KB} = 1. Since the conjunction of two assertions that have
probability 1 also has probability 1, we can use Theorem 3.16 to conclude
that Pr
~034 1
{'|KB} =
Pr
~034 1
{'|KB ^022[
* ] ^ 033 003
}.
Now, recall that   is equivalent to the disjunction
W
D2A{ }
D. By straightforward
probabilistic reasoning, we can therefore conclude that
Pr ~034
1
{'|KB ^022[
* ] ^ 033
003
} = X
D2A{ }
Pr
~034 1
{'|KB ^022[
* ] ^ 033 003
^ D} 001 Pr
~034
1 {D|KB ^022[
* ] ^ 033
003 }:
By Theorem 3.16 again, Pr ~034
1
{D|KB ^022[
* ] ^ 033
003
} = Pr
~034
1 {D|KB}. The desired expression now
follows.We now simplify the expression Pr
~034
1
{'|KB ^022[
* ] ^ 033
003 ^ D}:
83Grove, Halpern, & Koller Proposition D.4: For', KB , 033
003
, D, and 022[
* ] as above, if Pr
~034
1
{D|KB} > 0, then
Pr
~034
1 {'|KB ^ 022[
* ] ^ 033
003 ^ D} = Pr
1 {'|033
003 ^ D};
and its value is either 0 or 1. Note that since the latter probability only
refers to first-order formulas, it is independent of the tolerance values.
Proof: That the right-hand side is either 0 or 1 is proved in {Grove et
al., 1993b}, where it
is shown that the asymptotic probability of any pure first-order sentence
when conditioned on knowledge of the form 033 003
^ D {which is, essentially, what was called a model description
in {Grove et al., 1993b}) is either 0 or 1. Very similar techniques can
be used to show that
the left-hand side is also either 0 or 1, and that the conjuncts KB ^022[
* ] do not affect this
limit {so that the left-hand side and the right-hand side are in fact equal}.
We briefly sketch
the relevant details here, referring the reader to {Grove et al., 1993b}
for full details.
The idea {which actually goes back to Fagin {1976}) is to associate with
a model descrip-
tion such as 033
003 ^ D a theory T which essentially consists of extension axioms. Intuitively,
anextension axiom says that any finite substructure of the model defined
by a complete
description D
0
can be extended in all possible ways definable by another description D
00
. We
say that a description D
00
0
if all conjuncts of D
0
are also conjuncts in D
00
. An extension axiom has the form 8x 1
; : : : ; x j
{D
0 } 9x
j+1 D
00
}, where D
0
is a complete description over X = {x
1
; : : :; x j
} and D
00
is a complete description over X [ fx
j+1}, such
that D 00
extends D
0
, both D
0 and D
00
extend D, and both are consistent with 033
003
. It is
then shown that {a} T is complete {so that for each formula 030, either
T|= 030 or T|= :030}
and {b} if 030 2 T then Pr
1 {030|033
003 ^ D} = 1. From {b} it easily follows that if T |= 030, then
Pr 1
{030|033 003
^ D} is also 1. Using {a}, the desired 0-1 law follows. The only difference
from
the proof in {Grove et al., 1993b} is that we need to show that {b} holds
even when we condition on KB ^ 022[
* ] ^ 033
003 ^ D, instead of just on 033
003
^ D.
So suppose 030 is the extension axiom 8x
1
; : : :; x
j
{D
0
} 9x j+1
D
00 }. We must show that Pr
1
{030|KB ^ 022[
* ] ^ 033
003
^ D} = 1. We first want to show that the right-hand side of the
conditional is consistent. As observed in the previous proof, it follows
from Theorem 3.16
that Pr
1
{D|KB} = Pr
~034
1
{'|KB ^022[
* ] ^ 033
003
}. Sincewe are assuming that Pr
1
{D|KB} > 0, it
follows that Pr
1
{KB ^ 022[
* ] ^ 033
003
^ D} > 0, and hence KB ^ 022[
* ] ^ 033
003 ^ D must be consistent. Fix a domain size N and consider the set of
worlds satisfying KB ^022[
* ] ^ 033
003 ^ D. Now
consider some particular j domain elements, say d
1
; : : : ; d j
, that satisfy D 0
. Observe that, since D
0
extends D, the denotations of the constants are all among d
1
; : : : ; d
j
. For a given d 62 fd
1
; : : : ; d
j
}, let B{d} denote the event that d
1
;: : : ; d
j
; d satisfy D
00
, given that d
1
;: : : ; d j
satisfy D
0
. What is the probability of B{d} given KB ^022[
* ] ^ 033 003
^ D? First, note that since
d does not denote any constant, it cannot be mentioned in any way in the
knowledge base.
Thus, this probability is the same for all d. The description D
00
determines two types of properties for x
j+1
. The unary properties of x
j+1
itself|i.e., the atom A
i
to which x
j+1
mustbelong|and the relations between x j+1
and the remaining variables x
1
; : : : ; x
j
using the non-unary predicate symbols. Since D
00
is consistent with 033
003
, the description 033
003
mustcontain a conjunct 9x A
i
{x} if D
00 implies A
i
{x
j+1
}. By definition, 022[
* ] must therefore
contain the conjunct jjA
i
{x}jj
x
>
* . Hence, the probability of picking d in A
i
is at least
* . For any sufficiently large N , the probability of picking d in A
i
which is different from
d
1 ; : : : ; d
j {as required by the definition of the extension axiom} is at least
* =2 > 0. The
84Random Worlds and Maximum Entropy
probability that d 1
; : : : ; d
j
; d also satisfy the remaining conjuncts of D
00
, given that d is
in atom A
i
and d
1
; : : :; d
j
satisfy D
0
, is very small but bounded away from 0. {For this
to hold, we need the assumption that the non-unary predicates are not mentioned
in the
KB .} This is the case because the total number of possible waysto choose
the properties of d {as they relate to d 1
; : : : ; d
j
} is independent of N . We can therefore conclude that the probability of
B{d} {for sufficiently large N }, given that d
1
; : : : ; d
j
satisfy D, is bounded
away from 0 by some 025 independent of N . Since the properties of an element
d and its
relation to d 1
; : : :; d
j
can be chosen independently of the properties of a different element d
0
, the different events B{d}; B{d
0
}; : : : are all independent. Therefore, the probability that
there is no domain element at all that, together with d 1
; : : :; d j
, satisfies D 00
is at most
{1 000 025}
N000j
. This bounds the probability of the extension axiom being false, relative
to fixed d
1
; : : : ; d
j
. There are
000
N j
001
ways of these choosing j elements, so the probability of the
axiom being false anywhere in a model is at most
000
N j
001
{1 000 025}
N000j . This tends to 0 as N goes
to infinity. Therefore, the extension axiom 8x
1 ; : : :; x
j {D
0
} 9x
j+1
D 00
} has asymptotic
probability 1 given KB ^ 022[
* ] ^ 033
003 ^ D, as desired.Finally, we are in a position to prove Theorem 4.28.
Theorem 4.28: Let ' be a formula in L
031
and let KB = KB
0
^   be an essentially
positive knowledge base in L
031 1
which is separable with respect to '. Let Z bethe set of
constantsappearing in ' or in   {so that KB 0
contains none of the constants in Z} and
let 037 6=
be the formula V
c;c
0
2Z
c 6= c
0
. Assume that there exists a size description 033
003 such
that, for all ~034 > 0, KB and ~034 arestable for 033
003
, and that the space S
~
0
[KB ] has a unique
maximum-entropy point~v. Then
Pr
1
{'|KB} =
P
D2A{ ^037
6=
}
Pr
1
{'|033
003
^ D}F [D]
{~v}P
D2A{ ^037
6=
}
F [D]
{~ v}
if the denominator is positive. Proof: Assume without loss of generality
that   mentions all the constant symbols in ',
so that A{  ^ 037
6=
} 022 A{ }. ByProposition D.3,
Pr ~034
1
{'|KB} =
X D2A{ }
Pr
~034
1 {'|KB ^022[
* ] ^ 033
003 ^ D} 001 Pr
~034
1
{D|KB}:
Note that we cannot easily take limits of Pr ~034
1
{'|KB ^022[
* ] ^ 033
003
^ D} as ~034 goes to
~
0, because this expression depends on 022[
* ] and the value of
*  used depends on the choice of ~034 . However,
applying Proposition D.4, we get
Pr
~034
1
{'|KB} =
X D2A{ }
Pr
1
{'|033
003
^ D} 001 Pr
~034 1
{D|KB}:
We can now take the limit as~034 goes to ~
0. To do this, we use Proposition D.2. The
hypotheses of the theorem imply that Pr
1 { |KB
0 } > 0 {for otherwise, the denominator P
D2A{ ^037
6=
} F
[D]
{~v} would be zero}. Part {a} of the proposition tells us we can ignore
those
complete descriptions that are inconsistent with 037
6=
. We can now apply part {b} to get the
desired result.85Grove, Halpern, & Koller Acknowledgements
We are very grateful to Professor Gregory Brumfiel, of the Department of
Mathematics at
StanfordUniversity, for his invaluable help with the proof of Proposition
B.5. We would like
to thank Fahiem Bacchus, with whom we started working on this general area
of research,
and Moshe Vardi for useful comments on a previous draft of this paper. A
preliminary
version of this paper appeared in Proc. 7th IEEE Symposium on Logic in Computer
Science. Some of this research was performed while Adam Grove and Daphne
Koller were at Stanford
in part
by the Air Force Office of Scientific Research {AFSC}, under Contract F49620-91-C-0080,
by an IBM Graduate Fellowship, and by a University of California President's
Postdoctoral
Fellowship. References
Bacchus, F. {1990}. Representing and Reasoning with Probabilistic Knowledge.
MITPress, Cambridge, Mass.
Bacchus, F., Grove, A. J., Halpern, J. Y., & Koller, D. {1994}. Fromstatistical
knowledge
bases to degrees of belief. Tech. rep. 9855, IBM. Available by anonymous
ftp from logos.uwaterloo.ca/pub/bacchus or via WWW at http://logos.uwaterloo.ca.
A prelim-
inary version of this work appeared in Proc. Thirteenth International Joint
Conference
on Artificial Intelligence {IJCAI '93}, 1993, pages 563{569. Bochnak, J.,
Coste, M., & Roy, M. {1987}. Geom etrie Algebrique Reelle, Vol. 12 of
A Series
of Modern Surveys in Mathematics. Springer-Verlag, Berlin Heidelberg.
Carnap, R. {1950}. Logical Foundations of Probability. University of Chicago
Press, Chicago.
Carnap, R. {1952}. The Continuum of Inductive Methods. University of Chicago
Press, Chicago.
Cheeseman, P. C. {1983}. A method of computing generalized Bayesian probability
val-
uesfor expert systems. In Proc. Eighth International Joint Conference on
Artificial
Intelligence {IJCAI '83}, pp. 198{202.
Denbigh, K. G., & Denbigh, J. S. {1985}. Entropy in Relation to Incomplete
Knowledge.
Cambridge University Press, Cambridge, U.K.
Fagin, R. {1976}. Probabilities on finite models. Journal of Symbolic Logic,
41 {1}, 50{58.
Fagin, R., Halpern, J. Y., & Megiddo, N. {1990}. A logic for reasoning about
probabilities.
Information and Computation, 87 {1/2}, 78{128.
Geffner, H., & Pearl, J. {1990}. A framework for reasoning with defaults.
In Kyburg, Jr., H. E., Loui, R., & Carlson, G. {Eds.}, Knowledge Representation
and Defeasible
Reasoning,pp. 245{26. Kluwer Academic Press, Dordrecht, Netherlands. 86Random
Worlds and Maximum Entropy
Ginsberg, M. L. {Ed.}. {1987}. Readingsin Nonmonotonic Reasoning. Morgan
Kaufmann,
San Mateo, CA, San Francisco.
Glebski025020, Y. V., Kogan, D. I., Liogon'ki025020, M. I., & Talanov,
V. A. {1969}. Range and degree
of realizability offormulas in the restricted predicate calculus. Kibernetika,
2, 17{28. Goldman, S. A. {1987}. Efficient methods for calculating maximum
entropy distributions.
Master's thesis, MIT EECS Department.
Goldszmidt, M., Morris, P., & Pearl, J. {1990}. A maximum entropy approach
to nonmono-
tonic reasoning. In Proc. National Conference on Artificial Intelligence
{AAAI '90},
pp. 646{652. Goldszmidt, M., Morris, P., & Pearl, J. {1993}. A maximum entropy
approach to nonmono-
tonic reasoning. IEEE Transactions of Pattern Analysis and Machine Intelligence,
15 {3}, 220{232.
Grandjean, E. {1983}. Complexityof the first{order theory of almost all
structures. Infor-
mation and Control, 52, 180{204.
Grove, A. J., Halpern, J. Y., & Koller, D. {1992}. Randomworlds and maximum
entropy.
In Proc. 7th IEEE Symp. on Logic in Computer Science, pp. 22{33.
Grove, A. J., Halpern, J. Y., & Koller, D. {1993a}. Asymptotic conditional
probabilities: thenon-unary case. Researchreport RJ 9564, IBM.
Grove, A. J., Halpern, J. Y., & Koller, D. {1993b}. Asymptotic conditional
probabilities:
the unary case. Research report RJ 9563, IBM. To appear in SIAM Journal
on
Computing.
Halpern, J. Y. {1990}. An analysis of first-order logics of probability.
ArtificialIntelligence,
46, 311{350.
Jaynes,E. T. {1957}. Information theory and statistical mechanics. Physical
Review, 106 {4}, 620{630.
Jaynes, E. T. {1978}. Where do we stand on maximum entropy?. In Levine,
R. D., & Tribus,
M. {Eds.}, The Maximum Entropy Formalism, pp. 15{118. MIT Press, Cambridge,
Mass.
Jaynes, E. T. {1982}. On the rationale of maximum-entropy methods. Proc.
IEEE, 70 {9},
939{952. Jaynes, E. T. {1983}. Concentration of distributions at entropy
maxima. In Rosenkrantz,
R. D. {Ed.}, E. T. Jaynes: Papers on Probability, Statistics, and Statistical
Physics, pp. 315{336. Kluwer, Dordrecht, Netherlands.
Keynes, J. M. {1921}. A Treatise on Probability. Macmillan, London.
87Grove, Halpern, & Koller Koller, D., & Halpern, J. Y. {1992}. A logic
for approximate reasoning. InNebel, B., Rich,
C., & Swartout, W. {Eds.}, Proc. Third International Conference on Principles
of
Knowledge Representation and Reasoning {KR '92}, pp. 153{164. Morgan Kaufmann,
San Mateo, CA, San Francisco. Kyburg, Jr., H. E. {1983}. The reference class.
Philosophyof Science,50 {3}, 374{397.
Laplace, P. S. d. {1820}. Essai Philosophique sur les Probabilites. English
translation is
PhilosophicalEssay on Probabilities, Dover Publications, New York, 1951.
Luce, R. D., & Raiffa, H. {1957}. Games and Decisions. Wiley, New York. Nilsson,
N. {1986}. Probabilistic logic. Artificial Intelligence, 28, 71{87.
Paris, J. B., & Vencovska, A. {1989}. On the applicability of maximum entropy
to inexact
reasoning. International Journal of Approximate Reasoning, 3, 1{34.
Pearl, J. {1988}. Probabilistic Reasoning in Intelligent Systems. Morgan
Kaufmann, San Francisco,Calif.
Pearl, J. {1989}. Probabilistic semantics for nonmonotonic reasoning: A
survey. In Brach-
man, R. J., Levesque, H. J., & Reiter, R. {Eds.}, Proc. First International
Conference
on Principles of Knowledge Representation and Reasoning {KR '89}, pp. 505{516.
Reprinted in Readings in Uncertain Reasoning, G. Shafer and J. Pearl {eds.},
Morgan Kaufmann, San Francisco, Calif., 1990, pp. 699{710.
Pollock, J. L. {1984}. Foundations for direct inference. Theory and Decision,
17, 221{256. Reichenbach, H. {1949}. Theory of Probability. University of
California Press, Berkeley.
Shannon, C., & Weaver, W. {1949}. The Mathematical Theory of Communication.
Univer-
sity of Illinois Press.
Shastri, L. {1989}. Defaultreasoning in semantic networks: a formalization
of recognition
and inheritance. Artificial Intelligence, 39 {3}, 285{355.
Spiegelhalter, D. J. {1986}. Probabilistic reasoning in predictive expert
systems. InKamal,
L. N., & Lemmer, J. F. {Eds.}, Uncertainty in Artificial Intelligence, pp.
47{68. North- Holland, Amsterdam.
Tarski, A. {1951}. A Decision Method for Elementary Algebra and Geometry
{2nd edition}.
Univ. of California Press.
88```