Least Generalizations and Greatest Specializations of Sets of Clauses

S. H. Nienhuys-Cheng

Volume 4, 1996

Links to Full Text:

Abstract:
The main operations in Inductive Logic Programming (ILP) are generalization and specialization, which only make sense in a generality order. In ILP, the three most important generality orders are subsumption, implication and implication relative to background knowledge. The two languages used most often are languages of clauses and languages of only Horn clauses. This gives a total of six different ordered languages. In this paper, we give a systematic treatment of the existence or non-existence of least generalizations and greatest specializations of finite sets of clauses in each of these six ordered sets. We survey results already obtained by others and also contribute some answers of our own. Our main new results are, firstly, the existence of a computable least generalization under implication of every finite set of clauses containing at least one non-tautologous function-free clause (among other, not necessarily function-free clauses). Secondly, we show that such a least generalization need not exist under relative implication, not even if both the set that is to be generalized and the background knowledge are function-free. Thirdly, we give a complete discussion of existence and non-existence of greatest specializations in each of the six ordered languages.

Extracted Text


Journal of Artificial Intelligence Research 4 (1996) 341-363 Submitted 11/95;
published 5/96

Least Generalizations and Greatest Specializations

of Sets of Clauses

Shan-Hwei Nienhuys-Cheng cheng@cs.few.eur.nl Ronald de Wolf bidewolf@cs.few.eur.nl
Erasmus University of Rotterdam Department of Computer Science, H4-19 P.O.
Box 1738, 3000 DR Rotterdam, the Netherlands

Abstract The main operations in Inductive Logic Programming (ILP) are generalization
and specialization, which only make sense in a generality order. In ILP,
the three most important generality orders are subsumption, implication and
implication relative to background knowledge. The two languages used most
often are languages of clauses and languages of only Horn clauses. This gives
a total of six different ordered languages. In this paper, we give a systematic
treatment of the existence or non-existence of least generalizations and
greatest specializations of finite sets of clauses in each of these six ordered
sets. We survey results already obtained by others and also contribute some
answers of our own.

Our main new results are, firstly, the existence of a computable least generalization
under implication of every finite set of clauses containing at least one
non-tautologous function-free clause (among other, not necessarily function-free
clauses). Secondly, we show that such a least generalization need not exist
under relative implication, not even if both the set that is to be generalized
and the background knowledge are function-free. Thirdly, we give a complete
discussion of existence and non-existence of greatest specializations in
each of the six ordered languages.

1. Introduction Inductive Logic Programming (ILP) is a subfield of Logic
Programming and Machine Learning that tries to induce clausal theories from
given sets of positive and negative examples. An inductively inferred theory
should imply all of the positive and none of the negative examples. For instance,
suppose we are given P (0), P (s2(0)), P (s4(0)), P (s6(0)) as positive examples
and P (s(0)); P (s3(0)); P (s5(0)) as negative examples.1 Then the set Sigma
= fP (0); (P (s2(x))  P (x))g is a solution: it implies all positive and
no negative examples. Note that this set can be seen as a description of
the even integers, learned from these examples. Thus induction of clausal
theories is a form of learning from examples. For a more extensive introduction
to ILP, we refer to (Lavra^c & D^zeroski, 1994; Muggleton & De Raedt, 1994).

Learning from examples means modifying a theory to bring it more in accordance
with the examples. The two main operations in ILP for modification of a theory
are generalization and specialization. Generalization strengthens a theory
that is too weak, while specialization weakens a theory that is too strong.
These operations only make sense within a generality order. This is a relation
stating when some clause is more general than some other clause.

1. Here s

2(0) abbreviates s(s(0)), s3(0) abbreviates s(s(s(0))), etc.

cfl1996 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved.

Nienhuys-Cheng & de Wolf The three most important generality orders used
in ILP are subsumption (also called `-subsumption), logical implication and
implication relative to background knowledge.2 In the subsumption order,
we say that clause C is more general than D--or, equivalently, D is more
specific than C--in case C subsumes D. In the implication order C is more
general than D if C logically implies D. Finally, C is more general than
D relative to background knowledge Sigma  (Sigma  is a set of clauses),
if fCg [ Sigma  logically implies D.

Of these three orders, subsumption is the most tractable. In particular,
subsumption is decidable, whereas logical implication is not decidable, not
even for Horn clauses, as established by Marcinkowski and Pacholski (1992).
In turn, relative implication is harder than implication: both are undecidable,
but proof procedures for implication need to take only derivations from fCg
into account, whereas a proof procedure for relative implication should check
all derivations from fCg [ Sigma .

Within a generality order, there are two approaches to generalization or
specialization. The first approach generalizes or specializes individual
clauses. We do not discuss this in any detail in this paper, and merely mention
it for completeness' sake. This approach can be traced back to Reynolds'
(1970) concept of a cover. It was implemented for example by Shapiro (1981)
in the subsumption order in the form of refinement operators. However, a
clause C which implies another clause D need not subsume D. For instance,
take C = P (f (x))  P (x) and D = P (f 2(x))  P (x). Then C does not subsume
D, but C j= D. Thus subsumption is weaker than implication. A further sign
of this weakness is the fact that tautologies need not be subsume-equivalent,
even though they are logically equivalent.

The second approach generalizes or specializes sets of clauses. This is the
approach we will be concerned with in this paper. Here the concept of a least
generalization3 is important. The use of such least generalizations allows
us to generalize cautiously, avoiding over-generalization. Least generalizations
of sets of clauses were first discussed by Plotkin (1970, 1971a, 1971b).
He proved that any finite set S of clauses has a least generalization under
subsumption (LGS). This is a clause which subsumes all clauses in S and which
is subsumed by all other clauses that also subsume all clauses in S. Positive
examples can be generalized by taking their LGS.4 Of course, we need not
take the LGS of all positive examples, which would yield a theory consisting
of only one clause. Instead, we might divide the positive examples into subsets,
and take a separate LGS of each subset. That way we obtain a theory containing
more than one clause.

For this second approach, subsumption is again not fully satisfactory. For
example, if S consists of the clauses D1 = P (f 2(a))  P (a) and D2 = P (f
(b))  P (b), then the LGS of S is P (f (y))  P (x). The clause P (f (x))
P (x), which seems more appropriate as a least generalization of S, cannot
be found by Plotkin's approach, because it does not subsume D1. As this example
also shows, the subsumption order is particularly unsatisfactory when we
consider recursive clauses: clauses which can be resolved with themselves.

2. There is also relative subsumption (Plotkin, 1971b), which will be briefly
touched in Section 4. 3. Least generalizations are also often called least
general generalizations, for instance by Plotkin (1971b),

Muggleton and Page (1994), Idestam-Almquist (1993, 1995), Niblett (1988),
though not by Plotkin (1970), but we feel this `general' is redundant. 4.
There is also a relation between least generalization under subsumption and
inverse resolution (Muggleton, 1992).

342

Least Generalizations and Greatest Specializations Because of the weakness
of subsumption, it is desirable to make the step from the subsumption order
to the more powerful implication order. Accordingly, it is important to find
out whether Plotkin's positive result on the existence of LGS's holds for
implication as well. So far, the question whether any finite set of clauses
has a least generalization under implication (LGI) has only been partly answered.
For instance, Idestam-Almquist (1993, 1995) studies least generalizations
under T-implication as an approximation to LGI's. Muggleton and Page (1994)
investigate self-saturated clauses. A clause is self-saturated if it is subsumed
by any clause which implies it. A clause D is a self-saturation of C if C
and D are logically equivalent and D is self-saturated. As Muggleton and
Page (1994) state, if two clauses C1 and C2 have self-saturations D1 and
D2, then an LGS of D1 and D2 is also an LGI of C1 and C2. This positively
answers our question concerning the existence of LGI's in the case of clauses
which have a self-saturation. However, Muggleton and Page also show that
there exist clauses which have no self-saturation. Hence the concept of self-saturation
cannot solve our question in general.

Use of the third generality order, relative implication, is even more desirable
than the use of "plain" implication. Relative implication allows us to take
background knowledge into account, which can be used to formalize many useful
properties and relations of the domain of application. For this reason, least
generalizations under implication relative to background knowledge also deserve
attention.

Apart from the least generalization, there is also its dual: the greatest
specialization. Greatest specializations have been accorded much less attention
in ILP than least generalizations, but the concept of a greatest specialization
may nevertheless be useful (see the beginning of Section 6).

In this paper, we give a systematic treatment of the existence and non-existence
of least generalizations and greatest specializations, applied to each of
these three generality orders. Apart from distinguishing between these three
orders, we also distinguish between languages of general clauses and more
restricted languages of Horn clauses. Though most researchers in ILP restrict
attention to Horn clauses, general clauses are also sometimes used (Plotkin,
1970, 1971b; Shapiro, 1981; De Raedt & Bruynooghe, 1993; Idestam-Almquist,
1993, 1995). Moreover, many researchers who do not use general clauses actually
allow negative literals to appear in the body of a clause. That is, they
use clauses of the form A  L1; : : : ; Ln, where A is an atom and each Li
is a literal. These are called program clauses (Lloyd, 1987). Program clauses
are in fact logically equivalent to general clauses. For instance, the program
clause P (x)  Q(x); :R(x) is equivalent to the non-Horn clause P (x) . :Q(x)
. R(x). For these two reasons we consider not only languages of Horn clauses,
but also pay attention to languages of general clauses.

The combination of three generality orders and two different possible languages
of clauses gives a total of six different ordered languages. For each of
these, we can ask whether least generalizations (LG's) and greatest specializations
(GS's) always exist. We survey results already obtained by others and also
contribute some answers of our own. For the sake of clarity, we will summarize
the results of our survey right at the outset. In the following table `+'
signifies a positive answer, and `Gamma ' means a negative answer.

343

Nienhuys-Cheng & de Wolf

Horn clauses General clauses Quasi-order LG GS LG GS Subsumption (*) + +
+ + Implication (j=) Gamma  Gamma  + for function-free + Relative implication
(j=Sigma ) Gamma  Gamma  Gamma  +

Table 1: Existence of LG's and GS's

Our own contributions to this table are threefold. First and foremost, we
prove that if S is a finite set of clauses containing at least one non-tautologous
function-free clause5 (apart from this non-tautologous function-free clause,
S may contain an arbitrary finite number of other clauses, including clauses
which contain functions), then there exists a computable LGI of S. This result
is on the one hand based on the Subsumption Theorem for resolution (Lee,
1967; Kowalski, 1970; Nienhuys-Cheng & de Wolf, 1996), which allows us to
restrict attention to finite sets of ground instances of clauses, and on
the other hand on a modification of some proofs concerning T-implication
which can be found in (Idestam-Almquist, 1993, 1995). An immediate corollary
of this result is the existence and computability of an LGI of any finite
set of function-free clauses. As far as we know, both our general LGI-result
and this particular corollary are new results.

Niblett (1988, p. 135) claims that "it is simple to show that there are lggs
if the language is restricted to a fixed set of constant symbols since all
Herbrand interpretations are finite." Yet even for this special case of our
general result, it appears that no proof has been published. Initially, we
found a direct proof of this case, but this was not really any simpler than
the proof of the more general result that we give in this paper. Niblett's
idea that the proof is simple may be due to some confusion about the relation
between Herbrand models and logical implication (which is defined in terms
of all models, not just Herbrand models). We will describe this at the end
of Subsection 5.1. Or perhaps one might think that the decidability of implication
for function-free clauses immediately implies the existence of an LGI. But
in fact, decidability is not a sufficient condition for the existence of
a least generalization. For example, it is decidable whether one function-free
clause C implies another function-free clause D relative to function-free
background knowledge. Yet least generalizations relative to function-free
background knowledge do not always exist, as we will show in Section 7.

Our LGI-result does not solve the general question of the existence of LGI's,
but it does provide a positive answer for a large class of cases: the presence
of one non-tautologous function-free clause in a finite S already guarantees
the existence and computability of an LGI of S, no matter what other clauses
S contains.6 Because of the prominence of functionfree clauses in ILP, this
case may be of great practical signifcance. Often, particularly in implementations
of ILP-systems, the language is required to be function-free, or function

5. A clause which only contains constants and variables as terms. 6. Note
that even for function-free clauses, the subsumption order is still not enough.
Consider D1 =

P (x; y; z)  P (y; z; x) and D2 = P (x; y; z)  P (z; x; y) (this example
is adapted from IdestamAlmquist). D1 is a resolvent of D2 and D2 and D2 is
a resolvent of D1 and D1. Hence D1 and D2 are logically equivalent. This
means that D1 is an LGI of the set fD1; D2g. However, the LGS of these two
clauses is P (x; y; z)  P (u; v; w), which is clearly an over-generalization.

344

Least Generalizations and Greatest Specializations symbols are removed from
clauses and put in the background knowledge by techniques such as flattening
(Rouveirol, 1992). Well-known ILP-systems such as Foil (Quinlan & Cameron-Jones,
1993), Linus (Lavra^c & D^zeroski, 1994) and Mobal (Morik, Wrobel, Kietz,
& Emde, 1993) all use only function-free clauses. More than one half of the
ILPsystems surveyed by Aha (1992) is restricted to function-free clauses.
Function-free clauses are also sufficient for most applications concerning
databases.

Our second contribution shows that a set S need not have a least generalization
relative to some background knowledge Sigma , not even when S and Sigma
are both function-free.

Thirdly, we contribute a complete discussion of existence and non-existence
of greatest specializations in each of the six ordered languages. In particular,
we show that any finite set of clauses has a greatest specialization under
implication. Combining this with the corollary of our result on LGI's, it
follows that a function-free clausal language is a lattice.

2. Preliminaries In this section we will define some of the concepts we need.
For the definitions of `model', `tautology', `substitution', etc., we refer
to standard works such as (Chang & Lee, 1973; Lloyd, 1987). A positive literal
is an atom, a negative literal is the negation of an atom. A clause is a
finite set of literals, which is treated as the universally quantified disjunction
of those literals. A definite program clause is a clause with one positive
and zero or more negative literals and a definite goal is a clause without
positive literals. A Horn clause is either a definite program clause or a
definite goal. If C is a clause, we use C+ to denote the positive literals
in C, and CGamma  to denote the negative literals in C. The empty clause,
which represents a contradiction, is denoted by 2.

Definition 1 Let A be an alphabet of the first-order logic. Then the clausal
language C by A is the set of all clauses which can be constructed from the
symbols in A. The Horn language H by A is the set of all Horn clauses which
can be constructed from the symbols in A. 2

In this paper, we just presuppose some arbitrary alphabet A, and consider
the clausal language C and Horn language H based on this A. We will now define
three increasingly strong generality orders on clauses: subsumption, implication
and relative implication.

Definition 2 Let C and D be clauses and Sigma  be a set of clauses. We say
that C subsumes D, denoted as C * D, if there exists a substitution ` such
that C` ` D.7 C and D are subsume-equivalent if C * D and D * C.

Sigma  (logically) implies C, denoted as Sigma  j= C, if every model of
Sigma  is also a model of C. C (logically) implies D, denoted as C j= D,
if fCg j= D. C and D are (logically) equivalent if C j= D and D j= C.

C implies D relative to Sigma , denoted as C j=Sigma  D, if Sigma  [ fCg
j= D. C and D are equivalent relative to Sigma  if C j=Sigma  D and D j=Sigma
C. 2

7. Right from the very first applications of subsumption in ILP, there has
been some controversy about

the symbol used for subsumption: Plotkin (1970) used `^', while Reynolds
(1970) used `*'. We use `*' here, similar to Reynolds' `*', because we feel
it serves the intuition to view C as somehow "bigger" or "stronger" than
D, if C * D holds.

345

Nienhuys-Cheng & de Wolf If C does not subsume D, we write C 6* D. Similarly,
we use the notation C 6j= D and C 6j=Sigma  D.

If C * D, then C j= D. The converse does not hold, as the examples in the
introduction showed. Similarly, if C j= D, then C j=Sigma  D, and again
the converse need not hold. Consider C = P (a) . :P (b), D = P (a), and Sigma
= fP (b)g: then C j=Sigma  D, but C 6j= D.

We now proceed to define a proof procedure for logical implication between
clauses, using resolution and subsumption.

Definition 3 If two clauses have no variables in common, then they are said
to be standardized apart.

Let C1 = L1 . : : : . Li . : : : . Lm and C2 = M1 . : : : . Mj . : : : .
Mn be two clauses which are standardized apart. If the substitution ` is
a most general unifier (mgu) of the set fLi; :Mjg, then the clause ((C1 Gamma
Li) [ (C2 Gamma  Mj ))` is called a binary resolvent of C1 and C2. The literals
Li and Mj are said to be the literals resolved upon. 2

If C1 and C2 are not standardized apart, we can take a variant C02 of C2,
such that C1 and C02 are standardized apart. For simplicity, a binary resolvent
of C1 and C02 is also called a binary resolvent of C1 and C2 itself.

Definition 4 Let C be a clause and ` an mgu of fL1; : : : ; Lng ` C (n *
1). Then the clause C` is called a factor of C. 2

Note that any non-empty clause C is a factor of itself, using the empty substitution
" as an mgu of a single literal in C.

Definition 5 A resolvent C of clauses C1 and C2 is a binary resolvent of
a factor of C1 and a factor of C2, where the literals resolved upon are the
literals unified in the respective factors. C1 and C2 are the parent clauses
of C. 2

Definition 6 Let Sigma  be a set of clauses and C a clause. A derivation
of C from Sigma  is a finite sequence of clauses R1; : : : ; Rk = C, such
that each Ri is either in Sigma , or a resolvent of two clauses in fR1;
: : : ; RiGamma 1g. If such a derivation exists, we write Sigma  `r C.
2

Definition 7 Let Sigma  be a set of clauses and C a clause. We say there
exists a deduction of C from Sigma , written as Sigma  `d C, if C is a
tautology, or if there exists a clause D such that Sigma  `r D and D * C.
2

The next result, proved by Nienhuys-Cheng and de Wolf (1996), generalizes
Herbrand's Theorem:

Theorem 1 Let Sigma  be a set of clauses and C be a ground clause. If Sigma
j= C, then there exists a finite set Sigma g of ground instances of clauses
in Sigma , such that Sigma g j= C.

The following Subsumption Theorem gives a precise characterization of implication
between clauses in terms of resolution and subsumption. It was proved by
Lee (1967), Kowalski (1970) and reproved by Nienhuys-Cheng and de Wolf (1996).

346

Least Generalizations and Greatest Specializations Theorem 2 (Subsumption
theorem) Let Sigma  be a set of clauses and C be a clause. Then Sigma 
j= C iff Sigma  `d C.

The next lemma was first proved by Gottlob (1987). Actually, it is an immediate
corollary of the subsumption theorem:

Lemma 1 (Gottlob) Let C and D be non-tautologous clauses. If C j= D, then
C+ * D+ and CGamma  * DGamma .

Proof Since C+ * C, if C j= D, then we have C+ j= D. Since C+ cannot be resolved
with itself, it follows from the subsumption theorem that C+ * D. But then
C+ must subsume the positive literals in D, hence C+ * D+. Similarly CGamma
* DGamma . 2

An important consequence of this lemma concerns the depth of clauses, defined
as follows: Definition 8 Let t be a term. If t is a variable or constant,
then the depth of t is 1. If t = f (t1; : : : ; tn), n * 1, then the depth
of t is 1 plus the depth of the ti with largest depth. The depth of a clause
C is the depth of the term with largest depth in C. 2

For example, the term t = f (a; x) has depth 2. C = P (f (x))  P (g(f (x);
a)) has depth 3, since g(f (x); a) has depth 3. It follows from Gottlob's
lemma that if C j= D, then the depth of C is smaller than or equal to the
depth of D, for otherwise C+ cannot subsume D+ or CGamma  cannot subsume
DGamma . For instance, take D = P (x; f (x; g(y)))  P (g(a); b), which has
depth 3. Then a clause C containing a term f (x; g2(y)) (depth 4) cannot
imply D.

Definition 9 Let S and S0 be finite sets of clauses, x1; : : : ; xn all distinct
variables appearing in S, and a1; : : : ; an distinct constants not appearing
in S or S0. Then oe = fx1=a1; : : : ; xn=ang is called a Skolem substitution
for S w.r.t. S0. If S0 is empty, we just say that oe is a Skolem substitution
for S. 2

Lemma 2 Let Sigma  be a set of clauses, C be a clause, and oe be a Skolem
substitution for C w.r.t. Sigma . Then Sigma  j= C iff Sigma  j= Coe.

Proof

): Obvious. (: Suppose C is not a tautology and let oe = fx1=a1; : : : ;
xn=ang. If Sigma  j= Coe, it follows from the subsumption theorem that there
is a D such that Sigma  `r D and D * Coe. Thus there is a `, such that D`
` Coe. Note that since Sigma  `r D and none of the constants a1; : : : ;
an appears in Sigma , none of these constants appears in D. Now let `0 be
obtained by replacing in ` all occurrences of ai by xi, for every 1 ^ i ^
n. Then D`0 ` C, hence D * C. Therefore Sigma  `d C and hence Sigma  j=
C. 2

347

Nienhuys-Cheng & de Wolf 3. Least Generalizations and Greatest Specializations
In this section, we will define the concepts we need concerning least generalizations
and greatest specializations.

Definition 10 Let Gamma  be a set and R be a binary relation on Gamma .

1. R is reflexive on Gamma , if xRx for every x 2 Gamma . 2. R is transitive
on Gamma , if for every x; y; z 2 Gamma , xRy and yRz implies xRz. 3. R
is symmetric on Gamma , if for every x; y 2 Gamma , xRy implies yRx. 4.
R is anti-symmetric on Gamma , if for every x; y; z 2 Gamma , xRy and yRx
implies x = y.

If R is both reflexive and transitive on Gamma , we say R is a quasi-order
on Gamma . If R is both reflexive, transitive and anti-symmetric on Gamma
, we say R is a partial order on Gamma . If R is reflexive, transitive and
symmetric on Gamma , R is an equivalence relation on Gamma . 2

A quasi-order R on Gamma  induces an equivalence-relation , on Gamma ,
as follows: we say x; y 2 Gamma  are equivalent induced by R (denoted x
, y) if both xRy and yRx. Using this equivalence relation, a quasi-order
R on Gamma  induces a partial order R0 on the set of equivalence classes
in Gamma , defined as follows: if [x] denotes the equivalence class of x
(i.e., [x] = fy j x , yg), then [x]R0[y] iff xRy.

We first give a general definition of least generalizations and greatest
specializations for sets of clauses ordered by some quasi-order, which we
then instantiate in different ways.

Definition 11 Let Gamma  be a set of clauses, * be a quasi-order on Gamma
, S ` Gamma  be a finite set of clauses and C 2 Gamma . If C * D for every
D 2 S, then we say C is a generalization of S under *. Such a C is called
a least generalization (LG) of S under * in Gamma , if we have C0 * C for
every generalization C0 2 Gamma  of S under *.

Dually, C is a specialization of S under *, if D * C for every D 2 S. Such
a C is called a greatest specialization (GS) of S under * in Gamma , if
we have C * C0 for every specialization C0 2 Gamma  of S under *. 2

It is easy to see that if some set S has an LG or GS under * in Gamma ,
then this LG or GS will be unique up to the equivalence induced by * in Gamma
. That is, if C and D are both LG's or GS's of some set S, then we have C
, D.

The concepts defined above are instances of the mathematical concepts of
(least) upper bounds and (greatest) lower bounds. Thus we can speak of lattice-properties
of a quasi- or partially ordered set of clauses:

Definition 12 Let Gamma  be a set of clauses and * be a quasi-order on Gamma
. If for every finite subset S of Gamma , there exist both a least generalization
and a greatest specialization of S under * in Gamma , then the set Gamma
ordered by * is called a lattice. 2

It should be noted that usually in mathematics, a lattice is defined for
a partial order instead of a quasi-order. However, since in ILP we usually
have to deal with individual clauses rather than with equivalence classes
of clauses, it is convenient for us to define `lattice' for a quasi-order
here. Anyhow, if a quasi-order * is a lattice on Gamma , then the partial
order induced by * is a lattice on the set of equivalence classes in Gamma
.

348

Least Generalizations and Greatest Specializations In ILP, there are two
main instantiations for the set of clauses Gamma : either we take a clausal
language C, or we take a Horn language H. Similarly, there are three interesting
choices for the quasi-order *: we can use either * (subsumption), j= (implication),
or j=Sigma  (relative implication) for some background knowledge Sigma
. In the *-order, we will sometimes abbreviate the terms `least generalization
of S under subsumption' and `greatest specialization of S under subsumption'
to `LGS of S' and `GSS of S', respectively. Similarly, in the j=-order we
will sometimes speak of an LGI (least generalization under implication) and
a GSI. In the j=Sigma -order, we will use LGR (least generalization under
relative implication) and GSR.

These two different languages and three different quasi-orders give a total
of six combinations. For each combination, we can ask whether an LG or GS
of every finite set S exists. In the next section, we will review the answers
for subsumption given by others or by ourselves. Then we devote two sections
to least generalizations and greatest specializations under implication,
respectively. Finally, we discuss least generalizations and greatest specializations
under relative implication. The results of this survey have already been
summarized in Table 1 in the introduction.

4. Subsumption First we devote some attention to subsumption. Least generalizations
under subsumption have been discussed extensively by Plotkin (1970). The
main result in Plotkin's framework is the following:

Theorem 3 (Existence of LGS in C) Let C be a clausal language. Then for every
finite S ` C, there exists an LGS of S in C.

If S only contains Horn clauses, then it can be shown that the LGS of S is
itself also a Horn clause. Thus the question for the existence of an LGS
of every finite set S of clauses is answered positively for both clausal
languages and for Horn languages.

Plotkin established the existence of an LGS, but he seems to have ignored
the GSS in (1970, 1971b), possibly because it is a very straightforward result.
It is in fact fairly easy to show that the GSS of some finite set S of clauses
is simply the union of all clauses in S after they are standardized apart.8
We include the proof here.

Theorem 4 (Existence of GSS in C) Let C be a clausal language. Then for every
finite S ` C, there exists a GSS of S in C.

Proof Suppose S = fD1; : : : ; Dng ` C. Without loss of generality, we assume
the clauses in S are standardized apart. Let D = D1 [ : : : [ Dn, then Di
* D, for every 1 ^ i ^ n. Now let C 2 C be such that Di * C, for every 1
^ i ^ n. Then for every 1 ^ i ^ n, there is a `i such that Di`i ` C and `i
only acts on variables in Di. If we let ` = `1 [ : : : [ `n, then D` = D1`1
[ : : : [ Dn`n ` C. Hence D * C, so D is a GSS of S in C. 2

8. Note that this has nothing to do with unification. For instance, if S
= fP (a; x); P (y; b)g, then the GSS

of S in C would be P (a; x) . P (y; b). However, if we would instantiate
Gamma  in Definition 11 to the set of atoms, then the greatest specialization
of two atoms in the set of atoms should itself also be an atom. The GSS of
two atoms is then their most general unification (Reynolds, 1970). For instance,
the GSS of S would in this case be P (a; b).

349

Nienhuys-Cheng & de Wolf This establishes that a clausal language C ordered
by * is a lattice.

Proving the existence of a GSS of every finite set of Horn clauses in H requires
a little more work, but here also the result is positive. For example, D
= P (a)  P (f (a)); Q(y) is a GSS of D1 = P (x)  P (f (x)) and D2 = P (a)
Q(y). Note that D can be obtained by applying oe = fx=ag (the mgu of the
heads of D1 and D2) to D1 [ D2, the GSS of D1 and D2 in C. This idea will
be used in the following proof. Here we assume H contains an artificial bottom
element (True) ?, such that C * ? for every C 2 H, and ? 6* C for every C
6= ?. Note that ? is not subsume-equivalent with other tautologies.

Theorem 5 (Existence of GSS in H) Let H be a Horn language, with ? 2 H. Then
for every finite S ` H, there exists a GSS of S in H.

Proof Suppose S = fD1; : : :; Dng ` H. Without loss of generality we assume
the clauses in S are standardized apart, D1; : : : ; Dk are the definite
program clauses in S, and Dk+1; : : : ; Dn are the definite goals in S. If
k = 0 (i.e., if S only contains goals), then it is easy to show that D1 [
: : : [ Dn is a GSS of S in H. If k * 1 and the set fD+1 ; : : : ; D+k g
is not unifiable, then ? is a GSS of S in H. Otherwise, let oe be an mgu
of fD+1 ; : : : ; D+k g, and let D = D1oe [ : : : [ Dnoe (note that actually
Dioe = Di for k + 1 ^ i ^ n, since the clauses in S are standardized apart).
Since D has exactly one literal in its head, it is a definite program clause.
Furthermore, we have Di * D for every 1 ^ i ^ n, since Dioe ` D.

To show that D is a GSS of S in H, suppose C 2 H is some clause such that
Di * C for every 1 ^ i ^ n. For every 1 ^ i ^ n, let `i be such that Di`i
` C and `i only acts on variables in Di. Let ` = `1 [ : : : [ `n. For every
1 ^ i ^ k, D+i ` = D+i `i = C+, so ` is a unifier of fD+1 ; : : : ; D+k g.
But oe is an mgu of this set, so there is a fl such that ` = oefl. Now Dfl
= D1oefl [ : : : [ Dnoefl = D1` [ : : : [ Dn` = D1`1 [ : : : [ Dn`n ` C.
Hence D * C, so D is a GSS of S in H. See figure 1 for illustration of the
case where n = 2.

D1H

HHHjoeJJ

JJ

JJ^ `1

D2Phi  Phi Phi Phi ss oe Omega Omega 

Omega Omega  Omega Omega AE

`2

D

?

fl

C Figure 1: D is a GSS of D1 and D2

2

Thus a Horn language H ordered by * is also a lattice.

We end this section by briefly discussing Plotkin's (1971b) relative subsumption.
This is an extension of subsumption which takes background knowledge into
account. This background knowledge is rather restricted: it must be a finite
set Sigma  of ground literals. Because of its restrictiveness, we have not
included relative subsumption in Table 1. Nevertheless, we mention it here,
because least generalization under relative subsumption forms the basis of
the well-known ILP system Golem (Muggleton & Feng, 1992).

Definition 13 Let C; D be clauses, Sigma  = fL1; : : : ; Lmg be a finite
set of ground literals. Then C subsumes D relative to Sigma , denoted by
C *Sigma  D, if C * (D [ f:L1; : : : ; :Lmg). 2

350

Least Generalizations and Greatest Specializations It is easy to see that
*Sigma  is reflexive and transitive, so it imposes a quasi-order on a set
of clauses.

Suppose S = fD1; : : : ; Dng and Sigma  = fL1; : : : ; Lmg. It is easy to
see that an LGS of f(D1 [ f:L1; : : : ; :Lmg); : : :; (Dn [ f:L1; : : : ;
:Lmg)g is a least generalization of S under *Sigma , so every finite set
of clauses has a least generalization under *Sigma  in C. Moreover, if each
Di is a Horn clause and each Lj is a positive ground literal (i.e., a ground
atom), then this least generalization will itself also be a Horn clause.
Accordingly, if Sigma  is a finite set of positive ground literals, then
every finite set of Horn clauses has a least generalization under *Sigma
in H.

5. Least Generalizations under Implication Now we turn from subsumption to
the implication order. In this section we will discuss LGI's, in the next
section we handle GSS's. For Horn clauses, the LGI-question has already been
answered negatively by Muggleton and De Raedt (1994).

Let D1 = P (f 2(x))  P (x), D2 = P (f 3(x))  P (x), C1 = P (f (x))  P (x)
and C2 = P (f 2(y))  P (x). Then we have both C1 j= fD1; D2g and C2 j= fD1;
D2g. It is not very difficult to see that there are no more specific Horn
clauses than C1 and C2 that imply both D1 and D2. For C1: no resolvent of
C1 with itself implies D2 and no clause that is properly subsumed by C1 still
implies D1 and D2. For C2: every resolvent of C2 with itself is a variant
of C2, and no clause that is properly subsumed by C2 still implies D1 and
D2. Thus C1 and C2 are both "minimal" generalizations under implication of
fD1; D2g. Since C1 and C2 are not logically equivalent under implication,
there is no LGI of fD1; D2g in H.

However, the fact that there is no LGI of fD1; D2g in H does not mean that
D1 and D2 have no LGI in C, since a Horn language is a more restricted space
than a clausal language. In fact, it is shown by Muggleton and Page (1994)
that C = P (f (x)) . P (f 2(y))  P (x) is an LGI of D1 and D2 in C. For this
reason, it may be worthwhile for the LGI to consider a clausal language instead
of only Horn clauses.

In the next subsection, we show that any finite set of clauses which contains
at least one non-tautologous function-free clause, has an LGI in C. An immediate
corollary of this result is the existence of an LGI of any finite set of
function-free clauses. In our usage of the word, a `function-free' clause
may contain constants, even though constants are sometimes seen as functions
of arity 0.

Definition 14 A clause is function-free if it does not contain function symbols
of arity 1 or more. 2

Note that a clause is function-free iff it has depth 1. In case of sets of
clauses which all contain function symbols, the LGI-question remains open.

5.1 A Sufficient Condition for the Existence of an LGI In this subsection,
we will show that any finite set S of clauses containing at least one non-tautologous
function-free clause, has an LGI in C.

Definition 15 Let C be a clause, x1; : : : ; xn all distinct variables in
C, and K a set of terms. Then the instance set of C w.r.t. K is I(C; K) =
fC` j ` = fx1=t1; : : :; xn=tng;

351

Nienhuys-Cheng & de Wolf where ti 2 K, for every 1 ^ i ^ ng. If Sigma  =
fC1; : : : ; Ckg is a set of clauses, then the instance set of Sigma  w.r.t.
K is I(Sigma ; K) = I(C1; K) [ : : : [ I(Ck; K). 2

For example, if C = P (x) . Q(y) and T = fa; f (z)g, then I(C; T ) = f(P
(a) . Q(a)); (P (a) . Q(f (z))); (P (f (z)) . Q(a)); (P (f (z)) . Q(f (z)))g.

Definition 16 Let S be a finite set of clauses, and oe be a Skolem substitution
for S. Then the term set of S by oe is the set of all terms (including subterms)
occurring in Soe. 2

A term set of S by some oe is a finite set of ground terms. For instance,
the term set of D = P (f 2(x); y; z)  P (y; z; f 2(x)) by oe = fx=a; y=b;
z=cg is T = fa; f (a); f 2(a); b; cg.

Our definition of a term set corresponds to what Idestam-Almquist (1993,
1995) calls a `minimal term set'. In his definition, if oe is a Skolem substitution
for a set of clauses S = fD1; : : : ; Dng w.r.t. some other set of clauses
S0, then a term set of S is a finite set of terms which contains the minimal
term set of S by oe as a subset.

Using his notion of term set, he defines T-implication as follows: if C and
D are clauses and T is a term set of fDg by some Skolem substitution oe w.r.t.
fCg, then C T-implies D w.r.t. T if I(C; T ) j= Doe. T-implication is decidable,
weaker than logical implication and stronger than subsumption. Idestam-Almquist
(1993, 1995) gives the result that any finite set of clauses has a least
generalization under T-implication w.r.t. any term set T . However, as he
also notes, T-implication is not transitive and hence not a quasi-order.
Therefore it does not fit into our general framework here. For this reason,
we will not discuss it fully here, and for the same reason we have not included
a row for T-implication in Table 1.

Let us now begin with the proof of our result concerning the existence of
LGI's. Consider C = P (x; y; z)  P (z; x; y) and D, oe and T as above. Then
C j= D and also I(C; T ) j= Doe, since Doe is a resolvent of P (f 2(a); b;
c)  P (c; f 2(a); b) and P (c; f 2(a); b)  P (b; c; f 2(a)), which are in
I(C; T ). As we will show in the next lemma, this holds in general: if C
j= D and C is function-free, then we can restrict attention to the ground
instances of C instantiated to terms in the term set of D by some oe.

The proof of Lemma 3 uses the following idea. Consider a derivation of a
clause E from a set Sigma  of ground clauses. Suppose some of the clauses
in Sigma  contain terms not appearing in E. Then any literals containing
these terms in Sigma  must be resolved away in the derivation. This means
that if we replace all the terms in the derivation that are not in E, by
some other term t, then the result will be another derivation of E. For example,
the left of figure 2 shows a derivation of length 1 of E. The term f 2(b)
in the parent clauses does not appear in E. If we replace this term by the
constant a, the result is another derivation of E (right of the figure).

P (b)  P (f2(b)) P (f2(b))  Q(a; f(a))

@@

@R

Gamma Gamma  Gamma Psi  E = P (b)  Q(a; f(a))

P (b)  P (a) P (a)  Q(b; f(a))

@@

@R

Gamma Gamma  Gamma Psi  E = P (b)  Q(a; f(a))

Figure 2: Transforming the left derivation yields the right derivation

Lemma 3 Let C be a function-free clause, D be a clause, oe be a Skolem substitution
for D w.r.t. fCg and T be the term set of D by oe. Then C j= D iff I(C; T
) j= Doe.

352

Least Generalizations and Greatest Specializations Proof

(: Since C j= I(C; T ) and I(C; T ) j= Doe, we have C j= Doe. Now C j= D
by Lemma 2. ): If D is a tautology, then Doe is a tautology, so this case
is obvious. Suppose D is not a tautology, then Doe is not a tautology. Since
C j= Doe, it follows from Theorem 1 that there exists a finite set Sigma
of ground instances of C, such that Sigma  j= Doe. By the Subsumption Theorem,
there exists a derivation from Sigma  of a clause E, such that E * Doe.
Since Sigma  is ground, E must also be ground, so we have E ` Doe. This
implies that E only contains terms from T .

Let t be an arbitrary term in T and let Sigma 0 be obtained from Sigma
by replacing every term in clauses in Sigma  which is not in T , by t. Note
that since each clause in Sigma  is a ground instance of the function-free
clause C, every clause in Sigma 0 is also a ground instance of C. Now it
is easy to see that the same replacement of terms in the derivation of E
from Sigma  results in a derivation of E from Sigma 0: (1) each resolution
step in the derivation from Sigma  can also be carried out in the derivation
from Sigma 0, since the same terms in Sigma  are replaced by the same terms
in Sigma 0, and (2) the terms in Sigma  that are not in T (and hence are
replaced by t) do not appear in the conclusion E of the derivation.

Since there is a derivation of E from Sigma  we have Sigma 0 j= E, and
hence Sigma 0 j= Doe. Sigma 0 is a set of ground instances of C and all
terms in Sigma 0 are terms in T , so Sigma 0 ` I(C; T ). Hence I(C; T )
j= Doe. 2

Lemma 3 cannot be generalized to the case where C contains function symbols
of arity * 1, take C = P (f (x); y)  P (z; x) and D = P (f (a); a)  P (a;
f (a)) (from the example given on p. 25 of Idestam-Almquist, 1993). Then
T = fa; f (a)g is the term set of D and we have C j= D, yet it can be seen
that I(C; T ) 6j= D. The argument used in the previous lemma does not work
here, because different terms in some ground instance need not relate to
different variables. For example, in the ground instance P (f 2(a); a)  P
(a; f (a)) of C, we cannot just replace f 2(a) by some other term, for then
the resulting clause would not be an instance of C.

On the other hand, Lemma 3 can be generalized to a set of clauses instead
of a single clause. If Sigma  is a set of function-free clauses, C is an
arbitrary clause, and oe is a Skolem substitution for C w.r.t. Sigma , then
we have that Sigma  j= C iff I(Sigma ; T ) j= Coe. The proof is almost
literally the same as above.

This result implies that Sigma  j= C is reducible to an implication I(Sigma
; T ) j= Coe between ground clauses. Since, by the next lemma, implication
between ground clauses is decidable, it follows that Sigma  j= C is decidable
in case Sigma  is function-free.

Lemma 4 The problem whether Sigma  j= C, where Sigma  is a finite set of
ground clauses and C is a ground clause, is decidable.

Proof Let C = L1 . : : : . Ln and A be the set of all ground atoms occurring
in Sigma  and C. Now consider the following statements, which can be shown
equivalent. (1) Sigma  j= C. (2) Sigma  [ f:L1; : : : ; :Lng is unsatisfiable.
(3) Sigma  [ f:L1; : : : ; :Lng has no Herbrand model. (4) No subset of
A is an Herbrand model of Sigma  [ f:L1; : : : ; :Lng.

353

Nienhuys-Cheng & de Wolf Then (1),(2). (2),(3) by Theorem 4.2 of (Chang &
Lee, 1973). Since also (3),(4), we have (1),(4). (4) is decidable because
A is finite, so (1) is decidable as well. 2

Corollary 1 The problem whether Sigma  j= C, where Sigma  is a finite set
of function-free clauses and C is a clause, is decidable.

The following sequence of lemmas more or less follows the pattern of Idestam-Almquist's
(1995) Lemma 10 to Lemma 12 (similar to Lemma 3.10 to Lemma 3.12 of Idestam-Almquist,
1993). There he gives a proof of the existence of a least generalization
under T-implication of any finite set of (not necessarily function-free)
clauses. We can adjust the proof in such a way that we can use it to establish
the existence of an LGI of any finite set of clauses containing at least
one non-tautologous function-free clause.

Lemma 5 Let S be a finite set of non-tautologous clauses, V = fx1; : : :
; xmg be a set of variables and let G = fC1; C2; : : :g be a (possibly infinite)
set of generalizations of S under implication. Then the set G0 = I(C1; V
) [ I(C2; V ) [ : : : is a finite set of clauses.

Proof Let d be the maximal depth of the terms in clauses in S. It follows
from Lemma 1 that G (and hence also G0) cannot contain terms of depth greater
than d, nor predicates, functions or constants other than those in S. The
set of literals which can be constructed from predicates in S and from terms
of depth at most d consisting of functions and constants in S and variables
in V , is finite. Hence the set of clauses which can be constructed from
those literals is finite as well. G0 is a subset of this set, so G0 is a
finite set of clauses. 2

Lemma 6 Let D be a clause, C be a function-free clause such that C j= D,
T = ft1; : : : ; tng be the term set of D by oe, V = fx1; : : : ; xmg be
a set of variables and m * n. If E is an LGS of I(C; V ), then E j= D.

Proof Let fl = fx1=t1; : : : ; xn=tn; xn+1=tn; : : : ; xm=tng (it does not
matter to which terms the variables xn+1; : : : ; xm are mapped by fl, as
long as they are mapped to terms in T ). Suppose I(C; V ) = fCae1; : : :
; Caekg. Then I(C; T ) = fCae1fl; : : :; Caekflg. Let E be an LGS of I(C;
V ) (note that E must be function-free). Then for every 1 ^ i ^ k, there
are `i such that E`i ` Caei. This means that E`ifl ` Caeifl and hence E`ifl
j= Caeifl, for every 1 ^ i ^ k. Therefore E j= I(C; T ).

Since C j= D, we know from Lemma 1 that constants appearing in C must also
appear in D. This means that oe is a Skolem substitution for D w.r.t. fCg.
Then from Lemma 3 we know I(C; T ) j= Doe, hence E j= Doe. Furthermore, since
E is an LGS of I(C; V ), all constants in E also appear in C, hence all constants
in E must appear in D. Thus oe is also a Skolem substitution for D w.r.t.
fEg. Therefore E j= D by Lemma 2. 2

Consider C = P (x; y; z)  P (y; z; x) and D = Q(w). Both C and D imply the
clause E = P (x; y; z)  P (z; x; y); Q(b). Now note that C [D = P (x; y;
z)  P (y; z; x); Q(w) also implies E. This holds for clauses in general,
even in the presence of background knowledge

354

Least Generalizations and Greatest Specializations Sigma . The next lemma
is very general, but in this section we only need the special case where
C and D are function-free and Sigma  is empty. We need the general case
to prove the existence of a GSR in Section 8.

Lemma 7 Let C, D and E be clauses such that C and D are standardized apart
and let Sigma  be a set of clauses. If C j=Sigma  E and D j=Sigma  E,
then C [ D j=Sigma  E.

Proof Suppose C j=Sigma  E and D j=Sigma  E, and let M be a model of Sigma
[ fC [ Dg. Since C and D are standardized apart, the clause C [ D is equivalent
to the formula 8(C) . 8(D) (where 8(C) denotes the universally quantified
clause C). This means that M is a model of C or a model of D. Furthermore,
M is also a model of Sigma , so it follows from Sigma  [ fCg j= E or Sigma
[ fDg j= E that M is a model of E. Thus Sigma  [ fC [ Dg j= E, hence C [
D j=Sigma  E. 2

Now we can prove the existence of an LGI of any finite set S of clauses which
contains at least one non-tautologous and function-free clause. In fact we
can prove something stronger, namely that this LGI is a special LGI. This
is an LGI that is not only implied, but actually subsumed by any other generalization
of S:

Definition 17 Let C be a clausal language and S be a finite subset of C.
An LGI C of S in C is called a special LGI of S in C, if C0 * C for every
generalization C0 2 C of S under implication. 2

Note that if D is an LGI of a set containing at least one non-tautologous
function-free clause, then by Lemma 1 D is itself function-free, because
it should imply the functionfree clause(s) in S. For instance, C = P (x;
y; z)  P (y; z; x); Q(w) is an LGI of D1 = P (x; y; z)  P (y; z; x); Q(f
(a)) and D2 = P (x; y; z)  P (z; x; y); Q(b). Note that this LGI is properly
subsumed by the LGS of fD1; D2g, which is P (x; y; z)  P (x0; y0; z0); Q(w).
An LGI may sometimes be the empty clause 2, for example if S = fP (a); Q(a)g.

Theorem 6 (Existence of special LGI in C) Let C be a clausal language. If
S is a finite set of clauses from C and S contains at least one non-tautologous
function-free clause, then there exists a special LGI of S in C.

Proof Let S = fD1; : : : ; Dng be a finite set of clauses from C, such that
S contains at least one non-tautologous function-free clause. We can assume
without loss of generality that S contains no tautologies. Let oe be a Skolem
substitution for S, T = ft1; : : : ; tmg be the term set of S by oe, V =
fx1; : : : ; xmg be a set of variables and G = fC1; C2; : : :g be the set
of all generalizations of S under implication in C. Note that 2 2 G, so G
is not empty. Since each clause in G must imply the function-free clause(s)
in S, it follows from Lemma 1 that all members of G are function-free. By
Lemma 5, the set G0 = I(C1; V ) [ I(C2; V ) [ : : : is a finite set of clauses.
Since G0 is finite, the set of I(Ci; V )s is also finite. For simplicity,
let fI(C1; V ); : : :; I(Ck; V )g be the set of all distinct I(Ci; V )s.

Let Ei be an LGS of I(Ci; V ), for every 1 ^ i ^ k, such that E1; : : : ;
Ek are standardized apart. For every 1 ^ j ^ n, the term set of Dj by oe
is some set ftj1 ; : : : ; tjsg ` T , such that m * js. From Lemma 6, we
have that Ei j= Dj, for every 1 ^ i ^ k and 1 ^ j ^ n,

355

Nienhuys-Cheng & de Wolf hence Ei j= S. Now let F = E1 [ : : : [ Ek, then
we have F j= S from Lemma 7 (applying the case of Lemma 7 where Sigma  is
empty).

To prove that F is a special LGI of S, it remains to show that Cj * F , for
every j * 1. For every j * 1, there is an i (1 ^ i ^ k), such that I(Cj;
V ) = I(Ci; V ). So for this i, Ei is an LGS of I(Cj; V ). Cj is itself also
a generalization of I(Cj; V ) under subsumption, hence Cj * Ei. Then finally
Cj * F , since Ei ` F . 2

As a consequence, we also immediately have the following: Corollary 2 (Existence
of LGI for function-free clauses) Let C be a clausal language. Then for every
finite set of function-free clauses S ` C, there exists an LGI of S in C.

Proof Let S be a finite set of function-free clauses in C. If S only contains
tautologies, any tautology will be an LGI of S. Otherwise, let S0 be obtained
by deleting all tautologies from S. By the previous theorem, there is a special
LGI of S0. Clearly, this is also a special LGI of S itself in C. 2

This corollary is not trivial, since even though the number of Herbrand interpretations
of a language without function symbols is finite (due to the fact that the
number of all possible ground atoms is finite in this case), S may nevertheless
be implied by an infinite number of non-equivalent clauses. This may seem
like a paradox, since there are only finitely many categories of clauses
that can "behave differently" in a finite number of finite Herbrand interpretations.
Thus it would seem that the number of non-equivalent functionfree clauses
should also be finite. This is a misunderstanding, since logical implication
(and hence also logical equivalence) is defined in terms of all interpretations,
not just Herbrand interpretations. For instance, define D1 = P (a; a) and
P (b; b), Cn = fP (xi; xj) j i 6= j; 1 ^ i; j ^ ng. Then we have Cn j= fD1;
D2g, Cn j= Cn+1 and Cn+1 6j= Cn, for every n * 1, see (van der Laag & Nienhuys-Cheng,
1994).

Another interesting consequence of Theorem 6 concerns self-saturation (see
the introduction to this paper for the definition of self-saturation). If
C is a special LGI of some set S, then it is clear that C is self-saturated:
any clause which implies C also implies S and hence must subsume C, since
C is a special LGI of S. Now consider S = fDg, where D is some non-tautologous
function-free clause. Then a special LGI C of S will be logically equivalent
to D. Moreover, since this C will be self-saturated, it is a self-saturation
of D.

Corollary 3 If D is a non-tautologous function-free clause, then there exists
a self-saturation of D.

5.2 The LGI is Computable In the previous subsection we proved the existence
of an LGI in C of every finite set S of clauses containing at least one non-tautologous
function-free clause. In this subsection we will establish the computability
of such an LGI. The next algorithm, extracted from the proof of the previous
section, computes this LGI of S.

356

Least Generalizations and Greatest Specializations LGI-Algorithm Input: A
finite set S of clauses, containing at least one non-tautologous functionfree
clause. Output: An LGI of S in C.

1. Remove all tautologies from S (a clause is a tautology iff it contains
literals

A and :A), call the remaining set S0. 2. Let m be the number of distinct
terms (including subterms) in S0, let

V = fx1; : : : ; xmg. (Notice that this m is the same number as the number
of terms in the term set T used in the proof of Theorem 6.) 3. Let G be the
(finite) set of all clauses which can be constructed from

predicates and constants in S0 and variables in V . 4. Let fU1; : : :; Ung
be the set of all subsets of G. 5. Let Hi be an LGS of Ui, for every 1 ^
i ^ n. These Hi can be computed

by Plotkin's (1970) algorithm. 6. Remove from fH1; : : : ; Hng all clauses
which do not imply S0 (since each

Hi is function-free, by Corollary 1 this implication is decidable), and standardize
the remaining clauses fH1; : : :; Hqg apart. 7. Return the clause H = H1
[ : : : [ Hq.

The correctness of this algorithm follows from the proof of Theorem 6. First
notice that H j= S by Lemma 7. Furthermore, note that all I(Ci; V )'s mentioned
in the proof of Theorem 6, are elements of the set fU1; : : : ; Ung. This
means that for every Ei in the set fE1; : : : ; Ekg mentioned in that proof,
there is a clause Hj in fH1; : : : ; Hqg such that Ei and Hj are subsume-equivalent.
Then it follows that the LGI F = E1 [ : : : [ Ek of that proof subsumes the
clause H = H1 [ : : : [ Hq that our algorithm returns. On the other hand,
F is a special LGI, so F and H must be subsume-equivalent.

Suppose the number of distinct constants in S0 is c and the number of distinct
variables in step 2 of the algorithm is m. Furthermore, suppose there are
p distinct predicate symbols in S0, with respective arities a1; : : : ; ap.
Then the number of distinct atoms that can be formed from these constants,
variables and predicates, is l = Ppi=1(c + m)ai, and the number of distinct
literals that can be formed is 2 Delta  l. The set G of distinct clauses
which can be formed from these literals is the power set of this set of literals,
so jGj = 22Delta l. Then the set fU1; : : : ; Ung of all subsets of G contains
2jGj = 22

2Delta l members.

Thus the algorithm outlined above is not very efficient (to say the least).
A more efficient algorithm may exist, but since implication is harder than
subsumption and the computation of an LGS is already quite expensive, we
should not put our hopes too high. Nevertheless, the existence of the LGI-algorithm
does establish the theoretical point that the LGI of any finite set of clauses
containing at least one non-tautologous function-free clause is effectively
computable.

Theorem 7 (Computability of LGI) Let C be a clausal language. If S is a finite
set of clauses from C, and S contains at least one non-tautologous function-free
clause, then the LGI of S in C is computable.

357

Nienhuys-Cheng & de Wolf 6. Greatest Specializations under Implication Now
we turn from least generalizations under implication to greatest specializations.
Finding least generalizations of sets of clauses is common practice in ILP.
On the other hand, the greatest specialization, which is the dual of the
least generalization, is used hardly ever. Nevertheless, the GSI of two clauses
D1 and D2 might be useful. Suppose that we have one positive example e+ and
two negative examples eGamma 1 and eGamma 2 and suppose that D1 implies
e+ and eGamma 1 , while D2 implies e+ and eGamma 2 . Then it might very
well be that the GSI of D1 and D2 still implies e+, but does not imply either
eGamma 1 or eGamma 2 . Thus we could obtain a correct specialization by
taking the GSI of D1 and D2.

It is obvious from the previous sections that the existence of an LGI of
S is quite hard to establish. For clauses which all contain functions, the
existence of an LGI is still an open question, and even for the case where
S contains at least one non-tautologous function-free clause, the proof was
far from trivial. However, the existence of a GSI in C is much easier to
prove. In fact, a GSI of a finite set S is the same as the GSS of S, namely
the union of the clauses in S after these are standardized apart.

To see the reason for this dissymmetry, let us take a step back from the
clausal framework and consider full first-order logic for a moment. If OE1
and OE2 are two arbitrary first-order formulas, then it can be easily shown
that their least generalization is just OE1 ^ OE2: this conjunction implies
OE1 and OE1, and must be implied by any other formula which implies both
OE1 and OE2. Dually, the greatest specialization is just OE1 . OE2: this
is implied by both OE1 and OE2, and must imply any other formula that is
implied by both OE1 and OE2. See figure 3.

OE1 ^ OE2 Gamma Gamma 

Gamma Psi 

@@

@ROE 1 OE2@

@@R Gamma Gamma Gamma Psi 

OE1 . OE2

Figure 3: Least generalization and greatest specialization in first-order
logic

Now suppose OE1 and OE2 are clauses. Then why do we have a problem in finding
the LGI of OE1 and OE2? The reason for this is that OE1 ^ OE2 is not a clause.
Instead of using OE1 ^ OE2, we have to find some least clause which implies
both clauses OE1 and OE2. Such a clause appears quite hard to find sometimes.

On the other hand, in case of specialization there is no problem. Here we
can take OE1 . OE2 as GSI, since OE1 . OE2 is equivalent to a clause, if
we handle the universal quantifiers in front of a clause properly. If OE1
and OE2 are standardized apart, then the formula OE1 . OE2 is equivalent
to the clause which is the union of OE1 and OE2. This fact was used in the
proof of Lemma 7.

Suppose S = fD1; : : : ; Dng, and D01; : : : ; D0n are variants of these
clauses which are standardized apart. Then clearly D = D01 [ : : : [ D0n
is a GSI of S, since it follows from Lemma 7 that any specialization of S
under implication is implied by D. Thus we have the following result:

358

Least Generalizations and Greatest Specializations Theorem 8 (Existence of
GSI in C) Let C be a clausal language. Then for every finite S ` C, there
exists a GSI of S in C.

The previous theorem holds for clauses in general, so in particular also
for function-free clauses. Furthermore, Corollary 2 guarantees us that in
a function-free clausal language an LGI of every finite S exists. This means
that the set of function-free clauses quasi-ordered by logical implication
is in fact a lattice.

Corollary 4 (Lattice-structure of function-free clauses under j=) A function-free
clausal language ordered by implication is a lattice.

In case of a Horn language H, we cannot apply the same proof method as in
the case of a clausal language, since the union of two Horn clauses need
not be a Horn clause itself. In fact, we can show that not every finite set
of Horn clauses has a GSI in H. Here we can use the same clauses that we
used to show that sets of Horn clauses need not have an LGI in H, this time
from the perspective of specialization instead of generalization.

Again, let D1 = P (f 2(x))  P (x), D2 = P (f 3(x))  P (x), C1 = P (f (x))
P (x) and C2 = P (f 2(y))  P (x). Then C1 j= fD1; D2g and C2 j= fD1; D2g,
and there is no Horn clause D such that D j= D1, D j= D2, C1 j= D and C2
j= D. Hence there is no GSI of fC1; C2g in H.

7. Least Generalizations under Relative Implication Implication is stronger
than subsumption, but relative implication is even more powerful, because
background knowledge can be used to model all sorts of useful properties
and relations. In this section, we will discuss least generalizations under
implication relative to some given background knowledge Sigma  (LGR's).
In the next section we treat greatest specializations under relative implication.

First, we will prove the equivalence between our definition of relative implication
and a definition given by Niblett (1988, p. 133). He gives the following
definition of subsumption relative to a background knowledge Sigma  (to
distinguish it from our notion of subsumption, we will call this `N-subsumption'):9

Definition 18 Clause C N-subsumes clause D with respect to background knowledge
Sigma  if there is a substitution ` such that Sigma  ` (C` ! D) (here `!'
is the implication-connective, and ``' is an arbitrary complete proof procedure).
2

Proposition 1 Let C and D be clauses and Sigma  be a set of clauses. Then
C N-subsumes D with respect to Sigma  iff C j=Sigma  D.

Proof Consider the following six statements, which can be shown equivalent.
(1) C N-subsumes D with respect to Sigma . (2) There is a substitution `
such that Sigma  ` (C` ! D). (3) There is a substitution ` such that Sigma
j= (C` ! D).

9. Niblett attributes this definition to Plotkin, though Plotkin gives a
rather different definition of relative

subsumption in (Plotkin, 1971b), as we have seen in Section 4.

359

Nienhuys-Cheng & de Wolf (4) There is a substitution ` such that Sigma 
[ fC`g j= D. (5) Sigma  [ fCg j= D. (6) C j=Sigma  D. (1),(2) by definition.
(2),(3) by the completeness of `. (3),(4) by the Deduction Theorem. (4))(5)
is obvious and (5))(4) follows from letting ` be the empty substitution,
hence (4),(5). Finally, (5),(6) by definition. Thus these six statements
are equivalent. 2

Since j= is the special case of j=Sigma  where Sigma  is empty, our counterexamples
to the existence of LGI's or GSI's in H are also counterexamples to the existence
of LGR's or GSR's in H. In other words, the `Gamma '-entries in the second
row of Table 1 carry over to the third row.

For general clauses, the LGR-question also has a negative answer. We will
show here that even if S and Sigma  are both finite sets of function-free
clauses, an LGR of S relative to Sigma  need not exist. Let D1 = P (a),
D2 = P (b), S = fD1; D2g, and Sigma  = f(P (a).:Q(x)); (P (b). :Q(x))g.
We will show that this S has no LGR relative to Sigma  in C.

Suppose C is an LGR of S relative to Sigma . Note that if C contains the
literal P (a), then the Herbrand interpretation which makes P (a) true and
which makes all other ground literals false, would be a model of Sigma 
[ fCg but not of D2, so then we would have C 6j=Sigma  D2. Similarly, if
C contains P (b) then C 6j=Sigma  D1. Hence C cannot contain P (a) or P
(b).

Now let d be a constant not appearing in C. Let D = P (x) . Q(d), then D
j=Sigma  S. By the definition of an LGR, we should have D j=Sigma  C. Then
by the subsumption theorem, there must be a derivation from Sigma  [ fDg
of a clause E, which subsumes C. The set of all clauses which can be derived
(in 0 or more resolution-steps) from Sigma  [ fDg is Sigma  [ fDg [ f(P
(a) . P (x)); (P (b) . P (x))g. But none of these clauses subsumes C, because
C does not contain the constant d, nor the literals P (a) or P (b). Hence
D 6j=Sigma  C, contradicting the assumption that C is an LGR of S relative
to Sigma  in C.

Thus in general the LGR of S relative to Sigma  need not exist. However,
we can identify a special case in which the LGR does exist. This case might
be of practical interest. Suppose Sigma  = fL1; : : : ; Lmg is a finite
set of function-free ground literals. We can assume Sigma  does not contain
complementary literals (i.e., A and :A), for otherwise Sigma  would be inconsistent.
Also, suppose S = fD1; : : : ; Dng is a set of clauses, at least one of which
is non-tautologous and function-free. Then C j=Sigma  Di iff fCg [ Sigma
j= Di iff C j= Di . :(L1 ^ : : : ^ Lm) iff C j= Di . :L1 . : : : . :Lm. This
means that an LGI of the set of clauses f(D1 . :L1 . : : : . :Lm); : : :;
(Dn . :L1 . : : : . :Lm)g is also an LGR of S relative to Sigma . If some
Dk .:L1 .: : :.:Lm is non-tautologous and function-free, this LGI exists
and is computable. Hence in this special case, the LGR of S relative to Sigma
exists and is computable.

8. Greatest Specializations under Relative Implication Since the counterexample
to the existence of GSI's in H is also a counterexample to the existence
of GSR's in H, the only remaining question in the j=Sigma -order is the
existence of GSR's in C. The answer to this question is positive. In fact,
like the GSS and the GSI, the GSR of some finite set S in C is just the union
of the (standardized apart) clauses in S.

Theorem 9 (Existence of GSR in C) Let C be a clausal language and Sigma
` C. Then for every finite S ` C, there exists a GSR of S relative to Sigma
in C.

360

Least Generalizations and Greatest Specializations Proof Suppose S = fD1;
: : : ; Dng ` C. Without loss of generality, we assume the clauses in S are
standardized apart. Let D = D1 [ : : : [ Dn, then Di j=Sigma  D, for every
1 ^ i ^ n. Now let C 2 C be such that Di j=Sigma  C, for every 1 ^ i ^ n.
Then from Lemma 7, we have D j=Sigma  C. Hence D is a GSR of S relative
to Sigma  in C. 2

9. Conclusion In ILP, the three main generality orders are subsumption, implication,
and relative implication. The two main languages are clausal languages and
Horn languages. This gives a total of six different ordered sets. In this
paper, we have given a systematic treatment of the existence or non-existence
of least generalizations and greatest specializations in each of these six
ordered sets. The outcome of this investigation is summarized in Table 1.
The only remaining open question is the existence or non-existence of a least
generalization under implication in C for sets of clauses which all contain
function symbols.

Table 1 makes explicit the trade-off between different generality orders.
On the one hand, implication is better suited as a generality order than
subsumption, particularly in case of recursive clauses. Relative implication
is still better, because it allows us to take background knowledge into account.
On the other hand, we can see from the table that as far as the existence
of least generalizations goes, subsumption is more attractive than logical
implication, and logical implication is in turn more attractive than relative
implication. For subsumption, least generalizations always exist. For logical
implication, we can only prove the existence of least generalizations in
the presence of a function-free clause. And finally, for relative implication,
least generalizations need not even exist in a function-free language. In
practice this means that we cannot have it all. If we choose to use a very
strong generality order such as relative implication, least generalizations
only exist in very limited cases. On the other hand, if we want to guarantee
that least generalizations always exist, we are committed to the weakest
generality order: subsumption.

Acknowledgements We would like to thank Peter Idestam-Almquist and the referees
for their comments, which helped to improve the paper.

References Aha, D. W. (1992). Relating relational learning algorithms. In
Muggleton, S. (Ed.), Inductive Logic Programming, Vol. 38 of APIC Series,
pp. 233-254. Academic Press.

Chang, C.-L., & Lee, R. C.-T. (1973). Symbolic Logic and Mechanical Theorem
Proving.

Academic Press, San Diego.

De Raedt, L., & Bruynooghe, M. (1993). A theory of clausal discovery. In
Proceedings

of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93),
pp. 1058-1063. Morgan Kaufmann.

361

Nienhuys-Cheng & de Wolf Gottlob, G. (1987). Subsumption and implication.
Information Processing Letters, 24 (2),

109-111.

Idestam-Almquist, P. (1993). Generalization of Clauses. Ph.D. thesis, Stockholm
University. Idestam-Almquist, P. (1995). Generalization of clauses under
implication. Journal of Artificial Intelligence Research, 3, 467-489.

Kowalski, R. A. (1970). The case for using equality axioms in automatic demonstration.

In Proceedings of the Symposium on Automatic Demonstration, Vol. 125 of Lecture
Notes in Mathematics, pp. 112-127. Springer-Verlag.

Lavra^c, N., & D^zeroski, S. (1994). Inductive Logic Programming: Techniques
and Applications. Ellis Horwood.

Lee, R. C.-T. (1967). A Completeness Theorem and a Computer Program for Finding
Theorems Derivable from Given Axioms. Ph.D. thesis, University of California,
Berkeley.

Lloyd, J. W. (1987). Foundations of Logic Programming (Second edition). Springer-Verlag,

Berlin.

Marcinkowski, J., & Pacholski, L. (1992). Undecidability of the horn-clause
implication

problem. In Proceedings of the 33rd Annual IEEE Symposium on Foundations
of Computer Science, pp. 354-362 Pittsburg.

Morik, K., Wrobel, S., Kietz, J.-U., & Emde, W. (1993). Knowledge Acquisition
and Machine Learning: Theory, Methods and Applications. Academic Press, London.

Muggleton, S. (1992). Inductive logic programming. In Muggleton, S. (Ed.),
Inductive Logic

Programming, Vol. 38 of APIC Series, pp. 3-27. Academic Press.

Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory
and methods.

Journal of Logic Programming, 19, 629-679.

Muggleton, S., & Feng, C. (1992). Efficient induction of logic programs.
In Muggleton, S.

(Ed.), Inductive Logic Programming, Vol. 38 of APIC Series, pp. 281-298.
Academic Press.

Muggleton, S., & Page, C. D. (1994). Self-saturation of definite clauses.
In Wrobel, S.

(Ed.), Proceedings of the 4th International Workshop on Inductive Logic Programming
(ILP-94), Vol. 237 of GMD-Studien, pp. 161-174 Bad Honnef/Bonn. Gesellschaft
f"ur Mathematik und Datenverarbeitung.

Niblett, T. (1988). A study of generalisation in logic programs. In Sleeman,
D. (Ed.),

Proceedings of the 3rd European Working Sessions on Learning (EWSL-88), pp.
131- 138.

Nienhuys-Cheng, S.-H., & de Wolf, R. (1996). The subsumption theorem in inductive
logic

programming: Facts and fallacies. In De Raedt, L. (Ed.), Advances in Inductive
Logic Programming, pp. 265-276 Amsterdam. IOS Press.

362

Least Generalizations and Greatest Specializations Plotkin, G. D. (1970).
A note on inductive generalization. Machine Intelligence, 5, 153-163. Plotkin,
G. D. (1971a). Automatic Methods of Inductive Inference. Ph.D. thesis, Edinburgh

University.

Plotkin, G. D. (1971b). A further note on inductive generalization. Machine
Intelligence,

6, 101-124.

Quinlan, J. R., & Cameron-Jones, R. M. (1993). Foil: A midterm report. In
Brazdil, P.

(Ed.), Proceedings of the 6th European Conference on Machine Learning (ECML-93),
Vol. 667 of Lecture Notes in Artificial Intelligence, pp. 3-20. Springer-Verlag.

Reynolds, J. C. (1970). Transformational systems and the algebraic structure
of atomic

formulas. Machine Intelligence, 5, 135-151.

Rouveirol, C. (1992). Extensions of inversion of resolution applied to theory
completion.

In Muggleton, S. (Ed.), Inductive Logic Programming, Vol. 38 of APIC Series,
pp. 63-92. Academic Press.

Shapiro, E. Y. (1981). Inductive inference of theories from facts. Research
report 192, Yale

University.

van der Laag, P. R. J., & Nienhuys-Cheng, S.-H. (1994). Existence and nonexistence
of

complete refinement operators. In Bergadano, F., & De Raedt, L. (Eds.), Proceedings
of the 7th European Conference on Machine Learning (ECML-94), Vol. 784 of
Lecture Notes in Artificial Intelligence, pp. 307-322. Springer-Verlag.

363