S. H. Nienhuys-Cheng

Volume 4, 1996

**Links to Full Text:**

The main operations in Inductive Logic Programming (ILP) are generalization and specialization, which only make sense in a generality order. In ILP, the three most important generality orders are subsumption, implication and implication relative to background knowledge. The two languages used most often are languages of clauses and languages of only Horn clauses. This gives a total of six different ordered languages. In this paper, we give a systematic treatment of the existence or non-existence of least generalizations and greatest specializations of finite sets of clauses in each of these six ordered sets. We survey results already obtained by others and also contribute some answers of our own. Our main new results are, firstly, the existence of a computable least generalization under implication of every finite set of clauses containing at least one non-tautologous function-free clause (among other, not necessarily function-free clauses). Secondly, we show that such a least generalization need not exist under relative implication, not even if both the set that is to be generalized and the background knowledge are function-free. Thirdly, we give a complete discussion of existence and non-existence of greatest specializations in each of the six ordered languages.

Journal of Artificial Intelligence Research 4 (1996) 341-363 Submitted 11/95; published 5/96 Least Generalizations and Greatest Specializations of Sets of Clauses Shan-Hwei Nienhuys-Cheng cheng@cs.few.eur.nl Ronald de Wolf bidewolf@cs.few.eur.nl Erasmus University of Rotterdam Department of Computer Science, H4-19 P.O. Box 1738, 3000 DR Rotterdam, the Netherlands Abstract The main operations in Inductive Logic Programming (ILP) are generalization and specialization, which only make sense in a generality order. In ILP, the three most important generality orders are subsumption, implication and implication relative to background knowledge. The two languages used most often are languages of clauses and languages of only Horn clauses. This gives a total of six different ordered languages. In this paper, we give a systematic treatment of the existence or non-existence of least generalizations and greatest specializations of finite sets of clauses in each of these six ordered sets. We survey results already obtained by others and also contribute some answers of our own. Our main new results are, firstly, the existence of a computable least generalization under implication of every finite set of clauses containing at least one non-tautologous function-free clause (among other, not necessarily function-free clauses). Secondly, we show that such a least generalization need not exist under relative implication, not even if both the set that is to be generalized and the background knowledge are function-free. Thirdly, we give a complete discussion of existence and non-existence of greatest specializations in each of the six ordered languages. 1. Introduction Inductive Logic Programming (ILP) is a subfield of Logic Programming and Machine Learning that tries to induce clausal theories from given sets of positive and negative examples. An inductively inferred theory should imply all of the positive and none of the negative examples. For instance, suppose we are given P (0), P (s2(0)), P (s4(0)), P (s6(0)) as positive examples and P (s(0)); P (s3(0)); P (s5(0)) as negative examples.1 Then the set Sigma = fP (0); (P (s2(x)) P (x))g is a solution: it implies all positive and no negative examples. Note that this set can be seen as a description of the even integers, learned from these examples. Thus induction of clausal theories is a form of learning from examples. For a more extensive introduction to ILP, we refer to (Lavra^c & D^zeroski, 1994; Muggleton & De Raedt, 1994). Learning from examples means modifying a theory to bring it more in accordance with the examples. The two main operations in ILP for modification of a theory are generalization and specialization. Generalization strengthens a theory that is too weak, while specialization weakens a theory that is too strong. These operations only make sense within a generality order. This is a relation stating when some clause is more general than some other clause. 1. Here s 2(0) abbreviates s(s(0)), s3(0) abbreviates s(s(s(0))), etc. cfl1996 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved. Nienhuys-Cheng & de Wolf The three most important generality orders used in ILP are subsumption (also called `-subsumption), logical implication and implication relative to background knowledge.2 In the subsumption order, we say that clause C is more general than D--or, equivalently, D is more specific than C--in case C subsumes D. In the implication order C is more general than D if C logically implies D. Finally, C is more general than D relative to background knowledge Sigma (Sigma is a set of clauses), if fCg [ Sigma logically implies D. Of these three orders, subsumption is the most tractable. In particular, subsumption is decidable, whereas logical implication is not decidable, not even for Horn clauses, as established by Marcinkowski and Pacholski (1992). In turn, relative implication is harder than implication: both are undecidable, but proof procedures for implication need to take only derivations from fCg into account, whereas a proof procedure for relative implication should check all derivations from fCg [ Sigma . Within a generality order, there are two approaches to generalization or specialization. The first approach generalizes or specializes individual clauses. We do not discuss this in any detail in this paper, and merely mention it for completeness' sake. This approach can be traced back to Reynolds' (1970) concept of a cover. It was implemented for example by Shapiro (1981) in the subsumption order in the form of refinement operators. However, a clause C which implies another clause D need not subsume D. For instance, take C = P (f (x)) P (x) and D = P (f 2(x)) P (x). Then C does not subsume D, but C j= D. Thus subsumption is weaker than implication. A further sign of this weakness is the fact that tautologies need not be subsume-equivalent, even though they are logically equivalent. The second approach generalizes or specializes sets of clauses. This is the approach we will be concerned with in this paper. Here the concept of a least generalization3 is important. The use of such least generalizations allows us to generalize cautiously, avoiding over-generalization. Least generalizations of sets of clauses were first discussed by Plotkin (1970, 1971a, 1971b). He proved that any finite set S of clauses has a least generalization under subsumption (LGS). This is a clause which subsumes all clauses in S and which is subsumed by all other clauses that also subsume all clauses in S. Positive examples can be generalized by taking their LGS.4 Of course, we need not take the LGS of all positive examples, which would yield a theory consisting of only one clause. Instead, we might divide the positive examples into subsets, and take a separate LGS of each subset. That way we obtain a theory containing more than one clause. For this second approach, subsumption is again not fully satisfactory. For example, if S consists of the clauses D1 = P (f 2(a)) P (a) and D2 = P (f (b)) P (b), then the LGS of S is P (f (y)) P (x). The clause P (f (x)) P (x), which seems more appropriate as a least generalization of S, cannot be found by Plotkin's approach, because it does not subsume D1. As this example also shows, the subsumption order is particularly unsatisfactory when we consider recursive clauses: clauses which can be resolved with themselves. 2. There is also relative subsumption (Plotkin, 1971b), which will be briefly touched in Section 4. 3. Least generalizations are also often called least general generalizations, for instance by Plotkin (1971b), Muggleton and Page (1994), Idestam-Almquist (1993, 1995), Niblett (1988), though not by Plotkin (1970), but we feel this `general' is redundant. 4. There is also a relation between least generalization under subsumption and inverse resolution (Muggleton, 1992). 342 Least Generalizations and Greatest Specializations Because of the weakness of subsumption, it is desirable to make the step from the subsumption order to the more powerful implication order. Accordingly, it is important to find out whether Plotkin's positive result on the existence of LGS's holds for implication as well. So far, the question whether any finite set of clauses has a least generalization under implication (LGI) has only been partly answered. For instance, Idestam-Almquist (1993, 1995) studies least generalizations under T-implication as an approximation to LGI's. Muggleton and Page (1994) investigate self-saturated clauses. A clause is self-saturated if it is subsumed by any clause which implies it. A clause D is a self-saturation of C if C and D are logically equivalent and D is self-saturated. As Muggleton and Page (1994) state, if two clauses C1 and C2 have self-saturations D1 and D2, then an LGS of D1 and D2 is also an LGI of C1 and C2. This positively answers our question concerning the existence of LGI's in the case of clauses which have a self-saturation. However, Muggleton and Page also show that there exist clauses which have no self-saturation. Hence the concept of self-saturation cannot solve our question in general. Use of the third generality order, relative implication, is even more desirable than the use of "plain" implication. Relative implication allows us to take background knowledge into account, which can be used to formalize many useful properties and relations of the domain of application. For this reason, least generalizations under implication relative to background knowledge also deserve attention. Apart from the least generalization, there is also its dual: the greatest specialization. Greatest specializations have been accorded much less attention in ILP than least generalizations, but the concept of a greatest specialization may nevertheless be useful (see the beginning of Section 6). In this paper, we give a systematic treatment of the existence and non-existence of least generalizations and greatest specializations, applied to each of these three generality orders. Apart from distinguishing between these three orders, we also distinguish between languages of general clauses and more restricted languages of Horn clauses. Though most researchers in ILP restrict attention to Horn clauses, general clauses are also sometimes used (Plotkin, 1970, 1971b; Shapiro, 1981; De Raedt & Bruynooghe, 1993; Idestam-Almquist, 1993, 1995). Moreover, many researchers who do not use general clauses actually allow negative literals to appear in the body of a clause. That is, they use clauses of the form A L1; : : : ; Ln, where A is an atom and each Li is a literal. These are called program clauses (Lloyd, 1987). Program clauses are in fact logically equivalent to general clauses. For instance, the program clause P (x) Q(x); :R(x) is equivalent to the non-Horn clause P (x) . :Q(x) . R(x). For these two reasons we consider not only languages of Horn clauses, but also pay attention to languages of general clauses. The combination of three generality orders and two different possible languages of clauses gives a total of six different ordered languages. For each of these, we can ask whether least generalizations (LG's) and greatest specializations (GS's) always exist. We survey results already obtained by others and also contribute some answers of our own. For the sake of clarity, we will summarize the results of our survey right at the outset. In the following table `+' signifies a positive answer, and `Gamma ' means a negative answer. 343 Nienhuys-Cheng & de Wolf Horn clauses General clauses Quasi-order LG GS LG GS Subsumption (*) + + + + Implication (j=) Gamma Gamma + for function-free + Relative implication (j=Sigma ) Gamma Gamma Gamma + Table 1: Existence of LG's and GS's Our own contributions to this table are threefold. First and foremost, we prove that if S is a finite set of clauses containing at least one non-tautologous function-free clause5 (apart from this non-tautologous function-free clause, S may contain an arbitrary finite number of other clauses, including clauses which contain functions), then there exists a computable LGI of S. This result is on the one hand based on the Subsumption Theorem for resolution (Lee, 1967; Kowalski, 1970; Nienhuys-Cheng & de Wolf, 1996), which allows us to restrict attention to finite sets of ground instances of clauses, and on the other hand on a modification of some proofs concerning T-implication which can be found in (Idestam-Almquist, 1993, 1995). An immediate corollary of this result is the existence and computability of an LGI of any finite set of function-free clauses. As far as we know, both our general LGI-result and this particular corollary are new results. Niblett (1988, p. 135) claims that "it is simple to show that there are lggs if the language is restricted to a fixed set of constant symbols since all Herbrand interpretations are finite." Yet even for this special case of our general result, it appears that no proof has been published. Initially, we found a direct proof of this case, but this was not really any simpler than the proof of the more general result that we give in this paper. Niblett's idea that the proof is simple may be due to some confusion about the relation between Herbrand models and logical implication (which is defined in terms of all models, not just Herbrand models). We will describe this at the end of Subsection 5.1. Or perhaps one might think that the decidability of implication for function-free clauses immediately implies the existence of an LGI. But in fact, decidability is not a sufficient condition for the existence of a least generalization. For example, it is decidable whether one function-free clause C implies another function-free clause D relative to function-free background knowledge. Yet least generalizations relative to function-free background knowledge do not always exist, as we will show in Section 7. Our LGI-result does not solve the general question of the existence of LGI's, but it does provide a positive answer for a large class of cases: the presence of one non-tautologous function-free clause in a finite S already guarantees the existence and computability of an LGI of S, no matter what other clauses S contains.6 Because of the prominence of functionfree clauses in ILP, this case may be of great practical signifcance. Often, particularly in implementations of ILP-systems, the language is required to be function-free, or function 5. A clause which only contains constants and variables as terms. 6. Note that even for function-free clauses, the subsumption order is still not enough. Consider D1 = P (x; y; z) P (y; z; x) and D2 = P (x; y; z) P (z; x; y) (this example is adapted from IdestamAlmquist). D1 is a resolvent of D2 and D2 and D2 is a resolvent of D1 and D1. Hence D1 and D2 are logically equivalent. This means that D1 is an LGI of the set fD1; D2g. However, the LGS of these two clauses is P (x; y; z) P (u; v; w), which is clearly an over-generalization. 344 Least Generalizations and Greatest Specializations symbols are removed from clauses and put in the background knowledge by techniques such as flattening (Rouveirol, 1992). Well-known ILP-systems such as Foil (Quinlan & Cameron-Jones, 1993), Linus (Lavra^c & D^zeroski, 1994) and Mobal (Morik, Wrobel, Kietz, & Emde, 1993) all use only function-free clauses. More than one half of the ILPsystems surveyed by Aha (1992) is restricted to function-free clauses. Function-free clauses are also sufficient for most applications concerning databases. Our second contribution shows that a set S need not have a least generalization relative to some background knowledge Sigma , not even when S and Sigma are both function-free. Thirdly, we contribute a complete discussion of existence and non-existence of greatest specializations in each of the six ordered languages. In particular, we show that any finite set of clauses has a greatest specialization under implication. Combining this with the corollary of our result on LGI's, it follows that a function-free clausal language is a lattice. 2. Preliminaries In this section we will define some of the concepts we need. For the definitions of `model', `tautology', `substitution', etc., we refer to standard works such as (Chang & Lee, 1973; Lloyd, 1987). A positive literal is an atom, a negative literal is the negation of an atom. A clause is a finite set of literals, which is treated as the universally quantified disjunction of those literals. A definite program clause is a clause with one positive and zero or more negative literals and a definite goal is a clause without positive literals. A Horn clause is either a definite program clause or a definite goal. If C is a clause, we use C+ to denote the positive literals in C, and CGamma to denote the negative literals in C. The empty clause, which represents a contradiction, is denoted by 2. Definition 1 Let A be an alphabet of the first-order logic. Then the clausal language C by A is the set of all clauses which can be constructed from the symbols in A. The Horn language H by A is the set of all Horn clauses which can be constructed from the symbols in A. 2 In this paper, we just presuppose some arbitrary alphabet A, and consider the clausal language C and Horn language H based on this A. We will now define three increasingly strong generality orders on clauses: subsumption, implication and relative implication. Definition 2 Let C and D be clauses and Sigma be a set of clauses. We say that C subsumes D, denoted as C * D, if there exists a substitution ` such that C` ` D.7 C and D are subsume-equivalent if C * D and D * C. Sigma (logically) implies C, denoted as Sigma j= C, if every model of Sigma is also a model of C. C (logically) implies D, denoted as C j= D, if fCg j= D. C and D are (logically) equivalent if C j= D and D j= C. C implies D relative to Sigma , denoted as C j=Sigma D, if Sigma [ fCg j= D. C and D are equivalent relative to Sigma if C j=Sigma D and D j=Sigma C. 2 7. Right from the very first applications of subsumption in ILP, there has been some controversy about the symbol used for subsumption: Plotkin (1970) used `^', while Reynolds (1970) used `*'. We use `*' here, similar to Reynolds' `*', because we feel it serves the intuition to view C as somehow "bigger" or "stronger" than D, if C * D holds. 345 Nienhuys-Cheng & de Wolf If C does not subsume D, we write C 6* D. Similarly, we use the notation C 6j= D and C 6j=Sigma D. If C * D, then C j= D. The converse does not hold, as the examples in the introduction showed. Similarly, if C j= D, then C j=Sigma D, and again the converse need not hold. Consider C = P (a) . :P (b), D = P (a), and Sigma = fP (b)g: then C j=Sigma D, but C 6j= D. We now proceed to define a proof procedure for logical implication between clauses, using resolution and subsumption. Definition 3 If two clauses have no variables in common, then they are said to be standardized apart. Let C1 = L1 . : : : . Li . : : : . Lm and C2 = M1 . : : : . Mj . : : : . Mn be two clauses which are standardized apart. If the substitution ` is a most general unifier (mgu) of the set fLi; :Mjg, then the clause ((C1 Gamma Li) [ (C2 Gamma Mj ))` is called a binary resolvent of C1 and C2. The literals Li and Mj are said to be the literals resolved upon. 2 If C1 and C2 are not standardized apart, we can take a variant C02 of C2, such that C1 and C02 are standardized apart. For simplicity, a binary resolvent of C1 and C02 is also called a binary resolvent of C1 and C2 itself. Definition 4 Let C be a clause and ` an mgu of fL1; : : : ; Lng ` C (n * 1). Then the clause C` is called a factor of C. 2 Note that any non-empty clause C is a factor of itself, using the empty substitution " as an mgu of a single literal in C. Definition 5 A resolvent C of clauses C1 and C2 is a binary resolvent of a factor of C1 and a factor of C2, where the literals resolved upon are the literals unified in the respective factors. C1 and C2 are the parent clauses of C. 2 Definition 6 Let Sigma be a set of clauses and C a clause. A derivation of C from Sigma is a finite sequence of clauses R1; : : : ; Rk = C, such that each Ri is either in Sigma , or a resolvent of two clauses in fR1; : : : ; RiGamma 1g. If such a derivation exists, we write Sigma `r C. 2 Definition 7 Let Sigma be a set of clauses and C a clause. We say there exists a deduction of C from Sigma , written as Sigma `d C, if C is a tautology, or if there exists a clause D such that Sigma `r D and D * C. 2 The next result, proved by Nienhuys-Cheng and de Wolf (1996), generalizes Herbrand's Theorem: Theorem 1 Let Sigma be a set of clauses and C be a ground clause. If Sigma j= C, then there exists a finite set Sigma g of ground instances of clauses in Sigma , such that Sigma g j= C. The following Subsumption Theorem gives a precise characterization of implication between clauses in terms of resolution and subsumption. It was proved by Lee (1967), Kowalski (1970) and reproved by Nienhuys-Cheng and de Wolf (1996). 346 Least Generalizations and Greatest Specializations Theorem 2 (Subsumption theorem) Let Sigma be a set of clauses and C be a clause. Then Sigma j= C iff Sigma `d C. The next lemma was first proved by Gottlob (1987). Actually, it is an immediate corollary of the subsumption theorem: Lemma 1 (Gottlob) Let C and D be non-tautologous clauses. If C j= D, then C+ * D+ and CGamma * DGamma . Proof Since C+ * C, if C j= D, then we have C+ j= D. Since C+ cannot be resolved with itself, it follows from the subsumption theorem that C+ * D. But then C+ must subsume the positive literals in D, hence C+ * D+. Similarly CGamma * DGamma . 2 An important consequence of this lemma concerns the depth of clauses, defined as follows: Definition 8 Let t be a term. If t is a variable or constant, then the depth of t is 1. If t = f (t1; : : : ; tn), n * 1, then the depth of t is 1 plus the depth of the ti with largest depth. The depth of a clause C is the depth of the term with largest depth in C. 2 For example, the term t = f (a; x) has depth 2. C = P (f (x)) P (g(f (x); a)) has depth 3, since g(f (x); a) has depth 3. It follows from Gottlob's lemma that if C j= D, then the depth of C is smaller than or equal to the depth of D, for otherwise C+ cannot subsume D+ or CGamma cannot subsume DGamma . For instance, take D = P (x; f (x; g(y))) P (g(a); b), which has depth 3. Then a clause C containing a term f (x; g2(y)) (depth 4) cannot imply D. Definition 9 Let S and S0 be finite sets of clauses, x1; : : : ; xn all distinct variables appearing in S, and a1; : : : ; an distinct constants not appearing in S or S0. Then oe = fx1=a1; : : : ; xn=ang is called a Skolem substitution for S w.r.t. S0. If S0 is empty, we just say that oe is a Skolem substitution for S. 2 Lemma 2 Let Sigma be a set of clauses, C be a clause, and oe be a Skolem substitution for C w.r.t. Sigma . Then Sigma j= C iff Sigma j= Coe. Proof ): Obvious. (: Suppose C is not a tautology and let oe = fx1=a1; : : : ; xn=ang. If Sigma j= Coe, it follows from the subsumption theorem that there is a D such that Sigma `r D and D * Coe. Thus there is a `, such that D` ` Coe. Note that since Sigma `r D and none of the constants a1; : : : ; an appears in Sigma , none of these constants appears in D. Now let `0 be obtained by replacing in ` all occurrences of ai by xi, for every 1 ^ i ^ n. Then D`0 ` C, hence D * C. Therefore Sigma `d C and hence Sigma j= C. 2 347 Nienhuys-Cheng & de Wolf 3. Least Generalizations and Greatest Specializations In this section, we will define the concepts we need concerning least generalizations and greatest specializations. Definition 10 Let Gamma be a set and R be a binary relation on Gamma . 1. R is reflexive on Gamma , if xRx for every x 2 Gamma . 2. R is transitive on Gamma , if for every x; y; z 2 Gamma , xRy and yRz implies xRz. 3. R is symmetric on Gamma , if for every x; y 2 Gamma , xRy implies yRx. 4. R is anti-symmetric on Gamma , if for every x; y; z 2 Gamma , xRy and yRx implies x = y. If R is both reflexive and transitive on Gamma , we say R is a quasi-order on Gamma . If R is both reflexive, transitive and anti-symmetric on Gamma , we say R is a partial order on Gamma . If R is reflexive, transitive and symmetric on Gamma , R is an equivalence relation on Gamma . 2 A quasi-order R on Gamma induces an equivalence-relation , on Gamma , as follows: we say x; y 2 Gamma are equivalent induced by R (denoted x , y) if both xRy and yRx. Using this equivalence relation, a quasi-order R on Gamma induces a partial order R0 on the set of equivalence classes in Gamma , defined as follows: if [x] denotes the equivalence class of x (i.e., [x] = fy j x , yg), then [x]R0[y] iff xRy. We first give a general definition of least generalizations and greatest specializations for sets of clauses ordered by some quasi-order, which we then instantiate in different ways. Definition 11 Let Gamma be a set of clauses, * be a quasi-order on Gamma , S ` Gamma be a finite set of clauses and C 2 Gamma . If C * D for every D 2 S, then we say C is a generalization of S under *. Such a C is called a least generalization (LG) of S under * in Gamma , if we have C0 * C for every generalization C0 2 Gamma of S under *. Dually, C is a specialization of S under *, if D * C for every D 2 S. Such a C is called a greatest specialization (GS) of S under * in Gamma , if we have C * C0 for every specialization C0 2 Gamma of S under *. 2 It is easy to see that if some set S has an LG or GS under * in Gamma , then this LG or GS will be unique up to the equivalence induced by * in Gamma . That is, if C and D are both LG's or GS's of some set S, then we have C , D. The concepts defined above are instances of the mathematical concepts of (least) upper bounds and (greatest) lower bounds. Thus we can speak of lattice-properties of a quasi- or partially ordered set of clauses: Definition 12 Let Gamma be a set of clauses and * be a quasi-order on Gamma . If for every finite subset S of Gamma , there exist both a least generalization and a greatest specialization of S under * in Gamma , then the set Gamma ordered by * is called a lattice. 2 It should be noted that usually in mathematics, a lattice is defined for a partial order instead of a quasi-order. However, since in ILP we usually have to deal with individual clauses rather than with equivalence classes of clauses, it is convenient for us to define `lattice' for a quasi-order here. Anyhow, if a quasi-order * is a lattice on Gamma , then the partial order induced by * is a lattice on the set of equivalence classes in Gamma . 348 Least Generalizations and Greatest Specializations In ILP, there are two main instantiations for the set of clauses Gamma : either we take a clausal language C, or we take a Horn language H. Similarly, there are three interesting choices for the quasi-order *: we can use either * (subsumption), j= (implication), or j=Sigma (relative implication) for some background knowledge Sigma . In the *-order, we will sometimes abbreviate the terms `least generalization of S under subsumption' and `greatest specialization of S under subsumption' to `LGS of S' and `GSS of S', respectively. Similarly, in the j=-order we will sometimes speak of an LGI (least generalization under implication) and a GSI. In the j=Sigma -order, we will use LGR (least generalization under relative implication) and GSR. These two different languages and three different quasi-orders give a total of six combinations. For each combination, we can ask whether an LG or GS of every finite set S exists. In the next section, we will review the answers for subsumption given by others or by ourselves. Then we devote two sections to least generalizations and greatest specializations under implication, respectively. Finally, we discuss least generalizations and greatest specializations under relative implication. The results of this survey have already been summarized in Table 1 in the introduction. 4. Subsumption First we devote some attention to subsumption. Least generalizations under subsumption have been discussed extensively by Plotkin (1970). The main result in Plotkin's framework is the following: Theorem 3 (Existence of LGS in C) Let C be a clausal language. Then for every finite S ` C, there exists an LGS of S in C. If S only contains Horn clauses, then it can be shown that the LGS of S is itself also a Horn clause. Thus the question for the existence of an LGS of every finite set S of clauses is answered positively for both clausal languages and for Horn languages. Plotkin established the existence of an LGS, but he seems to have ignored the GSS in (1970, 1971b), possibly because it is a very straightforward result. It is in fact fairly easy to show that the GSS of some finite set S of clauses is simply the union of all clauses in S after they are standardized apart.8 We include the proof here. Theorem 4 (Existence of GSS in C) Let C be a clausal language. Then for every finite S ` C, there exists a GSS of S in C. Proof Suppose S = fD1; : : : ; Dng ` C. Without loss of generality, we assume the clauses in S are standardized apart. Let D = D1 [ : : : [ Dn, then Di * D, for every 1 ^ i ^ n. Now let C 2 C be such that Di * C, for every 1 ^ i ^ n. Then for every 1 ^ i ^ n, there is a `i such that Di`i ` C and `i only acts on variables in Di. If we let ` = `1 [ : : : [ `n, then D` = D1`1 [ : : : [ Dn`n ` C. Hence D * C, so D is a GSS of S in C. 2 8. Note that this has nothing to do with unification. For instance, if S = fP (a; x); P (y; b)g, then the GSS of S in C would be P (a; x) . P (y; b). However, if we would instantiate Gamma in Definition 11 to the set of atoms, then the greatest specialization of two atoms in the set of atoms should itself also be an atom. The GSS of two atoms is then their most general unification (Reynolds, 1970). For instance, the GSS of S would in this case be P (a; b). 349 Nienhuys-Cheng & de Wolf This establishes that a clausal language C ordered by * is a lattice. Proving the existence of a GSS of every finite set of Horn clauses in H requires a little more work, but here also the result is positive. For example, D = P (a) P (f (a)); Q(y) is a GSS of D1 = P (x) P (f (x)) and D2 = P (a) Q(y). Note that D can be obtained by applying oe = fx=ag (the mgu of the heads of D1 and D2) to D1 [ D2, the GSS of D1 and D2 in C. This idea will be used in the following proof. Here we assume H contains an artificial bottom element (True) ?, such that C * ? for every C 2 H, and ? 6* C for every C 6= ?. Note that ? is not subsume-equivalent with other tautologies. Theorem 5 (Existence of GSS in H) Let H be a Horn language, with ? 2 H. Then for every finite S ` H, there exists a GSS of S in H. Proof Suppose S = fD1; : : :; Dng ` H. Without loss of generality we assume the clauses in S are standardized apart, D1; : : : ; Dk are the definite program clauses in S, and Dk+1; : : : ; Dn are the definite goals in S. If k = 0 (i.e., if S only contains goals), then it is easy to show that D1 [ : : : [ Dn is a GSS of S in H. If k * 1 and the set fD+1 ; : : : ; D+k g is not unifiable, then ? is a GSS of S in H. Otherwise, let oe be an mgu of fD+1 ; : : : ; D+k g, and let D = D1oe [ : : : [ Dnoe (note that actually Dioe = Di for k + 1 ^ i ^ n, since the clauses in S are standardized apart). Since D has exactly one literal in its head, it is a definite program clause. Furthermore, we have Di * D for every 1 ^ i ^ n, since Dioe ` D. To show that D is a GSS of S in H, suppose C 2 H is some clause such that Di * C for every 1 ^ i ^ n. For every 1 ^ i ^ n, let `i be such that Di`i ` C and `i only acts on variables in Di. Let ` = `1 [ : : : [ `n. For every 1 ^ i ^ k, D+i ` = D+i `i = C+, so ` is a unifier of fD+1 ; : : : ; D+k g. But oe is an mgu of this set, so there is a fl such that ` = oefl. Now Dfl = D1oefl [ : : : [ Dnoefl = D1` [ : : : [ Dn` = D1`1 [ : : : [ Dn`n ` C. Hence D * C, so D is a GSS of S in H. See figure 1 for illustration of the case where n = 2. D1H HHHjoeJJ JJ JJ^ `1 D2Phi Phi Phi Phi ss oe Omega Omega Omega Omega Omega Omega AE `2 D ? fl C Figure 1: D is a GSS of D1 and D2 2 Thus a Horn language H ordered by * is also a lattice. We end this section by briefly discussing Plotkin's (1971b) relative subsumption. This is an extension of subsumption which takes background knowledge into account. This background knowledge is rather restricted: it must be a finite set Sigma of ground literals. Because of its restrictiveness, we have not included relative subsumption in Table 1. Nevertheless, we mention it here, because least generalization under relative subsumption forms the basis of the well-known ILP system Golem (Muggleton & Feng, 1992). Definition 13 Let C; D be clauses, Sigma = fL1; : : : ; Lmg be a finite set of ground literals. Then C subsumes D relative to Sigma , denoted by C *Sigma D, if C * (D [ f:L1; : : : ; :Lmg). 2 350 Least Generalizations and Greatest Specializations It is easy to see that *Sigma is reflexive and transitive, so it imposes a quasi-order on a set of clauses. Suppose S = fD1; : : : ; Dng and Sigma = fL1; : : : ; Lmg. It is easy to see that an LGS of f(D1 [ f:L1; : : : ; :Lmg); : : :; (Dn [ f:L1; : : : ; :Lmg)g is a least generalization of S under *Sigma , so every finite set of clauses has a least generalization under *Sigma in C. Moreover, if each Di is a Horn clause and each Lj is a positive ground literal (i.e., a ground atom), then this least generalization will itself also be a Horn clause. Accordingly, if Sigma is a finite set of positive ground literals, then every finite set of Horn clauses has a least generalization under *Sigma in H. 5. Least Generalizations under Implication Now we turn from subsumption to the implication order. In this section we will discuss LGI's, in the next section we handle GSS's. For Horn clauses, the LGI-question has already been answered negatively by Muggleton and De Raedt (1994). Let D1 = P (f 2(x)) P (x), D2 = P (f 3(x)) P (x), C1 = P (f (x)) P (x) and C2 = P (f 2(y)) P (x). Then we have both C1 j= fD1; D2g and C2 j= fD1; D2g. It is not very difficult to see that there are no more specific Horn clauses than C1 and C2 that imply both D1 and D2. For C1: no resolvent of C1 with itself implies D2 and no clause that is properly subsumed by C1 still implies D1 and D2. For C2: every resolvent of C2 with itself is a variant of C2, and no clause that is properly subsumed by C2 still implies D1 and D2. Thus C1 and C2 are both "minimal" generalizations under implication of fD1; D2g. Since C1 and C2 are not logically equivalent under implication, there is no LGI of fD1; D2g in H. However, the fact that there is no LGI of fD1; D2g in H does not mean that D1 and D2 have no LGI in C, since a Horn language is a more restricted space than a clausal language. In fact, it is shown by Muggleton and Page (1994) that C = P (f (x)) . P (f 2(y)) P (x) is an LGI of D1 and D2 in C. For this reason, it may be worthwhile for the LGI to consider a clausal language instead of only Horn clauses. In the next subsection, we show that any finite set of clauses which contains at least one non-tautologous function-free clause, has an LGI in C. An immediate corollary of this result is the existence of an LGI of any finite set of function-free clauses. In our usage of the word, a `function-free' clause may contain constants, even though constants are sometimes seen as functions of arity 0. Definition 14 A clause is function-free if it does not contain function symbols of arity 1 or more. 2 Note that a clause is function-free iff it has depth 1. In case of sets of clauses which all contain function symbols, the LGI-question remains open. 5.1 A Sufficient Condition for the Existence of an LGI In this subsection, we will show that any finite set S of clauses containing at least one non-tautologous function-free clause, has an LGI in C. Definition 15 Let C be a clause, x1; : : : ; xn all distinct variables in C, and K a set of terms. Then the instance set of C w.r.t. K is I(C; K) = fC` j ` = fx1=t1; : : :; xn=tng; 351 Nienhuys-Cheng & de Wolf where ti 2 K, for every 1 ^ i ^ ng. If Sigma = fC1; : : : ; Ckg is a set of clauses, then the instance set of Sigma w.r.t. K is I(Sigma ; K) = I(C1; K) [ : : : [ I(Ck; K). 2 For example, if C = P (x) . Q(y) and T = fa; f (z)g, then I(C; T ) = f(P (a) . Q(a)); (P (a) . Q(f (z))); (P (f (z)) . Q(a)); (P (f (z)) . Q(f (z)))g. Definition 16 Let S be a finite set of clauses, and oe be a Skolem substitution for S. Then the term set of S by oe is the set of all terms (including subterms) occurring in Soe. 2 A term set of S by some oe is a finite set of ground terms. For instance, the term set of D = P (f 2(x); y; z) P (y; z; f 2(x)) by oe = fx=a; y=b; z=cg is T = fa; f (a); f 2(a); b; cg. Our definition of a term set corresponds to what Idestam-Almquist (1993, 1995) calls a `minimal term set'. In his definition, if oe is a Skolem substitution for a set of clauses S = fD1; : : : ; Dng w.r.t. some other set of clauses S0, then a term set of S is a finite set of terms which contains the minimal term set of S by oe as a subset. Using his notion of term set, he defines T-implication as follows: if C and D are clauses and T is a term set of fDg by some Skolem substitution oe w.r.t. fCg, then C T-implies D w.r.t. T if I(C; T ) j= Doe. T-implication is decidable, weaker than logical implication and stronger than subsumption. Idestam-Almquist (1993, 1995) gives the result that any finite set of clauses has a least generalization under T-implication w.r.t. any term set T . However, as he also notes, T-implication is not transitive and hence not a quasi-order. Therefore it does not fit into our general framework here. For this reason, we will not discuss it fully here, and for the same reason we have not included a row for T-implication in Table 1. Let us now begin with the proof of our result concerning the existence of LGI's. Consider C = P (x; y; z) P (z; x; y) and D, oe and T as above. Then C j= D and also I(C; T ) j= Doe, since Doe is a resolvent of P (f 2(a); b; c) P (c; f 2(a); b) and P (c; f 2(a); b) P (b; c; f 2(a)), which are in I(C; T ). As we will show in the next lemma, this holds in general: if C j= D and C is function-free, then we can restrict attention to the ground instances of C instantiated to terms in the term set of D by some oe. The proof of Lemma 3 uses the following idea. Consider a derivation of a clause E from a set Sigma of ground clauses. Suppose some of the clauses in Sigma contain terms not appearing in E. Then any literals containing these terms in Sigma must be resolved away in the derivation. This means that if we replace all the terms in the derivation that are not in E, by some other term t, then the result will be another derivation of E. For example, the left of figure 2 shows a derivation of length 1 of E. The term f 2(b) in the parent clauses does not appear in E. If we replace this term by the constant a, the result is another derivation of E (right of the figure). P (b) P (f2(b)) P (f2(b)) Q(a; f(a)) @@ @R Gamma Gamma Gamma Psi E = P (b) Q(a; f(a)) P (b) P (a) P (a) Q(b; f(a)) @@ @R Gamma Gamma Gamma Psi E = P (b) Q(a; f(a)) Figure 2: Transforming the left derivation yields the right derivation Lemma 3 Let C be a function-free clause, D be a clause, oe be a Skolem substitution for D w.r.t. fCg and T be the term set of D by oe. Then C j= D iff I(C; T ) j= Doe. 352 Least Generalizations and Greatest Specializations Proof (: Since C j= I(C; T ) and I(C; T ) j= Doe, we have C j= Doe. Now C j= D by Lemma 2. ): If D is a tautology, then Doe is a tautology, so this case is obvious. Suppose D is not a tautology, then Doe is not a tautology. Since C j= Doe, it follows from Theorem 1 that there exists a finite set Sigma of ground instances of C, such that Sigma j= Doe. By the Subsumption Theorem, there exists a derivation from Sigma of a clause E, such that E * Doe. Since Sigma is ground, E must also be ground, so we have E ` Doe. This implies that E only contains terms from T . Let t be an arbitrary term in T and let Sigma 0 be obtained from Sigma by replacing every term in clauses in Sigma which is not in T , by t. Note that since each clause in Sigma is a ground instance of the function-free clause C, every clause in Sigma 0 is also a ground instance of C. Now it is easy to see that the same replacement of terms in the derivation of E from Sigma results in a derivation of E from Sigma 0: (1) each resolution step in the derivation from Sigma can also be carried out in the derivation from Sigma 0, since the same terms in Sigma are replaced by the same terms in Sigma 0, and (2) the terms in Sigma that are not in T (and hence are replaced by t) do not appear in the conclusion E of the derivation. Since there is a derivation of E from Sigma we have Sigma 0 j= E, and hence Sigma 0 j= Doe. Sigma 0 is a set of ground instances of C and all terms in Sigma 0 are terms in T , so Sigma 0 ` I(C; T ). Hence I(C; T ) j= Doe. 2 Lemma 3 cannot be generalized to the case where C contains function symbols of arity * 1, take C = P (f (x); y) P (z; x) and D = P (f (a); a) P (a; f (a)) (from the example given on p. 25 of Idestam-Almquist, 1993). Then T = fa; f (a)g is the term set of D and we have C j= D, yet it can be seen that I(C; T ) 6j= D. The argument used in the previous lemma does not work here, because different terms in some ground instance need not relate to different variables. For example, in the ground instance P (f 2(a); a) P (a; f (a)) of C, we cannot just replace f 2(a) by some other term, for then the resulting clause would not be an instance of C. On the other hand, Lemma 3 can be generalized to a set of clauses instead of a single clause. If Sigma is a set of function-free clauses, C is an arbitrary clause, and oe is a Skolem substitution for C w.r.t. Sigma , then we have that Sigma j= C iff I(Sigma ; T ) j= Coe. The proof is almost literally the same as above. This result implies that Sigma j= C is reducible to an implication I(Sigma ; T ) j= Coe between ground clauses. Since, by the next lemma, implication between ground clauses is decidable, it follows that Sigma j= C is decidable in case Sigma is function-free. Lemma 4 The problem whether Sigma j= C, where Sigma is a finite set of ground clauses and C is a ground clause, is decidable. Proof Let C = L1 . : : : . Ln and A be the set of all ground atoms occurring in Sigma and C. Now consider the following statements, which can be shown equivalent. (1) Sigma j= C. (2) Sigma [ f:L1; : : : ; :Lng is unsatisfiable. (3) Sigma [ f:L1; : : : ; :Lng has no Herbrand model. (4) No subset of A is an Herbrand model of Sigma [ f:L1; : : : ; :Lng. 353 Nienhuys-Cheng & de Wolf Then (1),(2). (2),(3) by Theorem 4.2 of (Chang & Lee, 1973). Since also (3),(4), we have (1),(4). (4) is decidable because A is finite, so (1) is decidable as well. 2 Corollary 1 The problem whether Sigma j= C, where Sigma is a finite set of function-free clauses and C is a clause, is decidable. The following sequence of lemmas more or less follows the pattern of Idestam-Almquist's (1995) Lemma 10 to Lemma 12 (similar to Lemma 3.10 to Lemma 3.12 of Idestam-Almquist, 1993). There he gives a proof of the existence of a least generalization under T-implication of any finite set of (not necessarily function-free) clauses. We can adjust the proof in such a way that we can use it to establish the existence of an LGI of any finite set of clauses containing at least one non-tautologous function-free clause. Lemma 5 Let S be a finite set of non-tautologous clauses, V = fx1; : : : ; xmg be a set of variables and let G = fC1; C2; : : :g be a (possibly infinite) set of generalizations of S under implication. Then the set G0 = I(C1; V ) [ I(C2; V ) [ : : : is a finite set of clauses. Proof Let d be the maximal depth of the terms in clauses in S. It follows from Lemma 1 that G (and hence also G0) cannot contain terms of depth greater than d, nor predicates, functions or constants other than those in S. The set of literals which can be constructed from predicates in S and from terms of depth at most d consisting of functions and constants in S and variables in V , is finite. Hence the set of clauses which can be constructed from those literals is finite as well. G0 is a subset of this set, so G0 is a finite set of clauses. 2 Lemma 6 Let D be a clause, C be a function-free clause such that C j= D, T = ft1; : : : ; tng be the term set of D by oe, V = fx1; : : : ; xmg be a set of variables and m * n. If E is an LGS of I(C; V ), then E j= D. Proof Let fl = fx1=t1; : : : ; xn=tn; xn+1=tn; : : : ; xm=tng (it does not matter to which terms the variables xn+1; : : : ; xm are mapped by fl, as long as they are mapped to terms in T ). Suppose I(C; V ) = fCae1; : : : ; Caekg. Then I(C; T ) = fCae1fl; : : :; Caekflg. Let E be an LGS of I(C; V ) (note that E must be function-free). Then for every 1 ^ i ^ k, there are `i such that E`i ` Caei. This means that E`ifl ` Caeifl and hence E`ifl j= Caeifl, for every 1 ^ i ^ k. Therefore E j= I(C; T ). Since C j= D, we know from Lemma 1 that constants appearing in C must also appear in D. This means that oe is a Skolem substitution for D w.r.t. fCg. Then from Lemma 3 we know I(C; T ) j= Doe, hence E j= Doe. Furthermore, since E is an LGS of I(C; V ), all constants in E also appear in C, hence all constants in E must appear in D. Thus oe is also a Skolem substitution for D w.r.t. fEg. Therefore E j= D by Lemma 2. 2 Consider C = P (x; y; z) P (y; z; x) and D = Q(w). Both C and D imply the clause E = P (x; y; z) P (z; x; y); Q(b). Now note that C [D = P (x; y; z) P (y; z; x); Q(w) also implies E. This holds for clauses in general, even in the presence of background knowledge 354 Least Generalizations and Greatest Specializations Sigma . The next lemma is very general, but in this section we only need the special case where C and D are function-free and Sigma is empty. We need the general case to prove the existence of a GSR in Section 8. Lemma 7 Let C, D and E be clauses such that C and D are standardized apart and let Sigma be a set of clauses. If C j=Sigma E and D j=Sigma E, then C [ D j=Sigma E. Proof Suppose C j=Sigma E and D j=Sigma E, and let M be a model of Sigma [ fC [ Dg. Since C and D are standardized apart, the clause C [ D is equivalent to the formula 8(C) . 8(D) (where 8(C) denotes the universally quantified clause C). This means that M is a model of C or a model of D. Furthermore, M is also a model of Sigma , so it follows from Sigma [ fCg j= E or Sigma [ fDg j= E that M is a model of E. Thus Sigma [ fC [ Dg j= E, hence C [ D j=Sigma E. 2 Now we can prove the existence of an LGI of any finite set S of clauses which contains at least one non-tautologous and function-free clause. In fact we can prove something stronger, namely that this LGI is a special LGI. This is an LGI that is not only implied, but actually subsumed by any other generalization of S: Definition 17 Let C be a clausal language and S be a finite subset of C. An LGI C of S in C is called a special LGI of S in C, if C0 * C for every generalization C0 2 C of S under implication. 2 Note that if D is an LGI of a set containing at least one non-tautologous function-free clause, then by Lemma 1 D is itself function-free, because it should imply the functionfree clause(s) in S. For instance, C = P (x; y; z) P (y; z; x); Q(w) is an LGI of D1 = P (x; y; z) P (y; z; x); Q(f (a)) and D2 = P (x; y; z) P (z; x; y); Q(b). Note that this LGI is properly subsumed by the LGS of fD1; D2g, which is P (x; y; z) P (x0; y0; z0); Q(w). An LGI may sometimes be the empty clause 2, for example if S = fP (a); Q(a)g. Theorem 6 (Existence of special LGI in C) Let C be a clausal language. If S is a finite set of clauses from C and S contains at least one non-tautologous function-free clause, then there exists a special LGI of S in C. Proof Let S = fD1; : : : ; Dng be a finite set of clauses from C, such that S contains at least one non-tautologous function-free clause. We can assume without loss of generality that S contains no tautologies. Let oe be a Skolem substitution for S, T = ft1; : : : ; tmg be the term set of S by oe, V = fx1; : : : ; xmg be a set of variables and G = fC1; C2; : : :g be the set of all generalizations of S under implication in C. Note that 2 2 G, so G is not empty. Since each clause in G must imply the function-free clause(s) in S, it follows from Lemma 1 that all members of G are function-free. By Lemma 5, the set G0 = I(C1; V ) [ I(C2; V ) [ : : : is a finite set of clauses. Since G0 is finite, the set of I(Ci; V )s is also finite. For simplicity, let fI(C1; V ); : : :; I(Ck; V )g be the set of all distinct I(Ci; V )s. Let Ei be an LGS of I(Ci; V ), for every 1 ^ i ^ k, such that E1; : : : ; Ek are standardized apart. For every 1 ^ j ^ n, the term set of Dj by oe is some set ftj1 ; : : : ; tjsg ` T , such that m * js. From Lemma 6, we have that Ei j= Dj, for every 1 ^ i ^ k and 1 ^ j ^ n, 355 Nienhuys-Cheng & de Wolf hence Ei j= S. Now let F = E1 [ : : : [ Ek, then we have F j= S from Lemma 7 (applying the case of Lemma 7 where Sigma is empty). To prove that F is a special LGI of S, it remains to show that Cj * F , for every j * 1. For every j * 1, there is an i (1 ^ i ^ k), such that I(Cj; V ) = I(Ci; V ). So for this i, Ei is an LGS of I(Cj; V ). Cj is itself also a generalization of I(Cj; V ) under subsumption, hence Cj * Ei. Then finally Cj * F , since Ei ` F . 2 As a consequence, we also immediately have the following: Corollary 2 (Existence of LGI for function-free clauses) Let C be a clausal language. Then for every finite set of function-free clauses S ` C, there exists an LGI of S in C. Proof Let S be a finite set of function-free clauses in C. If S only contains tautologies, any tautology will be an LGI of S. Otherwise, let S0 be obtained by deleting all tautologies from S. By the previous theorem, there is a special LGI of S0. Clearly, this is also a special LGI of S itself in C. 2 This corollary is not trivial, since even though the number of Herbrand interpretations of a language without function symbols is finite (due to the fact that the number of all possible ground atoms is finite in this case), S may nevertheless be implied by an infinite number of non-equivalent clauses. This may seem like a paradox, since there are only finitely many categories of clauses that can "behave differently" in a finite number of finite Herbrand interpretations. Thus it would seem that the number of non-equivalent functionfree clauses should also be finite. This is a misunderstanding, since logical implication (and hence also logical equivalence) is defined in terms of all interpretations, not just Herbrand interpretations. For instance, define D1 = P (a; a) and P (b; b), Cn = fP (xi; xj) j i 6= j; 1 ^ i; j ^ ng. Then we have Cn j= fD1; D2g, Cn j= Cn+1 and Cn+1 6j= Cn, for every n * 1, see (van der Laag & Nienhuys-Cheng, 1994). Another interesting consequence of Theorem 6 concerns self-saturation (see the introduction to this paper for the definition of self-saturation). If C is a special LGI of some set S, then it is clear that C is self-saturated: any clause which implies C also implies S and hence must subsume C, since C is a special LGI of S. Now consider S = fDg, where D is some non-tautologous function-free clause. Then a special LGI C of S will be logically equivalent to D. Moreover, since this C will be self-saturated, it is a self-saturation of D. Corollary 3 If D is a non-tautologous function-free clause, then there exists a self-saturation of D. 5.2 The LGI is Computable In the previous subsection we proved the existence of an LGI in C of every finite set S of clauses containing at least one non-tautologous function-free clause. In this subsection we will establish the computability of such an LGI. The next algorithm, extracted from the proof of the previous section, computes this LGI of S. 356 Least Generalizations and Greatest Specializations LGI-Algorithm Input: A finite set S of clauses, containing at least one non-tautologous functionfree clause. Output: An LGI of S in C. 1. Remove all tautologies from S (a clause is a tautology iff it contains literals A and :A), call the remaining set S0. 2. Let m be the number of distinct terms (including subterms) in S0, let V = fx1; : : : ; xmg. (Notice that this m is the same number as the number of terms in the term set T used in the proof of Theorem 6.) 3. Let G be the (finite) set of all clauses which can be constructed from predicates and constants in S0 and variables in V . 4. Let fU1; : : :; Ung be the set of all subsets of G. 5. Let Hi be an LGS of Ui, for every 1 ^ i ^ n. These Hi can be computed by Plotkin's (1970) algorithm. 6. Remove from fH1; : : : ; Hng all clauses which do not imply S0 (since each Hi is function-free, by Corollary 1 this implication is decidable), and standardize the remaining clauses fH1; : : :; Hqg apart. 7. Return the clause H = H1 [ : : : [ Hq. The correctness of this algorithm follows from the proof of Theorem 6. First notice that H j= S by Lemma 7. Furthermore, note that all I(Ci; V )'s mentioned in the proof of Theorem 6, are elements of the set fU1; : : : ; Ung. This means that for every Ei in the set fE1; : : : ; Ekg mentioned in that proof, there is a clause Hj in fH1; : : : ; Hqg such that Ei and Hj are subsume-equivalent. Then it follows that the LGI F = E1 [ : : : [ Ek of that proof subsumes the clause H = H1 [ : : : [ Hq that our algorithm returns. On the other hand, F is a special LGI, so F and H must be subsume-equivalent. Suppose the number of distinct constants in S0 is c and the number of distinct variables in step 2 of the algorithm is m. Furthermore, suppose there are p distinct predicate symbols in S0, with respective arities a1; : : : ; ap. Then the number of distinct atoms that can be formed from these constants, variables and predicates, is l = Ppi=1(c + m)ai, and the number of distinct literals that can be formed is 2 Delta l. The set G of distinct clauses which can be formed from these literals is the power set of this set of literals, so jGj = 22Delta l. Then the set fU1; : : : ; Ung of all subsets of G contains 2jGj = 22 2Delta l members. Thus the algorithm outlined above is not very efficient (to say the least). A more efficient algorithm may exist, but since implication is harder than subsumption and the computation of an LGS is already quite expensive, we should not put our hopes too high. Nevertheless, the existence of the LGI-algorithm does establish the theoretical point that the LGI of any finite set of clauses containing at least one non-tautologous function-free clause is effectively computable. Theorem 7 (Computability of LGI) Let C be a clausal language. If S is a finite set of clauses from C, and S contains at least one non-tautologous function-free clause, then the LGI of S in C is computable. 357 Nienhuys-Cheng & de Wolf 6. Greatest Specializations under Implication Now we turn from least generalizations under implication to greatest specializations. Finding least generalizations of sets of clauses is common practice in ILP. On the other hand, the greatest specialization, which is the dual of the least generalization, is used hardly ever. Nevertheless, the GSI of two clauses D1 and D2 might be useful. Suppose that we have one positive example e+ and two negative examples eGamma 1 and eGamma 2 and suppose that D1 implies e+ and eGamma 1 , while D2 implies e+ and eGamma 2 . Then it might very well be that the GSI of D1 and D2 still implies e+, but does not imply either eGamma 1 or eGamma 2 . Thus we could obtain a correct specialization by taking the GSI of D1 and D2. It is obvious from the previous sections that the existence of an LGI of S is quite hard to establish. For clauses which all contain functions, the existence of an LGI is still an open question, and even for the case where S contains at least one non-tautologous function-free clause, the proof was far from trivial. However, the existence of a GSI in C is much easier to prove. In fact, a GSI of a finite set S is the same as the GSS of S, namely the union of the clauses in S after these are standardized apart. To see the reason for this dissymmetry, let us take a step back from the clausal framework and consider full first-order logic for a moment. If OE1 and OE2 are two arbitrary first-order formulas, then it can be easily shown that their least generalization is just OE1 ^ OE2: this conjunction implies OE1 and OE1, and must be implied by any other formula which implies both OE1 and OE2. Dually, the greatest specialization is just OE1 . OE2: this is implied by both OE1 and OE2, and must imply any other formula that is implied by both OE1 and OE2. See figure 3. OE1 ^ OE2 Gamma Gamma Gamma Psi @@ @ROE 1 OE2@ @@R Gamma Gamma Gamma Psi OE1 . OE2 Figure 3: Least generalization and greatest specialization in first-order logic Now suppose OE1 and OE2 are clauses. Then why do we have a problem in finding the LGI of OE1 and OE2? The reason for this is that OE1 ^ OE2 is not a clause. Instead of using OE1 ^ OE2, we have to find some least clause which implies both clauses OE1 and OE2. Such a clause appears quite hard to find sometimes. On the other hand, in case of specialization there is no problem. Here we can take OE1 . OE2 as GSI, since OE1 . OE2 is equivalent to a clause, if we handle the universal quantifiers in front of a clause properly. If OE1 and OE2 are standardized apart, then the formula OE1 . OE2 is equivalent to the clause which is the union of OE1 and OE2. This fact was used in the proof of Lemma 7. Suppose S = fD1; : : : ; Dng, and D01; : : : ; D0n are variants of these clauses which are standardized apart. Then clearly D = D01 [ : : : [ D0n is a GSI of S, since it follows from Lemma 7 that any specialization of S under implication is implied by D. Thus we have the following result: 358 Least Generalizations and Greatest Specializations Theorem 8 (Existence of GSI in C) Let C be a clausal language. Then for every finite S ` C, there exists a GSI of S in C. The previous theorem holds for clauses in general, so in particular also for function-free clauses. Furthermore, Corollary 2 guarantees us that in a function-free clausal language an LGI of every finite S exists. This means that the set of function-free clauses quasi-ordered by logical implication is in fact a lattice. Corollary 4 (Lattice-structure of function-free clauses under j=) A function-free clausal language ordered by implication is a lattice. In case of a Horn language H, we cannot apply the same proof method as in the case of a clausal language, since the union of two Horn clauses need not be a Horn clause itself. In fact, we can show that not every finite set of Horn clauses has a GSI in H. Here we can use the same clauses that we used to show that sets of Horn clauses need not have an LGI in H, this time from the perspective of specialization instead of generalization. Again, let D1 = P (f 2(x)) P (x), D2 = P (f 3(x)) P (x), C1 = P (f (x)) P (x) and C2 = P (f 2(y)) P (x). Then C1 j= fD1; D2g and C2 j= fD1; D2g, and there is no Horn clause D such that D j= D1, D j= D2, C1 j= D and C2 j= D. Hence there is no GSI of fC1; C2g in H. 7. Least Generalizations under Relative Implication Implication is stronger than subsumption, but relative implication is even more powerful, because background knowledge can be used to model all sorts of useful properties and relations. In this section, we will discuss least generalizations under implication relative to some given background knowledge Sigma (LGR's). In the next section we treat greatest specializations under relative implication. First, we will prove the equivalence between our definition of relative implication and a definition given by Niblett (1988, p. 133). He gives the following definition of subsumption relative to a background knowledge Sigma (to distinguish it from our notion of subsumption, we will call this `N-subsumption'):9 Definition 18 Clause C N-subsumes clause D with respect to background knowledge Sigma if there is a substitution ` such that Sigma ` (C` ! D) (here `!' is the implication-connective, and ``' is an arbitrary complete proof procedure). 2 Proposition 1 Let C and D be clauses and Sigma be a set of clauses. Then C N-subsumes D with respect to Sigma iff C j=Sigma D. Proof Consider the following six statements, which can be shown equivalent. (1) C N-subsumes D with respect to Sigma . (2) There is a substitution ` such that Sigma ` (C` ! D). (3) There is a substitution ` such that Sigma j= (C` ! D). 9. Niblett attributes this definition to Plotkin, though Plotkin gives a rather different definition of relative subsumption in (Plotkin, 1971b), as we have seen in Section 4. 359 Nienhuys-Cheng & de Wolf (4) There is a substitution ` such that Sigma [ fC`g j= D. (5) Sigma [ fCg j= D. (6) C j=Sigma D. (1),(2) by definition. (2),(3) by the completeness of `. (3),(4) by the Deduction Theorem. (4))(5) is obvious and (5))(4) follows from letting ` be the empty substitution, hence (4),(5). Finally, (5),(6) by definition. Thus these six statements are equivalent. 2 Since j= is the special case of j=Sigma where Sigma is empty, our counterexamples to the existence of LGI's or GSI's in H are also counterexamples to the existence of LGR's or GSR's in H. In other words, the `Gamma '-entries in the second row of Table 1 carry over to the third row. For general clauses, the LGR-question also has a negative answer. We will show here that even if S and Sigma are both finite sets of function-free clauses, an LGR of S relative to Sigma need not exist. Let D1 = P (a), D2 = P (b), S = fD1; D2g, and Sigma = f(P (a).:Q(x)); (P (b). :Q(x))g. We will show that this S has no LGR relative to Sigma in C. Suppose C is an LGR of S relative to Sigma . Note that if C contains the literal P (a), then the Herbrand interpretation which makes P (a) true and which makes all other ground literals false, would be a model of Sigma [ fCg but not of D2, so then we would have C 6j=Sigma D2. Similarly, if C contains P (b) then C 6j=Sigma D1. Hence C cannot contain P (a) or P (b). Now let d be a constant not appearing in C. Let D = P (x) . Q(d), then D j=Sigma S. By the definition of an LGR, we should have D j=Sigma C. Then by the subsumption theorem, there must be a derivation from Sigma [ fDg of a clause E, which subsumes C. The set of all clauses which can be derived (in 0 or more resolution-steps) from Sigma [ fDg is Sigma [ fDg [ f(P (a) . P (x)); (P (b) . P (x))g. But none of these clauses subsumes C, because C does not contain the constant d, nor the literals P (a) or P (b). Hence D 6j=Sigma C, contradicting the assumption that C is an LGR of S relative to Sigma in C. Thus in general the LGR of S relative to Sigma need not exist. However, we can identify a special case in which the LGR does exist. This case might be of practical interest. Suppose Sigma = fL1; : : : ; Lmg is a finite set of function-free ground literals. We can assume Sigma does not contain complementary literals (i.e., A and :A), for otherwise Sigma would be inconsistent. Also, suppose S = fD1; : : : ; Dng is a set of clauses, at least one of which is non-tautologous and function-free. Then C j=Sigma Di iff fCg [ Sigma j= Di iff C j= Di . :(L1 ^ : : : ^ Lm) iff C j= Di . :L1 . : : : . :Lm. This means that an LGI of the set of clauses f(D1 . :L1 . : : : . :Lm); : : :; (Dn . :L1 . : : : . :Lm)g is also an LGR of S relative to Sigma . If some Dk .:L1 .: : :.:Lm is non-tautologous and function-free, this LGI exists and is computable. Hence in this special case, the LGR of S relative to Sigma exists and is computable. 8. Greatest Specializations under Relative Implication Since the counterexample to the existence of GSI's in H is also a counterexample to the existence of GSR's in H, the only remaining question in the j=Sigma -order is the existence of GSR's in C. The answer to this question is positive. In fact, like the GSS and the GSI, the GSR of some finite set S in C is just the union of the (standardized apart) clauses in S. Theorem 9 (Existence of GSR in C) Let C be a clausal language and Sigma ` C. Then for every finite S ` C, there exists a GSR of S relative to Sigma in C. 360 Least Generalizations and Greatest Specializations Proof Suppose S = fD1; : : : ; Dng ` C. Without loss of generality, we assume the clauses in S are standardized apart. Let D = D1 [ : : : [ Dn, then Di j=Sigma D, for every 1 ^ i ^ n. Now let C 2 C be such that Di j=Sigma C, for every 1 ^ i ^ n. Then from Lemma 7, we have D j=Sigma C. Hence D is a GSR of S relative to Sigma in C. 2 9. Conclusion In ILP, the three main generality orders are subsumption, implication, and relative implication. The two main languages are clausal languages and Horn languages. This gives a total of six different ordered sets. In this paper, we have given a systematic treatment of the existence or non-existence of least generalizations and greatest specializations in each of these six ordered sets. The outcome of this investigation is summarized in Table 1. The only remaining open question is the existence or non-existence of a least generalization under implication in C for sets of clauses which all contain function symbols. Table 1 makes explicit the trade-off between different generality orders. On the one hand, implication is better suited as a generality order than subsumption, particularly in case of recursive clauses. Relative implication is still better, because it allows us to take background knowledge into account. On the other hand, we can see from the table that as far as the existence of least generalizations goes, subsumption is more attractive than logical implication, and logical implication is in turn more attractive than relative implication. For subsumption, least generalizations always exist. For logical implication, we can only prove the existence of least generalizations in the presence of a function-free clause. And finally, for relative implication, least generalizations need not even exist in a function-free language. In practice this means that we cannot have it all. If we choose to use a very strong generality order such as relative implication, least generalizations only exist in very limited cases. On the other hand, if we want to guarantee that least generalizations always exist, we are committed to the weakest generality order: subsumption. Acknowledgements We would like to thank Peter Idestam-Almquist and the referees for their comments, which helped to improve the paper. References Aha, D. W. (1992). Relating relational learning algorithms. In Muggleton, S. (Ed.), Inductive Logic Programming, Vol. 38 of APIC Series, pp. 233-254. Academic Press. Chang, C.-L., & Lee, R. C.-T. (1973). Symbolic Logic and Mechanical Theorem Proving. Academic Press, San Diego. De Raedt, L., & Bruynooghe, M. (1993). A theory of clausal discovery. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93), pp. 1058-1063. Morgan Kaufmann. 361 Nienhuys-Cheng & de Wolf Gottlob, G. (1987). Subsumption and implication. Information Processing Letters, 24 (2), 109-111. Idestam-Almquist, P. (1993). Generalization of Clauses. Ph.D. thesis, Stockholm University. Idestam-Almquist, P. (1995). Generalization of clauses under implication. Journal of Artificial Intelligence Research, 3, 467-489. Kowalski, R. A. (1970). The case for using equality axioms in automatic demonstration. In Proceedings of the Symposium on Automatic Demonstration, Vol. 125 of Lecture Notes in Mathematics, pp. 112-127. Springer-Verlag. Lavra^c, N., & D^zeroski, S. (1994). Inductive Logic Programming: Techniques and Applications. Ellis Horwood. Lee, R. C.-T. (1967). A Completeness Theorem and a Computer Program for Finding Theorems Derivable from Given Axioms. Ph.D. thesis, University of California, Berkeley. Lloyd, J. W. (1987). Foundations of Logic Programming (Second edition). Springer-Verlag, Berlin. Marcinkowski, J., & Pacholski, L. (1992). Undecidability of the horn-clause implication problem. In Proceedings of the 33rd Annual IEEE Symposium on Foundations of Computer Science, pp. 354-362 Pittsburg. Morik, K., Wrobel, S., Kietz, J.-U., & Emde, W. (1993). Knowledge Acquisition and Machine Learning: Theory, Methods and Applications. Academic Press, London. Muggleton, S. (1992). Inductive logic programming. In Muggleton, S. (Ed.), Inductive Logic Programming, Vol. 38 of APIC Series, pp. 3-27. Academic Press. Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. Journal of Logic Programming, 19, 629-679. Muggleton, S., & Feng, C. (1992). Efficient induction of logic programs. In Muggleton, S. (Ed.), Inductive Logic Programming, Vol. 38 of APIC Series, pp. 281-298. Academic Press. Muggleton, S., & Page, C. D. (1994). Self-saturation of definite clauses. In Wrobel, S. (Ed.), Proceedings of the 4th International Workshop on Inductive Logic Programming (ILP-94), Vol. 237 of GMD-Studien, pp. 161-174 Bad Honnef/Bonn. Gesellschaft f"ur Mathematik und Datenverarbeitung. Niblett, T. (1988). A study of generalisation in logic programs. In Sleeman, D. (Ed.), Proceedings of the 3rd European Working Sessions on Learning (EWSL-88), pp. 131- 138. Nienhuys-Cheng, S.-H., & de Wolf, R. (1996). The subsumption theorem in inductive logic programming: Facts and fallacies. In De Raedt, L. (Ed.), Advances in Inductive Logic Programming, pp. 265-276 Amsterdam. IOS Press. 362 Least Generalizations and Greatest Specializations Plotkin, G. D. (1970). A note on inductive generalization. Machine Intelligence, 5, 153-163. Plotkin, G. D. (1971a). Automatic Methods of Inductive Inference. Ph.D. thesis, Edinburgh University. Plotkin, G. D. (1971b). A further note on inductive generalization. Machine Intelligence, 6, 101-124. Quinlan, J. R., & Cameron-Jones, R. M. (1993). Foil: A midterm report. In Brazdil, P. (Ed.), Proceedings of the 6th European Conference on Machine Learning (ECML-93), Vol. 667 of Lecture Notes in Artificial Intelligence, pp. 3-20. Springer-Verlag. Reynolds, J. C. (1970). Transformational systems and the algebraic structure of atomic formulas. Machine Intelligence, 5, 135-151. Rouveirol, C. (1992). Extensions of inversion of resolution applied to theory completion. In Muggleton, S. (Ed.), Inductive Logic Programming, Vol. 38 of APIC Series, pp. 63-92. Academic Press. Shapiro, E. Y. (1981). Inductive inference of theories from facts. Research report 192, Yale University. van der Laag, P. R. J., & Nienhuys-Cheng, S.-H. (1994). Existence and nonexistence of complete refinement operators. In Bergadano, F., & De Raedt, L. (Eds.), Proceedings of the 7th European Conference on Machine Learning (ECML-94), Vol. 784 of Lecture Notes in Artificial Intelligence, pp. 307-322. Springer-Verlag. 363