L. Carsten, C. Areces, I. Horrocks and U. Sattler
Volume 23, 2005
Links to Full Text:
Journal of Artiï¬cial Intelligence Research 23 (2005) 667-726 Submitted 07/04; published 06/05 Keys, Nominals, and Concrete Domains Carsten Lutz lutz@tcs.inf.tu-dresden.de Theoretical Computer Science, TU DresdenD-01062 Dresden, Germany Carlos Areces areces@loria.fr INRIA Lorraine, Nancy54602 Villers les Nancy Cedex, France Ian Horrocks horrocks@cs.man.ac.uk Department of Computer ScienceUniversity of ManchesterOxford Road, Manchester M13 9PL, UK Ulrike Sattler sattler@cs.man.ac.uk Department of Computer Science University of Manchester Oxford Road, Manchester M13 9PL, UK Abstract Many description logics (DLs) combine knowledge representation on an abstract, logicallevel with an interface to âconcreteâ domains like numbers and strings with built-in predi-cates such as <, +, and preï¬x-of. These hybrid DLs have turned out to be useful in severalapplication areas, such as reasoning about conceptual database models. We propose tofurther extend such DLs with key constraints that allow the expression of statements likeâUS citizens are uniquely identiï¬ed by their social security numberâ. Based on this idea,we introduce a number of natural description logics and perform a detailed analysis oftheir decidability and computational complexity. It turns out that naive extensions withkey constraints easily lead to undecidability, whereas more careful extensions yield NExp-Time-complete DLs for a variety of useful concrete domains. 1. Motivation Description logics (DLs) are a family of formalisms that allow the representation of andreasoning about conceptual knowledge in a structured and semantically well-understoodmanner (Baader, Calvanese, McGuinness, Nardi, & Patel-Schneider, 2003). The centralentities for representing such knowledge are concepts, which are constructed from atomicconcept names (unary predicates) and role names (binary relations) by means of the conceptand role constructors oï¬ered by a particular DL. For example, in the basic propositionallyclosed description logic ALC (Schmidt-Schauà & Smolka, 1991), we can describe a companythat has part-time employees but only full-time managers using the concept Company âemployee.Parttime âemployee.(¬Manager ¬Parttime). In this example, all words beginning with uppercase letters denote concept names whileemployee denotes a role name. c 2005 AI Access Foundation. All rights reserved.
Lutz, Areces, Horrocks, & Sattler Rather than being viewed only as conceptual entities in a knowledge base, concepts can, more generally, be understood as the central notion in various kinds of class-centeredformalisms. In the last decade, this observation has given rise to various new and challeng-ing applications of description logics such as reasoning about database conceptual modelsexpressed in entity-relationship diagrams or object-oriented schemas (Calvanese, Lenzerini,& Nardi, 1998; Calvanese, De Giacomo, & Lenzerini, 1998) and reasoning about ontologiesfor use in the semantic web (Baader, Horrocks, & Sattler, 2002a; Horrocks, 2002; Horrocks,Patel-Schneider, & van Harmelen, 2002). These new applications have, in turn, stimulatedresearch in description logics since the expressive power of existing DLs was insuï¬cientfor the new tasks. One important extension is by providing expressive means that allowthe integration of numbers and other datatypes: suppose, for example, that we want toextend our earlier descriptions of companies and employees to include the founding year ofa company and the hiring year of an employee. Then, we may want to describe companiesthat were founded before 1970 and state that the hiring year of employees is not prior tothe founding year of the employing company. To do this, we obviously need a way to talkabout natural numbers (such as 1970) and comparisons between natural numbers. Nowadays, the standard approach to integrate datatypes into description logics is to extend DLs with concrete domains, as ï¬rst proposed by Baader and Hanschke (1991a) andrecently surveyed by Lutz (2003). More precisely, a concrete domain D consists of a set(such as the natural numbers) and predicates which are associated with a ï¬xed extensionover this set1 (such as the unary =0, the binary <, and the ternary +). The integration ofconcrete domains into, say, the description logic ALC is achieved by adding 1. abstract features, i.e. functional roles; 2. concrete features, i.e. (partial) functions associating values from the concrete domain (e.g., natural numbers) with logical objects; 3. a concrete domain-based concept constructor. The DL obtained by extending ALC in this way is called ALC(D), where D denotes aconcrete domain that can be viewed as a parameter to the logic. For example, using asuitable concrete domain D we can describe the constraints formulated above: the concept Employee âemployer.âfoundingyear.<1970 âhiringyear, (employer foundingyear).⥠describes the set of employees who are employed by a company founded before 1970 andwho have a hiring year not prior to the companyâs founding year. In this example, the termââfoundingyear.<1970â is an instance of the concrete domain concept constructor (not to beconfused with the existential value restriction as in âemployee.Parttime), and so is the thirdconjunct. While <1970 is a unary predicate and thus the former instance only takes oneconcrete feature foundingyear as argument, the second instance uses the binary predicateââ¥â requiring two arguments: the concrete feature hiringyear and the sequence of features(employer foundingyear) consisting of the abstract feature employer and the concrete featurefoundingyear. Concrete domains are rather important in many applications of DLs, including the two mentioned above: 1. This ï¬xed extension is why these predicates are often called âbuilt-inâ. 668
Keys, Nominals and Concrete Domains ⢠The standard way of using description logics for reasoning about conceptual database models is to translate a given model into a DL representation and then use a DL rea-soner such as FaCT (Horrocks, 1998) or RACER (Haarslev & M¨ oller, 2001) to compute the consequences of the information provided explicitly in the model. This includesdetecting inconsistencies and inferring additional, implicit containments between en-tities/classes (Calvanese et al., 1998). Since most databases store âconcreteâ datalike numbers and strings, constraints concerning such data are usually part of theconceptual model and should thus also be captured by the description logic used forreasoning. Indeed, the above example concepts can be viewed as the DL encodingof constraints from a database about companies and their employees. As discussedby Lutz (2002c), description logics with concrete domains are well-suited for concep-tual modeling applications involving concrete datatypes. ⢠So-called concrete datatypes play a prominent role in the construction of ontolo- gies (Horrocks et al., 2002). Say, for example, that we want to construct an ontologywhich can be used for describing car dealersâ web pages and web services. In such anontology, concrete datatypes such as prices, manufacturing years, and names of carmodels will doubtlessly be very important. To formulate this ontology using a DL, weneed a way to represent these concrete datatypes. Consequently, almost all DLs thathave been proposed as an ontology language are equipped with some form of concretedomain (Fensel, van Harmelen, Horrocks, McGuinness, & Patel-Schneider, 2001; Hor-rocks et al., 2002; Dean, Connolly, van Harmelen, Hendler, Horrocks, McGuinness,Patel-Schneider, & Stein, 2002). Furthermore, since these ontology languages provideinverse abstract roles and functional restrictions, the users and ontology designerswere quite surprised to ï¬nd that they do not provide inverse concrete functionalfeaturesâwhich is due to the fact that such features correspond to âconcrete keyconstraintsâ, for which no reasoning algorithms were known and whose eï¬ect on thedecidability/complexity was not yet investigated. In this paper, we propose to further enhance the expressive power of description logics withconcrete domains by extending them with âconcrete key constraintsâ. This extension isuseful both for knowledge representation and for the two applications sketched above. Thefollowing three examples describe the basic idea. 1. Suppose that, in a knowledge representation application, we represent nationalities by concept names such as US and German and, for US citizens, we store the socialsecurity number using a concrete feature ssn. Then it would be natural to state thatUS citizens are uniquely identiï¬ed by their social security number, i.e. any two distinctinstances of Human ânationality.US must have diï¬erent values for the ssn feature. In our extension of DLs with concretedomains, this can be expressed by using the key assertion2 (ssn keyfor Human ânationality.US). 2. Readers familiar with the relationship between DLs and ï¬rst order logic will notice that this key assertion is equivalent to âx1x2.((V (Human(x iâ1,2 i)â§âz.(nationality(xi, z)â§US(z)))â§Â¬(x1 = x2)) â ¬(ssn(x1) = ssn(x2))). 669
Lutz, Areces, Horrocks, & Sattler 2. Returning to our database about companies and employees, it could be useful to equip every employee with (i) a concrete feature branch storing the branch-ID inwhich she is working and (ii) a concrete feature id storing her personnel-ID. It wouldthen be natural to enforce that the branch-ID together with the personnel-ID uniquelyidentiï¬es employees, even though personnel-IDs are not unique. We can do this byusing the composite key assertion (branch, id keyfor Employee). 3. In the car dealersâ ontology, we may assume that cars as well as manufacturers are equipped with identiï¬cation numbers and that every car is uniquely identiï¬ed by thecombination of its own identiï¬cation number and its manufacturers one. To expressthis, we could employ a composite key assertion referring to sequences of features, inthis case (manufacturer id): (id, (manufacturer id) keyfor Car). More formally, we propose to extend DLs to provide for concrete domains with key boxes,which are sets of key assertions of the form (u1, . . . , un keyfor C), where each ui is a sequence f1 · · · fng of abstract features fj followed by a single concretefeature g, and C is a concept. As the above examples illustrate, the idea of key constraints isvery natural. Since, moreover, keys play an important role in databases and, as mentionedabove, reasoning about database conceptual models is an important, challenging applicationof description logics, several approaches to extend description logics with keys have alreadybeen investigated (Borgida & Weddell, 1997; Calvanese, De Giacomo, & Lenzerini, 2000;Khizder, Toman, & Weddell, 2001). What distinguishes our approach from existing ones,however, is the idea of using concrete domains for constructing key constraints, rather thandeï¬ning keys on an abstract, logical level. The goal of this paper is to provide a comprehensive analysis of the eï¬ects on decid- ability and computational complexity of adding key boxes to description logics with concretedomains. To this end, we extend the two description logics ALC(D) and SHOQ(D) with keyboxes, in this way obtaining ALCK(D) and SHOQK(D), respectively. While the basic DLwith concrete domains ALC(D) has already been discussed above, SHOQ(D) was proposedas an ontology language in (Horrocks & Sattler, 2001). It provides a wealth of expressivepossibilities such as general concept inclusion axioms (GCIs), transitive roles, role hierar-chies, nominals, and qualifying number restrictions. Moreover, it oï¬ers a restricted variantof the concrete domain constructor that disallows the use of sequences of features in orderto avoid undecidability of reasoning. The main outcome of our investigations is that keyconstraints can have a dramatic impact on the decidability and complexity of reasoning: forexample, whereas satisï¬ability of ALC(D)-concepts is known to be PSpace-complete (Lutz,2002b), we will show that satisï¬ability of ALCK(D)-concepts w.r.t. key boxes is, in general,undecidable. Decidability can be regained if we restrict the concepts used in key boxes 670
Keys, Nominals and Concrete Domains to Boolean combinations of concept names (Boolean key boxes). Interestingly, satisï¬abil-ity of ALCK(D)-concepts w.r.t. Boolean key boxes is still NExpTime-complete even forvery simple concrete domains. In the case of SHOQ(D) and SHOQK(D), the leap incomplexity is somewhat less dramatic since SHOQ(D)-concept satisï¬ability is already Ex-pTime-complete: again, the addition of key boxes results in NExpTime-complete reasoningproblems. It is interesting to note that there exists a close connection between key assertions and so-called nominals, i.e. concept names that can have at most one instance, such asPope. Nominals are a standard means of expressivity in description logics and sometimesappear in disguise as the âone-ofâ operator (Borgida & Patel-Schneider, 1994; Horrockset al., 2002). It is not hard to see that key boxes can âsimulateâ nominals: if, for example,we use a concrete domain based on the natural numbers and providing unary predicates=n for equality with n â N, then the key assertion (g keyfor ), where stands for logical truth, obviously makes the concept âg.=n behave like a nominal, for each n âN. For this reason, we also consider ALCO(D), the extension of ALC(D) with nominals,and ALCOK(D), the extension of ALCK(D) with nominals.3 Our main result concerningnominals is that, although in general being of lower expressive power than key boxes, theyalready make reasoning NExpTime-hard if combined with concrete domains: there existconcrete domains D such that ALCO(D)-concept satisï¬ability is NExpTime-complete. Weshould like to stress that this and the other NExpTime-hardness results obtained in thispaper are in accordance with the observation made in (Lutz, 2004) that the PSpace-upperbound for reasoning with ALC(D) is not robust w.r.t. extensions of the logic: there existseveral âseemingly harmlessâ extensions of ALC(D) (for example with acyclic TBoxes andwith inverse roles) which make the complexity of reasoning leap from PSpace-completenessto NExpTime-completeness for many natural concrete domains. The remainder of this paper is organized as follows: in Section 2, we formally introduce concrete domains, key boxes, and the DL ALCOK(D) together with its fragments ALCK(D)and ALCO(D). Moreover, we deï¬ne Boolean key boxes, which only allow Boolean com-binations of concept names to appear in key deï¬nitions. Additionally, we introduce someother important properties of key boxes: path-free key boxes prohibit the use of sequencesof features in key assertions; in unary key boxes, each key assertion involves exactly onesequence of features; and composite key boxes are simply non-unary ones. Section 3 is devoted to establishing lower bounds for extensions of ALC(D) with key boxes or nominals. In Section 3.1, we use a reduction of the Post Correspondence Problemto prove that ALCK(D)-concept satisï¬ability w.r.t. (non-Boolean) key boxes is undecidablefor a large class of concrete domains. We then shift our attention to Boolean key boxessince, in Section 4, we show that this restriction restores decidability. In Section 3.2, we introduce a NExpTime-complete variant of the domino problem and three concretedomains that are useful for the reduction of this problem to concept satisï¬ability in DLswith Boolean key boxes or nominals. In Section 3.3, we use these concrete domains toprove that ALCK(D)-concept satisï¬ability w.r.t. Boolean, path-free and unary key boxesis NExpTime-hard for some natural concrete domains. In Section 3.4, we prove that thereexist concrete domains D such that ALCO(D)-concept satisï¬ability without reference to 3. Note that the logic SHOQ(D) already provides for nominals. 671
Lutz, Areces, Horrocks, & Sattler key boxes is already NExpTime-hard; we show that this is true even for some concretedomains that are computationally very simple (PTime) when considered in isolation. The purpose of Section 4 is to develop reasoning procedures for description logics with key boxes and to prove upper complexity bounds matching the NExpTime lower boundsestablished in the previous section. We start in Section 4.1 with a tableau algorithm thatdecides ALCOK(D)-concept satisï¬ability w.r.t. Boolean key boxes, provided that the con-crete domain D is key-admissible. Intuitively, a concrete domain D is key-admissible if thereexists an algorithm that takes a ï¬nite conjunction c of predicates from D over some set ofvariables, decides whether this conjunction is satisï¬able and, if so, chooses a solution of cand returns the information on which variables take the same values in it. We call suchan algorithm a D-tester. We have chosen a tableau algorithm since this type of reasoningprocedure has the potential to be implemented in eï¬cient reasoners and has been shown tobehave well in practice (Horrocks, Sattler, & Tobies, 2000; Haarslev & M¨ oller, 2001). Our algorithm implies the following upper complexity bound: if D is a key-admissible concretedomain for which a non-deterministic polynomial time D-tester exists, then ALCO(D)-concept satisï¬ability w.r.t. Boolean key boxes is in NExpTime. In Section 4.2, we devise a tableau algorithm for SHOQK(D)-concept satisï¬ability w.r.t. path-free key boxes which might involve non-Boolean concepts. For the decidabilityof ALCOK(D), we restricted key boxes to Boolean ones. For SHOQK(D), such a restric-tion is not possible since SHOQ(D) provides TBoxes, and we can thus no longer distinguishbetween Boolean and non-Boolean concepts. On the other hand, it follows from an unde-cidability proof by Baader and Hanschke (1992) that SHOQ(D) is undecidable if we allowfor sequences of features in concrete domain constructors. Thus we restrict key assertionsanalogously to path-free ones, and show that this yields indeed a decidable logic. Its ex-pressive power is orthogonal to the one of ALCOK(D), and our previous undecidabilityresults imply that the combination of ALCOK(D) and SHOQK(D) is undecidable. As aby-product of the correctness proof of the algorithm, we obtain a bounded model prop-erty for SHOQK(D), which implies that SHOQK(D)-concept satisï¬ability w.r.t. path-freekey boxes is in NExpTime if D is a key-admissible concrete domain for which a non-deterministic polynomial time D-tester exists. In Section 5, we summarize the results obtained and give an outlook to possible future research. 2. Description Logics with Concrete Domains In the following, we introduce the description logic ALCOK(D). Let us start by deï¬ningconcrete domains: Deï¬nition 2.1 (Concrete Domain). A concrete domain D is a pair (âD, ΦD), where âDis a set and ΦD a set of predicate names. Each predicate name P â ΦD is associated withan arity n and an n-ary predicate P D â ân D. Based on concrete domains, we can now deï¬ne ALCOK(D)-concepts and key boxes. Deï¬nition 2.2 (ALCOK(D) Syntax). Let NC, NO, NR, NcF be pairwise disjoint and count-ably inï¬nite sets of concept names, nominals, role names, and concrete features. Further-more, we assume that NR contains a countably inï¬nite subset NaF of abstract features. A 672
Keys, Nominals and Concrete Domains path u is a composition f1 · · · fng of n abstract features f1, . . . , fn (n ⥠0) and a concretefeature g. Let D be a concrete domain. The set of ALCOK(D)-concepts is the smallest setsuch that ⢠every concept name and every nominal is a concept, and ⢠if C and D are concepts, R is a role name, g is a concrete feature, u1, . . . , un are paths, and P â ΦD is a predicate of arity n, then the following expressions are also concepts: ¬C, C D, C D, âR.C, âR.C, âu1, . . . , un.P, and gâ. A key assertion is an expression (u1, . . . , uk keyfor C), where u1, . . . , uk (k ⥠1) are paths and C is a concept. A ï¬nite set of key assertions iscalled a key box. As usual, we use as an abbreviation for an arbitrary propositional tautology, ⥠as an abbreviation for ¬ , C â D as an abbreviation for ¬C D, and C â D as an abbreviation for (C â D) (D â C). Throughout this paper, we will also consider several fragments of the description logic ALCOK(D). The DL ALCO(D) is obtained from ALCOK(D) byadmitting only empty key boxes. In particular, the set of ALCO(D)-concepts is just the setof ALCOK(D)-concepts. Furthermore, by disallowing the use of nominals, we obtain thefragment ALC(D) of ALCO(D) and ALCK(D) of ALCOK(D). The description logic ALCOK(D) is equipped with a Tarski-style set-theoretic semantics. Along with the semantics, we introduce the two standard inference problems: conceptsatisï¬ability and concept subsumption. Deï¬nition 2.3 (ALCOK(D) Semantics). An interpretation I is a pair (âI, ·I), where âI isa non-empty set, called the domain, and ·I is the interpretation function. The interpretationfunction maps ⢠each concept name C to a subset CI of âI, ⢠each nominal N to a singleton subset N I of âI, ⢠each role name R to a subset RI of âI à âI, ⢠each abstract feature f to a partial function f I from âI to âI, and ⢠each concrete feature g to a partial function gI from âI to âD. 673
Lutz, Areces, Horrocks, & Sattler If u = f1 · · · fng is a path, then uI(d) is deï¬ned as gI(f I · · · n (f I (d)) · · · ). The interpretation 1 function is extended to arbitrary concepts as follows: (¬C)I := âI CI (C D)I := CI â© DI (C D)I := CI ⪠DI (âR.C)I := d â âI | There is e â âI with (d, e) â RI and e â CI (âR.C)I := d â âI | For all e â âI, if (d, e) â RI, then e â CI (âu1, . . . , un.P )I := d â âI | âx1, . . . , xn â âD : uIi(d) = xi and (x1, . . . , xn) â P D (gâ)I := d â âI | gI(d) undeï¬ned. Let I be an interpretation. Then I is a model of a concept C iï¬ CI = â . Moreover, Isatisï¬es a key assertion (u1, . . . , un keyfor C) if, for any a, b â CI, uI (a) = uI (b), . . . , uI 1 1 n(a) = uI n(b) implies that a = b. I is a model of a key box K iï¬ I satisï¬es all key assertions in K. A concept C is satisï¬ablew.r.t. a key box K iï¬ C and K have a common model. C is subsumed by a concept D w.r.t.a key box K (written C K D) iï¬ C I â DI for all models I of K. It is well-known that, in description logics providing for all Boolean operators, subsumptioncan be reduced to (un)satisï¬ability and vice versa: C K D iï¬ C ¬D is unsatisï¬able w.r.t. K and C is satisï¬able w.r.t. K iï¬ C K â¥. This allows us to concentrate on concept satisï¬ability when devising complexity bounds for reasoning with description logics: lowerand upper complexity bounds for concept satisï¬ability imply corresponding bounds forconcept subsumptionâonly for the complementary complexity class. If decision procedures for description logics with concrete domains are to be devised without committing to a particular concrete domain, then a well-deï¬ned interface betweenthe decision procedure and a concrete domain reasoner is needed. Usually, this interfaceis based on the assumption that the concrete domain is admissible (Baader & Hanschke,1991a; Lutz, 2002a, 2003): Deï¬nition 2.4 (D-conjunction, Admissibility). Let D be a concrete domain and V a setof variables. A D-conjunction is a (ï¬nite) predicate conjunction of the form (i) c = (x , . . . , x(i)) : P 0 n i, i i<k (i) where Pi is an ni-ary predicate for i < k and the x are variables from V. A D-conjunction j c is satisï¬able iï¬ there exists a function δ mapping the variables in c to elements of âD such (i) (i) that (δ(x ), . . . , δ(x )) â P D for each i < k. Such a function is called a solution for c. 0 ni i We say that the concrete domain D is admissible iï¬ 1. ΦD contains a unary predicate D such that D D = âD; 2. ΦD is closed under negation, i.e., for each n-ary predicate P â ΦD, there is a predicate D P â ΦD of arity n such that P = ân D P D; 674
Keys, Nominals and Concrete Domains 3. satisï¬ability of D-conjunctions is decidable. We refer to the satisï¬ability of D-conjunctions as D-satisï¬ability. As we shall see, it sometimes makes a considerable diï¬erence w.r.t. complexity and decid-ability to restrict key boxes in various ways, for example to disallow paths of length greaterthan one. Therefore, we introduce some useful notions. Deï¬nition 2.5 (Boolean, Path-free, Simple). A key box K is called ⢠Boolean if all concepts appearing in (key assertions in) K are Boolean combinations of concept names; ⢠path-free if, for all key assertions (u1, . . . , un keyfor C) in K, u1, . . . , un â NcF; ⢠simple if it is both path-free and Boolean; ⢠unary if all key assertions in K are unary key assertions, i.e. of the form (u keyfor C). A concept C is called path-free if, in all its subconcepts of the form âu1, . . . , un.P , u1, . . . , unare concrete features. To emphasize that a key box might not necessarily be Boolean or path-free, we sometimescall such a key box general. Similarly, to emphasize that a key box is not necessarily aunary key box, we sometimes call such a key box composite. 3. Lower Bounds In this section, we prove lower complexity bounds for description logics with concrete do-mains and key boxes and/or nominals. In Section 3.1, we start by showing that the satisï¬-ability of ALCK(D)-concepts w.r.t. (general) key boxes is undecidable for many interestingconcrete domains. The discouraging picture painted by this result is mitigated by the factthat, in Section 4.1, we shall prove that the restriction to Boolean key boxes restores de-cidability. It is thus interesting to look for lower complexity bounds that apply under thisrestriction. In preparation for this, we introduce in Section 3.2 a NExpTime-complete vari-ant of the domino problem and three concrete domains that are well-suited for reductionsof this problem. In Section 3.3, we then prove that satisï¬ability of path-free ALCK(D)-concepts w.r.t. simple key boxes is NExpTime-hard for a large class of concrete domains D and that, formany concrete domains, this holds even if we restrict key boxes to unary ones. Finally,we consider the description logic ALCO(D) in Section 3.4 and identify several concretedomains such that ALCO(D)-concept satisï¬ability (without key boxes!) is NExpTime-hard. As we already mentioned, key boxes and nominals are closely related: key boxes canexpress nominals, but are in general more powerful. 3.1 Undecidability of ALCK(D) with General Key Boxes We prove that satisï¬ability of ALCK(D)-concepts w.r.t. key boxes is undecidable for alarge class of concrete domains if we allow complex ALCK(D)-concepts to occur in keyassertions. The proof is by a reduction of the well-known undecidable Post CorrespondenceProblem (Post, 1946; Hopcroft & Ullman, 1979). 675
Lutz, Areces, Horrocks, & Sattler Deï¬nition 3.1 (PCP). An instance P of the Post Correspondence Problem (PCP) is givenby a ï¬nite, non-empty list ( 1, r1), . . . , ( k, rk) of pairs of words over some alphabet Σ. Asequence of integers i1, . . . , im, with m ⥠1, is called a solution for P if i · · · = r · · · r . 1 im i1 im The PCP is to decide whether a given instance P has a solution. For reducing the PCP to the satisï¬ability of our DLs, we need an appropriate concretedomain. It is obviously natural to use a concrete domain based on words and concatenation.We will later see that the results obtained for this concrete domain carry over to otherconcrete domains based on numbers and arithmetics. The following concrete domain wasintroduced by Lutz (2004). Its deï¬nition presupposes a ï¬xed alphabet Σ that is at leastbinary. Deï¬nition 3.2 (Concrete domain W). The concrete domain W is deï¬ned by setting âW :=Σâ and deï¬ning ΦW as the smallest set containing the following predicates: ⢠unary predicates word and nword with wordW = âW and nwordW = â , ⢠unary predicates = and = with =W = and =W = Σ+, ⢠a binary equality predicate = and a binary inequality predicate = with the obvious interpretation, and ⢠for each w â Σ+, two binary predicates concw and nconcw with concW w = (u, v) | v = uw and nconcW w = (u, v) | v = uw. It is readily checked that W satisï¬es properties 1 and 2 of admissibility (see Deï¬nition 2.4).Moreover, W-satisï¬ability is decidable: Theorem 3.1 (Lutz, 2004). W-satisï¬ability is in PTime. Thus, W is admissible and even of low complexity. This is important since our aim isto demonstrate that the undecidability of ALCK(W)-concept satisï¬ability is due to thepresence of keys, and not due to the high complexity of W-satisï¬ability. We can now discuss the reduction of the PCP. A given instance ( 1, r1), . . . , ( k, rk) is translated into an ALCK(D)-concept CP and key box KP as deï¬ned in Figure 1 such thatP has a solution iï¬ CP is unsatisï¬able w.r.t. KP . The idea behind the reduction is that acommon model of CP and KP encodes all potential solutions for P (i.e., sequences i1, . . . , inof integers ij between 1 and k) and makes sure that none of them is in fact a solution. InFigure 1, f1, . . . , fk denote abstract features while g, , and r denote concrete features. Thedeï¬nition of the concept Step just serves as an abbreviation and should not be confusedwith so-called TBoxes (see Section 4.2 for the deï¬nition of TBoxes). Models of CP and KP ,such as the one displayed in Figure 2, have the form of an inï¬nite k-ary tree whose root isconnected to an âextra nodeâ x via the role R. Intuitively, each node of the tree representsone partial solution i1, . . . , in, its -successor represents the corresponding left concatenation i · · · , and its r-successor the corresponding right concatenation r · · · r . 1 in i1 in To enforce the existence of the inï¬nite tree, we employ the key box KP : consider for example the root nodeâs f1-successor in Figure 2âlet us call this node y. Due to Line 3 676
Keys, Nominals and Concrete Domains Step := âfi.(¬A âg.= â , r.=) 1â¤iâ¤k (â , fi .conc âr, f ) i ir.concri 1â¤iâ¤k CP := â .= âr.= âR.(A âg.= ¬Step) Step KP := g keyfor ¬Step Figure 1: The ALCK(W) reduction concept CP and key box KP . R r A = = x âg.= conc f 1 1 f conc k rk concr conc 1 k r r · · · f1 · · · fk f1 · · · fk · · · · · · Figure 2: An example model of CP and KP . of CP and Line 1 of Step, we have y â (âg.= )I. Due to Line 2 of CP , we also havex â (âg.= )I and x â (¬Step)I , where x is the âextra nodeâ mentioned above. In view ofthe key box KP , this implies that either (i) x = y or (ii) y â StepI. It is easy to see that(i) is impossible since Line 2 of CP and Line 1 of Step imply that x â AI and y â (¬A)I.Hence y â StepI and, by Line 2 of Step, y has the appropriate fi-successors for 1 ⤠i ⤠n.In the same way, the construction of the tree can be continued ad inï¬nitum. The secondline in the deï¬nition of Step enforces that I (z) = i · · · and rI (z) = r · · · r for z an 1 in i1 in fi · · · f -successor of the root node. Finally, the concept â , r.= in Line 1 of Step implies 1 in that I (z) = rI (z) holds at all nodes z of the tree (except for the root), which implies thatno potential solution is a solution. Since the size of CP and KP is clearly polynomial in k and the key box KP is a unary key box, we obtain the following proposition. Proposition 3.2. The satisï¬ability of ALCK(W)-concepts w.r.t. (non-Boolean) path-freeunary key boxes is undecidable. 677
Lutz, Areces, Horrocks, & Sattler To emphasize that this undecidability result was obtained using a very simple concretedomain, let us combine Theorem 3.1 with Proposition 3.2. Theorem 3.3. There exists a concrete domain D such that D-satisï¬ability is in PTimeand satisï¬ability of ALCK(D)-concepts w.r.t. (non-Boolean) path-free unary key boxes isundecidable. At ï¬rst sight, the concrete domain W might look artiï¬cial and one may question the rel-evance of lower bounds that have been obtained using W. However, it is straightforwardto encode words as natural numbers and to deï¬ne concatenation of words as rather sim-ple operations on the natural numbers (Baader & Hanschke, 1992): a word w = over the alphabet Σ of cardinality Σ can be interpreted as a number written at base Σ + 1in which the symbol that is the â0 digitâ does not occur. Hence, we can use the corre-sponding natural number (e.g., in base 10) to represent a word w, and the number 0 torepresent the empty word. The concatenation of two words v and w can then be expressedas vw = v · (Σ + 1)|w| + w, where |w| denotes the length of the word w. Moreover, exponen-tiation can be expressed as multiple multiplications, multiplication as multiple additions,and addition as multiple incrementation: this is shown in Section 5.6 of (Lutz, 2004) forthe case of ALC(D) extended with TBoxes (c.f. Section 4.2) and can easily be adapted toALC(D) with non-Boolean key boxes. This observation gives rise to the following theorem: Theorem 3.4. Let D be a concrete domain such that N â âD, ΦD contains a unary predi-cate =0 with (=0)D = 0, binary equality and inequality predicates, and a binary predicateincr with incrD â© (n, x) | n â N and x â âD = (k, k + 1) | k â N. Then satisï¬ability ofALCK(D)-concepts w.r.t. (non-Boolean) path-free unary key boxes is undecidable. 3.2 Domino Problems and Concrete Domains In this section, we introduce a NExpTime-complete variant of the well-known, undecidabledomino problem (Berger, 1966; Knuth, 1968), and then deï¬ne three concrete domains D1,D2, and D3. These concrete domains will be used in Sections 3.3 and 3.4 to establishlower bounds for reasoning with ALCK(D) and Boolean key boxes, and for reasoning withALCO(D). In general, a domino problem is given by a ï¬nite set of tile types. Intuitively, all tile types are of the same size, each type having a square shape and colored edges. An unlimitednumber of tiles of each type is available. In the NExpTime-hard variant of the dominoproblem that we use, the task is to tile a 2n+1 à 2n+1-torus (i.e., a 2n+1 à 2n+1-rectanglewhose borders are âgluedâ together) where neighboring edges have the same color. Deï¬nition 3.3 (Domino System). A domino system D is a triple (T, H, V ), where T N is a ï¬nite set of tile types and H, V â T à T represent the horizontal and vertical matchingconditions. Let D be a domino system and a = a0, . . . , anâ1 an initial condition, i.e. ann-tuple of tiles. A mapping Ï : 0, . . . , 2n+1 â 1 à 0, . . . , 2n+1 â 1 â T is a solution forD and a iï¬, for all x, y < 2n+1, the following holds: ⢠if Ï (x, y) = t and Ï (x â2n+1 1, y) = t , then (t, t ) â H ⢠if Ï (x, y) = t and Ï (x, y â2n+1 1) = t , then (t, t ) â V 678
Keys, Nominals and Concrete Domains â¢ Ï (i, 0) = ai for i < n. where âi denotes addition modulo i. It follows from results in (B¨ orger, Gr¨ adel, & Gurevich, 1997) that the above variant of the domino problem is NExpTime-complete. We deï¬ne the concrete domain D1 to be used in the reduction of the NExpTime- complete domino problem to ALCK(D1)-concept satisï¬ability w.r.t. Boolean key boxes. Deï¬nition 3.4 (Concrete Domain D1). The concrete domain D1 is deï¬ned by settingâD := 0, 1 and Φ to the (smallest) set containing the following predicates: 1 D1 ⢠unary predicates D with ( )D1 = â and ⥠with (⥠)D1 = â ; 1 D1 D1 D1 D1 ⢠unary predicates =0 and =1 with (=i)D1 = i, i â 0, 1. The second concrete domain D2 will be used for a reduction of the NExpTime-completedomino problem to ALCK(D2)-concept satisï¬ability w.r.t. Boolean unary key boxes. Forthis reduction we need to âstoreâ vectors of bits in single concrete domain elements. Deï¬nition 3.5 (Concrete Domain D2). For every n â N, a function v : 0, . . . , n â 1 â0, 1 is called a bit vector of dimension n. We use BVn to denote the set of all bit vectorsof dimension n. The concrete domain D2 is deï¬ned by setting âD := BV to 2 i>0 i and ΦD2 the (smallest) set containing the following predicates: ⢠unary predicates D with ( )D2 = â and ⥠with (⥠)D2 = â ; 2 D2 D2 D2 D2 ⢠for every k, i â N with i < k, unary predicates bit0i and bit1i with k k (bitni | k )D2 = v â âD v â BV 2 k and v(i) = n, and unary predicates bit0i and bit1i with (bitni )D2 = â (bitni )D2. k k k D2 k The last concrete domain D3 is used in the reduction of the NExpTime-complete dominoproblem to ALCO(D3)-concept satisï¬ability. In this reduction, the concrete domain D3contains two kinds of elements: ï¬rstly, there are elements of âD that can represent the 3 whole 2n+1 à 2n+1-torus, so-called domino arrays. Secondly, there are elements of âD that 3 represent positions in the torus. For technical reasons to be discussed later, these elementsare vectors of natural numbers rather than bit vectors, and in the following we shall just callthem vectors. A domino array is then a function mapping each pair of vectors (of certainlength) to a natural number which represents a tile type. Deï¬nition 3.6 (Concrete Domain D3). For every k â N, a function v : 0, . . . , k â 1 â Nis called a vector of dimension k. We use VEk to denote the set of all vectors of dimension k.For every k â N, a function k : VEk à VEk â N is called a domino array of dimension k.We use DAk to denote the set of all domino arrays of dimension k. The concrete domain D3is deï¬ned by setting âD := VE DA to the (smallest) set containing 3 i>0 i ⪠i>0 i and ΦD3 the following predicates: ⢠unary predicates D with ( )D3 = â and ⥠with (⥠)D3 = â ; 3 D3 D3 D3 D3 679
Lutz, Areces, Horrocks, & Sattler ⢠for every k, i â N with i < k, unary predicates pos0i and pos1i with k k (posni | k )D3 = v â âD v â VE 3 k and v(i) = n and unary predicates pos0i and pos1i with (posni )D3 = â (posni )D3; k k k D3 k ⢠for every k, i â N, a predicate tilei of arity 3 with k (tileik)D3 = (vx, vy, d) | vx, vy â VEk, d â DAk, and d(vx, vy) = i and a predicate tilei of arity 3 with (tilei )D3 = (â )3 (tilei )D3 . k k D3 k The reason for using vectors of natural numbers rather than bit vectors in the deï¬nition ofD3 is that we want D3-satisï¬ability to be of low complexity, preferably in PTime: considerthe D3-conjunction pos00(x) â§ pos00(y) â§ pos00(z) â§ 2 2 2 tile72(x, v, d) â§ tile82(y, v, d) â§ tile92(z, v, d). If we use bit vectors rather than vectors of natural numbers, then the upper line enforcesthat at least two out of the three variables x, y, and z must take the same value. Sincethe value of v is ï¬xed, the lower line makes the conjunction unsatisï¬able: it tries to assignthe three diï¬erent values 7, 8, 9 to two diï¬erent positions in the domino array. It seemsunlikely that this kind of inconsistency can be detected in polynomial time. This problemis circumvented by using vectors of natural numbers in the deï¬nition of D3 (but enforcingthem to be bit vectors in the reduction): in this case, the above conjunction is clearlysatisï¬able. Proposition 3.5. For each i â 1, 2, 3, the concrete domain Di is admissible and satisï¬-ability of Di-conjunctions is in PTime. For D1, this is trivial. For D2, a proof can be found in Appendix A. And for D3, a proofcan be found in (Lutz, Areces, Horrocks, & Sattler, 2002). 3.3 NExpTime-hardness of ALCK(D) with Boolean Key Boxes In this section, we prove two NExpTime-lower bounds for ALCK(D)-concept satisï¬abilityw.r.t. Boolean key boxes by reducing the NExpTime-complete domino problem introducedin the previous section. The ï¬rst reduction uses the very simple concrete domain D1, butdepends on composite key assertions. The second reduction uses the slightly more complexconcrete domain D2, but only needs unary key assertions. As we will see, the two reductionsyield diï¬erent, incomparable results. We ï¬rst reduce the NExpTime-complete domino problem to ALCK(D1)-concept satisï¬- ability w.r.t. Boolean composite key boxes. Each domino system D = (T, H, V ) with initialcondition a = a0, . . . , anâ1 is translated into an ALCK(D1)-concept CD,a as displayed inFigure 3. Names such as TreeX and TreeY are used as abbreviations only. We use âRi.C asan abbreviation for the n-fold nesting âR. · · · âR.C. The names xposi and yposi used in theï¬gure denote concrete features. In the deï¬nition of the Init concept, for each n â N, biti(n) 680
Keys, Nominals and Concrete Domains TreeX := âR.X0 âR.¬X0 âRi.(DistXiâ1 âR.Xi âR.¬Xi) i=1..n TreeY := DistXn âR.Y0 âR.¬Y0 âRi.(DistYiâ1 DistXn âR.Yi âR.¬Yi) i=1..n DistXk := ((Xi â âR.Xi) (¬Xi â âR.¬Xi)) i=0..k DistYk := ((Yi â âR.Yi) (¬Yi â âR.¬Yi)) i=0..k TransXPos := (Xi â âxposi. =1) (¬Xi â âxposi. =0) i=0..n TransYPos := (Yi â âyposi. =1) (¬Yi â âyposi. =0) i=0..n Succs := âRx.(TransXPos TransYPos) âRy.(TransXPos TransYPos) XSuccOk := (Yi â âRx.Yi) (¬Yi â âRx.¬Yi) i=0..n Xj â (Xk â âRx.¬Xk) (¬Xk â âRx.Xk) k=0..n j=0..k ¬Xj â (Xk â âRx.Xk) (¬Xk â âRx.¬Xk) k=0..n j=0..k YSuccOk := (Xi â âRy.Xi) (¬Xi â âRy.¬Xi) i=0..n Yj â (Yk â âRy.¬Yk) (¬Yk â âRy.Yk) k=0..n j=0..k ¬Yj â (Yk â âRy.Yk) (¬Yk â âRy.¬Yk) k=0..n j=0..k Label := Di ¬(Di Dj) iâT i,jâT,i=j CheckMatch := (Di âRx.Dj) (Di âRy.Dj) (i,j)âH (i,j)âV Init := ¬Xj Xj ¬Yj â Dai i=0..nâ1 j=0..n,bitj (i)=0 j=0..n,bitj (i)=1 j=0..n CD,a := TreeX âRn+1.TreeY âR2(n+1).(TransXPos TransYPos Succs XSuccOk YSuccOk) âR2(n+1).(Label CheckMatch Init) Figure 3: The ALCK(D1) reduction concept CD,a. is supposed to denote the iâth bit of the binary representation of n. We claim that CD,a issatisï¬able w.r.t. the key box (xpos0, . . . , xposn, ypos0, . . . , yposn keyfor ) iï¬ there exists a solution for D and a. To substantiate this claim, let us go through thereduction and explain the various parts of the concept CD,a. The ï¬rst step towards under- 681
Lutz, Areces, Horrocks, & Sattler standing the structure of models of CD,a (which is the key to understanding the reductionitself) is to note that the purpose of the ï¬rst line of CD,a is to enforce a tree structureof depth 2(n + 1), whose leaves correspond to positions in the 2n+1 à 2n+1-torus. Moreprecisely, the TreeX concept guarantees that, in every model I of CD,a, there exists a binarytree of depth n + 1. Moreover, the DistXk concepts (there exists one for each k â 0, . . . , n)ensure that the leaves of this tree are binarily numbered (from 0 to 2n+1 â 1) by the conceptnames X0, . . . , Xn. More precisely, for a domain object d â âI, set 1 if d â XI xpsn(d) = Σn i i=0αi(d) â 2i where αi(d) = 0 otherwise. The TreeX and DistX concepts ensure that there exist nodes d0, . . . , d2n+1â1 at level n + 1of the tree such that xpsn(di) = i. Intuitively, this numbering represents the horizontalpositions in the 2n+1 à 2n+1-torus. The vertical positions are coded in a similar way by theY0, . . . , Yn concept names. More speciï¬cally, the concepts TreeY, DistX, and DistY ensurethat every di (i ⤠2n+1 â 1) is the root of another tree, in which (i) every node has thesame âX0, . . . , Xn-conï¬gurationâ as its root node, and (ii) the leaves are numbered binarilyusing the concept names Y0, . . . , Yn (note that the TreeY concept appears in CD,a inside aâRn+1 value restriction). Deï¬ne 1 if d â Y I ypsn(d) = Σn i i=0βi(d) â 2i where βi(d) = 0 otherwise. In the set of leaf nodes of all the trees enforced by the TreeY concept, there exists, for eachi, j < 2n+1, an object4 ei,j â âI such that xpsn(ei,j) = i and ypsn(ei,j) = j, i.e., each ei,jrepresents the position (i, j) in the 2n+1 à 2n+1-torus. The next step is to translate the individual bits of the numbering of the ei,j-objects, which are up to now represented by concept names, into concrete domain values. This isdone by the TransXPos and TransYPos concepts which ensure that, for all ⤠n, we have xposI (ei,j) = 0 if ei,j â ¬X , xposI(ei,j) = 1 if ei,j â X , and similarly for ypos and Y .Since I is a model for the key box (xpos0, . . . , xposn, ypos0, . . . , yposn keyfor ), grid positions are uniquely represented by domain elements from (TransXPos TransYPos)I , i.e., if d, e â (TransXPos TransYPos)I such that xpsn(d) = xpsn(e) and ypsn(d) = yxpsn(e), then d = e. This fact is used in the concepts Succs, XSuccOk, and YSuccOk to enforce that,for the two roles Rx and Ry and each i, j ⤠n, the following holds: RI â© x (ei,j à âI) = (ei,j, e(iâ2n+11),j (â) RI â© y (ei,j à âI) = (ei,j, ei,(jâ2n+11). The Succs concept ensures that, for each ei,j, there exists an Rx-successor and an Ry-successor, and that both are in (TransXPos TransYPos)I . Let d be an Rx-successor of ei,j. Then the XSuccOk concept ensures that xpsn(d) = i â2n+1 1 and ypsn(d) = j. Before we 4. It does not matter if there is more than one such object. 682
Keys, Nominals and Concrete Domains explain how it does this, let us note that, since all ei,j are in (TransXPos TransYPos)I and the grid positions are uniquely represented by elements of (TransXPos TransYPos)I , this implies d = e(iâ2n+11),j which shows that the upper line of (â) does indeed hold. Let us now consider the XSuccOk concept in some more detail. It is essentially the DL-formulation of the well-known propositional formula n kâ1 n kâ1 ( xj = 1) â (xk = 1 â xk = 0) â§ ( xj = 0) â (xk = xk) k=0 j=0 k=0 j=0 which encodes incrementation modulo 2n+1, i.e., if t is the number (binarily) encoded bythe propositional variables x0, . . . , xn and t is the number encoded by the propositionalvariables x , . . . , x 0 n, then we have t = t + 1 modulo 2n+1 (see B¨ orger et al., 1997). Taking into account the âRx quantiï¬ers in XSuccOk, it is readily checked that this concept hasjust the desired eï¬ect: to ensure that, for every Rx-successor d of ei,j, we have xpsn(d) =xpsn(e(iâ2n+11),j) = i â2n+1 1. The explanation of YSuccOk and how it enforces the lower line of (â) is analogous to the XSuccOk case. It remains to ensure that every grid position is labeled with precisely one tile and that the initial condition as well as the horizontal and vertical matching conditions are satisï¬ed.The tiles are represented by concept names Di (where i is from the set of tiles T ) andthe described tasks are accomplished in the standard way by the concepts Label, Init, andCheckMatch. It is worth noting that the reduction concept is path-free and the key box is simple, i.e., path-free and Boolean. Path-freeness of concepts is often used to tame the complexityof description logics with concrete domains, although it largely sacriï¬ces their expressivepower (Lutz, 2003; Baader, Lutz, Sturm, & Wolter, 2002b; Haarslev, M¨ oller, & Wessel, 2001; Horrocks & Sattler, 2001). For example, if ALC(D) is augmented with general TBoxes, thenreasoning with arbitrary concepts is undecidable while reasoning with path-free conceptsis ExpTime-complete if D is admissible and D-satisï¬ability is in ExpTime (Lutz, 2002a).This âtaming approachâ does not work in the presence of key boxes since, as we have justseen, satisï¬ability of ALC(D)-concepts w.r.t. key boxes is (under some natural assumptions) NExpTime-hard, even if both concept and key box are path-free. Since the size of CD,a and of the used key box is clearly polynomial in n, we obtain the following proposition. Proposition 3.6. The satisï¬ability of path-free ALCK(D1)-concepts w.r.t. simple key boxesis NExpTime-hard. It has been shown that (non path-free) ALC(D)-concept satisï¬ability is PSpace-completeif D-satisï¬ability is in PSpace (Lutz, 2002b). Hence, it follows from Proposition 3.5 thatALC(D1)-concept satisï¬ability is PSpace-complete. Thus, there is a rather dramatic in-crease of complexity if key boxes are added to ALC(D1). To stress that this increase is dueto the key boxes themselves and not to the complexity of D1-satisï¬ability, we reformulateProposition 3.6: Theorem 3.7. There exists a concrete domain D such that D-satisï¬ability is in PTime andsatisï¬ability of path-free ALCK(D)-concepts w.r.t. simple key boxes is NExpTime-hard. 683
Lutz, Areces, Horrocks, & Sattler Succs2 := âRx.TransPos âRy.TransPos TransPos := (Xi â âbv.bit1i ) ¬X ) 2(n+1) i â âbv.bit0i2(n+1) i=0..n (Yi â âbv.bit1n+i+1 ) ¬Yi â âbv.bit0n+i+1 ) i=0..n 2(n+1) 2(n+1) CD,a := TreeX âRn+1.TreeY âR2(n+1).(TransPos Succs2 XSuccOk YSuccOk) âR2(n+1).(Label CheckMatch Init) Figure 4: The ALCK(D2) reduction concept CD,a. Although, due to its very low expressivity, the concrete domain D1 itself is not very naturalfor knowledge representation, it is a fragment of many concrete domains that have beenproposed in the literature (Baader & Hanschke, 1992; Haarslev & M¨ oller, 2001; Lutz, 2003, 2002b). Indeed, the presented reduction strategy can be adapted to several âstandardâconcrete domains. Let us formulate a (very weak) condition that a concrete domain mustsatisfy in order for the presented reduction strategy to be applicable. Theorem 3.8. Let D be a concrete domain. If there exist a, b â âD with a = b and P1, P2 âΦD such that P D = a and P D = b, then the satisï¬ability of path-free ALCK(D)-concepts 1 2 w.r.t. simple key boxes is NExpTime-hard. We now present the second NExpTime-hardness result for ALCK(D)-concept satisï¬ability.This time, we reduce the NExpTime-complete domino problem to the satisï¬ability of path-free ALCK(D2)-concepts w.r.t. simple unary key boxes. The reduction is very similar tothe previous one and we only discuss the diï¬erences. In the ï¬rst reduction, we represented the individual bits of grid positions by individual concrete features xposi and yposi and used a composite key box to ensure that each pointin the torus is represented by at most one element. In the second reduction, we use a singleconcrete feature bv and represent an entire position (i, j) in the torus using a bit vectorfrom the concrete domain D2. This allows us to enforce the above mentioned uniqueness ofrepresentations using a unary key box. The modiï¬ed reduction concept CD,a can be found in Figure 4, where the concepts TreeX, TreeY, DistXk, DistYk, XSuccOk, YSuccOk, Label, CheckMatch, and Init are deï¬nedas in Figure 3. The translation of the position in the torus encoded by X0, . . . , Xn, Y0, . . . , Yninto a bit vector is done by the TransPos concept in a straightforward manner. Given whatwas said about the ï¬rst reduction, it is not hard to see that CD,a is satisï¬able w.r.t. the keybox (bv keyfor ) iï¬ there exists a solution for D and a. We thus obtain the following proposition. Proposition 3.9. The satisï¬ability of path-free ALCK(D2)-concepts w.r.t. simple unarykey boxes is NExpTime-hard. Again, we relate the NExpTime lower bound to the complexity of D2-satisï¬ability, whichis determined in Proposition 3.5. 684
Keys, Nominals and Concrete Domains Theorem 3.10. There exists a concrete domain D such that D-satisï¬ability is in PTimeand the satisï¬ability of path-free ALCK(D)-concepts w.r.t. simple unary key boxes is NExp-Time-hard. Since the elements of âD are bit vectors, the concrete domain D 2 2 cannot be considered a natural choice for many application areas. But, in the reduction, D2 can be replaced byseveral natural concrete domains. The central observation is that we use bit vectors only to injectively translate sequences of bits into values of the concrete domain, i.e., we translate sequences of 2(n + 1) bits(represented by the concept names X0, . . . , Xn and Y0, . . . , Yn) into elements of âD such 2 that, for distinct sequences, the results of the translation are also distinct. Due to thisrestricted use of bit vectors, there are several ways to replace them by natural numbers.For example, we can replace TransPos with the following concept TransPos which ensuresthat, for each d â TransPos I , sI (d) = xpsn(d) + 2n+1 · ypsn(d): 2n+1 TransPos := âzero.=0 âti.=2i (¬X0 â âs0.=0) (X0 â âs0.=1) i=1...2n+1 ¬Xi â â(siâ1, zero, si).+ Xi â â(siâ1, ti, si).+ i=1..n ¬Yiâ(n+1) â â(siâ1, zero, si).+ (Yiâ(n+1) â â(siâ1, ti, si).+ i=n+1..2n+1 where zero, si, and ti are concrete features, =k (with k â N) denotes a unary predicate withthe obvious extension, and + denotes a ternary addition predicate such that, intuitively,the ï¬rst two arguments are the addends and the third one is the sum. I It is easy to check that, whenever two objects d, e â TransPos do not agree on the interpretation of the X0, . . . , Xn, Y0, . . . , Yn, then sI (d) = sI (e), and thus the key 2n+1 2n+1 box (s2n+1 keyfor ) can be used for the reduction. The size of TransPos is obviously polynomial in n if the numbers k appearing in =k predicates are coded in binary. We thusobtain the following theorem: Theorem 3.11. Let D be a concrete domain such that 1. N â âD, 2. ΦD contains, for each k â N, a predicate =k with (=k)D = k where the size of (the representation of ) =k is logarithmic in k, and 3. ΦD contains a predicate + with (+)D â© (k1, k2, x) | k1, k2 â N and x â âD = (k1, k2, k1 + k2) | k1, k2 â N. Then the satisï¬ability of path-free ALCK(D)-concepts w.r.t. simple unary key boxes is NExpTime-hard. For example, this theorem yields NExpTime-lower bounds for ALCK(D) instantiated withthe concrete domains proposed in (Baader & Hanschke, 1992; Haarslev & M¨ oller, 2001; Lutz, 2003, 2002b). An alternative to the addition predicate is to use multiplication to injectively 685
Lutz, Areces, Horrocks, & Sattler translate sequences of bits into natural numbers. More precisely, let p1, . . . , p2n+1 be theï¬rst 2n + 1 prime numbers and deï¬ne another version of TransPos as follows: TransPos := âone.=1 âti.=p (¬X i 0 â âs0.=0) (X0 â âs0.=1) i=1...2n+1 ¬Xi â â(siâ1, one, si).â Xi â â(siâ1, ti, si).â i=1..n ¬Yiâ(n+1) â â(siâ1, one, si).â Yiâ(n+1) â â(siâ1, ti, si).â i=n+1..2n+1 where â is a ternary multiplication predicate. Since the factorization of natural numbers into prime numbers is unique, we can again use the key box (s2n+1 keyfor ) for the reduction. Moreover, it is well-known that the kâth prime is polynomial in k (Graham, Knuth, & Patashnik, 1990), and thus the size ofthe concept TransPos is polynomial in n even if the numbers k in =k predicates are codedunarily. We thus obtain another theorem concerning quite natural concrete domains: Theorem 3.12. Let D be a concrete domain such that 1. N â âD, 2. ΦD contains, for each k â N, a predicate =k with (=k)D = k, and 3. ΦD contains a predicate â with (â)D â© (k1, k2, x) | k1, k2 â N and x â âD = (k1, k2, k1 · k2) | k1, k2 â N. Then the satisï¬ability of path-free ALCK(D)-concepts w.r.t. simple unary key boxes is NExpTime-hard. 3.4 NExpTime-hardness of ALCO(D) As we already pointed out in Section 1, the relationship between key boxes and nominalsis rather close: the latter can be âsimulatedâ by the former if the concrete domain providespredicates that can be used to uniquely describe elements of âD. For example, in ALCK(D1)the concept âg.=0 behaves as a nominal if we use the key assertion (g keyfor ). We can even deï¬ne n nominals using n single concrete features in unary-key assertions. In thelogics ALCK(D2) and ALCK(D3), a single concrete feature and unary key assertions aresuï¬cient to simulate an arbitrary number of nominals: for example, in ALCK(D2) theconcept C = âg.bit00 â 2 g.bit112 uniquely describes the bit vector (0, 1) â BV2 â âD , i.e., 2 a â CI implies gI (a) = (0, 1). Obviously, any other bit vector (of any length!) can bedescribed in a similar way. This illustrates that, for most non-trivial concrete domains D, the logic ALCK(D) is (at least) as expressive as ALCO(D). Although the converse does not hold, the expressivepower of ALCO(D) is still suï¬cient to prove NExpTime-hardness of concept satisï¬ability,provided that a suitable concrete domain D is used. Since ALCO concept satisï¬ability is PSpace-complete (Areces, Blackburn, & Marx, 1999), this is yet another example of a DLwhere an even seemingly harmless extension with concrete domains has a dramatic eï¬ecton the computational complexity (Lutz, 2003). 686
Keys, Nominals and Concrete Domains Nominal := âf.N XSucc := Xj â (Xk â ¬X ¬ k ) Xj â (Xk â Xk) k=0..n j=0..k k=0..n j=0..k YSucc := Yj â (Yk â ¬Y ¬ k ) Yj â (Yk â Yk) k=0..n j=0..k k=0..n j=0..k TransXPos := (Xi â âbvx.pos1in+1) (¬Xi â âbvx.pos0in+1) i=0..n TransYPos := (Yi â âbvy.pos1in+1) (¬Yi â âbvy.pos0in+1) i=0..n TransXSucc := (X â â â â i bvxs.pos1in+1) (¬Xi bvxs.pos0in+1) i=0..n TransYSucc := (Y â â â â i bvys.pos1in+1) (¬Yi bvys.pos0in+1) i=0..n CheckHMatch := (â(bvx, bvy, f ⦠darr).tilei â n+1 (bvxs, bvy, f ⦠darr).tilej ) n+1 i,jâH CheckVMatch := (â(bvx, bvy, f ⦠darr).tilei â n+1 (bvx, bvys, f ⦠darr).tilej ) n+1 i,jâV Init2 := ¬Xj Xj ¬Yj i=0..nâ1 j=0..n,bitj (i)=0 j=0..n,bitj (i)=1 j=0..n â â(bvx, bvy, f ⦠darr).tileai n+1 CD,a := TreeX âRn+1.TreeY âR2(n+1).Nominal âR2(n+1).(TransXPos TransYPos XSucc YSucc TransXSucc TransYSucc Init2 CheckHMatch CheckVMatch) Figure 5: The ALCO(D3) reduction concept CD,a. In this section, we reduce the NExpTime-complete domino-problem to ALCO(D3)- concept satisï¬ability. Again, let D = (T, H, V ) be a domino system and a = a0, . . . , anâ1an initial condition. The modiï¬ed reduction concept CD,a is deï¬ned in Figure 5, where bvx,bvy, bvxs, bvys, and darr denote concrete features, N denotes a nominal, and the conceptsTreeX, TreeY, DistXk, and DistYk are deï¬ned as in Figure 3. As in the previous reductions,we now give a detailed explanation of the reduction strategy to show that CD,a is satisï¬ableiï¬ there exists a solution for D and a. Formal details can then easily be worked out by theinterested reader. Let I be a model for CD,a. To explain the structure of I, it is convenient to start with the ï¬rst line of CD,a. As in the previous reductions, the TreeX and TreeY conceptsare used to ensure that I contains a tree-shaped substructure of depth n + 1 whose leafnodes are the roots of additional trees of depth n + 1 such that the set of the leafs of the 687
Lutz, Areces, Horrocks, & Sattler TreeX ... TreeY TreeY TreeY ... ... f ... f f N darr Figure 6: The structure of models of CD,a. latter trees correspond to the positions in the 2n+1 à 2n+1-torus, i.e., for each position,there is a leaf node representing it. The torus positions are binarily encoded by the conceptnames X0, . . . , Xn and Y0, . . . , Yn and we use ei,j to refer to the leaf with xpsn(ei,j) = i andypsn(ei,j) = j (see Section 3.3). As in the previous reductions, the numbers coded by X0, . . . , Xn and Y0, . . . , Yn are translated into concrete domain values, which is done by the TransXPos and TransYPosconcepts. Note that, in contrast to the ALCK(D2)-reduction, the x-position and the y-position are not stored in the same bit vector, but rather in the two distinct ones bvxand bvy. Also in contrast to the previous reduction, the actual tiling of the torus is notrepresented by the leaf nodes ei,j, but rather by a domino array: the last conjunct in theï¬rst line of CD,a ensures that every leaf ei,j is connected via the abstract feature f to the(unique) element w â N I .The domain element w is associated with a domino array via the concrete feature darr (aswe shall see later, this is guaranteed by the CheckHMatch and CheckVMatch concepts). Thisdomino array represents the tiling of the 2n+1 à 2n+1-torus. Summing up, the structure ofI is roughly as shown in Figure 6. Since the tiling is stored in a domino array, we need to explain the purpose of the leaf nodes ei,j: these nodes are used to enforce the initial condition and the horizontal andvertical matching condition. Let us discuss the horizontal matching condition (the verticalmatching condition is enforced analogously): the XSucc concept is the DL reformulationof the propositional logic formula for incrementation modulo 2n+1 and ensures that, foreach ei,j, the concept names X , . . . , X 0 n encode the number i â2n+1 1, i.e., the horizontal position of ei,jâs horizontal neighbor. In addition to the storage of the horizontal and verticalposition of ei,j in bvx(ei,j) and bvy(ei,j), we also store the horizontal position iâ2n+1 1 of ei,jâshorizontal successor in bvxs(ei,j). Finally, CheckHMatch veriï¬es that the tiles at positions 688
Keys, Nominals and Concrete Domains (i, j) and (i â2n+1 1, j), which are both stored in the domino array, are compatible with thehorizontal matching condition. Note that CheckHMatch also ensures that the domain element w (with w = N I ) has a domino array attached via the concrete feature darr and that, for each position (i, j), the(unique!) tile stored in the domino array is from the set T . The initial condition is ensuredvia the Init2 concept in a similar way. We (again) use bitj(i) to denote the jâth bit of thebinary encoding of the natural number i. Using the above considerations, the correctness of the reduction is readily checked. Moreover, the size of CD,a is at most polynomial in n. Note that CD,a is not path-free:paths of length two appear in the concepts CheckHMatch, CheckVMatch, and Init2. Summingup, the reduction described yields the following result: Proposition 3.13. The satisï¬ability of ALCO(D3)-concepts is NExpTime-hard. Again, we relate the NExpTime lower bound to the complexity of D3-satisï¬ability, whichis determined in Proposition 3.5. Theorem 3.14. There exists a concrete domain D such that D-satisï¬ability is in PTimeand the satisï¬ability of ALCO(D)-concepts is NExpTime-hard. Note that the reduction uses only a single nominal N . This is a dramatic increase of com-plexity since it has been shown that satisï¬ability of ALC(D)-concepts (i.e., without nominalsand key boxes) is PSpace-complete provided that D is admissible and D-satisï¬ability is inPSpace (Lutz, 2002b). As in previous sections, we note that D3 can be replaced by more natural concrete domains in the NExpTime-hardness proof presented. The idea is to represent the wholedomino array by a single natural number and then to use arithmetic operations to access theindividual positions: a natural number k can be viewed as a domino array by partitioningits binary representation into 2n+1 · 2n+1 = 22(n+1) âsectionsâ of length log(T ) , whereT denotes the cardinality of the set of tile types T . Each such section describes the tileof a single position in the torus. The sections can be accessed by using integer divisionand reminder operations: if k is the natural number representing the torus, then the tile ofposisition i is computed by (k div 2i· log(T ) ) mod 2 log(T ) + 1. Thus, we introduce ternary predicates div for integer division and mod for computing theremainder of a division, and a binary predicate 2x expressing exponentiation with basis 2.Then we modify the reduction as follows: we replace TransXPos and TransYPos by theTransPos concept from Section 3.3 to translate the two numbers encoded by X1, . . . , Xnand Y1, . . . , Yn into a single natural number that is stored in the concrete feature s2n+1. Wethen devise a new concept Tile[i] (for each i â T ) enforcing that the position identiï¬ed bythe feature s2n+1 is labeled with tile i: Tile[i] := âr.= log(T ) âs2n+1, r, r .â âr , r .2x âone.=1 âr, one, t.+ ât, t .2x âf torus, r , u.div âu, t , tile.mod âtile.=i. 689
Lutz, Areces, Horrocks, & Sattler Here, r, r , r , t, t , u, one, torus, and tile are concrete features. The torus feature is the coun-terpart of the darr feature in the original reduction, i.e., it stores the natural number thatrepresents the tiling array. We can use the Tile[i] concept in the obvious way inside theCheckHMatch, CheckVMatch, and Init2 concepts. The size of the resulting reduction conceptis polynomial in n if the numbers k appearing in =k predicates are coded in binary. Wethus obtain the following theorem: Theorem 3.15. Let D be a concrete domain such that 1. N â âD, 2. ΦD contains the predicates a predicate =k (for each k â N), 2x, +, â, div, mod with the following extensions (=k)D = k (2x)D â© (k, x) | k â N and x â âD = (k, 2k) | k â N (+)D â© (k1, k2, x) | k1, k2 â N and x â âD = (k1, k2, k1 + k2) | k1, k2 â N (â)D â© (k1, k2, x) | k1, k2 â N and x â âD = (k1, k2, k1 · k2) | k1, k2 â N (div)D â© (k1, k2, x) | k1, k2 â N and x â âD = (k1, k2, k1 div k2) | k1, k2 â N (mod)D â© (k1, k2, x) | k1, k2 â N and x â âD = (k1, k2, k1 mod k2) | k1, k2 â N Then the satisï¬ability of ALCO(D)-concepts is NExpTime-hard. 4. Reasoning Procedures This section is devoted to developing reasoning procedures for DLs with concrete domains,nominals, and keys. We start with devising a tableau algorithm that decides the satisï¬abilityof ALCOK(D)-concepts w.r.t. Boolean key boxes. This algorithm yields a NExpTime uppercomplexity bound matching the lower bounds established in Section 3.3. Then we consider the rather powerful description logic SHOQK(D). This DL, which is an extension of SHOQ(D) (Horrocks & Sattler, 2001; Pan & Horrocks, 2002), provides awealth of expressive means such as transitive roles, role hierarchies, nominals, and qualifyingnumber restrictions. Moreover, SHOQK(D) is equipped with a restricted variant of theconcrete domain constructor and with key boxes. We develop a tableau algorithm for deciding the satisï¬ability of SHOQK(D)-concepts w.r.t. path-free key boxes. Due to therestrictedness of SHOQK(D)âs concrete domain constructor, we can even admit generalrather than only Boolean key boxes. Again, the algorithm yields a tight NExpTime uppercomplexity bound. 4.1 A Tableau Algorithm for ALCOK(D) with Boolean Key Boxes Tableau algorithms decide the satisï¬ability of the input concept (in our case w.r.t. the inputkey box) by attempting to construct a model for it. More precisely, a tableau algorithmstarts with an initial data structure induced by the input concept and then repeatedly ap-plies so-called completion rules to it. This rule application can be thought of as attemptingto construct a model for the input concept. Finally, either the algorithm will ï¬nd an obvi-ous contradiction or it will encounter a situation that is contradiction-free and in which no 690
Keys, Nominals and Concrete Domains more completion rules are applicable. In the former case, the input concept is unsatisï¬able,while it is satisï¬able in the latter. When devising a tableau algorithm for a description logic with concrete domains but without committing to a particular concrete domain, it is commonly assumed that the con-crete domain is admissible, which implies decidability of the satisï¬ability of D-conjunctions.In the presence of keys, however, this is not enough: if a D-conjunction is satisï¬able, we alsowant to know which of its variables take the same values in an arbitrary but ï¬xed solution.As an example, consider the concrete domain N = (N, <n| n â N) and the N-conjunction c = <2(v1) â§ <2(v2) â§ <2(v3). Obviously, one solution δ for c satisï¬es δ(v1) = δ(v2), another satisï¬es δ(v1) = δ(v3), and soon. Our tableau algorithm uses such identity information passed from the concrete domainreasoner since, in the presence of key boxes, it can have an impact on the structure of theconstructed model. For example, this information reveals the unsatisï¬ability of âR.A âR.(¬A B) âR.(¬A ¬B) âR.âg.<2 w.r.t. (g keyfor ). To formalize this requirement, we strengthen the notion of admissibility into key-admissibility.Since the tableau algorithm developed in this section is non-deterministic, we formulate key-admissibility in a non-deterministic way. Deï¬nition 4.1 (Key-admissible). A concrete domain D is key-admissible iï¬ it satisï¬es thefollowing properties: 1. ΦD contains a name D for âD; 2. ΦD is closed under negation; 3. there exists an algorithm that takes as input a D-conjunction c, returns clash if c is unsatisï¬able, and otherwise non-deterministically outputs an equivalence relation â¼on the set of variables V used in c such that there exists a solution δ for c with thefollowing property: for all v, v â V δ(v) = δ(v ) iï¬ v â¼ v . An algorithm showing the behaviour described in item 3 above is called a D-tester, andthe equivalence relations â¼ are called concrete equivalences. We say that extended D- satisï¬ability is in NP if there exists a D-tester running in polynomial time. Please note that key-admissibility is less esoteric than it might seem: any concrete domainthat is admissible and provides for an equality predicate is also key-admissible. Due toadmissibility, the presence of an equality predicate implies that an inequality predicate isalso available. We can thus construct a D-tester from an algorithm for D-satisï¬ability:when presented with a predicate conjunction c, we simply âguessâ an equivalence relationâ¼ on the set of variables used in c. Then we decide the (non-extended) satisï¬ability ofthe conjunction c â§ =(v, v ) â§ =(v, v ), return clash if it is unsatisï¬able and vâ¼v vâ¼v â¼ otherwise. The rather weak condition that an equality predicate should be present is 691
Lutz, Areces, Horrocks, & Sattler ¬(C D) ¬C ¬D ¬(C D) ¬C ¬D ¬¬C C ¬(âR.C) âR.¬C ¬(âR.C) âR.¬C ¬(âu1, . . . , un.P ) âu1, . . . , un.P u1â · · · unâ ¬(gâ) âg. D Figure 7: The NNF rewrite rules. satisï¬ed by almost all concrete domains proposed in the literature (see, e.g. (Lutz, 2003;Baader & Hanschke, 1991b; Kamp & Wache, 1996; Haarslev, Lutz, & M¨ oller, 1998; Baader & Sattler, 1998)). Throughout this chapter, we assume that any concrete domain is equipped with an equality predicate. This assumption is w.l.o.g. since any D-conjunction using equality canbe translated into an equivalent one without equality by identifying variables according tothe stated equalities. This assumption must not be confused with what was discussed in theprevious paragraph: even if the concrete domain D is admissible and its set of predicates isthus closed under negation, this assumption does not imply the presence of an inequalitypredicate. We need some more prerequisites before we can start the presentation of the tableau algorithm: a concept is in negation normal form (NNF) if negation occurs only in front ofconcept names and nominals. It is easily seen that, if the concrete domain D is admissible,then every ALCOK(D)-concept can be converted into an equivalent one in NNF by exhaus-tively applying the rewrite rules displayed in Figure 7. We use Ë Â¬ C to denote the result of converting ¬C to NNF. A key box is in NNF if all concepts occurring in key assertions arein NNF. In what follows, we generally assume input concepts and key boxes to be in NNF.Let C be an ALCOK(D)-concept and K a key box. We use sub(C) to denote the set ofsubconcepts of C (including C itself) and con(K) to denote the set of concepts appearingon the right-hand side of key assertions in K. For a set of concepts Î, sub(Î) denotes theset sub(C). Moreover, we write cl(C, K) as abbreviation for the set CâÎ sub(C) ⪠sub(con(K)) âª Ë Â¬D | D â sub(con(K)). We now start the presentation of the tableau algorithm by introducing the underlying datastructure. Deï¬nition 4.2 (Completion System). Let Oa and Oc be disjoint and countably inï¬nitesets of abstract and concrete nodes. A completion tree for an ALCOK(D)-concept C and akey box K is a ï¬nite, labeled tree T = (Va, Vc, E, L) with nodes Va ⪠Vc such that Va â Oa,Vc â Oc, and all nodes from Vc are leaves. The tree is labeled as follows: ⢠each node a â Va is labeled with a subset L(a) of cl(C, K); ⢠each edge (a, b) â E with a, b â Va is labeled with a role name L(a, b) occurring in C or K; ⢠each edge (a, x) â E with a â Va and x â Vc is labeled with a concrete feature L(a, x) occurring in C or K. 692
Keys, Nominals and Concrete Domains For a â Va, we use levT(a) to denote the depth at which a occurs in T (starting with theroot node on depth 0). A completion system for an ALCOK(D)-concept C and a key boxK is a tuple (T, P, , â¼), where ⢠T = (Va, Vc, E, L) is a completion tree for C and K, ⢠P is a function mapping each P â ΦD of arity n in C to a subset of V n c , ⢠is a linear ordering of Va such that levT(a) ⤠levT(b) implies a b, and ⢠⼠is an equivalence relation on Vc. Let (Va, Vc, E, L) be a completion tree. A node b â Va is an R-successor of a node a â Vaif (a, b) â E and L(a, b) = R, while a node x â Vc is a g-successor of a if (a, x) â E andL(a, x) = g. For a path u, the notion of u-successor is deï¬ned in the obvious way. Intuitively, the relation â¼ records equalities between concrete nodes found during the (non-deterministic) model construction process. The recording is necessary since equali-ties between concrete nodes can induce equalities between abstract nodes which, in turn,can imply more equalities between concrete nodes. This can be seen in the following ex-ample: assume the completion tree contains, for i â 1, 2, an abstract node ai with aconcrete g-successor xi and a concrete g -successor yi. Now assume that the key box con-tains (g keyfor ), and that the D-tester returns x1 â¼ x2. As a consequence, a1 and a2 represent the same element and thus functionality of g implies that also y1 and y2 representthe same (concrete) element. To deal with such eï¬ects, we deï¬ne an equivalence relationâa on abstract nodes and a second equivalence relation âc on concrete nodes. Deï¬nition 4.3 (âa and âc Relations). Let S = (T, P, , â¼) be a completion system fora concept C and a key box K with T = (Va, Vc, E, L), and let â be an equivalence relationon Va. For each R â NR, a node b â Va is an R/â-neighbor of a node a â Va if there existsa node c â Va such that a â c and b is an R-successor of c. Similarly, for each g â NcF, anode x â Vc is a g/â-neighbor of a if there exists a node c â Va such that a â c and x is ag-successor of c. For paths u, the notion of u/â-neighbor is deï¬ned in the obvious way. We deï¬ne a sequence of equivalence relations â0 â â1 â · · · a a on Va as follows: â0 ⪠a = (a, a) â V 2 a (a, b) â V 2 | a there is an N â NO such that N â L(a) â© L(b) âi+1 ⪠a = âia(a, b) â V 2 | a there is a c â Va and an f â NaF such that a and b are f /âia-neighbors of c ⪠(a, b) â V 2 | a there is a (u1, . . . , un keyfor C) â K, ui/âia-neighbors xi of a for 1 ⤠i ⤠n, andui/âia-neighbors yi of b for 1 ⤠i ⤠nsuch that C â L(a) â© L(b) and xi â¼ yi for 1 ⤠i ⤠n. Finally, set âa = âi iâ¥0 a. Then deï¬ne âc = ⼠⪠(x, y) â V 2 | c there is an a â Va and a g â NcF such that x and y are g/âa-neighbors of a. 693
Lutz, Areces, Horrocks, & Sattler This deï¬nition reï¬ects the above mentioned tight coupling between the concrete and ab-stract equalities: if the D-tester ï¬nds (or guesses) that two concrete nodes are equal, thetableau algorithm may use this to deduce (via the computation of âa and âc) even moreequalities between concrete nodes. Let D be a key-admissible concrete domain. To decide the satisï¬ability of an ALCOK(D)- concept C0 w.r.t. a Boolean key box K (both in NNF), the tableau algorithm is started withthe initial completion tree TC = (a 0 0, â , â , a0 â C0) in the initial completion system SC = (T , P 0 C0 â , â , â ), where Pâ maps each P â ΦD occurring in C0 to â . We now introduce an operation that isused by the completion rules to add new nodes to completion trees. Deï¬nition 4.4 (â+â Operation). An abstract or concrete node is called fresh in a comple-tion tree T if it does not appear in T. Let S = (T, P, , â¼) be a completion system with T = (Va, Vc, E, L). We use the following notions: ⢠Let a â Va, b â Oa fresh in T, and R â NR. We write S +aRb to denote the completion system S that can be obtained from S by adding b to Va and (a, b) to E and settingL(a, b) = R and L(b) = â . Moreover, b is inserted into such that b c implies levT(b) ⤠levT(c). ⢠Let a â Va, x â Oc fresh in T and g â NcF. We write S +agx to denote the completion system S that can be obtained from S by adding x to Vc and (a, x) to E and settingL(a, x) = g. When nesting the + operation, we omit brackets, writing, for example, S + aR1b + bR2c for(S + aR1b) + bR2c. Let u = f1 · · · fng be a path. When a â Va and x â Oc is fresh in T,we use S + aux to denote the completion system that is obtained from S by taking distinctnodes b1, . . . , bn â Oa which are fresh in T and setting S + aux := S + af1b1 + · · · + bnâ1fnbn + bngx. Strictly speaking, the S + aRb operation is non-deterministic since we did not specify howprecisely the node b is inserted into . However, since this is donât care non-determinism, we will view the â+â operation as being deterministic. The completion rules can be found in Figure 8. Note that the R and Rch rules are non-deterministic, i.e., they have more than one possible outcome (this is true donât knownon-determinism). Some further remarks on the completion rules are in order: the upperï¬ve rules are well-known from existing tableau algorithms for ALC(D)-concept satisï¬ability(see, e.g., Lutz, 2002a). Only the use of R/ âa-neighbors and u/ âa-neighbors in the rulesRâ, Râ, and Râc deserves a comment. Take for example Râ: intuitively, if we have a âa bfor two abstract nodes a and b of the completion tree, then a and b describe the samedomain element of the constructed model (and similarly for the âc relation on concrete 694
Keys, Nominals and Concrete Domains R if C1 C2 â L(a) and C1, C2 â L(a) then L(a) := L(a) ⪠C1, C2 R if C1 C2 â L(a) and C1, C2 â© L(a) = â then L(a) := L(a) ⪠C for some C â C1, C2 Râ if âR.C â L(a) and there is no R/âa-neighbor b of a such that C â L(b),then set S := S + aRb for a fresh b â Oa and L(b) := C Râ if âR.C â L(a), b is an R/âa-neighbor of a, and C / â L(b) then set L(b) := L(b) ⪠C Râc if âu1, . . . , un.P â L(a) and there exist no x1, . . . , xn â Vc such that xi is ui/âa-neighbor of a for 1 ⤠i ⤠n and (x1, . . . , xn) â P(P ) then set S := (S + au1x1 + · · · + aunxn) with x1, . . . , xn â Oc fresh and P(P ) := P(P ) ⪠(x1, . . . , xn) Rch if (u1, . . . , un keyfor C) â K and there exist x1, . . . , xn â Vc such that xi is ui/âa-neighbor of a for 1 ⤠i ⤠n and C, ˬC â© L(a) = â then set L(a) := L(a) ⪠D for some D â C, Ë Â¬C Rp if L(b) â L(a) and a â Va is minimal w.r.t. such that a âa b then set L(a) := L(a) ⪠L(b) Figure 8: Completion rules for ALCOK(D). nodes). Thus if a âa b and c is an R-successor of a, then c should also be an R-successorof b. However, since we want the completion tree to be a tree, we do not make the lattersuccessorship explicit. To compensate for this, the Râ rule talks about R/âa-neighborsrather than about R-successors. The lower two rules are necessary for dealing with key boxes. The Rch rule is a so-called âchoose ruleâ (Hollunder & Baader, 1991; Horrocks et al., 2000): intuitively,it guesses whether or not an abstract node a satisï¬es C if there exists a key assertion(u1, . . . , un keyfor C) â K such that there are neighbors of a for all the paths ui. This isnecessary since both possibilities may have ramiï¬cations: if a satisï¬es C, then it must betaken into account in the construction of the relation âa; if a does not satisfy C, then wemust deal with the consequences of it satisfying Ë Â¬C (e.g. in case that C is ). The Rp rule deals with equalities between abstract nodes as recorded by the âa relation: since a âa b means that a and b describe the same node in the constructed model, theirnode labels should be identical. It suï¬ces, however, to choose one representative for eachequivalence class of âa and make sure that this representativeâs node label contains thelabels of all its âa-equivalent nodes. As the representative, we use the node that is minimalw.r.t. the ordering , which has been introduced solely for this reason. The Rp rule does the appropriate copying of node labels. Let us now formalize what it means for a completion system to contain a contradiction. Deï¬nition 4.5 (Clash). Let S = (T, P, , â¼) be a completion system for a concept C and a key box K with T = (Va, Vc, , â¼). We say that the completion system S is concrete 695
Lutz, Areces, Horrocks, & Sattler deï¬ne procedure sat(S) do if S contains a clash then return unsatisï¬able â¼ := test(ζS)compute âacompute âc while â¼ = âcif S contains a clash then return unsatisï¬able if S is complete then return satisï¬able S := the application of a completion rule to Sreturn sat(S ) Figure 9: The ALCOK(D) tableau algorithm. domain satisï¬able iï¬ the conjunction ζS = P (x1, . . . , xn) â§ =(x, y) P used in C (x xâ 1,...,xn)âP (P ) cy is satisï¬able. S is said to contain a clash iï¬ 1. there is an a â Va and an A â NC such that A, ¬A â L(a), 2. there are a â Va and x â Vc such that gâ â L(a) and x is g/âa-neighbor of a, 3. S is not concrete domain satisï¬able. If S does not contain a clash, S is called clash-free. S is called complete iï¬ no completionrule is applicable to S. The tableau algorithm is described in Figure 9 in pseudo-code notation. In this ï¬gure, testcalls a D-tester as speciï¬ed in Deï¬nition 4.1. Let us say a few words about the while loop.There obviously exist close relationships between the relations â¼ and âc and the predicateconjunction ζS: ⢠⼠â âc (note that both âa and âc depend on â¼ and are thus recomputed in each step of the while loop); ⢠by deï¬nition of ζS and D-tester, the result of test(ζS) yields a relation containing âc (and thus also â¼). Using these facts, one may check that, in each step of the while loop, new tuples are addedto the â¼ relation, but none are deleted (see the proof of Lemma B.2 in the appendix).The while loop is needed because (i) âa is deï¬ned using â¼, (ii), âc is deï¬ned using âa, 696
Keys, Nominals and Concrete Domains and (iii) new concrete equalities in âc may then imply even more concrete and/or abstractequalities, and so on. A similar concrete-abstract interplay takes place in the course of several recursion steps: equalities between concrete nodes provided by the D-tester may make new rules applicable(for example Rp and Râc) which changes P and thus also ζS. This may subsequently lead tothe detection of more equalities between concrete nodes by the D-tester, and so on. Theseconsiderations show that, in the presence of keys, there exists a close interplay between theconcrete domain reasoner and the tableau algorithm, which is not needed if keys are notpresent: without keys, it suï¬ces to apply the concrete domain satisï¬ability check only onceafter the completion rules have been exhaustively applied (Baader & Hanschke, 1991a). The detailed proof of termination, soundness, and completeness together with a com- plexity analysis of the tableau algorithm deï¬ned in this section is given in Appendix B. Theorem 4.1. Let D be a key-admissible concrete domain. If extended D-satisï¬ability isin NP, then ALCOK(D)-concept satisï¬ability w.r.t. Boolean key boxes is in NExpTime. We should note that, in the way it is presented here, the algorithm leaves considerableroom for optimizations. One possible optimization concerns the âre-useâ of f -successors(for abstract features f ): for example, when applying the Râ rule to a concept âf.C â L(a),where a already has an f -successor b, we could simply add C to L(b) instead of adding anew f -successor c and recording that b âa c. Another candidate for optimizations is the test function. Recall that this function takes a predicate conjunction c with set of variables V and non-deterministically returns a concreteequivalence, i.e., a relation â¼ such that there exists a solution δ for c with vi â¼ vj iï¬Î´(vi) = δ(vj) (see Deï¬nition 4.1). It is not hard to devise an ALC(D)-concept that forcescompletion systems to have exponentially many concrete nodes by slightly adapting well-known ALC-concepts that require models of exponential size (Halpern & Moses, 1992).Hence, the size of input conjunctions c to test can be exponential in the size of the inputconcept. Even for trivial D-conjunctions c = D(v1) ⧠· · · â§ D(vk) we have an exponential number of distinct concrete equivalences â¼. Thus, the number ofpossible outcomes of a call to the test function may be double exponential in the size ofthe input concept. Considering the above example, a natural response to this problem isto require test to return only minimal concrete equivalences: intuitively, an equivalence isminimal if only those variables are equivalent whose equality is enforced by the conjunction. More precisely, â¼ is called minimal if there exists no concrete equivalence â¼ such that (x, y) | x â¼ y â (x, y) | x â¼ y. We conjecture that restricting test in this way doesnot destroy the soundness and completeness of the tableau algorithm. However, althoughthis deï¬nitely is a worthwhile optimization, it does not help to overcome the existence ofdoubly exponentially many outcomes of test in the worst caseâat least not for all concretedomains D: consider the concrete domain N from Page 691 and conjunctions of the form ci = <i(v1) ⧠· · · â§ <i(v2i). It is readily checked that, for each i ⥠1, the number of minimal concrete equivalences forci is exponential in i. Moreover, it is not hard to devise a concept Ci of size logarithmic 697
Lutz, Areces, Horrocks, & Sattler in i that leads to completion systems S such that ζS = ci. Hence, there are still doublyexponentially many possible outcomes of the test function. In the example just discussed, the exponential branching of test is clearly due to the discreteness of the natural numbers. Indeed, if we use a dense structure for deï¬ning concretedomains, it seems that the restriction to minimal concrete equivalences can have the desiredeï¬ect, namely that the number of testâs possible outcomes becomes polynomial in the sizeof its input and thus exponential in the size of the input concept. For example, considerthe concrete domain Q, which is deï¬ned as follows: ⢠âQ is the set Q of rational numbers; ⢠ΦQ contains unary predicates Q and its negation â¥Q, unary predicates =q and =q for each q â Q, binary comparison predicates <, â¤, =, =, â¥, >, a ternary additionpredicate +, and its negation + (all with the obvious semantics). It is readily checked that Q is key-admissible (note that it provides a binary equality pred-icate) and thus falls into our framework. We conjecture that there exists only one minimalconcrete equivalence for every Q-predicate conjunction c: intuitively, it seems possible to(inductively) determine a relation â¼ on the set of variables V used in c such that (i) x â¼ yimplies that δ(x) = δ(y) for every solution δ for c and (ii) there exists a solution δ for c suchthat v â¼ v implies δ(v) = δ(v ). Clearly, â¼ is a minimal concrete equivalence. Moreover,due to (i) it is the only one. 4.2 A Tableau Algorithm for SHOQK(D) Although ALCOK(D) is a quite powerful DL, it lacks several expressive means that can befound in most state-of-the-art description logic systems such as FaCT and RACER (Horrocks,1998; Horrocks et al., 2000; Haarslev & M¨ oller, 2001). In this section, we consider the very expressive description logic SHOQK(D) which provides for concrete domains, keyboxes, and nominals, but also for many other means of expressivity such as transitiveroles, role hierarchies, qualifying number restrictions, and general TBoxes. Modulo somedetails, SHOQK(D) can be viewed as the extension of the DL SHOQ(D) with key boxes.SHOQ(D) was proposed by Horrocks and Sattler (2001) (see also Pan & Horrocks, 2002)as a tool for ontology reasoning in the context of the semantic web (Berners-Lee, Hendler,& Lassila, 2001; Baader et al., 2002a). One very important feature of SHOQK(D) are so-called TBoxes, i.e. concept equations5 . of the form C = D that are used as a âbackground theoryâ in reasoning. Since it is well-known that combining general TBoxes and the concrete domain constructor easily leadsto undecidability (Baader & Hanschke, 1992; Lutz, 2004), SHOQK(D) only oï¬ers a path-free variant of the concrete domain constructorâi.e. only concrete features are admittedinside this constructor rather than paths of arbitrary length. This restriction indeed regainsdecidability (Haarslev et al., 2001; Horrocks & Sattler, 2001). Path-freeness of the concretedomain constructor obviously renders abstract features unnecessary, and thus this syntactictype is not available in SHOQK(D). 5. Some TBox formalisms also allow for concept inclusions C D, but these can be re-written into equivalent equations, see Section 2.2.2.5 of (Baader et al., 2003). 698
Keys, Nominals and Concrete Domains 4.2.1 The Description Logic SHOQK(D) Let us now deï¬ne SHOQK(D) in a formal way, starting with the syntax. Deï¬nition 4.6 (SHOQK(D) Syntax). A role axiom is either a role inclusion, which is ofthe form R S with R, S â NR, or a transitivity axiom Trans(R) where R â NR. A role box R is a ï¬nite set of role axioms. Let * be the reï¬exive-transitive closure of the role inclusions in R. A role name R is called simple if S * R implies Trans(S) / â R for all role names S. Let D be a concrete domain. The set of SHOQK(D)-concepts is the smallest setsuch that ⢠every concept name and every nominal is a concept, and ⢠if C and D are concepts, R is a role name, S a simple role name, n and k are natural numbers, g1, . . . , gn are concrete features, and P â ΦD is a predicate of arity n, thenthe following expressions are also concepts: ¬C, C D, C D, âR.C, âR.C, ( k S C), ( k S C), âg1, . . . , gn.P, and g1â. . A concept equation is an expression C = D with C and D concepts. A TBox is a ï¬nite setof concept equations. For SHOQK(D), we consider key boxes that diï¬er in two aspects from the ones we consid-ered for ALCOK(D): in the following, we assume key boxes to be path-free, but we admitcomplex concepts to occur in key assertions. Note that abstract features and paths donot occur in the syntax of SHOQK(D)âas will become clear after the semantics has beendeï¬ned, the former can be âsimulatedâ by the more general number restrictions ( n R C). As usual in description logics of the SHIQ/SHOQ family, we require role names in number restrictions to be simple since admitting arbitrary roles yields undecidability ofreasoning (Horrocks et al., 2000; Horrocks & Sattler, 2001). If the role box R is clear fromthe context, we will usually write Trans(R) instead of Trans(R) â R. We now introduce thesemantics of SHOQK(D) and the relevant reasoning problems. Deï¬nition 4.7 (SHOQK(D) Semantics). Interpretations I = (âI, ·I) are deï¬ned as inDeï¬nition 2.3, where the function ·I is extended to the novel SHOQK(D)-concepts asfollows: ( k R C)I := d â âI | e | (d, e) â RI ⤠k and ( k R C)I := d â âI | e | (d, e) â RI ⥠k. . Let I be an interpretation. Then I satisï¬es a concept equation C = D if CI = DI . I isa model of a TBox T if I satisï¬es all concept equations in T . Similarly, I satisï¬es a roleinclusion R S if RI â SI and a transitivity axiom Trans(R) if RI is a transitive relation. I is a model of a role box R if I satisï¬es all role inclusions and transitivity axioms in R. Let T be a TBox, R a role box, and K a key box. A concept C is satisï¬able w.r.t. T , R, and K iï¬ C, T , R, and K have a common model. C is subsumed by a concept D w.r.t.T , R, and K (written C T ,R,K D) iï¬ C I â DI for all common models I of T , R, and K. 699
Lutz, Areces, Horrocks, & Sattler Note that, due to the requirement that role names used inside number restrictions should besimple, existential and universal value restrictions are not just syntactic sugar: in contrastto number restrictions, they can be used on all roles. It is well-known that, in many expressive description logics, reasoning with TBoxes can be reduced to reasoning without them (Schild, 1991; Horrocks & Sattler, 2001): forSHOQK(D), a concept C is satisï¬able w.r.t. T , R, and K iï¬ the concept âR.C âR. . D â E âR.N D=EâT nominal N used in C, T , or K is satisï¬able w.r.t. to R , K, and the empty TBox, where R is a fresh role not appearing inC, R, and T , and R := R ⪠Trans(R) ⪠S R. role name S used in C, T , R, or K Since subsumption can be reduced to satisï¬ability as described in Section 2, in the followingwe will only consider satisï¬ability of concepts w.r.t. role boxes and key boxes, but withoutTBoxes. We will also generally assume role boxes R to be acyclic, i.e. to satisfy the followingcondition: for each role name R, there are no role names R1, . . . , Rk such that R = R1 = Rkand Ri Ri+1 â R for 1 ⤠i < k. It is not hard to see that this is not a restriction since cycles can be eliminated: if R1, . . . , Rk is a cycle in R, then we have RI = · · · = RI 1 k for all interpretations I. Thus we can simply remove the cycle from R and replace everyoccurrence of R2, . . . , Rk in C, R, and K with R1, and add Trans(R1) if, before the cycleelimination, we had Trans(Ri) for some i with 1 ⤠i ⤠n. Before we turn our attention to the construction of a tableau algorithm for SHOQK(D), let us comment on a few minor diï¬erences between SHOQK(D) as introduced here andthe original version of SHOQ(D) as described in (Horrocks & Sattler, 2001). The maindiï¬erence is that our logic, like the extensions investigated in (Haarslev et al., 2001; Pan &Horrocks, 2002), allows n-ary predicates while Horrocks and Sattler restrict themselves tounary predicates. Moreover, SHOQ(D) as introduced in (Horrocks & Sattler, 2001) usesconcrete roles rather than concrete features, the diï¬erence being that concrete roles are notnecessarily functional. Due to this non-functionality, the original SHOQ(D) admits twovariants âT.P and âT.P of the concrete domain constructor (where T is a concrete role andP a unary predicate). In SHOQK(D), we can simulate the universal variant by writingâg.P gâ since concrete features g are interpreted as partial functions and, in contrast to Horrocks and Sattler, we have the undeï¬nedness constructor gâ available. Except for then-ary predicates which provide important additional expressivity, we view these deviationsas minor ones since it is easy to see that they do not aï¬ect decidability and complexity ofreasoning. 4.2.2 A Tableau Algorithm for SHOQK(D) The basic intuitions of the SHOQK(D) tableau algorithm are similar to the ALCOK(D)algorithm, with one exception: to deal with the various expressive means of SHOQK(D), 700
Keys, Nominals and Concrete Domains ¬( n R C) ( (n â 1) R C) if n ⥠1 ¬( 0 R C) ⥠¬( n R C) ( (n + 1) R C) Figure 10: The SHOQK(D) NNF rewrite rules. it is convenient to introduce a certain abstraction of models, so-called tableaux. The maindiï¬erence between tableaux and models is that, in tableaux, roles declared to be transitiveare not necessarily described by transitive relations. We show that there exists a tableaufor a given concept and key box if and only if they have a common model. The aim of theSHOQK(D) algorithm is then to construct a tableau for its input rather than trying toconstruct a model. To do this, the algorithm employs completion forests as its underlyingdata structure. We ï¬rst introduce tableaux. Let us start by discussing some preliminaries. As for ALCOK(D), we assume all concepts and key boxes to be in NNF, i.e. negation occurs onlyin front of concept names and nominals. We again use Ë Â¬ C to denote the NNF of ¬C. The additional NNF rewrite rules for SHOQK(D) can be found in Figure 10 and complete thosegiven for ALCOK(D) in Figure 7. For a concept D, role box R, and key box K, we deï¬ne cl(D, K) := sub(D) ⪠sub(con(K)) âª Ë Â¬C | C â sub(D) ⪠sub(con(K)) cl(D, R, K) := cl(D, K) ⪠âR.C | R * S and âS.C â cl(D, K). Obviously, the cardinality of cl(D, R, K) is linear in the size of D, R, and K. In whatfollows, we write ND,R,K to denote the set of role names occurring in D, R, or K, and ND,K R cF to denote the sets of concrete features occurring in D or K. We are now ready to deï¬netableaux. Deï¬nition 4.8 (Tableau). Let D be a SHOQK(D)-concept in NNF, R a role box, and Ka path-free key box in NNF. A tableau T for D w.r.t. R and K is a tuple (Sa, Sc, L, E, e, P)such that ⢠Sa, Sc are sets of abstract and concrete individuals, ⢠L : Sa â 2cl(D,R,K) maps each abstract individual to a subset of cl(D, R, K), ⢠E : Sa à Sa â 2ND,R,K R maps pairs of abstract individuals to sets of roles, ⢠e : SaÃND,K â S cF c maps pairs of abstract individuals and concrete features to concrete individuals, ⢠P maps each n-ary concrete predicate in cl(D, R, K) to a set of n-tuples over Sc, ⢠there is an abstract individual s0 â Sa such that D â L(s0), and for all s, t â Sa, C, C1, C2 â cl(D, R, K), R, S â ND,R,K, and for R ST (s, C) := t â Sa | S â E(s, t) and C â L(t), it is the case that: 701
Lutz, Areces, Horrocks, & Sattler (T1) if C â L(s), then Ë Â¬ C / â L(s), (T2) if C1 C2 â L(s), then C1 â L(s) and C2 â L(s), (T3) if C1 C2 â L(s), then C1 â L(s) or C2 â L(s), (T4) if R â E(s, t) and R * S, then S â E(s, t), (T5) if âR.C â L(s) and R â E(s, t), then C â L(t), (T6) if âR.C â L(s), then there is some t â Sa such that R â E(s, t) and C â L(t), (T7) if âS.C â L(s) and R â E(s, t) for some R * S with Trans(R), then âR.C â L(t), (T8) if ( n S C) â L(s), then ST (s, C) n, (T9) if ( n S C) â L(s), then ST (s, C) n, (T10) if either ( n S C) â L(s) and S â E(s, t) or (g1, . . . , gn keyfor C) â K and e(t, gi) is deï¬ned for all 1 ⤠i ⤠n, then C, Ë Â¬ C â© L(t) = â , (T11) if N â L(s) â© L(t), then s = t, (T12) if âg1, . . . , gn.P â L(s), then there are x1, . . . , xn â Sc with e(s, gi) = xi and (x1, . . . , xn) â P(P ), (T13) P (x x = y is satisï¬able, P used in D,K (x 1, . . . , xn) â§ 1,...,xn)âP (P ) x=y (T14) if (g1, . . . , gn keyfor C) â K, C â L(s) â© L(t), and e(s, gi) = e(t, gi) for all 1 ⤠i ⤠n, then s = t, (T15) if gâ â L(s), then e(s, g) is undeï¬ned. Note that the predicate conjunction in (T13) uses a binary inequality predicate. In general,we do not require the concrete domain D to be equipped with such a predicate and thus thispredicate conjunction is not necessarily a D-conjunction. However, it is nevertheless âsafeâto use (T13) in the given form since tableaux are only used in proofs and we do not needa concrete domain reasoner that is capable of deciding the satisï¬ability of this conjunction.The following lemma, whose proof is provided in Appendix C, shows that our deï¬nition oftableaux provides an adequate abstraction of models. Lemma 4.2. Let D be a SHOQK(D)-concept in NNF, R a role box, and K a key box inNNF. Then D is satisï¬able w.r.t. R and K iï¬ D has a tableau w.r.t. R and K. Given Lemma 4.2, in order to decide satisï¬ability of SHOQK(D)-concepts w.r.t. role andkey boxes, we may use a (tableau) algorithm that tries to construct a tableau for the input.In the following, we will describe such an algorithm in detail. As in the previous section, the algorithm works on completion systems. However, in the case of SHOQK(D) the core component of completion systems is a completion forestrather than a completion tree. The reason for this is that some completion rules removenodes and edges from the completion system and in this way can disconnect one tree intotwo subtrees. 702
Keys, Nominals and Concrete Domains Deï¬nition 4.9 (Completion System). Let D be a SHOQK(D)-concept in NNF, R a rolebox, and K a path-free key box in NNF. For each concept ( n R C) â cl(D, R, K) and 1 ⤠i ⤠n, we reserve a concept name AnRC not appearing in cl(D, R, K) and deï¬ne an i extended closure cl+(D, R, K) := cl(D, R, K) ⪠AnRc | 1 , . . . , AnRc n ( n R C) â cl(D, R, K). Let Oa and Oc be disjoint and countably inï¬nite sets of abstract and concrete nodes. Acompletion forest for D, R, and K is a ï¬nite forest F = (Va, Vc, E, L) with nodes Va ⪠Vcsuch that Va â Oa, Vc â Oc, and all nodes from Vc are leaves. The forest is labelled asfollows: ⢠each node a â Va is labelled with a subset L(a) of cl+(D, R, K), ⢠each edge (a, b) â E with a, b â Va is labeled with a non-empty set of role names L(a, b) occurring in D, R, or K, and ⢠each edge (a, x) â E with a â Va and x â Vc is labeled with a concrete feature L(a, x) occurring in D, R, or K. A completion system for D, R, and K is a tuple S = (F, P, â¼c, ) such that ⢠F = (Va, Vc, E, L) is a completion forest for D, R, and K, ⢠P maps each n-ary concrete predicate in cl(D, R, K) to a set of n-tuples in Vc, ⢠â¼c is an equivalence relation on Vc, and ⢠is a linear ordering on Va. A node t â Va is called an R-successor of a node s â Va if, for some R with R * R, we haveR â L(s, t). A node x â Vc is called a g-successor of a node s â Va if L(s, x) = g. Finally, . we write s = t if s and t are R-successors of the same node and there is some AnRC â L(s) i and AnRC â L(t) with i = j. j Some remarks are in order here. Firstly, in contrast to the ALCOK(D) case, the relation is no longer required to respect the level of a node. This is due to the fact that (a) we have to enforce termination artiï¬cially and the mentioned property of was used to ensure âautomaticâ termination, and (b) the level of a node might change since a nodemight become a root node because some completion rules will remove nodes and edges. Secondly, the relation â¼c will be returned by a D-tester, and is used to compute a relation âa which is then used by the tableau algorithm. However, we do not need tocompute the relation âc from âa as in the ALCOK(D) case since all concepts and keyboxes are assumed to be path-free. Thirdly, the new concept names AnRC will be used to ensure that successors of a node i x generated for some ( n R C) â L(x) will not be merged later due to a concept ( n R C ) â L(x): each generated successor is labelled with a diï¬erent concept AnRC ; i since merging two nodes means unifying their node labels, it then suï¬ces to disallow the 703
Lutz, Areces, Horrocks, & Sattler occurrence of distinct concepts AnRC in the same node label through a suitable deï¬nition i of âclashâ. Since SHOQK(D) provides for transitive roles, we need some cycle-detection mechanism in order to guarantee termination of our algorithm: roughly speaking, if we encounter anode which is âsimilarâ to an already existing one, then this node does not need to befurther explored. Speaking in terms of (Horrocks et al., 2000; Baader & Sattler, 2000), weemploy a mechanism called subset blocking. Deï¬nition 4.10 (Blocked). Let be the reï¬exive closure of . A node t â Va is blocked by a node s â Va if L(t) â L(s), and s s , for all s with L(t) â L(s ). Note that, unlike to what is done, e.g., in (Horrocks et al., 2000), the blocking node is notnecessarily an ancestor of the blocked node, but can be anywhere in the forest. It may evenbe that blocked nodes have unblocked successors. This modiï¬cation is used later to obtaina NExpTime upper bound. To decide the satisï¬ability of an ALCOK(D)-concept D w.r.t. a role box R and a path- free key box K (where D and K are in NNF), the tableau algorithm is started with theinitial completion system SD = (FD, Pâ , â , â ), where FD = (s0, â , â , s0 â D) and Pâ maps each P â ΦD occurring in D and K to â . Then the algorithm repeatedly applies completion rules. Before the actual rules are given,we introduce some new notions: ï¬rstly, we deï¬ne the equivalence relation âa over Va asfollows: s âa t if one of the following conditions is satisï¬ed: ⢠N â L(s) â© L(t) for some nominal N or ⢠(g1 . . . , gn keyfor C) â K, C â L(s)â©L(t), and there are xi, yi such that gi â E(s, xi)â© E(t, yi) and xi â¼c yi for 1 ⤠i ⤠n. Intuitively, two abstract nodes related via the âa relation describe the same individual ina tableau. Secondly, we use the following abbreviations in the formulation of the rules (written in italic): ⢠To remove an abstract node s and all its incoming and outgoing edges, remove s from Va and each (s, t) and (t, s) from E for all t â Va ⪠Vc. ⢠Adding a g-successor of an abstract node s means doing nothing if there exists a g-successor x â Vc of s and, otherwise, adding E(s, x) = g for some x â Oc that doesnot yet occur in the completion forest. ⢠To update the relation â¼c, a D-tester is asked to decide the satisï¬ability of the D- conjunction ζS := P (x1, . . . , xn) â§ x = y P used in D,K xâ¼cy (x1,...,xn)âP(P ) and returns, in case that this conjunction is satisï¬able, an âupdatedâ concrete equiv-alence â¼c as deï¬ned in Deï¬nition 4.1. 704
Keys, Nominals and Concrete Domains Concerning the predicate conjunction used in updates, recall that we can w.l.o.g. assumethe concrete domain to contain an equality predicate as discussed after Deï¬nition 4.1. The completion rules are given in Figure 11. We generally assume that new nodes x are introduced into the completion forest such that y x for all already existing nodes y. Before further describing the tableau algorithm, we comment on the completion rules. Therules R , R , Râc, and Rch are non-deterministic, i.e., their application has more than onepossible outcome. For the Râc rule, this is due to the update operation performed on â¼cusing the D-tester: as discussed at the end of Section 4.1, computing a concrete equivalencefor a given D-conjunction may result in a high degree of non-determinism. Please note that,in contrast to ALCOK(D), we now only need to call the D-tester in this ruleâand not aftereach rule application. Next, the Râa rule takes care of abstract nodes related via âa. Since all the nodes from a âa-equivalence class denote the same individual, we choose only one representative whosenode label contains the labels of all other nodes in the class. This representative simplyis the -minimal node of the equivalence class and the Râa rule performs the appropriate copying of node labels. The R rule is the only rule to remove nodes and edges: it removes a surplus R-successor t of a node s with ( n R C) â L(s). Since the subtree below t is not removed, tâs successors are new, additional root nodes. This behavior is the reason why we work on a completionforest. As in the ALCOK(D) case, the tableau algorithm stops applying rules if it ï¬nds an obvious contradiction, a âclashâ, or if no more completion rules are applicable. Deï¬nition 4.11 (Clash). Let S = (F, P, â¼c, ) be a completion system for D, R and K,and F = (Va, Vc, E, L). Then S is said to contain a clash if one of the following conditionsapplies: (C1) for some concept name A â NC and some node s â Va, A, ¬A â L(s); (C2) the D-conjunction ζS deï¬ned above is not satisï¬able; . (C3) s = s for some s â Va; (C4) for some s â Va and g â NcF, we have gâ â L(s) and s has a g-successor. A completion system not containing a clash is called clash-free. The completion system iscomplete if none of the completion rules is applicable. Due to the simplicity of the algorithm, we refrain from describing it in pseudo-code nota-tion: the algorithm starts with the initial completion system and then repeatedly appliesthe completion rules, checking for clashes after each rule application. If a clash is de-tected, it returns unsatisï¬able. If a complete and clash-free completion system is found,then the algorithm returns satisï¬able. Note that, since some of the completion rules arenon-deterministic, the algorithm is also non-deterministic. Details of the proof of termination, soundness, and completeness are given in Ap- pendix C. Unfortunately, we have to leave the complexity of the algorithm as an openproblem: it is not hard to prove that it runs in double exponential time, but it is not clearwhether exponential time also suï¬ces. However, we can still use the algorithm to obtain a 705
Lutz, Areces, Horrocks, & Sattler R if C1 C2 â L(s), s is not blocked, and C1, C2 â L(s), then L(s) := L(s) ⪠C1, C2 R if C1 C2 â L(s), s is not blocked, and C1, C2 â© L(s) = â , then L(s) := L(s) ⪠C for some C â C1, C2 Râ if âR.C â L(s), s is not blocked, and s has no R-successor t with C â L(t)then create a new node t such that t t for all t â Va and set E(s, t) := R and L(t) := C R if ( n S C) â L(s), s is not blocked, and there are no n S-successors . t1, . . . , tn of s with C â L(ti) and ti = tj for 1 ⤠i < j ⤠n, then create n new nodes t1, . . . , tn s.t. t ti for 1 ⤠i ⤠n and all t â Va, and set E(s, ti) := S and L(ti) := C, AnSC for 1 ⤠i ⤠n i R if ( n S C) â L(s), s is not blocked, s has n + 1 S-successors t0, . . . , tn with C â L(ti) for 0 ⤠i ⤠n, then choose i, j such that ti tj, set L(ti) := L(ti) ⪠L(tj), L(s, ti) := L(s, ti) ⪠L(s, tj), and remove tj and all its incomingand outgoing edges Râc if âg1, . . . , gn.P â L(s), s is not blocked, and there are no gi-successors xi with (x1, . . . , xn) â P(P ) then add a gi-successor of s for each 1 ⤠i ⤠n, for yi the gi-successor of s, add (y1, . . . , yn) to P(P ), andupdate â¼c Râ if âR.C â L(s), s is not blocked, and there is an R-successor t of s with C / â L(t), then L(t) := L(t) ⪠C Râ+ if âS.C â L(s), s is not blocked, there is some R with Trans(R) and R * S, and an R-successor t of s with âR.C / â L(t), then L(t) := L(t) ⪠âR.C Rch if s is an S-successor of s and ( n S C) â L(s ) or s has gi-successors xi for all 1 ⤠i ⤠n and (g1, . . . gn keyfor C) â K ands is not blocked and C, Ë Â¬ C â© L(s) = â , then L(s) := L(s) ⪠E for some E â C, Ë Â¬ C Râa if s âa t, L(t) â L(s), s t, and s is not blocked, then set L(s) := L(s) ⪠L(t) Figure 11: The completion rules for SHOQK(D). 706
Keys, Nominals and Concrete Domains tight complexity bound for SHOQK(D): the following corollary is an easy by-product ofthe correctness proofs (for a proof see again Appendix C). Corollary 4.3. If a SHOQK(D)-concept D is satisï¬able w.r.t. a role box R and a path-freekey box K, then D is satisï¬able w.r.t. R and K in a model of size at most |âI| ⤠2m form = cl+(D, R, K). Thus the following is an alternative algorithm for deciding satisï¬ability of a SHOQK(D)-concept D w.r.t. a role box R and a path-free key box K: ï¬rst, guess an interpretationI with cardinality of âI bounded by 2m, using placeholder variables from Oc instead ofconcrete values in the interpretation of concrete features. Let Vc be the set of variablesfrom Oc occuring in I. Additionally guess an âinterpretationâ P for the concrete domainpredicates: just as in completion forests, P maps each n-ary concrete predicate used in D orK to an n-ary relation on Vc. Then perform standard (polynomial-time) model checking toensure that I is a model of D. In doing this, treat concepts of the form âg1, . . . , gn.P usingthe interpretation of predicates P. It is easily checked in polynomial time that I is also amodel of R and Kâfor the latter, assume that all placeholder variables stand for diï¬erentvalues. Finally, use the concrete domain D-tester to check whether the conjunction P (x1, . . . , xn) P used inD,K (x1,...,xn)âP(P ) is satisï¬able. Answer âyesâ if it is and ânoâ otherwise. Since this algorithm can clearly beimplemented in NExpTime provided that there is a D-tester running in non-deterministicpolynomial time, we obtain the following: Theorem 4.4. Let D be a key-admissible concrete domain such that extended D-satisï¬abilityis in NP, then SHOQK(D)-concept satisï¬ability w.r.t. TBoxes, role boxes, and path-freekey boxes is in NExpTime. 5. Conclusion In this paper, we have identiï¬ed key constraints as an interesting extension of descriptionlogics with concrete domains. Starting from this observation, we introduced a number ofnatural description logics and provided a comprehensive analysis of the decidability andcomplexity of reasoning. The main observation of our investigations is that key boxes canhave dramatic consequences on the complexity of reasoning: for example, the PSpace-complete DL ALC(D) becomes NExpTime-complete if extended with path-free, unary,Boolean key boxes and undecidable if extended with path-free, unary, non-Boolean keyboxes. Thus the eï¬ect of our key boxes on the complexity are quite diï¬erent from the eï¬ectof key assertions where only abstract features are allowed (Calvanese et al., 2000): theseabstract key assertions can be said to be âfor freeâ since they do not increase the complexityfor expressive description logics. We show that the restriction to Boolean key boxes (in the ALCOK(D) case) and to path-free key boxes (in the SHOQK(D) case) yield decidabile and NExpTime-completereasoning problems. We selected ALC(D) and SHOQ(D) as the basis for our analysis since, 707
Lutz, Areces, Horrocks, & Sattler in our opinion, these are the most fundamental description logics with concrete domains.Going one step further, it would be interesting to combine key boxes with other extensionsof concrete domains, such as the ones presented by Lutz (2003, 2004). To name only onepossibility, the extension of both ALCOK(D) and SHOQ(D) with inverse roles seems tobe a natural idea. Note that inverse roles interact with several of the available means ofexpressivity: while ALC with inverse roles is PSpace complete (Horrocks, Sattler, & Tobies,1999), ALCO with inverse roles is ExpTime-complete (Areces et al., 1999) and ALC(D)with inverse roles even NExpTime-complete (Lutz, 2004). Other options for future research are more closely related to the material presented in this paper. For example, is SHOQK(D)-concept satisï¬ability still decidable if we dropthe requirement of key boxes to be path-free? Moreover, we had to leave the exact timerequirements of our tableau algorithm as an open problem. If this algorithm runs in (non-deterministic) exponential time, it directly yields Theorem 4.4 rather than via a boundedmodel property. Acknowledgments We would like to thank the anonymous reviewers for valuable comments. This paper is anextended version of (Lutz, Areces, Horrocks, & Sattler, 2003). Appendix A. Proofs of Section 3.2 We prove that D2-satisï¬ability can be decided in PTime. Proposition A.1. D2-satisï¬ability is in PTime. Proof. Let c be a D2-conjunction. We show that c is satisï¬able iï¬ none of the followingconditions applies: 1. c contains a conjunct â¥D (x); 2 2. c contains conjuncts bit0i (x) and bit1i (x); k k 3. c contains conjuncts bitni (x) and bitmj(x) with k = ; k 4. c contains conjuncts bitni (x) and bitni (x); k k 5. c contains conjuncts bitni (x), bit0j (x), and bit1j (x). k k k It is easily seen that c is unsatisï¬able if one of the conditions applies. Assume now thatConditions 1 to 5 do not apply to c and let X be the set of variables used in c. For eachx â X, set t(x) = k if bitni (x) â c for some n, i â N.6 If bitni (x) / â c for all n, i, k â N, then k k set t(x) = r for some r not appearing as an index ·r to a predicate in c. The mapping t iswell-deï¬ned since c is ï¬nite, Condition 3 does not apply, and the only predicates availableare bitni (·), ⥠(·), and (·). We deï¬ne a solution δ for c as follows: for each x â X, set k D2 D2 δ(x) to the bit vector v â BVt(x) in which the iâth bit is 1 if bit1i (x) â c or bit0i (x) â c, t(x) t(x) and 0 otherwise. It remains to prove that δ is indeed a solution for c: 6. We use âP (x) â câ as an abbreviation for âP (x) is a conjunct in câ. 708
Keys, Nominals and Concrete Domains ⢠Let bit0i (x) â c. Then t(x) = k and thus δ(x) â BV k k . Since Condition 2 does not apply, we have bit1i (x) / â c. Moreover, non-applicability of Condition 4 implies k bit0i (x) / â c. By deï¬nition of δ, the iâth bit of δ(x) is thus 0. k ⢠Let bit1i (x) â c. Then t(x) = k and δ(x) â BV k k . By deï¬nition of δ, the iâth bit of δ(x) is 1. ⢠Let bit0i (x) â c. If t(x) = k, then δ(x) / â BV )D2 and we are k k . Thus δ(x) â (bit0ik done. If t(x) = k, then the iâth bit of δ(x) is 1 by deï¬nition of δ and thus again δ(x) â (bit0i )D2 . k ⢠Let bit1i (x) â c. If t(x) = k, then δ(x) / â BV k k and we are done. If t(x) = k, then bitnj (x) â c for some n, j â N. Since Condition 5 does not apply, we thus have k bit0i (x) / â c. Moreover, non-applicability of Condition 4 yields bit1i (x) / â c. Thus, by k k deï¬nition of δ, the iâth bit of δ(x) is 0. It is obvious that the listed properties can be checked in polynomial time. Appendix B. Proofs of Section 4.1 We prove termination, soundness, and completeness of the ALCOK(D) tableau algorithmpresented in Section 4.1, starting with termination. We start with establishing a few notionsand technical lemmas. Let C be a concept and K a key box. We use |C| to denote the length of C, i.e. the number of symbols used to write it down, and |K| to denote |C|. For (u1,...,uk keyfor C)âK a path u = f1 · · · fkg, we use |u| to denote k + 1. The role depth of concepts is deï¬nedinductively as follows: rd(A) = rd(N ) = rd(gâ) = 0rd(âu1, . . . , un.P ) = max|ui| | 1 ⤠i ⤠n â 1 rd(C D) = rd(C D) = maxrd(C), rd(D) rd(âR.C) = rd(âR.C) = rd(C) + 1. The following series of lemmas will eventually allow us to prove termination. Lemma B.1. There is a constant k such that, if the tableau algorithm is started on inputC0, K and T = (Va, Vc, E, L) is a completion tree constructed during the run of the algorithm,then Va ⤠2|C0|k and Vc ⤠2|C0|k . Proof. Using induction on the number of rule applications and a case distinction accordingto the applied rule, it is straightforward to show that C â L(a) implies rd(C) ⤠|C0| â levT(a) (â) for all constructed completion trees T. We omit the details but note that, (1) for treatingthe Rch rule, one needs to employ the fact that K is Boolean and thus only adds conceptsof role depth 0 to node labels, and (2) for treating the Rp rule, we use that a b implies levT(a) ⤠levT(b). 709
Lutz, Areces, Horrocks, & Sattler This implies an upper bound on the depth of constructed completion trees: ï¬rst, only the Râ and Râc rules generate new nodes, and an application of either rule to a nodea â Va implies L(a) = â and thus levT(a) ⤠|C0| by (â). Second, each new (abstract orconcrete) node b generated by an application of these rules to a node a â Va clearly satisï¬eslevT(b) ⤠levT(a) + max(1, mpl(C0)), where mpl(C0) denotes the maximum length of pathsin C0 (note that concepts in K may not contain any paths since it is Boolean). Sincempl(C0) ⤠|C0|, the above observations imply that the depth of constructed completiontrees is bounded by 2 · |C0|. Now for the out-degree. If a node a is generated, then this is due to the application of a rule Râ or Râc and, initially, a has at most one successor. Let us analyze the numberof successors generated by later applications of the rules Râ and Râc to a: these rulescan be applied at most once for each concept of the form âR.C or âu1, . . . , un.P in L(a).By deï¬nition of cl(C0, K) and since K is Boolean, the number of such concepts per nodelabel is bounded by sub(C0) ⤠|C0|. Moreover, each rule application creates at most |C0|successors. Hence, the out-degree of constructed completion trees is bounded by |C0|2 + 1. Lemma B.2. There is a constant k such that, if the tableau algorithm is started with C0, K,then, in every recursion step, the while loop terminates after at most 2|C0|k steps. Proof. Fix an argument S = (T, P, , â¼) with T = (Va, Vc, E, L) passed to the sat function, let â¼1, â¼2, . . . be the sequence of concrete equivalences computed in the while loop, and letâ1c, â2c, . . . be the corresponding âc relations. Since test(ζS) calls a D-tester, each of thesecalls indeed terminates. We show that â¼1 â¼2 · · · , (â) which implies Lemma B.2: by Lemma B.1, there exists a constant k such that Vc ⤠2|C0|k .Hence, we have â¼i ⤠22·|C0|k which, together with (â), implies that the number of stepsperformed by the while loop is also bounded by 22·|C0|k . Now for the proof of (â). If the while loop reaches the i-th step, then we had â¼iâ1 = âiâ1 c after step i â 1. Since â¼iâ1 â âiâ1 c by deï¬nition, this implies â¼iâ1 âiâ1 c . By deï¬nition of ζS, it is easy to see that âiâ1 â â¼ c i for i ⥠1. Hence â¼iâ1 â¼i. Lemma B.3. There is a constant k such that, if the tableau algorithm is started with C0, K,then the number of recursive calls is bounded by 2(|C0|+|K|)k . Proof. It obviously suï¬ces to establish an appropriate upper bound on the number of ruleapplications. The R , R , Râ, and Râc rules can be applied at most once for each conceptin a node label. By Lemma B.1, the number of nodes is at most exponential in |C0| + |K|.Since neither nodes nor concepts in node labels are ever deleted, the fact that node labelsare subsets of cl(C0, K) thus implies that the number of applications of these rules is at mostexponential in |C0| + |K|. The same holds for the rules Râ and Rp, which can be appliedat most once for every concept C â cl(C0, K) and every pair of (abstract) nodes. Finally,the number of Rch applications is at most exponential in |C0| + |K| since this rule can beapplied at most once for every abstract node and every key assertion in K. 710
Keys, Nominals and Concrete Domains Termination is now an obvious consequence of Lemmas B.2 and B.3. Corollary B.4 (Termination). The tableau algorithm terminates on any input. Let us now prove soundness of the algorithm. Lemma B.5 (Soundness). If the tableau algorithm returns satisï¬able, then the input conceptC0 is satisï¬able w.r.t. the input key box K. Proof. If the tableau algorithm returns satisï¬able, then there exists a complete and clash-free completion system S = (T, P, , â¼) for C0. Let T = (Va, Vc, E, L). By deï¬nition of the tableau algorithm, there is a completion system S = (T, P, , â¼ ) such that a call to test(ζS ) returned â¼. Moreover, we have â¼ = âc in S. Thus, there exists a solution δ forζS such that δ(x) = δ(y) iï¬ x âc y. (â ) Clearly, δ is also a solution for ζS: since the second component P is the same in S and S ,δ is a solution for the ï¬rst part P (x1, . . . , xn) of ζS. Moreover, for P used in C (x1,...,xn)âP(P ) each conjunct =(x, y) from the second part of ζS, we have x âc y by deï¬nition of ζS andthus δ(x) = δ(y) by (â ). We now use S and δ to construct an interpretation I by setting âI = a â Va | there is no b â Va such that a âa b and b a ⪠w AI = a â âI | A â L(a) a â â N I = I | N â L(a) if there is an a â âI such that N â L(a) w otherwise RI = (a, b) â âI à âI | there are a , b â Va such that a âa a , b âa b , and b is an R-successor of a gI = (a, δ(x)) â âI à âD | x is a g/âa-neighbor of a for all A â NC, N â NO, R â NR, and g â NcF. We ï¬rst show that I is well-deï¬ned: ⢠N I is a singleton for each N â NO. Assume that there exist a, b â âI such that a = b and N â L(a) â© L(b). By deï¬nition of âa (Deï¬nition 4.3), N â L(a) â© L(b) impliesa âa b. This, together with a, b â âI, yields a b and b a, contradicting being a linear ordering. ⢠f I is functional for each f â NaF. Assume that there exist a, b, c â âI such that (a, b), (a, c) â f I and b = c. Then there exist a1, a2, b , c â Va such that a âa a1 âaa2, b âa b , c âa c , b is an f -successor of a1, and c is an f -successor of a2. Bydeï¬nition of âa, we thus have b âa c implying b âa c. Since b, c â âI, this yieldsb c and c b, a contradiction. ⢠gI is functional for each g â NcF. Assume that there exist an a â âI and x, y â Vc such that (a, δ(x)), (a, δ(y)) â f I and δ(x) = δ(y). Then x and y are both g/âa-neighbors of a. By deï¬nition of âc, we thus have x âc y implying δ(x) = δ(y) by (â ),a contradiction. 711
Lutz, Areces, Horrocks, & Sattler Now we show the following claim. In the proof, we use the notion of f1 · · · fk/âa-neighbors(with f1, . . . fk abstract features), which is deï¬ned analogously to u/âa-neighbors for paths u. Claim 1: For all a â âI and all paths u, we have uI(a) = α iï¬ there is a ui/ âa-neighborx of a with δ(x) = α. Proof: Let u = f1 · · · fkg. Using induction on i it is easily proved that, for all i and allb â âI, we have f I(· · · (f I(a)) · · · ) = b iï¬ there is a f i 1 1 · · · fi/âa-neighbor b of a with b âa b . Thus we have in particular that f I(· · · (f I(a)) · · · ) = b iï¬ there is an f k 1 1 · · · fk/âa- neighbor b of a with b âa b . To prove the claim, it hence remains to use the deï¬nitionof gI together with (â ). The following claim is central for showing that I is a model for C0 and K. Claim 2: For all a â âI and C â cl(C0, K), if C â L(a), then a â CI. Since C0 is in the label of the root node, Claim 2 clearly implies that I is a model for C0.Moreover, we can use it to prove that I satisï¬es all key assertions (u1, . . . , un keyfor C) inK: ï¬x a, b â CI such that uI (a) = uI (b) for 1 ⤠i ⤠n. Non-applicability of Rch yields i i C, Ë Â¬ C â© L(a) = â . If Ë Â¬ C â L(a), then Claim 2 implies a â ( Ë Â¬ C)I in contradiction to a â CI . Thus we obtain C â L(a). In an analogous way, we can argue that C â L(b). SinceuI (a) and uI (b) are deï¬ned for 1 ⤠i ⤠n, Claim 1 yields that a has a u i i i/âa-neighbor xi with δ(xi) = uI(a) and b a u (b) for 1 ⤠i ⤠n. Thus the i i/âa-neighbor yi with δ(yi) = uI i fact that uI (a) = uI (b) yields δ(x i i i) = δ(yi) for 1 ⤠i ⤠n. By (â ) we obtain xi âc yi and thus xi â¼ yi for 1 ⤠i ⤠n. By deï¬nition of âa, we thus get a âa b. Since a, b â âI, weobtain a b and b a by deï¬nition of âI and thus a = b. It remains to prove Claim 2, which we do using structural induction: ⢠C is a concept name or a nominal. Easy by construction of I. ⢠C = ¬D. Since C â cl(C0, K), C is in NNF and D is a concept name. Since S is clash-free, C â L(a) implies D / â L(a). Thus, a / â DI by construction of I, which yields a â (¬D)I . ⢠C = âu1, . . . , un.P . Since the Râc rule is not applicable, there exist x1, . . . , xn â Vc such that xi is a ui/âa-neighbor of a for 1 ⤠i ⤠n and (x1, . . . , xn) â P(P ). Claim 1yields uI (a) = δ(x i i) for 1 ⤠i ⤠n. Since (x1, . . . , xn) â P (P ) and δ is a solution for ζS, we have (δ(x1), . . . , δ(xn)) â P D and thus a â CI. ⢠C = gâ. Since S is clash-free, there exists no x â Vc such that x is g/âa-neighbor of a. Thus by Claim 1 there is no α such that (a, α) â gI . ⢠C = D E or C = D E. Straightforward using completeness and the induction hypothesis. ⢠C = âR.D. Since the Râ rule is not applicable, a has an R/âa-neighbor b such that D â L(b). Let b be minimal w.r.t. such that b âa b . By deï¬nition of I, we have (a, b ) â RI . Non-applicability of the Rp rule yields D â L(b ). By induction, we getb â DI and thus a â CI . 712
Keys, Nominals and Concrete Domains ⢠C = âR.D. Let (a, b) â RI . By deï¬nition of I, this implies that there exist a , b â Va such that a is minimal w.r.t. and a âa a , b is minimal w.r.t. and b âa b , and b is an R-successor of a . Since b is clearly an R/âa-neighbor of a, non-applicability ofRâ yields D â L(b ), which implies D â L(b) due to the non-applicability of Rp. Byinduction, we get b â DI . Since this holds independently of the choice of b, we obtaina â (âR.D)I . Lemma B.6 (Completeness). If the input concept C0 is satisï¬able w.r.t. the input key boxK, then the tableau algorithm returns satisï¬able. Proof. Let I be a model of C0 and K. We use I to âguideâ (the non-deterministic partsof) the algorithm such that it constructs a complete and clash-free completion system. Acompletion system S = (T, P, , â¼) with T = (Va, Vc, E, L) is called I-compatible if there exist mappings Ï : Va â âI and Ï : Vc â âD such that (Ca) C â L(a) â Ï(a) â CI (Cb) b is an R-successor of a â (Ï(a), Ï(b)) â RI (Cc) x is a g-successor of a â gI (Ï(a)) = Ï (x) (Cd) (x1, . . . , xn) â P(P ) â (Ï (x1), . . . , Ï (xn)) â P D (Ce) x â¼ y â Ï (x) = Ï (y). We ï¬rst establish the following claim: Claim 1: If a completion system S is I-compatible, then (i) a âa b implies Ï(a) = Ï(b)and (ii) x âc y implies Ï (x) = Ï (y). Proof: We show by induction on i that a âia b implies Ï(a) = Ï(b) (see Deï¬nition 4.3),which yields (i). ⢠Start. If a â0a b, then there exists a nominal N such that N â L(a) â© L(b). By (Ca) we obtain Ï(a) â N I and Ï(b) â N I , which yields Ï(a) = Ï(b) by deï¬nition of thesemantics. ⢠Step. For a âia b, we distinguish three cases: 1. If a âiâ1 a b, then Ï(a) = Ï(b) by induction. 2. There is a c â Va and an f â NaF such that both a and b are f /âiâ1 a -neighbors of c. Hence, there exist c1, c2 â Va such that c âiâ1 a c1 âiâ1 a c2, a is an f -successor of c1, and b is an f -successor of c2. By induction, we have Ï(c) = Ï(c1) = Ï(c2).Thus (Cb) yields (Ï(c), Ï(a)), (Ï(c), Ï(b)) â f I , which implies Ï(a) = Ï(b) bydeï¬nition of the semantics. 3. There exist (u1, . . . , un keyfor C) â K, ui/âiâ1 a -neighbors xi of a and ui/âiâ1 a - neighbors yi of b for 1 ⤠i ⤠n such that C â L(a)â©L(b) and xi â¼ yi for 1 ⤠i ⤠n.(Ca) yields a, b â CI . Using induction, (Cb), and (Cc), it is straightforward to 713
Lutz, Areces, Horrocks, & Sattler show that uI (Ï(a)) = Ï (x (Ï(b)) = Ï (y i i) and uI i i) for 1 ⤠i ⤠n. By (Ce), this implies uI (Ï(a)) = uI (Ï(b)) for 1 ⤠i ⤠k. Since I is a model of the key box K, i i this yields Ï(a) = Ï(b) by deï¬nition of the semantics. Now for Part (ii) of Claim 1. If x âc y, then either x â¼ y or there is an a â Va and a g â NcFsuch that both x and y are g/âa-neighbors of a. In the former case, (Ce) yields Ï (x) = Ï (y).In the latter case, Part (i) of the claim and (Cc) yields (Ï(a), Ï (x)), (Ï(a), Ï (y)) â gI whichimplies Ï (x) = Ï (y). This ï¬nishes the proof of Claim 1. We can now show that the completion rules can be applied such that I-compatibility ispreserved. Claim 2: If a completion system S is I-compatible and a rule R is applicable to S, then Rcan be applied such that an I-compatible completion system S is obtained. Proof: Let S be an I-compatible completion system, let Ï and Ï be functions satisfying(Ca) to (Ce), and let R be a completion rule applicable to S. We make a case distinctionaccording to the type of R. R The rule is applied to a concept C1 C2 â L(a). By (Ca), C1 C2 â L(a) implies Ï(a) â (C1 C2)I and hence Ï(a) â CI and Ï(a) â CI. Since the rule adds C 1 2 1 and C2 to L(a), it yields a completion system that is I-compatible via the same Ï and Ï . R The rule is applied to C1 C2 â L(a). C1 C2 â L(a) implies Ï(a) â CI or Ï(a) â CI. 1 2 Since the rule adds either C1 or C2 to L(a), it can be applied such that it yields acompletion system that is I-compatible via the same Ï and Ï . Râ The rule is applied to a concept âR.C â L(a), generates a new R-successor b of a and sets L(b) = C. By (Ca), we have Ï(a) â (âR.C)I and, hence, there exists a d â âIsuch that (Ï(a), d) â RI and d â CI . Set Ï := Ï âª b â d. It is readily checkedthat the resulting completion system is I-compatible via Ï and Ï . Râ The rule is applied to a concept âR.C â L(a) and adds C to L(b) of an existing R/âa-neighbor b of a. Hence, there exists an a such that a âa a and b is an R-successor of a . By Part (i) of Claim 1, we have Ï(a) = Ï(a ). Thus, by (Ca) we haveÏ(a ) â (âR.C)I while (Cb) yields ((Ï(a ), Ï(b)) â RI . By deï¬nition of the semantics,Ï(b) â CI and thus the resulting completion system is I-compatible via Ï and Ï . (i) (i) Râc The rule is applied to a concept âu1, . . . , un.P â L(a) with ui = f · · · f g 1 k i for i (i) 1 ⤠i ⤠n. The rule application generates new abstract nodes b and x j j for 1 ⤠i ⤠n and 1 ⤠j ⤠ki such that (i) (i) â b is an f -successor of a for 1 ⤠i ⤠n, 1 1 (i) (i) (i) â b is an f -successor of b for 1 ⤠i ⤠n and 2 ⤠j ⤠k j j jâ1 i, (i) â xi is gi-successor of b for 1 ⤠i ⤠n, and ki â (x1, . . . , xn) â P(P ). 714
Keys, Nominals and Concrete Domains (i) By (Ca), we have Ï(a) â (âu1, . . . , un.P )I. Hence, there exist d â â j I for 1 ⤠i ⤠n and 1 ⤠j ⤠ki and α1, . . . , αn â âD such that (i) (i) â (Ï(a), d ) â (f )I for 1 ⤠i ⤠n, 1 1 (i) (i) (i) â (d , d ) â (f )I for 1 ⤠i ⤠n and 2 ⤠j ⤠k jâ1 j j i, (i) â gI (d ) = α i k i for 1 ⤠i ⤠n, and i â (α1, . . . , αn) â P D. (i) (i) Set Ï := Ï âª b â d and Ï := Ï âª x 1â¤iâ¤n and 1â¤jâ¤k i â αi. i j j 1â¤iâ¤n The resulting completion system is I-compatible via Ï and Ï . Rch The rule is applied to an abstract node a and a key assertion (u1, . . . , un keyfor C) â K and non-deterministically adds either C or Ë Â¬ C. By deï¬nition of the semantics, Ï(a) â CI or Ï(a) â ( Ë Â¬ C)I . Hence, Rch can be applied such that the resulting completion system is I-compatible via Ï and Ï . Rp The rule is applied to a concept C â L(a) and adds C to the label L(b) of a node b with a âa b. By (Ca), we have Ï(a) â CI. Since Claim 1 yields Ï(a) = Ï(b), it followsthat the resulting completion system is I-compatible via Ï and Ï . Finally, we show that I-compatibility implies clash-freeness. Claim 3: Every I-compatible completion system is clash-free. Proof: Let S = (T, P, , â¼) be an I-compatible completion system. Consider the three kinds of a clash: ⢠Due to (Ca), a clash of the form A, ¬A â L(a) clearly contradicts the semantics. ⢠Assume that there are a â Va and x â Vc such that gâ â L(a) and x is g/âa-neighbor of a. Then there exists b â Va such that a âa b and x is g-successor of b. By Claim 1,a âa b implies Ï(a) = Ï(b). Thus, gâ â L(a) and (Ca) give Ï(b) â (gâ)I. We obtain acontradiction since (Cc) yields (Ï(b), Ï (x)) â gI . ⢠Properties (Cd) and (Ce) and Part (ii) of Claim 1 imply that Ï is a solution for ζS. Thus, S is concrete domain satisï¬able. We can now describe the âguidanceâ of the tableau algorithm by the model I in detail:we ensure that, at all times, the considered completion systems are I-compatible. Thisobviously holds for the initial completion system SC = (T , P = (a 0 C0 â , â , â ) with TC0 0, â , â , a0 â C ). We guide the non-deterministic test function such that, when given a predicate conjunctionζS with set of variables Vc â Oc as input, it returns the relation â¼ deï¬ned by setting x â¼ yiï¬ Ï (x) = Ï (y) for all x, y â V . The relation â¼ is a concrete equivalence since Ï is a solutionfor ζS (see above). With this guidance, (Ce) is obviously satisï¬ed after each call to test, andthe other properties are not aï¬ected by such a call. According to Claim 2, we can apply the 715
Lutz, Areces, Horrocks, & Sattler completion rules such that I-compatibility is preserved. By Corollary B.4, the algorithmalways terminates, hence also when guided in this way. Since, by Claim 3, we will not ï¬nda clash, the algorithm returns satisï¬able. The tableau algorithm yields decidability and a tight upper complexity bound for ALCOK(D)-concept satisï¬ability w.r.t. key boxes. Theorem B.7 (Theorem 4.1 of Section 4.1). Let D be a key-admissible concrete domain.If extended D-satisï¬ability is in NP, then ALCOK(D)-concept satisï¬ability w.r.t. Booleankey boxes is in NExpTime. Proof. Corollary B.4 and Lemmas B.5 and B.6 yield decidability of ALCOK(D)-conceptsatisï¬ability w.r.t. Boolean key boxes. For complexity, Lemma B.3 provides an exponentialbound on the number of recursive calls. Hence, it remains to show that each single recursionstep needs at most exponential time. By Lemma B.2, the while loop terminates after at mostexponentially many steps. In each such step, we compute the relations âa and âc, whichare used in the construction of the predicate conjunction ζS and for checking terminationof the while loop. Since, by Lemma B.1, there exists an exponential bound on the numberof abstract and concrete nodes in the completion system S, this can obviously be done inexponential time. Moreover, Lemma B.1 implies that the size of ζS is at most exponential.This together with the fact that extended D-satisï¬ability is in NP implies that the call totest needs at most exponential time. All remaining tasks (checking for clashes, completeness,and rule applicability) can clearly also be performed in exponential time. Appendix C. Proofs of Section 4.2 We ï¬rst provide the proof of Lemma 4.2 which shows that the notion of tableaux introducedin Section 4.2 is an adequate abstraction of models. Lemma C.1 (Lemma 4.2 in Section 4.2). Let D be a SHOQK(D)-concept in NNF, R arole box, and K a path-free key box in NNF. Then D is satisï¬able w.r.t. R and K iï¬ D hasa tableau w.r.t. R and K. Proof. For the âonly-ifâ direction, we construct a tableau T from a common model I of D,R, and K as follows: Sa := âI Sc := x â âD | gI(s) = x for some s â Sa L(s) := C â cl(D, R, K) | s â CI E(s, t) := S â ND,R,K | (s, t) â SI R e(s, g) := gI (s) if gI (s) is deï¬ned P(P ) := (x1, . . . , xn) â Sn | c (x1, . . . , xn) â P D. It can be easily veriï¬ed that T is a tableau for D w.r.t. R and K: the proof that T satisï¬es(T1) â (T9) is identical to the corresponding cases in (Horrocks et al., 2000; Horrocks &Sattler, 2001); (T10) holds by deï¬nition of L; (T11) by deï¬nition of L and the fact thatnominals are interpreted as singleton sets; (T12) by deï¬nition of L, e, and P together with 716
Keys, Nominals and Concrete Domains the semantics of concepts âg1, . . . , gn.P ; (T13) since the identity function on Sc is clearly asolution for the listed predicate conjunction; (T14) by deï¬nition of L and e together withthe semantics of key constraints; and ï¬nally (T15) by deï¬nition of L and e together withthe semantics of concepts gâ. For the âifâ direction, let T = (Sa, Sc, L, E, e, P) be a tableau for D w.r.t. R and K and let δ be a solution for the predicate conjunction in (T13). We construct a model I for Das follows: âI := Sa AI := s â âI | A â L(s) for concept names A N I := s â âI | N â L(s) for nominals N SI ⪠(s, t) | R â E(s, t) for R â N S * R R NcF with not Trans(R) RI := S=R (s, t) | R â E(s, t)+ for R â NR NcF with Trans(R) δ(x) if e(s, g) = x gI (s) := for g â N undeï¬ned if e(s, g) is undeï¬ned cF. Due to (T11), the interpretation of nominals is a singleton. Moreover, the interpretation ofroles is well-deï¬ned since role boxes are acyclic. The following claim is central for provingthat I is indeed a model for D, R, and K: Claim: For each C â cl(D, R, K), C â L(s) implies s â CI . Proof: We proceed by induction on the norm ||C|| of C, which is deï¬ned as follows: ||A|| := ||¬A|| := 0 for A concept name ||gâ|| := ||âu1, . . . , un.P || := 0 ||C1 C2|| := ||C1 C2|| := 1 + ||C1|| + ||C2|| ||( n R C)|| := ||( n R C)|| := 1 + ||C|| For concept names A and nominals N , the claim follows by deï¬nition of AI and N I . For thenegation of concept names A and nominals N (note that C is in NNF), the claim follows bydeï¬nition of AI and N I together with (T1). Concepts C of the form C1 C2 and C1 C2 can be treated using (T2) and (T3) together with the induction hypothesis. For existential,universal, and number restrictions, the proof is analogous to the one for SHIQ in (Horrockset al., 2000). For concepts of the form C = âg1, . . . gn.P â L(s), s â CI is an immediateconsequence of (T12), the deï¬nition of gI , and the fact that (x i 1, . . . , xn) â P (P ) implies (δ(x1), . . . , δ(xn)) â P D by (T13). Finally, for concepts C = gâ, s â CI is an immediateconsequence of the deï¬nition of gI together with (T15). This ï¬nishes the proof of theclaim. By deï¬nition of tableaux, there exists an s0 â Sa such that C â L(s0). By the claim,s0 â CI and thus I is a model of C. Next, we show that I is a model of R. By deï¬nition of RI , it is obvious that Trans(R) â R implies that RI is a transitive relation. Now let S R â R. If Trans(R) / â R, then we have SI â RI by deï¬nition of RI . Now let Trans(R) â R and (s, t) â SI . If S â E(s, t),then (T4) implies R â E(s, t), and thus (s, t) â RI . Otherwise, there is an S * S withTrans(S ) â R and (s, t) â (u, v) | S â E(u, v)+. Now (T4) together with S * R implies 717
Lutz, Areces, Horrocks, & Sattler that (u, v) | S â E(u, v) â (u, v) | R â E(u, v), and thus Trans(R) â R implies that(s, t) â RI . It remains to show that I is a model of K. To this end, let (g1, . . . , gn keyfor C) â K and s, t â CI such that gI (s) = gI (t) for 1 ⤠i ⤠n. Since the predicate conjunction in i i (T13) contains explicit inequalities for all distinct concrete individuals, this implies thate(s, gi) = e(t, gi) for 1 ⤠i ⤠n. (T10) implies C, ˬ C â© L(s) = â and C, ˬ C â© L(t) = â .If Ë Â¬ C â L(s), then the claim yields s â ( Ë Â¬ C)I contradicting s â CI . Thus we obtain C â L(s) and, in a similar way, C â L(t). Finally, (T14) implies that s = t, and thus Isatisï¬es K. We now proceed to prove termination, soundness, and completeness of the tableau algorithmpresented in Section 4.2, starting with termination. In the following, we use |D, R, K| todenote | cl+(D, R, K)|. Recall that this number is polynomial in the size of D, R, K. Lemma C.2 (Termination). Let D be a key-admissible concrete domain. When startedwith a SHOQK(D) concept D in NNF, a role box R, and a path-free key box K in NNF,the tableau algorithm terminates. Proof. Assume that there are D, R, and K such that the tableau algorithm does not ter-minate. Since D is key-admissible, this means that there is an inï¬nite sequence S0, S1, . . .of completion systems such that (a) S0 is the initial completion system SD and (b) Si+1 isthe result of applying a completion rule to Si. This is only possible if the Râ or the R rules are applied inï¬nitely often: it is easily seen that the rules R , R , R , Râc, Râ, Râ+, Rch, and Râa can only be applied ï¬nitely oftento completion systems whose set of abstract nodes Va does not increase since they eitheradd concepts into node labels (whose size is bounded), or they add concrete nodes (whosenumber is bounded linearly by the number of abstract nodes), or they remove abstractnodes from the forest. Hence there is a sub-sequence Si , S , . . . of S 1 i2 0, S1, . . . such that Si is the result of applying the Râ or the R rule to S j ij â1. Let si be the abstract node to which the Râ or the R rule was applied in Si â1. Since s t implies that t was not generated before s, the linear ordering is well-founded. Thus, we ï¬nd an inï¬nite sub- sequence Sj , S , . . . of S , S , . . . such that either s for each ⥠1 or s 1 j2 i1 i2 j = sj +1 j sj +1 for each ⥠1. The former, however, is not possible since the Râ and the R rules are applied at most once per node and concept in cl(D, R, K): even if a node is removed, thelabel copying performed by the R rule together with clashes of type (C3) ensures that the R rule is not re-applied to the same concept and node. Thus only the second option remains: there is a subsequence Sj , S , . . . of S , S , . . . such that s for each 1 j2 i1 i2 j sj +1 ⥠1. Let Lj be the labeling function in Sj. Since each abstract node is labeled with a subset Lj of cl+(D, R, K), there are nodes sj s (s ) = L k j with k and Ljk jk j (sj ). Now node labels can only increase and, if a node t is removed, its label is conjoined to thelabel of a node Ë t with Ë t t. Thus there is a node t in the completion system Sj with t sj and Lj (sj ) â Lj (t). By deï¬nition, sj is thus blocked in Sj , contradicting the assumption that the Râ or the R rule is applied to sj in Sj . Lemma C.3 (Soundness). If the expansion rules can be applied to a SHOQK(D) conceptD in NNF, a role box R, and a path-free key box K such that they yield a complete andclash-free completion forest, then D has a tableau w.r.t. R and K. 718
Keys, Nominals and Concrete Domains Proof. Let S = ((Va, Vc, E, L), P, â¼c, ) be a complete and clash-free completion system.We can ï¬nd a solution δ for ζS such that δ(x) = δ(y) iï¬ x â¼c y: only the Râc rule updatesthe predicate conjunction ζS, and after each rule application the â¼c relation is updatedusing the concrete equivalence that a D-tester returns for ζS (note that ζS is satisï¬able dueto clash-freeness). According to Deï¬nition 4.1, we can thus ï¬nd a solution δ as required.From S and δ, we deï¬ne a ï¬nite tableau T = (Sa, Sc, Ë E, Ë L, Ë P) as follows: Sa := s â Va | s occurs in S and is not blocked Sc := δ(x) | (s, x) â E(g) for some s â Sa and some g Ë L(s) := L(s) â© cl(D, R, K) (the intersection is due to the auxiliary concepts AnRC i ), Ë E(s, t) := R | t is an R-successor of s or t blocks an R-successor t of s δ(x) if x is a g-successor of s e(s, g) := undeï¬ned if x has no g-successor Ë P := the restriction of P to Sc. Note that the function e is well-deï¬ned due to the deï¬nition of adding g-successors. Itremains to show that T satisï¬es (T1)â(T14), which is basically a consequence of S beingclash-free and complete. ⢠(T1) is satisï¬ed since S does not contain a clash (C1). ⢠(T2) is satisï¬ed since the R rule cannot be applied, and thus C1 C2 â Ë L(s) implies C1, C2 â Ë L(s). ⢠(T3) is satisï¬ed since the R rule cannot be applied, and thus C1 C2 â Ë L(s) implies C1, C2 â© Ë L(s) = â . ⢠For (T4), consider s, t â Sa with R â Ë E(s, t) and R * R . Then R â Ë E(s, t) implies that t is or blocks an R-successor of s. By deï¬nition of âsuccessorâ, t is or blocks anR -successor of s, and thus R â Ë E(s, t). ⢠For (T5), let âR.C â Ë L(s) and R â Ë E(s, t). If t is an R-successor of s, then s not being blocked implies C â L(t) since the Râ rule cannot be applied. If t blocks anR-successor t of s, then s not being blocked and the fact that the Râ rule cannot beapplied yields C â L(t ), and the blocking condition implies C â L(t). In both cases, we thus have C â Ë L(t). ⢠(T6) and (T7) are satisï¬ed for the same reasons as (T5) with Râ replaced with Râ and Râ+. ⢠For (T8), consider s with ( n R C) â Ë L(s). Hence ( n R C) â L(s) and completeness of S implies the existence of R-successors t1, . . . , tn of s with C â L(ti) . and ti = tj for i = j. The latter implies, for each i = j, the existence of integers k,such that k = , AnRC â L(t k i), and AnRC â L(tj ). For (T8) to be satisï¬ed, it remains to verify that â no ti can block a tj: if this was the case, the blocking condition would imply that AnRC , AnRC â L(t k i). 719
Lutz, Areces, Horrocks, & Sattler â no t can block both ti and tj with i = j: similarly, this would imply that AnRC , AnRC â L(t). k In each case, we would have a clash (C3), in contradiction to S being clash-free. ⢠For (T9), consider s with ( n R C) â Ë L(s). Hence ( n R C) â L(s) and, since the R rule cannot be applied, there are at most n R-successors ti of s. Since each ti is either not blocked or blocked by exactly one other node (due to being a linear ordering), there are at most n ui â Sa with R â Ë E(s, ui) and C â Ë L(ui). ⢠For (T10), let ( n R C) â Ë L(s) and R â Ë E(s, t). Hence ( n R C) â L(s) and t either is an R-successor of s or blocks an R-successor of s. In the ï¬rst case, non-applicability of the Rch rule implies that C, Ë Â¬ C â© L(t) = â . In the second case, C, Ë Â¬ C â© L(t ) = â for t the R-successor of s blocked by t, and thus the blocking condition yields C, Ë Â¬ C â© L(t) = â . In both cases, this implies C, Ë Â¬ C â© Ë L(t) = â . Next, consider (g1, . . . , gn keyfor C) â K and s such that e(s, gi) is deï¬ned for eachi. Hence s has a gi-successor for each i, and thus s not being blocked and the non-applicability of the Rch rule imply that C, Ë Â¬ C â© Ë L(t) = â . ⢠For (T11), consider N â Ë L(s) â© Ë L(t). By deï¬nition, N â L(s) â© L(t) and thus s âa t. Moreover, totality of implies that we can assume without loss of generality that s t or s = t. Thus non-applicability of the Râa rule implies that L(t) â L(s), and thus t not being blocked implies s = t. ⢠(T12) is satisï¬ed since the rule Râc cannot be applied. ⢠For (T13), clash-freeness implies the satisï¬ability of P (x1, . . . , xn). P used in D,K (x1,...,xn)âP(P ) By choice of δ, δ(x) = δ(y) iï¬ x â¼c y, and thus (T13) is satisï¬ed. ⢠For (T14), let (g1, . . . , gn keyfor C) â K, C â Ë L(s) â© Ë L(t), and e(s, gi) = e(t, gi), for all 1 ⤠i ⤠n. Thus C â L(s) â© L(t) and, by choice of e and δ, we have xi â¼c yifor gi â E(s, xi) â© E(t, yi). Hence s âa t. Without loss of generality, we assume thats t or s = t. Thus non-applicability of the Râa rule implies that L(t) â L(s), and thus t not being blocked implies s = t. ⢠(T15) is satisï¬ed by deï¬nition of T and since S does not contain a clash (C4). Lemma C.4 (Completeness). If a SHOQK(D)-concept D in NNF has a tableau w.r.t. arole box R and a path-free key box K, then the expansion rules can be applied to D, R, andK such that they yield a complete and clash-free completion forest. 720
Keys, Nominals and Concrete Domains Proof. Given a tableau T = (Sa, Sc, Ë L, Ë E, e, Ë P) for D w.r.t. R and K, we can âguideâ the non-deterministic rules R , Rch, and Râa in such a way that each rule application preservesclash-freeness. This together with termination from Lemma C.2 ï¬nishes the proof. Along with rule application, we perform a stepwise construction of a total mapping Ï that takes abstract nodes of the completion forest to elements of Sa and concrete nodes ofthe completion forest to elements of Sc. ⢠L(s) â© cl(D, R, K) â Ë L(Ï(s)) for each s â Va, ⢠if t is an R-successor of s, then R â Ë E(Ï(s), Ï(t)), ⢠if x is a g-successor of s, then e(Ï(s), g) = Ï(x), ⢠x â¼c y iï¬ Ï(x) = Ï(y), and . ⢠if s = t, then Ï(s) = Ï(t). A mapping satisfying these four conditions is called correct in the following. Note that acompletion system for which there exists a correct mapping does not contain a clash: due to(T1) and the ï¬rst property, we do not encounter a clash (C1). A clash (C3) cannot occurdue to the last property. The ï¬rst and the third property together with (T15) ensure thata clash (C4) does not occur. Finally, a clash (C2) cannot occur for the following reason:by construction of P and since edges labelled with abstract features are never removed, foreach tuple (x1, . . . , xn) â P(P ), we ï¬nd an abstract node s and paths u1, . . . , un such thatâu1, . . . , un.P â L(s) and xi is a ui-successor of s for 1 ⤠i ⤠n. Thus, the ï¬rst, second,and third property together with (T12) and (T13) ensure that the conjunction P (Ï(x1), . . . , Ï(xn)) P used inD,K (x1,...,xn)âP(P ) has a solution δ with δ(Ï(x)) = δ(Ï(x)) iï¬ Ï(x) = Ï(y). By the fourth property, settingδ (x) := δ(Ï(x)) for all x â Vc thus yields a solution δ for ζS. The total mapping Ï is inductively deï¬ned as follows: let δ be a solution for the equation in (T13). Choose a node Ë s0 with D â Ë L(Ë s0), and set Ï(s0) := Ë s0 for s0 the (only) node of the initial completion forest. Obviously, Ï is correct. We will now show that each completionrule can be applied in such a way that Ï either is still correct or that Ï can be extended toa correct mapping. ⢠An application of the rule R preserves correctness of Ï due to (T2). ⢠Due to (T3), the rule R can be applied such that correctness is preserved. ⢠If the rule Râ adds a new node t for âR.C â L(s), then correctness implies âR.C â Ë L(Ï(s)), and thus (T6) implies the existence of some Ë t â Sa with R â E(Ï(s), Ë t) and C â Ë L(Ë t). Thus extending Ï with Ï(t) := Ë t obviously yields a correct mapping. ⢠If the rule R adds n nodes ti for ( n R C) â L(s), then correctness implies ( n R C) â Ë L(Ï(s)), and thus (T8) implies the existence of Ë t1, . . . , Ë tn â Ë Sa with Ë ti = Ë tj for i = j, R â E(Ï(s), Ë ti), and C â Ë L(Ë ti). Thus extending Ï with Ï(ti) := Ë ti obviously yields a correct mapping. 721
Lutz, Areces, Horrocks, & Sattler ⢠Assume that the R rule is applicable to a node s with ( n R C) â L(s) and more than n R-successors ti with C â L(ti). Then correctness implies that ( n R C) â Ë L(Ï(s)), R â Ë E(Ï(s), Ï(ti)), and C â Ë L(ti). Thus, by (T9), there are i = j with . Ï(ti) = Ï(tj). Again, correctness implies that not ti = tj and, without loss of general-ity, we can assume that ti tj. Hence applying the rule and thereby merging L(tj) into L(ti) preserves correctness. ⢠For the rule Râc, Ï can be extended in a similar way as for Râ: if a new gi-successor xi of s is added, then extending Ï with Ï(xi) := e(Ï(s), gi) yields a correct Ï. Moreover,(T13) ensures that â¼c can be updated in such a way that the fourth condition ispreserved. ⢠For the Râ rule, Ï does not need to be extended, and (T5), (T4), and the deï¬nition of R-successors imply that correctness is preserved. ⢠The Râ+ rule is similar, with the only diï¬erence that (T7) takes the place of (T5). ⢠Due to (T10), the rule Rch can be applied without violating correctness. ⢠For Râa, we consider two reasons for Râa to be applicable: â N â L(s) â© L(t). Then correctness of Ï and (T11) imply that Ï(s) = Ï(t). â (g1, . . . , gn keyfor C) â K, C â L(s) â© L(t), and gi â E(s, xi) â© E(t, yi) and xi â¼c yi for 1 ⤠i ⤠n. Then correctness implies that Ëe(Ï(s), gi) = e(Ï(t), gi), andthus (T14) together with the ï¬rst property of correctness imply that Ï(s) = Ï(t). In both cases, applying Râa to s and t preserves correctness. As an immediate consequence of Lemmas 4.2, C.2, C.3, and C.4, the tableau algorithmalways terminates and answers âD is satisï¬able w.r.t. R and Kâ if and only if the inputconcept D is satisï¬able w.r.t. the input role box R and the input key box K. Since conceptsatisï¬ability w.r.t. TBoxes can be reduced to concept satisï¬ability without TBoxes, weobtain the following result: Proposition C.5. Let D be a key-admissible concrete domain. The tableau algorithm decides satisï¬ability of SHOQK(D) concepts w.r.t. TBoxes, role boxes, and path-free keyboxes. It is not hard to verify that the proof of Lemma C.4 together with Lemmas 4.2 and C.2yield a bounded model property for SHOQK(D), where the bound is exponential. Corollary C.6. If a SHOQK(D)-concept D is satisï¬able w.r.t. a role box R and a path-free key box K, then D is satisï¬able w.r.t. R and K in a model of size at most |âI| ⤠2mfor m = cl+(D, R, K). 722
Keys, Nominals and Concrete Domains Proof. If a SHOQK(D)-concept D is satisï¬able w.r.t. a role box R and a path-free keybox K, Lemma C.4 implies that the tableau algorithm constructs a complete and clash-freecompletion forest for D, R, and K. By the deï¬nition of blocking, the number of abstractnodes in a completion forest that are not blocked is bounded by 2m: if s = t â Va areabstract nodes in a completion forest and L(s) = L(t), then either s blocks t, t blocks s, orthey are both blocked by another node u. Moreover, it is easily seen that the number ofconcrete successors per abstract node is bounded by the number of concrete features in C, R,and K. Now, in the proof of Lemma C.4, the abstract nodes in the tableau constructed froma complete and clash-free completion forest coincide with the nodes that are not blockedin the completion forest. Finally, in the proof of Lemma 4.2 the interpretation domainof a model constructed from a tableau coincides with the abstract nodes in the tableau.Summing up, a SHOQK(D)-concept that is satisï¬able w.r.t. R and K has a model of size|âI| ⤠2m. References Areces, C., Blackburn, P., & Marx, M. (1999). A road-map on complexity for hybrid logics. In Flum, J., & Rodr´ıguez-Artalejo, M. (Eds.), Computer Science Logic, No. 1683 inLecture Notes in Computer Science, pp. 307â321. Springer-Verlag. Baader, F., Horrocks, I., & Sattler, U. (2002a). Description logics for the semantic web. KI â K¨ unstliche Intelligenz, 16 (4), 57â59. Baader, F., Lutz, C., Sturm, H., & Wolter, F. (2002b). Fusions of description logics and abstract description systems. Journal of Artiï¬cial Intelligence Research (JAIR), 16,1â58. Baader, F., & Sattler, U. (1998). Description logics with concrete domains and aggrega- tion. In Prade, H. (Ed.), Proceedings of the 13th European Conference on Artiï¬cialIntelligence (ECAIâ98), pp. 336â340. John Wiley & Sons. Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., & Patel-Schneider, P. F. (2003). The Description Logic Handbook: Theory, implementation and applications. Cam-bridge University Press, Cambridge, MA, USA. Baader, F., & Hanschke, P. (1991a). A scheme for integrating concrete domains into concept languages. In Proceedings of the 12th International Joint Conference on Artiï¬cialIntelligence (IJCAI-91), pp. 452â457, Sydney, Australia. Baader, F., & Hanschke, P. (1991b). A scheme for integrating concrete domains into concept languages. DFKI research report RR-91-10, German Research Center for Artiï¬cialIntelligence (DFKI). Baader, F., & Hanschke, P. (1992). Extensions of concept languages for a mechanical engineering application. In Proceedings of the 16th German AI-Conference (GWAI-92), Vol. 671 of Lecture Notes in Computer Science, pp. 132â143. Springer-Verlag. Baader, F., & Sattler, U. (2000). Tableau algorithms for description logics. In Dyckhoï¬, R. (Ed.), Proceedings of the International Conference on Automated Reasoning withTableaux and Related Methods (Tableaux 2000), Vol. 1847 of Lecture Notes in Artiï¬cialIntelligence, pp. 1â18. Springer-Verlag. 723
Lutz, Areces, Horrocks, & Sattler Berger, R. (1966). The undecidability of the domino problem. Memoirs of the American Mathematical Society, 66, 1â72. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The semantic web. Scientiï¬c American, 284 (5), 34â43. B¨ orger, E., Gr¨ adel, E., & Gurevich, Y. (1997). The Classical Decision Problem. Perspectives in Mathematical Logic. Springer-Verlag. Borgida, A., & Patel-Schneider, P. F. (1994). A semantics and complete algorithm for subsumption in the CLASSIC description logic. Journal of Artiï¬cial Intelligence Re-search, 1, 277â308. Borgida, A., & Weddell, G. E. (1997). Adding uniqueness constraints to description logics (preliminary report). In Bry, F., Ramakrishnan, R., & Ramamohanarao, K. (Eds.),Proceedings of the 5th International Conference on Deductive and Object-OrientedDatabases (DOOD97), Vol. 1341 of LNCS, pp. 85â102. Springer. Calvanese, D., De Giacomo, G., & Lenzerini, M. (1998). On the decidability of query containment under constraints. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODSâ98), pp. 149â158. Calvanese, D., De Giacomo, G., & Lenzerini, M. (2000). Keys for free in description logics. In Baader, F., & Sattler, U. (Eds.), Proceedings of the 2000 International Workshop inDescription Logics (DL2000), No. 33 in CEUR-WS (http://ceur-ws.org/), pp. 79â88. Calvanese, D., Lenzerini, M., & Nardi, D. (1998). Description logics for conceptual data modeling. In Chomicki, J., & Saake, G. (Eds.), Logics for Databases and InformationSystems, pp. 229â263. Kluwer Academic Publisher. Dean, M., Connolly, D., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D. L., Patel-Schneider, P. F., & Stein, L. A. (2002). Web ontology language (OWL) referenceversion 1.0. W3C Working Draft. Fensel, D., van Harmelen, F., Horrocks, I., McGuinness, D. L., & Patel-Schneider, P. F. (2001). OIL: An ontology infrastructure for the semantic web. IEEE Intelligent Systems, 16 (2), 38â45. Graham, R. L., Knuth, D. E., & Patashnik, O. (1990). Concrete Mathematics. Addison Wesley Publ. Co., Reading, Massachussetts. Haarslev, V., Lutz, C., & M¨ oller, R. (1998). Foundations of spatioterminological reasoning with description logics. In Cohn, A., Schubert, L., & S.C.Shapiro (Eds.), Proceedingsof the 6th International Conference on Principles of Knowledge Representation andReasoning (KRâ98), pp. 112â124. Morgan Kaufman. Haarslev, V., & M¨ oller, R. (2001). RACER system description. In Gor´ e, R., Leitsch, A., & Nipkow, T. (Eds.), Proceedings of the 1st International Joint Conference onAutomated Reasoning (IJCARâ01), No. 2083 in Lecture Notes in Artiï¬cial Intelligence,pp. 701â705. Springer-Verlag. Haarslev, V., M¨ oller, R., & Wessel, M. (2001). The description logic ALCN HR+ extended with concrete domains: A practically motivated approach. In Gor´ e, R., Leitsch, A., 724
Keys, Nominals and Concrete Domains & Nipkow, T. (Eds.), Proceedings of the 1st International Joint Conference on Auto-mated Reasoning IJCARâ01, No. 2083 in Lecture Notes in Artiï¬cial Intelligence, pp.29â44. Springer-Verlag. Halpern, J. Y., & Moses, Y. (1992). A guide to completeness and complexity for modal logics of knowledge and belief. Artiï¬cial Intelligence, 54 (3), 319â380. Hollunder, B., & Baader, F. (1991). Qualifying number restrictions in concept languages. In Proceedings of the 2nd International Conference on Principles of Knowledge Rep-resentation and Reasoning (KRâ91), pp. 335â346, Boston, MA, USA. Hopcroft, J. E., & Ullman, J. D. (1979). Introduction to Automata Theory, Languages and Computation. Addison-Wesley. Horrocks, I., Sattler, U., & Tobies, S. (2000). Practical reasoning for very expressive de- scription logics. Logic Journal of the IGPL, 8 (3), 239â264. Horrocks, I. (1998). Using an expressive description logic: FaCT or ï¬ction?. In Proceedings of the 6th International Conference on the Principles of Knowledge Representationand Reasoning (KR98), pp. 636â647. Horrocks, I. (2002). Reasoning with expressive description logics: Theory and practice. In Voronkov, A. (Ed.), Proceedings of the 18th International Conference on AutomatedDeduction (CADE 2002), No. 2392 in Lecture Notes in Artiï¬cial Intelligence, pp. 1â15.Springer. Horrocks, I., Patel-Schneider, P. F., & van Harmelen, F. (2002). Reviewing the design of DAML+OIL: An ontology language for the semantic web. In Proceedings of the 18thNational Conference on Artiï¬cial Intelligence (AAAI 2002), pp. 792â797. Horrocks, I., & Sattler, U. (2001). Ontology reasoning in the SHOQ(D) description logic. In Nebel, B. (Ed.), Proceedings of the 17th International Joint Conference on Artiï¬cialIntelligence (IJCAIâ01), pp. 199â204. Morgan-Kaufmann. Horrocks, I., Sattler, U., & Tobies, S. (1999). Practical reasoning for expressive description logics. In Ganzinger, H., McAllester, D., & Voronkov, A. (Eds.), Proceedings of the6th International Conference on Logic for Programming and Automated Reasoning(LPARâ99), No. 1705 in Lecture Notes in Artiï¬cial Intelligence, pp. 161â180. Springer-Verlag. Kamp, G., & Wache, H. (1996). CTL - a description logic with expressive concrete domains. Tech. rep. LKI-M-96/01, Laboratory for Artiï¬cial Intelligence (LKI), Universitity ofHamburg, Germany. Khizder, V. L., Toman, D., & Weddell, G. E. (2001). On decidability and complexity of de- scription logics with uniqueness constraints. In den Bussche, J. V., & Vianu, V. (Eds.),Proceedings of the 8th International Conference on Database Theory (ICDT2001), Vol.1973 of LNCS, pp. 54â67. Springer. Knuth, D. (1968). The Art of Computer Programming, Vol. 1. Addison-Wesley. Lutz, C. (2003). Description logics with concrete domainsâa survey. In Advances in Modal Logics Volume 4, pp. 265â296. World Scientiï¬c Publishing Co. Pte. LTd. 725
Lutz, Areces, Horrocks, & Sattler Lutz, C. (2002a). The Complexity of Reasoning with Concrete Domains. Ph.D. thesis, LuFG Theoretical Computer Science, RWTH Aachen, Germany. Lutz, C. (2002b). PSpace reasoning with the description logic ALCF(D). Logic Journal of the IGPL, 10 (5), 535â568. Lutz, C. (2002c). Reasoning about entity relationship diagrams with complex attribute dependencies. In Horrocks, I., & Tessaris, S. (Eds.), Proceedings of the InternationalWorkshop in Description Logics 2002 (DL2002), No. 53 in CEUR-WS (http://ceur-ws.org/), pp. 185â194. Lutz, C. (2004). NExpTime-complete description logics with concrete domains. ACM Transactions on Computational Logic, 5 (4), 669â705. Lutz, C., Areces, C., Horrocks, I., & Sattler, U. (2002). Keys, nominals, and concrete domains. LTCS-report 02-04, Technical University Dresden. See http://lat.inf.tu-dresden.de/research/reports.html. Lutz, C., Areces, C., Horrocks, I., & Sattler, U. (2003). Keys, nominals, and concrete domains. In Proceedings of the 18th International Joint Conference on Artiï¬cial In-telligence (IJCAIâ03), pp. 349â354. Morgan-Kaufmann Publishers. Pan, J. Z., & Horrocks, I. (2002). Reasoning in the SHOQ(Dn) description logic. In Hor- rocks, I., & Tessaris, S. (Eds.), Proceedings of the International Workshop in Descrip-tion Logics 2002 (DL2002), No. 53 in CEUR-WS (http://ceur-ws.org/), pp. 53â62. Post, E. M. (1946). A variant of a recursively unsolvable problem. Bulletin of the American Mathematical Society, 52, 264â268. Schild, K. D. (1991). A correspondence theory for terminological logics: Preliminary report. In Mylopoulos, J., & Reiter, R. (Eds.), Proceedings of the 12th International JointConference on Artiï¬cial Intelligence (IJCAI-91), pp. 466â471. Morgan Kaufmann. Schmidt-SchauÃ, M., & Smolka, G. (1991). Attributive concept descriptions with comple- ments. Artiï¬cial Intelligence, 48 (1), 1â26. 726