J. Keppens and Q. Shen
Volume 21, 2004
Links to Full Text:
Journal of Artiï¬cial Intelligence Research 21 (2004) 499-550 Submitted 08/03; published 04/04 Compositional Model Repositories via Dynamic Constraint Satisfaction with Order-of-Magnitude Preferences Jeroen Keppens JEROEN@INF.ED.AC.UK Qiang Shen QIANGS@INF.ED.AC.UK School of Informatics, The University of Edinburgh Appleton Tower, Crichton Street, Edinburgh EH8 9LE, UK Abstract The predominant knowledge-based approach to automated model construction, compositional modelling, employs a set of models of particular functional components. Its inference mechanismtakes a scenario describing the constituent interacting components of a system and translates it intoa useful mathematical model. This paper presents a novel compositional modelling approach aimedat building model repositories. It furthers the ï¬eld in two respects. Firstly, it expands the appli-cation domain of compositional modelling to systems that can not be easily described in terms ofinteracting functional components, such as ecological systems. Secondly, it enables the incorpora-tion of user preferences into the model selection process. These features are achieved by casting thecompositional modelling problem as an activity-based dynamic preference constraint satisfactionproblem, where the dynamic constraints describe the restrictions imposed over the composition ofpartial models and the preferences correspond to those of the user of the automated modeller. Inaddition, the preference levels are represented through the use of symbolic values that differ inorders of magnitude. 1. Introduction Mathematical models form an important aid in understanding complex systems. They also helpproblem solvers to capture and reason about the essential features and dynamics of such systems.Constructing mathematical models is not an easy task, however, and many disciplines have con-tributed approaches to automate it. Compositional modelling (Falkenhainer & Forbus, 1991; Kep-pens & Shen, 2001b) is an important class of approaches to automated model construction. It usespredominantly knowledge-based techniques to translate a high level scenario into a mathematicalmodel. The knowledge base usually consists of generic fragments of models that provide one ofthe possible mathematical representation of a process that occurs in one or more components. Theinference mechanisms instantiate this knowledge base, search for the most appropriate selection ofmodel fragments, and compose them into a mathematical model. Compositional modelling has beensuccessfully applied to a variety of application domains ranging from simple physics, over variousengineering problems to biological systems. The present work aims at a compositional modelling approach for building model repositories of ecological systems. In the ecological modelling literature, a range of models have been devisedto formally characterise the various phenomena that occur in ecological systems. For example,the logistic growth (Verhulst, 1838) and the Holling predation (Holling, 1959) models describe thechanges in the size of a population. The former expresses changes due to births and deaths and thelatter changes due to one population feeding on another. A compositional model repository aims c 2004 AI Access Foundation. All rights reserved.
KEPPENS & SHEN to make such (partial) models more generally usable by providing a mechanism to instantiate andcompose them into larger models for more complex systems involving many interacting phenomena. Thus, the input to a compositional model repository is a scenario describing the conï¬guration of a system to be modelled. A sample scenario may include a number of populations and variouspredation and competition relations between them. The output is a mathematical model, called ascenario model, representing the behaviour of the system speciï¬ed in the given scenario. A set ofdifferential equations describing the changes in the population sizes in the aforementioned scenariodue to births, natural deaths, deaths because of predation, available food supply or competitionwould constitute such a scenario model. This application domain poses three important new challenges to compositional modelling. Firstly, the processes and components of an ecological system that are to be represented in theresulting composed model depend on one another and on the ways they are described. In popu-lation dynamics for example, models describing the predation or competition phenomena betweentwo populations rely on the existence of a population growth model for each of the populationsinvolved in the phenomenon. This inhibits the conventional approach of searching for a consis-tent and adequate combination of partial models, one for each component in the scenario. Thisapproach provides an adequate solution for physical systems because these are comprised of com-ponents implementing a particular functionality that can be described by one or multiple partialmodels. Although the seminal work on compositional modelling (Falkenhainer & Forbus, 1991)recognised the existence of more complex interdependencies in model construction in general, itprovided only a partial solution for it: all the conditions under which certain modelling choiceswere relevant had to be speciï¬ed manually in the knowledge base. Secondly, the domain of ecology lacks a complete theory of what constitutes an adequate model. Most existing compositional modellers are based on a predeï¬ned concept of model adequacy. Theyemploy inference mechanisms that are guaranteed to ï¬nd a model that meets such adequacy criteria.However, criteria to determine how adequate an ecological model may be vary between ecologicaldomains and even between the ecologists that require the model within the same domain. Therefore,the compositional modeller requires a facility to deï¬ne the properties that the generated ecologicalmodels must satisfy. Thirdly, it is not possible to express all the criteria imposed on the scenario model in terms of hard requirements. Often, ecological models that describe mechanisms and behaviours are only par-tially understood. In such cases, the choice of one model over another becomes a matter of expertopinion rather than pure theory. Therefore, in the ecological domain, modelling approaches and pre-sumptions are, to some extent, selected based on preferences. Existing compositional modellers arenot equipped to deal with such user preferences and this paper presents the very ï¬rst compositionalmodeller that incorporates them. Generally speaking, the above three issues are tackled in this paper by means of a method to translate the compositional modelling problem into an activity-based dynamic preference constraintsatisfaction problem (aDPCSP) (Keppens & Shen, 2002). An aDPCSP integrates the concept ofactivity-based dynamic constraint satisfaction problem (aDCSP) (Miguel & Shen, 1999; Mittal &Falkenhainer, 1990) with that of order-of-magnitude preferences (Keppens & Shen, 2002). Theattributes and domains of this aDPCSP correspond to model design decisions, with constraints de-scribing the restrictions imposed by consistency requirements and properties and order-of-magnitudepreferences describing the userâs preferences on modelling choices. The translation method bringsthe additional advantage that compositional modelling problems can now be solved by means of 500
COMPOSITIONAL MODEL REPOSITORIES efï¬cient aDCSP techniques. As such, compositional modellers can beneï¬t from recent and futureadvances in constraint satisfaction research. The remainder of this paper is organised as follows. Section 2 introduces the concept of an aDPCSP, a preference calculus that is suitable to express subjective user preferences for modeldesign decisions and to be integrated with the general framework of aDPCSPs. It also gives asolution algorithm for aDPCSPs. Next, section 3 presents the compositional model repository andshows how such an aDPCSP is employed for automated model construction. These theoreticalideas are then illustrated by means of a large example in section 4, applying the compositionalmodel repository to population dynamics problems. Section 5 concludes this paper with a summaryand an outline of further research. 2. Dynamic Constraint Satisfaction with Order-of-Magnitude Preferences In this section, a preference calculus based on order-of-magnitude reasoning is introduced and inte-grated into the activity-based dynamic constraint satisfaction problem (aDCSP) to form an aDCSPwith order-of-magnitude preferences (aDPCSP). Then, a solution algorithm for such aDPCSPs ispresented. The theory is illustrated with examples from the compositional modelling domain. 2.1 Background: Activity-based dynamic preference constraint satisfaction A hard constraint satisfaction problem (CSP) is a tuple X, D, C , where ⢠X = x1, . . . , xn is a vector of n attributes, ⢠D = Dx , . . . , D is a vector containing exactly one domain for each attribute in X. 1 xn Each domain Dx â D is a set of values di1, . . . , din that may be assigned to the attribute i corresponding to the domain. ⢠C is a set of compatibility constraints. A compatibility constraint cxi,...,xj â C deï¬nes a relation over a subset of the domains Dx , ..., D , and hence c à . . . à D . i xj xi,...,xj â Dxi xj A solution to a hard constraint satisfaction problem is any tuple x1 : dx , . . . , x such 1 n : dxn that ⢠each attribute is assigned a value from its domain: âxi â X, dx â D , and i xi ⢠all compatibility constraints are satisï¬ed: âxx , . . . , d â c i,...,xj â C, dxi xj xi,...,xj . An activity-based dynamic CSP (aDCSP), originally proposed in by Mittal and Falkenhainer (1990), extends conventional CSPs with the notion of activity of attributes. In an aDCSP, not allattributes are necessarily assigned in a solution, but only the active ones. As such, each attribute iseither active and assigned a value or inactive: âxi â X, âdx â D , x â active(x i xi i : dxi i) The activity of attributes in an aDCSP is governed by activity constraints that enforce under whichassignments of attributes, an assignment to another attribute is relevant or possible. This informationis important because it not only dictates for which attributes a value must be searched, but also theset of compatibility constraints that must be satisï¬ed. Clearly, only the compatibility constraints 501
KEPPENS & SHEN cxi,...,xj â C for which all attributes xi, . . . , xj are active must be satisï¬ed, and a hard CSP is asub-type of aDCSP in which all attributes are always active. In summary, an activity-based dynamic constraint satisfaction problem (aDCSP) is a tuple X, D, C, A , where ⢠X, D, C is a hard CSP, and ⢠A is a set of activity constraints. An activity constraint restricts the sets of attribute-value assignments under which an attribute is active or inactive: ax à . . . à D à active(x i,xj ,...,xk â Dxj xk i), ¬active(xi) where xi â xj, . . . , xk. A solution to an activity-based dynamic constraint satisfaction problem is any tuple x1 : dx , . . . , x such that 1 l : dxl ⢠the attributes that are part of the solution are assigned a value from their domain: âxi â x1, . . . , xl, dx â D , i xi ⢠all activity constraints are satisï¬ed: âaxi,xj,...,xk â A, xj â x1, . . . , xl ⨠. . . ⨠xk â x1, . . . , xl ⨠xi â x1, . . . , xl â§ dx , . . . , d , active(x j xk i) â axi,xj,...,xk ⨠xi â x1, . . . , xl â§ dx , . . . , d , ¬active(x j xk i) â axi,xj,...,xk and ⢠all compatibility constraints are satisï¬ed: âcx , . . . , d â c i,...,xj â C, ¬active(xi) ⨠. . . ⨠¬active(xj ) ⨠dxi xj xi,...,xj 2.2 Order-of-magnitude preferences (OMPs) Although an aDCSP can capture the hard constraints over decisions in a given problem as well astheir dynamically changing solution space (as described by the activity constraints), the represen-tation scheme it employs does not take into account any preferences users may have over possiblealternative value assignments. Therefore, this work is extended to allow preference information tobe attached to attribute-value assignments. The way in which this can be achieved depends on therepresentation and reasoning mechanisms underlying the preference calculus. In general, a prefer-  ence calculus can be deï¬ned as a tuple , â, where: ⢠ is the set of preferences, ⢠â  is a commutative, associative operator that is closed in , and ⢠forms a partial order, that is, reï¬exive, anti-symmetric and transitive relation deï¬ned over  à  . Because is reï¬exive, antisymmetric and transitive, comparing preferences with the relation yields one of four possible results: 502
COMPOSITIONAL MODEL REPOSITORIES ⢠Two preferences P1, P2 â  are equal to one another (denoted P1 = P2) iff P1 P2 and P2 P1. ⢠A preference P1 â  is strictly greater than a preference P2 â  (denoted P1 P2) iff P1 P2 and P2 P1. ⢠A preference P1 â  is strictly smaller than a preference P2 â  (denoted P1 P2) iff P1 P2 and P2 P1. ⢠Two preferences P1, P2 â  are incomparable with one another (denoted P1?P2) iff P1 P2 and P2 P1. Thus, an activity-based dynamic preference constraint satisfaction problem (aDPCSP) is a tuple X, D, C, A,  , â, , P where ⢠X, D, C, A is an aDCSP, ⢠ , â, is a preference calculus, and ⢠P is a mapping Dx ⪠. . . ⪠D â  from the individual attribute-value assignments to the 1 xn preferences. The preferences attached to attribute-value assignments express the relative desirability of these assignments. The aim of the aDPCSP is to ï¬nd a solution with the highest combined preference.That is, given an aDPCSP X, D, C, A,  , â, , P , any solution xi : dx , . . . , x of the i j : dxj aDCSP X, D, C, A such that no other solution xk : dx , . . . , x of X, D, C, A exists k l : dxl with P (xi : dx ) â . . . â P (x ) P (x ) â . . . â P (x ) is a solution to the aDPCSP. i j : dxj k : dxk l : dxl In this section, a preference calculus is introduced to extend an aDCSP into an aDPCSP. The calculus will be illustrated with examples from the compositional modelling domain. 2.2.1 REPRESENTATION OF OMPS Technically, OMPs are combinations of so-called basic preference quantities (BPQs), which are theprimitive units of preference or utility valuation associated with possible design decisions. Becauseit is often difï¬cult to evaluate these BPQs numerically, they are ordered relative to one another em-ploying similar ordering relations as those employed by relative order-of-magnitude calculi (Dague,1993a, 1993b). Let be the set of all BPQs with respect to a particular decision problem. The BPQs in are   ordered with respect to one another at two levels of granularity, by two relations and <. First,  is partitioned into orders of magnitude, which are ordered by . Then, the BPQs within each order of magnitude are ordered by <. Formally, an order-of-magnitude ordering over BPQs is a tuple  O, , where O = O1, . . . , Oq is a partition of and is an irreï¬exive and transitive binary  relation over O. Any subset of BPQs O â O is said to be an order of magnitude in . Similarly, a  within-magnitude ordering over a set of BPQs is a tuple O, < , where O is an order of magnitudein and < is an irreï¬exive and transitive binary relation over O.  To illustrate these ideas, consider the problem of constructing an ecological model describing a scenario containing a number of populations. Let some of the populations be parasites and othersbe hosts for these parasites. Also, assume that certain populations compete with others for scarceresources. In order to construct a scenario model, the compositional modeller must make a number 503
KEPPENS & SHEN b < b 15: Lotka-Volterra 13: Holling predation model predation model < b11: Rogerâs < host-parasitoid model < b < b 14: Thomsonâs 12: Nicholson-Baileyâs host-parasitoid model host-parasitoid model O1 (host-parasitoid phenomenon) b22: exponential growth model < b21: logistic growth model b < 23: other b31: competition growth model phenomenon O O 2 (population growth phenomena) 3: (competition phenomenon) Figure 1: Sample space of BPQs  of model design decisions: which population growth, host-parasitoid and competition phenomenaare relevant, and which types of model best describe these phenomena. Figure 1 shows a sample space of BPQs that correspond to the selection of types of model. For the sake of illustration, the presumption is made that the quality of a scenario model depends on theinclusion of types of model, rather than on the inclusion or exclusion of phenomena. Apart fromb23 and b31, all BPQs correspond to standard textbook ecological models1. BPQ b23 stands for theuse of a population growth model that is implicit in another population growth model (the Lotka-Volterra model, for instance, implicitly includes its own concept of growth). Finally, BPQ b31 is thepreference associated with a competition model (say, the only one included in the knowledge base). The 9 BPQs in this sample space are partitioned over 3 orders of magnitude. The relation orders the orders of magnitude: O2 O1 and O2 O3. The binary < relation orders indi- vidual BPQs within an order of magnitude. In the BPQ ordering within O1, for instance, Rogersâhost-parasitoid model (b11) is preferred over that by Nicholson and Bailey (b12) and the Hollingpredation model (b13). The latter two models can not be compared with one another, but they bothare preferred over the Lotka-Volterra model. Furthermore, Thompsonâs host-parasitoid model isless preferred than that of Nicholson and Bailey, but it can not be compared with the Lotka-Volterraand Holling models. 2.2.2 COMBINATIONS OF OMPS By deï¬nition, OMPs are combinations of BPQs. The implicit value of an OMP p equals the com-bination b1 â . . . â bn of its constituent BPQs b1, . . . , bn. This property allows OMPs to be deï¬nedas functions such that an OMP P = b1 â . . . â bn is a function fP : â : b â fP (b) where    1. To be precise, the BPQs b11, b12, b13, b14, b15, b21 and b22 respectively correspond to the inclusion of Rogersâ host-parasitoid model (1972), the host-parasitoid model by Nicholson and Bailey (1935), Hollingâs predation model(1959), Thompsonâs host-parasitoid model (1929), the predation model by Lotka and Volterra (1925, 1926), a logisticpopulation growth model (Verhulst, 1838) and an exponential population growth model (Malthus, 1798). 504
COMPOSITIONAL MODEL REPOSITORIES is the set of BPQs, is the set of natural numbers and fP (b) equals the number of occurrences of b  in b1, . . . , bn. For example, let Pmodel denote the OMP associated with the scenario model that contains three logistic population growth models (b21), two Holling predation model (b13) and one competitionmodel (b31). Therefore, Pmodel = b21 â b21 â b21 â b13 â b13 â b31 and hence: 3 if b = b  21 2 if b = b f  13 P (b) =  model 1 if b = b  31 0 otherwise By describing OMPs as functions, the concept of combinations of OMPs becomes clear. For two OMPs P1 and P2, the combined preference P1 â P2 is deï¬ned as: fP : â : b â f (b) = f (b) + f (b) 1âP2 P P P   1âP2 1 2 Note that the combination operator â is assumed to be commutative, associative and strictly mono-tonic (P P â P ). The latter assumption is made to better reï¬ect the ideas underpinning conven- tional utility calculi (Binger & Hoffman, 1998). 2.2.3 PARTIAL ORDERING OF OMPS Based on the combinations of OMPs, a partial order over the OMPs can be computed by exploit- ing the constituent BPQs of the OMPs considered. This partial order implies that a comparison ofany pair of OMPs either returns equal preference (=), smaller preference ( ), greater preference( ) or incomparable preference (?). This calculus is developed assuming the following: ⢠Prioritisation: A combination of BPQs is never an order of magnitude greater than its con- stituent BPQs. That is, given the set of BPQs belonging to the same order of magnitudeb1, b2, . . . , bn â O1 and a BPQ b â O2 belonging to a higher order of magnitude, i.e.O1 O2, then b1 â b2 â . . . â bn b With respect to the ongoing example, this means that any BPQ taken from the order of magni-tude O1 is preferred over any combination of BPQs taken from O2. In other words, the choiceof a model to describe a host-parasitoid phenomenon is considered more important than thechoice of population growth model (see Figure 1). Prioritisation also means that distinctions at higher orders of magnitude are considered tobe more signiï¬cant than those at lower orders of magnitude. Consider a number of BPQsb1, . . . , bmâ1, bm, . . . , bn taken from one order of magnitude O1 and a pair of BPQs b, b taken from an order of magnitude that is higher than O1. If b < b , then (irrespective of theordering of the BPQs taken from O1) b1 â . . . â bmâ1 â b bm â . . . â bn â b 505
KEPPENS & SHEN ⢠Strict monotonicity: Even though distinctions at higher orders of magnitude are more signif- icant, distinctions at lower orders of magnitude are not negligible. That is, given an OMPP and two BPQs b1 and b2 taken from the same order of magnitude with b1 < b2, then(irrespective of the orders of magnitude of the BPQs that constitute P ) b1 â P b2 â P For instance, the preference ordering depicted in Figure 1 shows that a scenario model with aRogerâs host-parasitoid model and two logistic predation models is preferred over one with aRogerâs host-parasitoid model and two exponential predation models: b11 â b22 â b22 b11 â b21 â b21 Note that this is a departure from conventional order-of-magnitude reasoning. If the OMPsassociated with two (partial) outcomes contain equal BPQs at a higher order of magnitude, it isusually desirable to compare both solutions further in terms of the (less important) constituentBPQs at lower orders of magnitude, as the example illustrated. However, conventional order-of-magnitude reasoning techniques can not handle this. ⢠Partial ordering maintenance: Conventional order-of-magnitude reasoning is motivated by the need for abstract descriptions of real-world behaviour, whereas the OMP calculus is mo-tivated by incomplete knowledge for decision making. As opposed to conventional order-of-magnitude reasoning and real numbers, OMPs are not necessarily totally ordered. Thisimplies that, when the user states, for example, that b1 < b2 < b and that b3 < b4 < b, theexplicit absence of ordering information between the BPQs in b1, b2 and those in b3, b4means that the user is unable to compare them (e.g. because they are entirely different things).Consequently, b1 â b2 would be deemed incomparable to b3 â b4 (i.e. b1 â b2?b3 â b4), ratherthan roughly equivalent. From the above, it can be derived that given two OMPs P1 and P2 and an order of magnitude O, P1 is less or equally preferred to P2 with respect to the order of magnitude O (denoted P1 O P2) provided that âbi â O, fP (b f (b (b f (b 1 i) + P1 j ) ⤠fP2 i) + P2 j ) bj âO,bi<bj bj âO,bi<bj Thus, comparing two OMPs within an order of magnitude can yield four possible results: ⢠P1 is less preferred than P2 with respect to O (P1 O P2) iff (P1 O P2) ⧠¬(P2 P1), ⢠P1 is more preferred than P2 with respect to O (P1 O P2) iff ¬(P1 O P2) â§ (P2 P1), ⢠P1 is equally preferred than P2 with respect to O (P1 =O P2) iff (P1 O P2) â§ (P2 P1), and ⢠P1 is incomparable to P2 with respect to O (P1?OP2) iff ¬(P1 O P2) ⧠¬(P2 P1). 506
COMPOSITIONAL MODEL REPOSITORIES In the ongoing example of Figure 1, for instance, the preference of a scenario model with a Rogerâs host-parasitoid model and a Holling predation model is P1 = b11 â b13 and the preferenceof a scenario model with a Rogerâs host-parasitoid model and a Lotka-Volterra predation modelis P2 = b11 â b15. The latter model is less than or equally preferred to the former within theâhost-parasitoidâ order of magnitude (O1), i.e. P2 O P 1 1, because fP (b (b 2 11) = 1 ⤠1 = fP1 11), fP (b (b (b (b 2 11) â fP2 12) = 1 ⤠1 = fP1 11) â fP1 12), fP (b (b (b (b 2 11) â fP2 13) = 1 ⤠2 = fP1 11) â fP1 13), fP (b (b (b (b (b (b 2 11) â fP2 12) â fP2 14) = 1 ⤠1 = fP1 11) â fP1 12) â fP1 14), fP (b (b (b (b (b (b (b (b 2 11) â fP2 12) â fP2 13) â fP2 14) = 2 ⤠2 = fP1 11) â fP1 12) â fP1 13) â fP1 14). Similarly, it can be established that the reverse, i.e. P1 O P 1 2, is not true. Therefore, the latter scenario model is less preferred than the former within O1, i.e. P2 O P 1 1. The above result can be further generalised such that given two OMPs P1 and P2, P1 is less or equally preferred to P2 (denoted P1 P2) if âOi â O, (P1 O P P i 2) ⨠(âOj â O, Oi Oj â§ P1 Oj 2) More generally, the relations , , = and ? can be derived in the same manner as with the relation where O, O, =O and ?O with O. To illustrate the utility of such orderings, consider the scenario of one predator population that feeds on two prey populations while the two prey populations compete for scarce resources. Thefollowing are two plausible scenario models for this scenario: ⢠Model 1 contains two Holling predation models and three logistic population growth models, and has preference P1 = b13 â b13 â b21 â b21 â b21. ⢠Model 2 contains one competition model, two Holling predation models, two logistic pop- ulation growth models and an exponential population growth model, and has preferenceP2 = b13 â b13 â b21 â b21 â b22 â b31. As demonstrated earlier, it can be shown that P1 =O P P P 1 2, P1 O2 2, and P1 O3 2. From these relations it follows that P1 P2 because ⢠for O1: P1 O P P 1 2 since P1 =O1 2, ⢠for O2: there exists an order of magnitude O3 where O3 O2 and P1 O P 3 2, ⢠for O3: P1 O P P 3 2 since P1 O3 2. As the reverse is not true, it can be concluded that scenario model 2 is preferred over scenario model1. 2.3 Solving aDPCSPs This section presents a basic algorithm for solving aDPCSPs. Although OMPs are used in thiswork, this algorithm can take any aDPCSP provided that it employs a preference calculus with a 507
KEPPENS & SHEN commutative, associative and monotonic combination operator. However, the use of OMPs providesa convenient way of specifying incomplete preference information. An aDPCSP is similar to valued CSPs as presented by Schiex, Fargier and Verfaillie (1995) and also to semiring based CSPs (Bistarelli, Montanari, & Rossi, 1997). However, it extends bothapproaches with activity constraints and involves different underlying presumptions in its valuationstructure. The preference valuations in this work are allowed to be ordered partially, as opposed tothe valued CSPs. An aDPCSP represents an important type of constraint satisfaction optimisation problem (Tsang, 1993). In order to tackle the optimisation of preferences an A* type algorithm is employed (Hart,Nilsson, & Raphael, 1968; Raphael, 1990). A* algorithms are known to be efï¬cient in terms ofthe total number of nodes explored in an effort to ï¬nd optimal solutions, with a given amount ofinformation. On the downside, they have an exponential space complexity. Naturally, a number ofalternative approaches could have been explored, including conventional constraint-based solvingmethods such as depth ï¬rst branch and bound search. However, the use of an A*-like algorithm issufï¬cient for solving the aDPCSPs in the domain of the present interest. In particular, algorithm 1implements an A* search strategy that is capable of handling activity constraints, which involvesthe use of basic CSP techniques such as constraint propagation and backtracking. An A* algorithm maintains the explored attribute-value assignments by means of a priority queue Q of nodes. Each node n in Q corresponds to a set of attribute-value assignments: solution(n).The search proceeds through a number of iterations. At each iteration, a node n is taken from Q,and replaced by nodes that extend solution(n) with an additional attribute-value assignment. Morespeciï¬cally, for each node n in Q, a set Xu(n) of remaining active but unassigned attributes ismaintained. At each iteration, the possible assignments of the ï¬rst attribute x â Xu(n), wheren is the node taken from Q at the current iteration, are processed. For every assignment x : dthat is consistent with solution(n) (i.e. solution(n) ⪠x : d, C â¥), a new child node n , with solution(n ) = solution(n) ⪠x : d and Xu(n ) = Xu(n) â x, is created and added to Q. The activity constraints are processed via propagation rather than constraint satisfaction. When- ever a node n is taken from Q such that Xu(n) is empty, the activity constraints are ï¬red in order toobtain a new set of active but unassigned attributes. That is, Xu(n) is assigned xi | solution(n), A active(xi) â Xa(n) where Xa(n) represents the active, but already assigned attributes in node n. In the priority queue Q, nodes are maintained by means of two heuristics: committed preference CP (n) and potential preference P P (n). Here, given a node n, CP (n) = âx:dâsolution(n)P (x : d)P P (n) = CP (n) â (âxâX P (x : d)) nd(n) max dâDx where Xnd(n) is the set of unassigned attributes that can still be activated given the partial assign-ment solution(n) (as indicated previously, the actual implementation employs an assumption-basedtruth maintenance system (de Kleer, 1986) to efï¬ciently determine which attributeâs activity can nolonger be supported). In other words, CP (n) is the preference associated with the partial attribute-value assignment in node n and P P (n) is CP (n) combined with the highest possible preferenceassignments taken from all the values of the domains of those attributes in Xnd(n). Thus, P P (n) 508
COMPOSITIONAL MODEL REPOSITORIES Algorithm 1: SOLVE(X, D, C, A, P ) n â new node;solution(n) â ;Xu(n) â xi | , A active(xi); Xa(n) â ;CP (n) â 0;P P (n) â âxâX maxdâD(x) P (x : d);Q â createOrderedQueue();enqueue(Q, n, P P (n), CP (n)); while Q = â n â dequeue(Q); if X  u(n) = â  x â ï¬rst(X  u(n));  then  PROCESS(x, n, C, A, P, Q);  X  u(n) â xi | solution(n), A active(xi) â Xa(n);  if Xu(n) = â do    nnext â ï¬rst(Q);    if CP (n) P P (nï¬rst)         else  then then return (solution(n));     P P (n) â CP (n);  else      enqueue(Q, n, P P (n), CP (n));          x â ï¬rst(X   u(n));   else   PROCESS(x, n, C, A, P, Q); procedure PROCESS(x, nparent, C, A, P, Q) for d â D(x) if solution(nparent)âªx : d,C ⥠ n  child â new node;    solution(n  child) â solution(nparent) ⪠x : d;      X  d â deactivated(solution(nchild),X(nparent)); X do    nd(nchild) â Xnd(nparent)âxâXd; then Xa(nchild) â Xa(nparent) ⪠x;    Xu(nchild) â Xu(nparent) â x;    CP (nchild) â CP (nparent) â P (x : d);      P P (n   child) â C P (nchild) â âxâXnd(n) maxdâD(x) P (x : d);    enqueue(Q,nchild,PP(nchild),CP(nchild)); computes an upper boundary on the preference of an aDPCSP solution that includes the partialattribute-value assignments corresponding to n. The following theorem shows that algorithm 1 is guaranteed to ï¬nd the set of attribute-value pairs with the highest combined preferences, within the set of solutions that satisfy the constraints. Theorem 1 SOLVE(X, D, C, A, P ) is admissibleProof: SOLVE(X, D, C, A, P ) is an A* algorithm guided by a heuristic function P P (n) = CP (n)âh(n), where CP (n) is the actual preference of node n and h(n) = âxâX P (x : d). nd(n) maxdâDx It follows from the previous discussion that h(n) is greater than or equal to the combined preferenceof any value-assignment of unassigned attributes that is consistent with the partial solution of n. Inthis algorithm, the nodes n are maintained in a priority queue in descending order of P P (n). Let δbe a distance function that reverses the preference ordering such that δ(P1) δ(P2) â P1 P2. SOLVE(X, D, C, A, P ) can then be described as an A* algorithm, where the nodes n in the priority 509
KEPPENS & SHEN queue Q are ordered in ascending order of δ(P P (n)), such that δ(P P (n)) = δ(CP (n)) â δ(h(n))and δ(h(n)) is a lower bound on the distance between n and the optimal solution. Therefore, fol-lowing the work by Hart, Nilsson and Raphael (1968), SOLVE(X, D, C, A, P ) is an admissiblealgorithm, guaranteed to ï¬nd a solution S with a minimal δ(P (S)) or a maximal P (S). To illustrate algorithm 1, consider the problem of ï¬nding an ecological model that describes the behaviour of two populations, one of which predates on the other. An aDPCSP is constructed forthe compositional modelling problem with the following attributes and domains. Note that section3 demonstrates how the attributes, domains and constraints of this problem can be constructedautomatically and that section 4 illustrates these ideas in the context of a larger example. X = x1, x2, x3, x4, x5, x6 Dx = yes, no 1 Dx = yes, no 2 Dx = yes, no 3 Dx = other, logistic 4 Dx = other, logistic 5 Dx = Holling, Lotka-Volterra 6 The attributes x1, x2 and x3 respectvely describe the relevance of the following phenomena: the change in size of the predator population, the change in size of the prey population and thepredation of the prey by the predator. The attributes x4 and x5 represent the choice of type ofpopulation growth model. Two types of such models are incorporated in the problem: the logisticone and the âotherâ. Finally, attribute x6 is associated with the choice of model type of the predationphenomenon. Here, two types of model, the Holling model and the Lotka-Volterra model, areincluded. Because the Holling predation model presumes that logistic models are employed to describe population growth, and because the Lotka-Volterra Model incorporates its own population growthmodel, the combinations of assignments to x4, x5, and x6 are restricted. Hence, the aDPCSPcontains a set C = cx4,x6, cx5,x6 of compatibility constraints, with: cx4,x6 = x4 : other, x6 : Lotka-Volterra , x4 : logistic, x6 : Holling cx5,x6 = x5 : other, x6 : Lotka-Volterra , x5 : logistic, x6 : Holling Furthermore, a model type of predator/prey growth must be selected if and only if the cor- responding population growth phenomenon is deemed relevant. Also, a model type of preda-tion must be selected if and only if both population growth phenomena and the predation phe-nomenon are deemed relevant (because ecological models describing predation rely on submodelsdescribing population growth of the predator and the prey). Hence, the aDPCSP contains a setA = ax4,x1, ax5,x2, ax6,x1,x2,x3 of activity constraints, with: 510
COMPOSITIONAL MODEL REPOSITORIES ax4,x1 = x1 : yes, active(x4) , x1 : no, ¬active(x4) ax5,x2 = x2 : yes, active(x5) , x2 : no, ¬active(x5) ax6,x1,x2,x3 = x1 : yes, x2 : yes, x3 : yes, active(x4) , x1 : yes, x2 : yes, x3 : no, ¬active(x4) , x1 : yes, x2 : no, x3 : yes, ¬active(x4) , x1 : yes, x2 : no, x3 : no, ¬active(x4) ,x1 : no, x2 : yes, x3 : yes, ¬active(x4) , x1 : no, x2 : yes, x3 : no, ¬active(x4) ,x1 : no, x2 : no, x3 : yes, ¬active(x4) , x1 : no, x2 : no, x3 : no, ¬active(x4) Finally, let the preference calculus consist of two orders of magnitude Ogrowth and Opredation, with Ogrowth Opredation, where Ogrowth =pother, plogistic with plogistic < pother Opredation =pHolling, pLotka-Volterra with pLotka-Volterra < pHolling The OMP assignments are as follows: P (x4 : other) = P (x5 : other) =pother P (x4 : logistic) = P (x5 : logistic) =plogistic P (x6 : Holling) =pHolling P (x6 : Lotka-Volterra) =pLotka-Volterra When applied to this problem, algorithm 1 initialises the search by creating a node n0, where: ⢠Xu(n0), the set of currently active attributes, is initialised to x1, x2, x3, because the activity of these attributes does not depend on other attribute-value assignments. ⢠Xa(n0) and CP (n0) are initialised to the empty set and to 0 respectively, since no attributes have been assigned yet. ⢠Finally, P P (n0) equals pother â pother â pHolling because this is the combination of highest OMPs associated with each domain. This initial node is enqueued in Q. Next, the algorithm proceeds through a number of iterations. At each iteration, the node with most potential (as measured by P P and CP ) is dequeued, and itschildren are generated and enqueued in Q. The nodes that are created in this way are depicted inFigure 2. The number i in the subscript of each node ni indicates the order of node generation, andthe thick arrows show the order in which the search space is explored. Note that there are three important features of the algorithm that could not be clearly demon- strated within Figure 2. Firstly, at node n5, the initial set of unassigned attributes is exhausted:Xu(n5) = . Therefore, the activity constraints are ï¬red when n5 is explored. Because n5 corre-sponds to the assignment x1 : yes, x2 : yes, x3 : yes, the remaining attributes are activated andXu(n5) is reset to x4, x5, x6. Secondly, node n12 corresponds to an assignment of all (active) attributes that is consistent with the activity and compatibility constraints: x1 : yes, x2 : yes, x3 : yes, x4 : other, x5 : other, x6 : Lotka-Volterra 511
x1 n1 yes n2 no P P = pother â pother â pHolling P P = pother CP = 0 CP = 0 x2 n3 yes n4 no P P = pother â pother â pHolling P P = pother CP = 0 CP = 0 x3 n5 yes n6 no P P = pother â pother â pHolling P P = pother â pother N CP = 0 CP = 0 EHS&S x 512 N 4 E n n P 7 other 8 logistic PE P P = p P P = p K other â pother â pHolling logistic â pother â pHolling CP = pother CP = plogistic x5 x5 n9 other n10 logistic n15 other n16 logistic P P = pother â pother â pHolling P P = pother â plogistic â pHolling P P = plogistic â pother â pHolling P P = plogistic â plogistic â pHolling CP = pother â pother CP = pother â plogistic CP = plogistic â pother CP = plogistic â plogistic x6 x6 x6 x6 n n 11 n12 n13 14 Lotka-Volterra n17 n18 n19 Holling n20 Holling Holling Lotka-Volterra Holling Lotka-Volterra Lotka-Volterra P P = p P P = p other â pother â pLotka-Volterra logistic â plogistic â pHolling inconsistent CP = p inconsistent inconsistent inconsistent inconsistent CP = p inconsistent other â pother â pLotka-Volterra logistic â plogistic â pHolling Figure 2: Search space explored by algorithm 1 when solving sample aDPCSP
COMPOSITIONAL MODEL REPOSITORIES This assignment is not a solution to the aDPCSP, because the corresponding preference is not guar-anteed to be maximal (and, the assignment is, in fact, not optimal). After the creation of n12, the pri-ority queue Q looks as follows (the ordering between n2 and n4 may vary since P P (n2) = P P (n4)and CP (n2) = CP (n4)): n10, n8, n12, n6, n2, n4 Therefore, the next node to be explored (after n9 and the subsequent creation of n12) is n10. Thirdly, node n19 does correspond with an optimal solution. After its creation, Q equals: n19, n12, n6, n2, n4 As a consequence, n19 is dequeued in the next iteration. Because no children of n19 can be created(Xu(n19) = â and the activity constraints activate no more attributes), n19 is retained as a solution. If the user is interested in ï¬nding multiple alternative solutions, the search may proceed until Q contains no more nodes with a P P value that is not smaller than the maximum preference ofthe ï¬rst solution. In this case, P P (n12) CP (n19) and hence, there is only one solution to this aDPCSP. 3. Compositional Model Repositories The aDPCSPs discussed in the previous section provide the foundation for the development of thecompositional model repositories. This section speciï¬es the problem that a compositional modelrepository is built to solve and shows how it can be translated into an aDPCSP, and hence be resolvedusing the proposed aDPCSP solution algorithm. 3.1 Background: assumption based truth maintenance An ATMS is a mechanism that keeps track of how each piece of inferred information dependson presumed information and facts and of how inconsistencies arise. In an ATMS, each piece ofinformation used or derived by the problem solver is stored as a node. Certain pieces of informationare not known to be true and cannot be inferred from other pieces of information, yet plausibleinference may be drawn from them. Such nodes are categorised by a special type and referred to asassumptions. Inferences between pieces of information are maintained within the ATMS as dependencies be- tween the corresponding nodes. In its extended form (see de Kleer, 1988; or Keppens, 2002), theATMS can take inferences, called justiï¬cations of the form ni â§ . . . â§ nj ⧠¬nk â§ . . . ⧠¬nl â nm,where ni, . . . , nj, nk, . . . , nl, nm are nodes that the problem solver is interested in. An ATMScan also take a speciï¬c type of justiï¬cation, called nogood, that leads to an inconsistency, of theform ni â§ . . . â§ nj ⧠¬nk â§ . . . ⧠¬nl â ⥠(meaning that at least one of the statements inni, . . . , nj, ¬nk, . . . , ¬nl must be false). In the ATMS, these nogoods are represented as jus-tiï¬cations of a special node, called the nogood node. Based on the given justiï¬cations and nogoods, the ATMS computes a label for each (non- assumption) node. A label is a set of environments and an environment is a set of assumptions.In particular, an environment E depicts a possible world where all the assumptions in E are true.Thus, the label L(n) of a node n describes all possible worlds in which n can be true. The labelcomputation algorithm of the ATMS guarantees that each label is: 513
KEPPENS & SHEN ⢠Sound - All assumptions in any environment within the label of a node being true is a sufï¬cient condition to derive that node: âE â L(n), [(â§niâEni) â§ (â§Â¬niâE¬ni)] n ⢠Consistent - No environment in the label of a node, other than the nogood node, describes an impossible world: âE â L(n), [(â§niâEni) â§ (â§Â¬niâE¬ni)] ⥠⢠Minimal - The label does not contain possible worlds that are less general than one of the other possible worlds it contains (i.e. environments that are supersets of other environmentsin the label): âE â L(n) E â L(n), E â E ⢠Complete - The label of each node, other than the nogood node, describes all possible worlds in which that node can be inferred: âE,[(â§niâEni) â§ (â§Â¬niâE¬ni) n] âE â L(n), [(â§niâE ni) â§ (â§Â¬niâE ¬ni) n] 3.2 Knowledge Representation As with any other knowledge-based approach, building a compositional modeller requires a formal-ism for the speciï¬cation of its inputs, its outputs and its knowledge base. The work developed hereis loosely based on the compositional modelling language (Bobrow, Falkenhainer, Farquhar, Fikes,Forbus, Gruber, Iwasaki, & Kuipers, 1996), a proposed standard knowledge representation formal-ism for compositional modellers, but adapted to meet the challenges of the ecological compositionalmodelling problems identiï¬ed in the introduction. 3.2.1 PRELIMINARY CONCEPTS The most primitive constructs in a compositional modeller are participants, relations and assump-tions. This subsection summarises these concepts and explains how they are represented herein. Participants2 refer to the objects of interest, which are involved in the scenario or its model. These participants may be real-world objects or conceptual objects, such as variables that expressfeatures of real-world objects in a mathematical model. For instance, a population of a species isa typical example of a real-world object, and a variable that expresses the number of individualsof this species forms an example of a conceptual object. It is natural to group objects that sharesomething in common into classes. Participants are herein grouped into participant classes, witheach representing a set of participants that share certain common features. Each class will be givena name for easy reference. Relations describe how the participants are related to one another. As with participants, some relations represent a real-world relationship, such as: 2. Some of the previous work in compositional modelling refers to these as individuals and quantities, but such names would not suit the present application. Ecological models typically describe the behaviour of populations rather thanthat of individuals and it is often hard to distinguish between quantities. 514
COMPOSITIONAL MODEL REPOSITORIES predation(frog, insect) (1) Other relations may be conceptual in nature, such as equation (2), which describes an important textbook model of logistic population growth (Ford, 1999): d size change = parameter à size à (1 â ) (2) dt capacity To be consistent with other compositional modelling approaches, this paper employs a LISP- style notation for relations. As such, the above two sample relations become: (predation frog insect) (1) (d/dt change (* change-rate size (- 1 (/ size capacity)))) (2) Assumptions form a special type of relation that are employed to distinguish between alternative model design decisions. Internally, assumptions will be stored in the form of assumption nodes inthe ATMS (see section 3.3.1), but in the knowledge base, assumptions appear as relations with aspeciï¬c syntax and semantics. Two types of assumptions are employed in this article. Relevance assumptions state what phe- nomena are to be included in or excluded from the scenario model. Typical examples of phenomenaare the population growth and predation phenomena. The general format of a relevance assumptionis shown in (3). The phenomenon that is incorporated in the scenario model when describing a rele-vance assumption is identiï¬ed by name and is speciï¬c to the subsequent participants or relations.For example, relevance assumption (4) states that the growth of participant ?population is to beincluded in the model. (relevant name [ participant | relation ]) (3) (relevant growth ?population) (4) Model assumptions specify which type of model is utilised to describe the behaviour of a certain participant or relation. Typical examples of model types include the exponential (Malthus, 1798)and the logistic (Verhulst, 1838) model types of population growth. The formal speciï¬cation of amodel assumption is given in (5). Often the name in (5) corresponds to the name of a known(partial) model of the phenomenon or process being described. The example in (6) states that thepopulation ?population is being modelled using the logistic approach. (model [ participant | relation ] name ) (5) (model ?population logistic) (6) 515
KEPPENS & SHEN Predators natality mortality mortalityârate natalityârate preyârequirement capacity Prey natality mortality mortalityârate predation natalityârate capacity searchârate preyâhandlingâtime Figure 3: Stock ï¬ow diagram of predator prey scenario model 3.2.2 SCENARIOS AND SCENARIO MODELS As formalised by Keppens and Shen (2001b), a compositional modeller takes two inputs and pro-duces one output. The ï¬rst input is a representation (which is itself a model) that describes thesystem of interest by means of an accessible formalism. This model, which normally consists of(mainly) real-world participants and their interrelationships, is called the scenario. The second inputis the task description. It is a formal description of the criteria by which the adequacy of the outputis evaluated. The output is a new model that describes the scenario in a more detailed formalism,usually a set of variables and equations, which the model-based reasoner can employ readily. Such amodel, which normally contains conceptual participants and interrelationships, is called a scenariomodel. The aim of any compositional modeller is to translate the scenario into a scenario model, bymeans of the task description. In this work, a model is formally deï¬ned by a tuple P, R , where P is a set of participants and R is a set of relations over the participants in P . This deï¬nition applies to both the scenario and thescenario model. A typical example of a scenario is a description of a predator population, a preypopulation and a predation relation between the predator and the prey. This scenario is a model P, R with: P = predator, prey R = (predation predator prey) The aim of the compositional model repository is to translate a scenario into a scenario model. Within this work, both systems dynamics stock-ï¬ow formalism (Forrester, 1968) and ordinary dif-ferential equations (ODEs) will be employed as the modelling formalisms. For example, a scenariomodel that corresponds to the above scenario is depicted in Figure 3. Formally, a scenario model isanother model P, R and in this case P = Npredator, Bpredator, Dpredator, Nprey, Bprey, Dprey, Pprey, bpredator, bprey, dpredator, dprey, Cpredator, Cprey,s(prey,predator), t(prey,predator), r(predator,prey) 516
COMPOSITIONAL MODEL REPOSITORIES Symbol Variable name Npredator, Nprey number of predators, prey Bpredator, Bprey natality of predators, prey Dpredator, Dprey mortality of predators, prey Pprey predation of prey bpredator, bprey natality-rate of predators, prey dpredator, dprey mortality-rate of predators, prey Cpredator, Cprey capacity of predators, prey s(prey,predator) search-rate t(prey,predator) prey-handling-time r(predator,prey) prey-requirement Table 1: Variables in the stock ï¬ow diagram and the mathematical model d R = N dt predator = Bpredator â Dpredator, d N dt prey = Bprey â Dprey â Pprey, Bpredator = bpredator à Npredator,Bprey = bprey à Nprey, N D predator predator = dpredator à Npredator à , Cpredator N D prey prey = dprey à Nprey à , Cprey s( P prey,predator) à Nprey à Npredator prey = , 1 + s(prey,predator) à Nprey à t(prey,predator) Cpredator = r(predator,prey) à Nprey,Cprey = Nprey The relation between the variables of the mathematical model and those used in the stock-ï¬ow dia-gram is given in table 1. Generally speaking, stock-ï¬ow diagrams are graphical representations ofsystems of (ordinary or qualitative) differential equations. In the automated modelling literature ingeneral, and engineering and physical systems modelling in particular, more sophisticated represen-tational formalisms have been developed to enable the identiï¬cation of mathematical models of thebehaviour of dynamic systems from observations. Examples include bond graphs (Karnopp, Mar-golis, & Rosenberg, 1990) and generalised physical networks (Easley & Bradley, 1999). However,the potential beneï¬ts of these more advanced formalisms are not exploited here, but remain as aninteresting future work. Instead, stock-ï¬ow diagrams are employed throughout this paper as theyare far more commonly used in ecological modelling (Ford, 1999). It is often possible to construct multiple scenario models from a single given scenario, and the task speciï¬cation is employed to guide the search for the most appropriate one(s). In this work,scenario models are selected on the basis of hard constraints and user preferences. The hard con-straints stem from restrictions imposed on compositionality by the representational framework (seesection 3.2.3) and from properties the scenario model is required to satisfy (see section 3.2.3). The 517
KEPPENS & SHEN Name Syntax (inï¬x notation) Syntax (preï¬x notation) ?var = C+(formula) (== ?var (C-add formula)) Addition ?var = Câ(formula) (== ?var (C-sub formula)) ?var = CÃ(formula) (== ?var (C-mul formula)) Multiplication ?var = C÷(formula) (== ?var (C-div formula)) ?var = Cif,p(antecedent, formula) (== ?var (C-if antecedent formula :priority p)) Selection ?var = Celse(formula) (== ?var (C-else formula) Table 2: Composable functors and composable relations user preferences express the userâs subjective view as to which modelling approaches are moreappropriate in the context of the current scenario (see section 2.2). 3.2.3 THE KNOWLEDGE BASE To construct scenario models from a given scenario, a compositional modeller relies on the useof a knowledge base that is particular to the problem domain. To illustrate the ideas, this sectionpresents the constructs employed in the compositional modeller that is developed to synthesisescenario models in the ecological domain. Composable relations The knowledge base in this approach consists of partial models that can be instantiated and composed into more complex scenario models. The composition of partial modelsinto a scenario model may involve the composition of partial relations (coming from different partialmodels) in compounded relations. In the sample scenario model of section 3.2.2, the followingrelation describes the changes of population size of the prey population d N dt prey = Bprey â Dprey â Pprey (7) In (7), Nprey is the population size, Bprey the number of births, Dprey the number of natural deathsand Pprey the number of prey who died due to predation. Thus, relation (7) actually describes twophenomena that affect the population size Nprey: natural population growth (Bprey â Dprey) andpredation related deaths (Pprey). When constructing the knowledge base, it is desirable to representthese two phenomena in isolation because they do not always occur in combination. For example,some species do not have predators, and it is therefore unnecessary to always include predationas a cause of death. From this viewpoint, relation (7) can be seen as composed from differentcomposable relations in the knowledge base: d d d N N N dt prey = C+(Bprey) dt prey = Câ(Dprey) dt prey = Câ(Pprey) The use of composable relations enables the knowledge base to cover as many combinations of the phenomena that may affect a relation as possible, by representing each phenomenon indi-vidually rather than precompiling everything together. Because only the component parts (i.e. thecomposable relations) of relations need to be represented, instead of all possible, and however com-plex, combinations of them, the knowledge base can be smaller and more effective. This sectiondescribes how such composable relations are represented in the knowledge base, as well as whetherand how they can be composed to form compounded relations. 518
COMPOSITIONAL MODEL REPOSITORIES Composable relations are those containing composable functors and for which a method of composition exists (that describes how a complete set of composable relations can be composed).The composable functors employed are those proposed by Bobrow et al. (1996) with a new addition:composable selection. A summary of such composable relations is presented in table 2. The composable relations introduced by Bobrow et al. (1996) are easy to understand. The formulae f in v = C+(f ) and v = Câ(f ) represent terms (respectively f and âf ) of a sum, andthe formulae f in v = CÃ(f ) and v = C÷(f ) represent factors (respectively f and 1 ) of a product. f However, ecological models in use typically contain selection statements which declare that one certain equation must be employed when a condition is satisï¬ed and some other one otherwise.Formally, a selection is a relation of the form if c1 then v = r1 else if c2 . . . else v = rn (8) where v is a participant, each ci (with i = 1, . . . , nâ1) is a relation describing a condition statementand each rj (with j = 1, . . . , n) is a relation. This selection relation consists of the partial relations: if ci then v = ri with i = 1, . . . , n â 1 else v = rn Therefore, a selection relation can be composed from two types of composable relation. The ï¬rstis a composable âifâ relation, which has the form v = C if,p(a, f ), where v is a participant, p is anelement taken from a total order, such as the set of natural numbers , which denotes the priority of  the composable âifâ relation in the sequence, and a and f are two given relations. The second typeof composable relation is a composable âelseâ relation, which has the form v = C else(felse), wherefelse is a given relation assigned to v if none of the antecedents in the composable âifâ relations istrue. To illustrate this notation, the selection relation (8) can be composed from the following com- posable relations: v = Cif,p1(c1, r1) ... v = Cif,pnâ1(cnâ1, rnâ1) v = Celse(rn) with p1 > . . . > pnâ1. To combine the composable relations, a number of rules are deï¬ned to implement the semantics of the representational formalism. In theory, a set of rules can be generated that enables the aggre-gation of any set of composable relations. In practice, however, a trade-off must be made betweenï¬exibility (the ability to combine many different types of composable relation) and comprehensi-bility (the use of a set of rules that is easily understood by the knowledge engineer who employscomposable relations). Thus, the types of composable relations that can be combined has to berestricted. Table 3 summarises what composable relations can be joined to form compounded relations. The principle guiding the construction of this table is to allow only the composition of relations ofcertain types for which a resulting compound relation is intuitively obvious. For example, according 519
KEPPENS & SHEN C+(f2) Câ(f2) CÃ(f2) C÷(f2) Cif,p2(a2, f2) Celse(f2) C+(f1) yes yes no no no no Câ(f1) yes yes no no no no CÃ(f1) no no yes yes no no C÷(f1) no no yes yes no no Cif,p1(a1, f1) no no no no yes yes Cif,p2(a1, f1) no no no no no yes Celse(f1) no no no no yes no Table 3: Composibility of composable relations to Table 3, a composable addition relation x = C +(y) can be combined with a composable sub-traction relation x = Câ(z) because their combination is clearly x = y â z. However, according toTable 3, a composable addition relation x = C +(y) can not be combined with a composable multi-plication relation x = CÃ(z), because an arbitrary and non-intuitive rule would otherwise have tobe deï¬ned to decide whether the compound relation would be x = y + z or x = y à z. The order in which the composable selections must be considered is deï¬ned by the priorities (or is implicit in the case of Celse). Therefore, composable selections can be combined with oneanother provided no two composable âifâ relations have the same priority. In order to derive the actual rules of composition, the sets of all composable relations with the same functor for a given model P, R are deï¬ned ï¬rst: R(v, C+) = v = C+(fi) | (v = C+(fi)) â RR(v, Câ) = v = Câ(fi) | (v = Câ(fi)) â RR(v, CÃ) = v = CÃ(fi) | (v = CÃ(fi)) â RR(v, C÷) = v = C÷(fi) | (v = C÷(fi)) â R R(v, Cif) = v = Cif,pi(ai, fi) | (v = Cif,pi(ai, fi)) â R R(v, Celse) = v = Celse(fi) | (v = Celse(fi)) â R From this, the rules of composition can be built as given in the expressions (9), (10) and (11). They jointly state how a given set of composable relations can be rewritten as a single compoundrelation. Each of these rules contains a complete set of all composable relations in the antecedent.In particular, the antecedent of rule (9) contains the set of all composable addition and subtractionrelations with the same participant v in the left-hand side. Similarly, the antecedent rule (10) contains the complete set of composable multiplication rela- tions. Finally, the antecedent of rule (11) is satisï¬ed for the complete set of composable if and elserelations with the same left-hand participant v, provided that the priorities are strictly ordered (i.e.no two priorities are equal) and that there is only a single composable else relation. The latter twoconditions are added because two composable if relations with the same priority or two composableelse relations can not be compounded. The consequents of the rules of composition explain howthese complete sets of composable relations can be joined. This is simply a matter of applying theappropriate mathematical operation to the provided terms. 520
COMPOSITIONAL MODEL REPOSITORIES R(v, C+) = v = C+(f1+), . . . , v = C+(fm+)â§ R(v, Câ) = v = Câ(f1â), . . . , v = Câ(fnâ) â (9) v = f1+ + . . . + fm+ â (f1â + . . . + fnâ) R(v, CÃ) = v = CÃ(f1Ã), . . . , v = CÃ(fmÃ)â§ R(v, C÷) = v = C÷(f1÷), . . . , v = C÷(fn÷) â (10) 1 à f v = 1à à . . . à fmà f1÷ à . . . à fn÷ R(v, Cif) =v = Cif,p1(a1, f1), . . . , v = Cif,pm(am, fm)â§ R(v, Celse) =v = Celse(felse) â§ p1 > . . . > pm â (11) v =if a1 then f1, else . . . , if am then fm, else felse Property deï¬nitions Property deï¬nitions describe features of interest to the application requiring a scenario model. A property deï¬nition Î is a tuple P s, Φ, Ï where P s = ps1, . . . psm is a set ofsource-participants, a predicate calculus sentence Φ whose free variables are elements of P s, andÏ is a relation, whose free variables are also elements of P s, such that âps1, . . . , âpsmΦ â Ï A typical example of a feature of interest is the requirement that a certain variable in the model is endogenous or exogenous. To be more speciï¬c, the property deï¬nitions below describe when avariable ?v is endogenous and exogenous respectively. (defproperty endogenous :source-participants ((?v :type variable)):structural-condition ((or (== ?v *) (d/dt ?v *))):property (endogenous ?v)) (defproperty exogenous :source-participants ((?v :type variable)):structural-condition ((not (endogenous ?v))):property (exogenous ?v)) The ï¬rst deï¬nition states that whenever either ?v = * or d ?v = * is true (where * matches dt any constant or formula), ?v is deemed to be endogenous. The second property deï¬nition indicatesthat a variable is said to be exogenous if such an object exists and it is not endogenous. By describing such features formally in the knowledge base, property deï¬nitions enable them to be imposed as criteria on the selection of scenario models. In this way, the variable describingthe size of a particular population in an eco-system, for instance, can be forced to be endogenous. Note that required properties can be speciï¬ed in two different ways: either globally as goals for the scenario model construction or locally as a required purpose of a certain model fragment. Thelatter use of model properties will be illustrated later. 521
KEPPENS & SHEN Model fragments Model fragments are the building blocks with which scenario models are con- structed. A model fragment µ is a tuple P s, P t, Φs, Φt, A, Î where P s = ps1, . . . psm is aset of variables called source-participants, P t = pt1, . . . , ptn is a set of variables called target-participants, Φs = Ïs1, . . . , Ïsv is a set of relations, called structural conditions, whose free vari-ables are elements of P s, Φt = Ït1, . . . , Ïtx is a set of relations, called postconditions, whose freevariables are elements of P s ⪠P t, A = a1, . . . , ay is a set of relations, called assumptions, andÎ = is a set of relations, called purpose-required properties, such that: âÏti â Φt, âps1, . . . , âpsm, âpt1, . . . , âptn, Ïs1 â§ . . . â§ Ïsv â (a1 â§ . . . â§ ay â Ïti) (12) âÏ â Î , âps1, . . . , âpsm, âpt1, . . . , âptn, Ïs1 â§ . . . â§ Ïsv â§ a1 â§ . . . â§ ax â§ Â¬Ï â ⥠(13) Note that, in this work, each property deï¬nition P s, Φ, Ï is equivalent to a model fragment P s, , Φ, Ï, , . For example, the model fragment below states that a population ?p can be described by two variables ?p-size (describing the size of ?p) and ?p-change (describing the rate of change inpopulation size) and a differential equation d ?p-size = ?p-change dt The usage of this partial scenario model is subject to two conditions: (1) the growth phenomenon isrelevant with regard to ?p, and (2) the variable ?p-change is endogenous in the eventual scenariomodel. The former requirement is indicated by the relevance assumption and the latter by thepurpose-required property: (defModelFragment population-growth :source-participants ((?p :type population)):assumptions ((relevant growth ?p)):target-participants ((?p-size :type variable) (?p-change :type variable)) :postconditions ((size-of ?p-size ?p) (change-of ?p-change ?p)(d/dt ?p-size ?p-change)) :purpose-required ((endogenous ?p-change))) The purpose-required property is usually satisï¬ed by additional model fragments, such as the one below: (defModelFragment logistic-population-growth :source-participants ((?p :type population) (?p-size :type variable)(?p-change :type variable)) :structural-conditions ((size-of ?p-size ?p) (change-of ?p-births ?p)) :assumptions ((model ?p-size logistic)):target-participants ((?r :type parameter) (?k :type variable)(?d :type variable)) :postconditions ((capacity-of ?k ?p) (density-of ?d ?p-size)(== ?d (C-add (/ ?p-size ?k))) (== ?p-change (- (* ?r ?p-size (- 1 ?d)))))) 522
COMPOSITIONAL MODEL REPOSITORIES Model fragments are rules of inference that describe how new knowledge can be derived from existing knowledge by committing the emerging model to certain assumptions. They are usedto generate a space of possible models. Model fragments are instantiated by matching source-participants to existing participants in the scenario or an emerging model, and by matching thestructural conditions to corresponding relations. For each possible instantiation, a new instance isgenerated for each of the target-participants, and where necessary, new instances are also created forthe postconditions and assumptions. Such instances, as well as the inferential relationships betweenthe instances of the source-participants, structural conditions and assumptions on the one hand, andthose of the target-participants and postconditions on the other, are stored in an ATMS, forming themodel space. This is to be further explained in section 3.3.1. A model fragment is said to be applied if it is instantiated and the underlying assumptions hold. If a model fragment is applied, the instances of the target-participants and postconditionscorresponding to the instantiation of that model fragment must be added to the resulting model. Withrespect to the above example, the model fragment that implements the logistic population growthmodel is instantiated whenever variables exist that describe the size and change in a population, andit is applied if the logistic model for population size has also been selected. Note that in most compositional modellers, such as the ones devised by Heller and Struss (1998, 2001); Levy, Iwasaki and Fikes (1997); Nayak and Joskowicz (1996); and Rickel and Porter (1997),model fragments represent direct translations of components of physical systems into inï¬uences be-tween variables. Because the compositional modeller presented herein aims to serve as an ecologicalmodel repository, the contents of the model fragments employed differs from that of conventionalcompositional modellers in two important regards: Firstly, model fragments contain partial models describing certain phenomena instead of in- ï¬uences. These partial models normally correspond to those developed in ecological modellingresearch. Typical examples include the logistic population growth model (Verhulst, 1838) and theHolling predation model (Holling, 1959) devised in the population dynamics literature. Secondly, the partial models contained in the model fragments often need to be composed incre- mentally. For example, the aforementioned sample model fragment logistic-population-growth requires an emerging scenario model, which may be generated by the other sample modelfragment population-growth. Thus, one model fragment, e.g. logistic-population-growth, can expand on the partial model contained in another, e.g. population-growth. Be-cause of this feature, it is (correctly) presumed that no model fragment µ generates new relationsthat are preconditions of model fragments that µ expands on. Violating this presumption wouldmake little sense in the context of the present application as it would imply a recursive extension ofan emerging scenario model with the same set of variables and equations. 3.2.4 PARTICIPANT CLASS DECLARATION AND PARTICIPANT TYPE HIERARCHIES In general, participant classes need not be deï¬ned. However, certain types of participant may bedescribed in terms of other interesting participants, irrespective of the modelling choices. Thisfeature provides syntactic sugar for describing important relations between participants, making iteasier to declare required properties of a scenario model in terms of the participants of the scenario.For example, the behaviour of a population may be described in terms of population size and growthrate variables: (defEntity population :participants (size growth-rate)) 523
KEPPENS & SHEN Participant class declarations may also be employed within model fragments to provide a more speciï¬c deï¬nition of the meaning of the source-participants and the target-participants. In this way,participant speciï¬cations are constrained to be a feature of another participant by means of the:entity statement, as the following example illustrates: (defModelFragment define-population-growth-phenomenon :source-participants ((?p :type population)):target-participants ((?ps :type stock :entity (size ?p)) (?pg :type variable :entity (growth-rate ?p))(?pb :type flow)(?pd :type flow)) :assumptions ((relevant growth ?p)):postconditions ((== ?pg (- ?pb ?pd)) (flow ?pb source ?pl)(flow ?pd ?pl sink))) Furthermore, participant class declarations may deï¬ne one class to be an immediate subclass of another. For example, the population participant class of holometabolous insects (e.g. butterï¬ies)may be deï¬ned as a subclass of the population participant class: (defEntity holometabolous-insect-population :subclass-of (population):participants (larva-number pupa-number adult-number)) In this way, a participant type hierarchy is deï¬ned. Each subclass inherits all participants of its superclasses (i.e. its immediate superclass and superclasses of superclasses). In summary, a participant class declaration is a tuple Î = Î S, P where Î S is a participant class, called the immediate superclass of the participant class and P is a set of participants classesthat describe important features of the participant class. 3.3 Inference The compositional modelling method presented herein employs a four step inference procedure: 1. Model space construction. The model space is an ATMS that efï¬ciently stores all the partici- pants, relations and model design decisions (represented in the form of relevance and modelassumptions) that may be part of the ï¬nal scenario model, as well as the conditions underwhich each of these participants and relations must or must not be part of the scenario model. 2. aDCSP construction. The model space contains a number of hard constraints on the partici- pants and relations that may be combined. This inference step extracts such restrictions andtranslates them into an aDCSP. 3. Inclusion of order-of-magnitude preferences. Preferences are associated with relevance and model assumptions in the scenario space as they reï¬ect the relative appropriateness of theseassumptions, resulting in an aDPCSP. 4. Scenario model selection. This inference step solves the aDPCSP. The resulting solutions correspond to scenario models that are consistent according to the domain knowledge andoptimise the overall preference with respect to the order-of-magnitude preference calculus. 524
COMPOSITIONAL MODEL REPOSITORIES Problem Specification Compositional Model Repository Application population(prey)population(predator)predation(predator,prey) STEP 1 Model Space Construction Scenario Requirements and Model Space Inconsistencies Generation Knowledge Base STEP 2 Requirements Activityâbased Dynamic Constraint and Inconsistencies Satisfaction Problem Construction Scenario Model Construction Dynamic Constraint Satisfaction Problem Scenario Model STEP 3 Inclusion of OrderâofâMagnitude Preferences or Preferences Preference Ordering Prey birth death rate rate crowding Dynamic Preference Constraint Application max crowd Satisfaction Problem sustainablepopulation foodâdemand consumption Predator STEP 4 birth death rate rate Scenario Model Selection crowding (aDCSP solver) Assumption Set Knowledge elements Inference elements Figure 4: Inference procedures of the compositional modeller 525
KEPPENS & SHEN These four steps correspond to the four squares of the compositional model repository in Figure 4 In this section, each of these inference steps is discussed in detail and illustrated by means of simple examples. The next section contains a more detailed example and shows how this procedurecan be applied to a non-trivial ecological modelling domain. 3.3.1 SCENARIO + KNOWLEDGE BASE = MODEL SPACE As previously stated, the aim of a compositional modeller is to translate a scenario into a scenariomodel. Both are representations of the system of interest though they model the system at a differentlevel of detail. The knowledge base provides the foundation for translation. All the scenario modelsthat can be constructed from the given scenario, with regard to the knowledge base, are stored in themodel space. A model space is an ATMS (de Kleer, 1986) containing all the participants, relations and as- sumptions that can be instantiated from a given scenario. In this work, the generalised version ofthe ATMS, as introduced by de Kleer (1988), is employed as it allows the use of negations of nodesin the justiï¬cations. The algorithm GENERATEMODELSPACE( O, R ) describes how such a modelspace can be created from a scenario O, R . It ï¬rst initialises the model space θ with the partic-ipant instances (O) and the relation instances (R) from the scenario. Then, for each model frag-ment whose source-participants and structural conditions match participants and relations alreadyin θ, new instances of its target-participants, assumptions and postconditions are added to θ. Be-cause each property deï¬nition P s, Φ, Ï is equivalent to a model fragment P s, , Φ, Ï, , ,this procedure applies to property deï¬nitions as well as model fragments. Matching the source-participants and structural conditions of a model fragment µ to the emerging model space is per-formed by the function match(µ, θ, Ï) as speciï¬ed below, where µ is the model fragment beingmatched, and Ï is a substitution from the source-participants of µ to participant instances. true if Ï = ps  1/o1, . . . , psm/om⧠ P s(µ) = ps1, . . . , psmâ§ match(µ, θ, Ï) =   o1 â θ â§ . . . â§ om â θ⧠ âÏ â Φs(µ), ÏÏ â θ false otherwise Each match, speciï¬ed by a model fragment µ and a substitution Ï, is processed as follows: ⢠For each assumption a â A(µ), a new node, denoting the assumption instance Ïa, is created and added to θ. ⢠Then, a new node n(Ï,µ), denoting the instantiation of µ via substitution Ï, is created, added to θ and justiï¬ed by the implication: (â§aâA(µ)Ïa) â§ (â§pâPs(µ)Ïp) â§ (â§ÏâΦs(µ)ÏÏ) â n(Ï,µ) ⢠Finally, a new instance for each target-participant p â P t(µ) and for each postcondition Ï â Φt(µ), provided ÏÏ does not already exist in the model space θ, is created. For thetarget-participants, this involves creating a new symbol for each new participant instance withthe function gensym() and extending Ï with the substitution p/gensym(). A new node n 526
COMPOSITIONAL MODEL REPOSITORIES Algorithm 1: GENERATEMODELSPACE( O, R ) θ â new ATMS;for each o â O, add-node(θ, o);for each r â R, add-node(θ, r);for each µ, Ï, match(µ, θ, Ï) justiï¬cation â â ; for each a â A(µ)  newnode â add-node(θ, (Ïa));  do  justiï¬cation â justiï¬cation ⪠newnode; for each p â Ps(µ)  do justiï¬cation â justiï¬cationâªï¬nd-node(θ,(Ïp)); for each Ï â Φs(µ)  do justiï¬cation â justiï¬cationâªï¬nd-node(θ,(ÏÏ)); add-node(θ,n(Ï,µ)); do  add-justiï¬cation(θ,n(Ï,µ),â§nâjustiï¬cationn); for each p â Pt(µ)  Ï â Ï âª p/gensym();    do o â add-node(θ,(Ïp));    add-justiï¬cation(θ, o, n  (Ï,µ)); for   each Ï â Φt(µ)  if (ÏÏ â θ)    then o â get-node(θ, (ÏÏ));    do    else o â add-node(θ, (ÏÏ));  add-justiï¬cation(θ,o,n(Ï,µ)); for each n1, . . . , nm, inconsistent(n1, . . . , nm) do add-justiï¬cation(θ, nâ¥, n1 â§ . . . â§ nm); Instances of ... A(µ) = a1, . . . , aÏa1 assumptions: t Ïpt1x Ïat Instances of target ... participants: Ïps P t(µ) = pt , . . . , pt n 1 Ïpt 1 n Instances of source- . participants: .. µ
 ¡ ¡ ¢¡¢  ¡ ¡ ¢¡¢  ¡ ¡ ¢¡¢ P s(µ) = ps, . . . , ps 1  ¡ ¡ ¢¡¢ m ÏÏt1 Ïpsm . Instances of postconditions: .. Φt(µ) = Ït , . . . , Ït 1 s ÏÏs1 ÏÏts Instances of structural . conditions: ..
Φs(µ) = Ïs, . . . , Ïs 1
v ÏÏsv Figure 5: Model fragment application is created and added to θ for each new participant instance Ïp and for each new instantiatedrelation ÏÏ. Each of these nodes is justiï¬ed by the implication n(Ï,µ) â n. 527
KEPPENS & SHEN Ï is a global property that must by satisï¬ed Ï is a purpose-required property in a model fragment µ, r1 and r2 are by all consistent scenario models and µ is applied with a substitution Ï. non-composable relations v = r1(. . .) â§ â¥ â§ Ï â§ Ï v = r2(. . .) Â¬Ï â§ â¥ â§ â¥ Ïµ â§ (a) Inconsistency caused by a (b) Inconsistency caused by a (c) Inconsistency caused by global property purpose-required property non-composable relations Figure 6: Sources of inconsistency To illustrate this procedure, Figure 5 shows a graphical representation of the inferences that areconstructed by applying a model fragment µ = P s, P t, Φs, Φt, A, with respect to a substitutionÏ. Once all possible applications of model fragments have been exhausted, the inconsistencies in the model space are identiï¬ed and recorded in the ATMS. In the algorithm, nogoods are generatedfor each set n1, . . . , nm of inconsistent nodes, denoted inconsistent(n1, . . . , nm). There arethree sources of inconsistencies that are each reported to the ATMS in a different way: ⢠Global properties: Let Ï be an instance of a global property that any scenario model must satisfy. Then, any combination of assumptions and negations of assumptions that prevents Ïfrom being satisï¬ed is inconsistent. Therefore, inconsistent(¬Ï) must be reported for anyrequired global property Ï. This type of inconsistency is depicted in Figure 6(a). ⢠Purpose-required properties: Any application of a model fragment µ without satisfying its purpose-required properties Î (µ) yields an inconsistency (see (13)). Hence, for each noden(Ï,µ) denoting the instantiation of µ via substitution Ï, and for each node nÏÏ describing theappropriate instance of a purpose-required property Ï â Î (µ), inconsistent(n(Ï,µ), ¬nÏÏ)is reported. This type of inconsistency is depicted in Figure 6(b). ⢠Non-composable relations: In any mathematical formalism designed to describe simulation models of dynamic systems, certain combinations of relations may over-constrain the model,and hence, be unsuitable for generating the behaviour of a system of interest. Within thesystem dynamics and ODE formalisms used in this paper, assignments of relations to thesame variable are only composable if those relations are explicitly deemed composable. Inother words, two relations v = ri and v = rj can only be combined with one another if riand rj are composable. Examples of pairs of non-composable relations include x = C+(y) and x = CÃ(z) because C+ and Cà relations are not composable, and a = C+(b) and a = c + d because c + d is not a composable relation. Combinations of such non-composable relations must be reported as an inconsistency as well.This type of inconsistency is depicted in Figure 6(c). 528
COMPOSITIONAL MODEL REPOSITORIES assumption: participant: (model nfrog logistic) parameter rfrog assumption: relation: participant: (relevant growth frog) d n parameter k dt frog = cfrog frog participant: participant: relation: µ1 population frog variable cfrog (capacity-of kfrog frog) µ1: population-growth participant: relation: µ n model fragment variable n 2 frog frog cfrog = rfrog à nfrog à (1 â ) kfrog µ2: logistic-population-growthmodel fragment relation: Ï: endogenous (change-of cfrog frog) property deï¬nition relation: relation: Ï (size-of nfrog frog) (endogenous cfrog) relation: ⧠⥠¬endogenous(cfrog) Figure 7: Partial model space To illustrate the model space construction algorithm, Figure 7 presents a small sample model space. It results from the application of the population-growth and logistic-popula-tion-growth model fragments and the endogenous property deï¬nition, which were describedearlier, for a single population âfrogâ. If a larger scenario involving multiple populations and rela-tions between these populations were speciï¬ed, a similar partial model space would be generatedfor each individual population. 3.3.2 FROM MODEL SPACE TO ADCSP Once the model space has been constructed, it can be translated into an aDCSP. The translationprocedure, summarised as algorithm CREATEADCSP(), consists of three steps as described below: Algorithm 2: CREATEADCSP() comment: Ï is the set of substitutions Ï â ;comment: Generate attributes and domains for each A, assumption-class(A) x â create-attribute(); D(x) â ; ï£´ï£´Ï â Ï âªA/x; do  for each a â A  v â create-value();    do D(x) â D(x)âªv;    ï£³Ï â Ï âªa/x : v; comment: Generate activity constraints for each A, assumption-class(A) s â subject(A); do for each a1 ,...,ap ,¬a1â¥,...,¬aq⥠â L(s)  do add(Ïa1 â§...â§Ïap â§Ï¬a1⥠â§...â§Ï¬aq⥠â active(ÏA)); comment: Generate compatibility constraints for each a1 , . . . , ap , ¬a1â¥, . . . , ¬aq⥠â L(nâ¥) do add(Ïa1 â§ . . . â§ Ïap ⧠Ϭa1⥠⧠. . . ⧠Ϭaq⥠â â¥); 529
KEPPENS & SHEN 1. Generate the attributes and domain values from the assumptions. The aDCSP attributes corre- spond to the underlying assumption classes (i.e. groups of assumptions indicating alternativechoices with regards to the same model construction decision). A relevance assumption andits negation jointly form an assumption class. For example, A1 =(relevant growthfrog), ¬(relevant growth frog) speciï¬es such an assumption class. The set ofmodel assumptions involving the same participants/relations, but with different model namesand hence different descriptions, also form an assumption class. For instance, A2 =(modelnfrog exponential), (model nfrog logistic), (model nfrog other), where nfrog is a variable denoting the size of a population, speciï¬es such an assumption class. Run-ning this step of the algorithm, an attribute is created for each assumption class, with thedomain of such an attribute consisting of all assumption instances in the assumption class. 2. Create activity constraints. The attributes and domain values generated in the previous step are only meaningful in situations where the participant and/or relation instances contained inthe arguments of the corresponding assumptions exist. For example, the assumption (modelnfrog logistic) is only relevant if the participant instance nfrog exists. Clearly, all as-sumptions within one assumption class have the same participant and/or relation instances astheir arguments. Because each assumption class corresponds to one attribute, the attributecan be activated if and only if the participant and/or relation instances associated with the re-lated assumption class are active. Therefore, this step creates activity constraints that activatean attribute based on the conjunction of the environments contained within the labels of theparticipants/relations of the assumption class. For instance, as can be deduced from Figure7, nfrog is activated when (relevant growth frog) is committed. Thus, the attributecorresponding to assumption class A2, deï¬ned in step 1, is activated under the attribute valueassignment associated with the (relevant growth frog) assumption. 3. Create compatibility constraints. In the ATMS (or model space), all sources of inconsisten- cies are contained in the label of the nogood node. Therefore, the compatibility constraintsare created directly by translating the environments in the label L(â¥) into the correspondingconjunctions of attribute-value assignments. 3.3.3 ADCSP + PREFERENCES = ADPCSP The aDCSP produced as above formalises the hard requirements imposed upon the scenario models.Among the scenario models that meet these requirements, some may be better than others, becausethe underlying model design decisions may be deemed more appropriate by the user. Preferencesthat express this (relative) level of appropriateness are attached to the assumptions that describe themodel design decisions, and by extension, to the attribute-value pairs in the aDCSP. As discussed insection 2, such an extension to the aDCSP constitutes an aDPCSP. More speciï¬cally, it is worth recalling that in section 2.2 an order-of-magnitude preference calculus is presented that enables representation and reasoning with subjective user preferences fordifferent relevance and modelling assumption. Next, section 2.3 introduces a solution algorithm foraDPCSPs that include an aDCSP, such as the ones constructed with the approach of section 3.3.2,and are extended with subjective user preferences for alternative design decisions. 530
COMPOSITIONAL MODEL REPOSITORIES 3.4 Outline analysis of complexity The complexity of the work arises from four major sources: 1) model space construction, 2) labelpropagation in the ATMS, 3) model space to aDCSP translation, and 4) aDPCSP solution. GENERATEMODELSPACE( O, R ) essentially performs a ï¬xed sequence of instructions and produces a small set of nodes and inferences for each match of a model fragment. Therefore, itstime and space complexity are linear with respect to the number of possible matches of modelfragments. CREATEADCSP() extracts certain information from the model space and rewrites it ina different formalism without further manipulations. Therefore, its time and space complexity arelinear with respect to the size of the model space. The label propagation algorithm of an ATMS is known to have an exponential time complexity. However, because the model space is built up incrementally (by GENERATEMODELSPACE( O, R ))from the root nodes of the ATMS network (i.e. those that correspond to facts and have no an-tecedents) to the leaf nodes (i.e. those that have have no consequents, other than the nogood node)and because the inconsistencies are added at the end, this complexity only increases exponentiallywith the depth of the network and the number of participants and relations in individual model frag-ments, rather than with the size of the model space. This fact signiï¬cantly limits the complexityimpact of label propagation. Firstly, the depth of the ATMS network is restricted by the domain.In many conventional compositional modellers, where model fragments are direct translations fromscenario components to scenario model equations, this depth would be only one. Empirically, con-structing the model space for sophisticated eco-systems, the depth of a model space never exceeded8. Secondly, the size of the individual model fragments does not change signiï¬cantly with the sizeof the knowledge base. The fourth and ï¬nal source of complexity is driven by the fact that the constraint satisfaction algorithm must determine a consistent combination of assumptions in the model space. The spaceof attribute value assignments increases exponentially with the size of the number of assumptionsand hence, with the model space. Thus, the overall complexity of the present approach is largelydominated by the constraint satisfaction algorithm employed. If the user does not specify any preference, the CSP is an aDCSP. Recently, a number of efï¬cient methods have been devised for solving aDCSPs as presented by Minton et al. (1992); Mittal andFalkenhainer (1990); and Verfaillie and Schiex (1994). This helps minimise the overhead incurredfor compositional modelling. With preferences, the CSP becomes an aDPCSP. As argued in section 2, this presents a new problem that has not yet been studied in detail. In this work, an A* algorithm has been proposed toimplement the CSP solution method. This approach is known to be the most efï¬cient in terms ofthe proportion of the search space the algorithm needs to explore before ï¬nding an optimal solution,when compared to other search methods that are based on the same heuristic (Hart et al., 1968). Adisadvantage is that it incurs an exponential space complexity. As explained by Miguel and Shen(2001a, 2001b); and Tsang (1993), a wide range of alternative solution techniques exist for ordinaryCSPs and many of these could also be extended to solve aDPCSPs. A detailed examination of thesetechniques is a topic of future research. 3.5 Automated modelling and scientiï¬c discovery As mentioned previously, a compositional model repository is designed in order to compose modelsfrom a systemâs structure and relevant domain knowledge. As such, this approach gives rise to a po- 531
KEPPENS & SHEN tentially beneï¬cial means to operationalise the outcomes of scientiï¬c discovery. More speciï¬cally,the resultant compositional model repositories will allow existing knowledge on model constructionto be applied to unexperienced scenarios and to support investigation into situations which may bephysically difï¬cult to replicate or create but which may be synthesised in computational represen-tations. The present work has been applied to the vegetation component of the MODMED n-species model (Legg, Muetzelfeldt, & Heathï¬eld, 1995). This n-species model offers a system dynamicsrepresentation of populations of Mediterranean vegetations and of how they are affected by popu-lations of farm animals, climate and environmental management. The purpose of the model is tobe instantiated with respect to various Mediterranean communities, and to serve as a componentof a very large scale simulation that is designed to simulate the effects of various environmentalpolicies on the Mediterranean landscape. A knowledge base containing approximately 60 modelfragments and 4 property deï¬nitions has been constructed, on the basis of the most complex partsof the n-species model in about two man-weeks. This knowledge base can be employed to recon-struct variations of the n-species model to accommodate a variety of possible scenarios, as well asto examine simpliï¬cations of the original n-species model which exclude certain phenomena. The compositional model repository is most closely related to the seminal work on compo- sitional modelling (Falkenhainer & Forbus, 1991). That approach has a similar functionality butit is devised speciï¬cally for physical systems and relies on a component-connection formalism torepresent scenarios. Another approach which has recently been developed and applied to the ecological domain by Heller and Struss (1998, 2001). This work derives a systemâs structure from observations of itsbehaviour and domain knowledge. Therefore, it is able to perform diagnosis of ecological systemsand therapy suggestion. Another important distinction of this work from the present study is thatit presumes that each process can only be described in just one way instead of allowing multiplealternative models. In the machine learning community, a number of approaches have been devised by Bradley, Easley and Stolle (2001); Langley et al. (2002); and Todorovski and D Ëzeroski (1997, 2001) toinduce sets of differential equations from a) observations of behaviour, b) domain knowledge rep-resented in the form of hypothetical equations, and c) a description of the structure of the system.These approaches aim at scientiï¬c discovery by generalising observed behaviour into mathematicalmodels. The speciï¬cations of the scenario and the domain knowledge in these methods are similarto those used in this article. This is especially true for the work by Langley et al. (2002); and Todor-ovski and DËzeroski (1997, 2001), because that work has also been applied to population dynamics.However, the internal mechanisms of these approaches are very different as they essentially rely onexhaustive search procedures instead of constraint satisfaction techniques. 4. A Population Dynamics Example The examples used throughout the previous sections were taken from a more extensive applicationstudy of the present work. The application was aimed to construct a repository of basic populationdynamic models, describing the phenomena of growth, predation and competition. This sectionpresents an overview of how the proposed approach is employed in this application to show theability of the work to scale to larger problems. 532
COMPOSITIONAL MODEL REPOSITORIES 4.1 Knowledge base This subsection illustrates how a set of model fragments can be constructed. The challenge ofthis task lies in the fact that model fragments must encompass a sufï¬ciently general and reusablecomponent part of the ecological models. In instances of models found in the literature on ecologicalmodelling, the boundaries of the recurring component parts are hidden, and it is therefore up to theknowledge engineer to identify them. First, a hierarchy of entity types is set up. The system dynamics models shown earlier contain only three types of participant: variables, stocks and ï¬ows. Here, stocks and ï¬ows are a specialtype of variable with a predetermined meaning. That is, a ï¬ow f into a stock s corresponds to theequation d s = C+(f ) and a ï¬ow f out of a stock s denotes d s = Câ(f ). Hence, stocks and dt dt ï¬ows are deï¬ned as subclasses of the participant class variable: (defEntity variable)(defEntity stock :subclass-of (variable)) (defEntity flow :subclass-of (variable)) The sample properties deï¬ned in section 3.2.3, which describe the condition under which a variable is endogenous or exogenous, are employed in this knowledge base: (defproperty endogenous-1 :source-participants ((?v :type variable)):structural-conditions ((== ?v *)):property (endogenous ?v)) (defproperty endogenous-2 :source-participants ((?v :type variable)):structural-conditions ((d/dt ?v *)):property (endogenous ?v)) (defproperty exogenous :source-participants ((?v :type variable)):structural-conditions ((not (endogenous ?v))):property (exogenous ?v)) The next three model fragments contain the rules of the stock-ï¬ow diagrams employed by sys- tems dynamics models. They respectively describe that: ⢠A ï¬ow ?flow into a stock ?stock corresponds to the composable differential equation: d ?stock = C+(?flow) dt ⢠A ï¬ow ?flow out of a stock ?stock corresponds to the composable differential equation: d ?stock = Câ(?flow) dt ⢠A ï¬ow ?flow from one stock ?stock1 to another stock ?stock2 corresponds to the composable differential equations: d d ?stock1 = Câ(?flow) and ?stock2 = C+(?flow) dt dt 533
KEPPENS & SHEN (defModelFragment inflow :source-participants ((?stock :type stock) (?flow :type flow)) :structural-conditions ((flow ?flow source ?stock)) :postconditions ((d/dt ?stock (C-add ?flow)))) (defModelFragment outflow :source-participants ((?stock :type stock) (?flow :type flow)) :structural-conditions ((flow ?flow ?stock sink)) :postconditions ((d/dt ?stock (C-sub ?flow)))) (defModelFragment inflow :source-participants ((?stock1 :type stock) (?stock2 :type stock)(?flow :type flow)) :structural-conditions ((flow ?flow ?stock1 ?stock2)) :postconditions ((d/dt ?stock1 (C-sub ?flow)) (d/dt ?stock2 (C-add ?flow)))) Once the above declarations are in place, the knowledge base of model fragments can be de- ï¬ned. The ï¬rst model fragment describes the population growth phenomenon. Note that all of theaforementioned growth, predation and competition models contain a stock representing populationsize and two ï¬ows, one ï¬ow of births into the stock and another ï¬ow of deaths out of the stock. Thiscommon feature of models on population dynamics is contained in a single model fragment. (defModelFragment population-growth :source-participants ((?population :type population)) :assumptions ((relevant growth ?population)) :target-participants ((?size :type stock :name size) (?birth-flow :type flow :name births)(?death-flow :type flow :name deaths)) :postconditions ((flow ?birth-flow source ?size) (flow ?death-flow ?size sink)(size-of ?size ?population)(births-of ?birth-flow ?population)(deaths-of ?death-flow ?population)) :purpose-required ((endogenous ?birth-flow) (endogenous ?death-flow))) The variables ?birth-flow and ?death-flow become endogenous if the model contains an equation describing birth ï¬ow and death ï¬ow. These equations differ between population growthmodels. Two types of population growth model are the exponential growth model (Malthus, 1798),which is shown in Figure 8(a), and the logistic growth model (Verhulst, 1838), which is shown inFigure 8(b). The following two model fragments formally describe these component models: 534
COMPOSITIONAL MODEL REPOSITORIES ¢¤ ¦!¨" %&!¨"'!)(  ¢¡¤£¦¥¨§ ¡ ¥¨§ 0 1 © "¢3 §¡ ¢ 1 2 © 4
£ (a) Exponential growth (b) Logistic growth Figure 8: Population growth models (defModelFragment exponential-population-growth :source-participants ((?population :type population) (?size :type variable)(?birth-flow :type variable)(?death-flow :type variable)) :structural-conditions ((size-of ?size ?population) (births-of ?birth-flow ?population)(deaths-of ?death-flow ?population)) :assumptions ((model ?size exponential)) :target-participants ((?birth-rate :type variable :name birth-rate) (?death-rate :type variable :name death-rate)) :postconditions ((== ?birth-flow (* ?birth-rate ?size)) (== ?death-flow (* ?death-rate ?size)))) (defModelFragment logistic-population-growth :source-participants ((?population :type population) (?size :type variable)(?birth-flow :type variable)(?death-flow :type variable)) :structural-conditions ((size-of ?size ?population) (births-of ?birth-flow ?population)(deaths-of ?death-flow ?population)) :assumptions ((model ?size logistic)) :target-participants ((?birth-rate :type variable :name birth-rate) (?death-rate :type variable :name death-rate)(?density :type variable :name total-population)(?capacity :type variable :name capacity)) :postconditions ((== ?birth-flow (* ?birth-rate ?size)) (== ?death-flow (* ?death-rate ?size ?density))(== ?density (C-add (/ ?size ?capacity)))(density-of ?density ?population)(capacity-of ?capacity ?population))) There is one twist in compositional modelling of population growth. Sometimes, the actual growth model is implicitly contained within another type of model. In such cases, the growthphenomenon and the corresponding differential equations are still relevant, but none of the dedicatedgrowth models can be employed. For example, as will be shown later, the Lotka-Volterra predationmodel comes with its own equations describing growth. 535
KEPPENS & SHEN The model fragment other-growth allows for an empty growth model, named other, to be selected. However, due to the purpose-required property that any instance of ?p-change mustbe endogenous, this empty model can only be selected if a growth model is implicitly includedelsewhere. (defModelFragment other-growth :source-participants ((?population :type population) (?size :type variable)(?birth-flow :type variable)(?death-flow :type variable)) :structural-conditions ((size-of ?size ?population) (births-of ?birth-flow ?population)(deaths-of ?death-flow ?population)) :assumptions ((model ?population other))) In addition to population growth, two other phenomena are included in the knowledge base: predation and competition. Predation and competition relations between species are represented bypredicates over the populations: e.g. (predation foxes rabbits) and (competitionsheep cows). However the existence of a phenomenon does not necessarily mean that it must becontained within the model. It would make little sense to model predation and competition withoutmodelling the size of the populations, because models of these phenomena relate population sizesto one another. Therefore, the incorporation of the predation phenomenon is made dependent uponthe existence of variables representing population size. Also, human expert modellers may preferto leave a phenomenon out of the resulting model. To keep this choice open, the following twomodel fragments construct a participant representing the phenomena of predation and competition,and make it dependent upon a relevance assumption: (defModelFragment predation-phenomenon :source-participants ((?predator :type population) (?prey :type population)(?predator-size :type variable)(?prey-size :type variable)) :structural-conditions ((predation ?predator ?prey) (size-of ?predator-size ?predator)(size-of ?prey-size ?prey)) :assumptions ((relevant predation ?predator ?prey)) :target-participant ((?predation-phenomenon :type phenomenon :name predation-phenomenon)) :postconditions ((predation-phenomenon ?predation-phenomenon ?predator ?prey)) :purpose-required ((has-model ?predation-phenomenon))) (defModelFragment competition-phenomenon :source-participants ((?population1 :type population) (?population2 :type population)(?size1 :type variable)(?size2 :type variable)) :structural-conditions ((competition ?population1 ?population2) (size-of ?size1 ?population1)(size-of ?size2 ?population2)) 536
COMPOSITIONAL MODEL REPOSITORIES N B D prey prey = bprey à Nprey prey = dprey à Nprey à Kprey Bprey D = bprey à Nprey prey = pprey à Nprey à Npred d N dt prey = Bprey â Dprey â P d N dt prey = Bprey â Dprey â P s d b ÃNpreyÃNpred prey prey P = 1+sÃNpreyÃth b pprey prey K s prey th N B D pred pred = bpred à Npred pred = dpred à Npred à Kpred N Bpred D pred = ppred à Nprey à Npred pred = dpred à Npred à K d pred N dt pred = Bpred â Dpred d N dt pred = Bpred â Dpred d b pred pred dpred ppred Kpred = k à Nprey (a) Lotka-Volterra predation (b) Holling predation Figure 9: Predation models :assumptions ((relevant competition ?population1 ?population2)) :target-participant ((?competition-phenomenon :type phenomenon :name competition-phenomenon)) :postconditions ((competition-phenomenon ?competition-phenomenon ?population1 ?population2)) :purpose-required ((has-model ?competition-phenomenon))) Both model fragments have a purpose-required property of the form (has-model ?phen). This property expresses the condition that a model must exist with respect to a phenomenon: (defproperty has-model :source-participants ((?p :type phenomenon)):structural-conditions ((is-model-of ?p *)):property (has-model ?p)) The next two model fragments implement such models (thereby satisfying the above has-model purpose-required property) for the predation phenomenon between two populations. They describetwo well-known predation models: the Lotka-Volterra model (1925, 1926), which is shown in Fig-ure 9(a), and the Holling model (1959), which is shown graphically in Figure 9(b). (defModelFragment Lotka-Volterra :source-participants ((?predation-phenomenon :type phenomenon) (?predator :type population)(?predator-size :type stock)(?predator-birth-flow :type flow)(?predator-death-flow :type flow)(?prey :type population)(?prey-size :type stock)(?prey-birth-flow :type flow)(?prey-death-flow :type flow)) :structural-conditions ((predation-phenomenon ?predation-phenomenon ?predator ?prey) 537
KEPPENS & SHEN (size-of ?predator-size ?predator)(births-of ?predator-birth-flow ?predator)(deaths-of ?predator-death-flow ?predator)(size-of ?prey-size ?prey)(births-of ?prey-birth-flow ?prey)(deaths-of ?prey-death-flow ?prey)) :assumptions ((model ?predation-phenomenon lotka-volterra)) :target-participants ((?prey-birth-rate :type variable :name birth-rate) (?predator-factor :type variable :name predator-factor)(?prey-factor :type variable :name prey-factor)(?predator-death-rate :type variable :name death-rate)) :postconditions ((== ?prey-birth-flow (* ?prey-birth-rate ?prey-size)) (== ?predator-birth-flow (* ?predator-factor ?prey-size ?predator-size))(== ?prey-death-flow (* ?prey-factor ?prey-size ?predator-size))(== ?predator-death-flow (* ?predator-death-rate ?predator-size))(is-model-of lotka-volterra ?predation-phenomenon))) As mentioned earlier, the Lotka-Volterra model introduces its own growth model for the prey and predator populations by assigning speciï¬c equations to the variables, which describe changes inthe sizes of the predator and prey populations, ?pred-change and ?prey-change respectively.Thus, it satisï¬es the purpose-required property in the application of the population-growthmodel fragment for the ?prey and ?pred populations. (defModelFragment Holling :source-participants ((?predation-phenomenon :type phenomenon) (?predator :type population)(?predator-size :type stock)(?capacity :type variable)(?prey :type population)(?prey-size :type stock)) :structural-conditions ((predation-phenomenon ?predation-phenomenon ?predator ?prey) (size-of ?predator-size ?predator)(size-of ?prey-size ?prey)(capacity-of ?capacity ?predator)) :assumptions ((model ?predation-phenomenon holling)) :target-participants ((?search-rate :type variable :name search-rate) (?handling-time :type variable :name handling-time)(?prey-requirement :type variable :name prey-requirement)(?predation :type flow :name predation)) :postconditions ((flow ?predation ?prey-size sink) (== ?predation (/ (* ?search-rate ?prey-size ?predator-size) (+ 1 (* ?search-rate ?prey-size ?handling-time)))) (== ?capacity (C-add (* ?prey-requirement ?prey)))(is-model-of holling ?predation-phenomenon))) The Holling model employs a variable denoting the capacity of a population. Such a variable may be introduced by a logistic growth model. In practice, logistic growth models and Hollingpredation models are often used in conjunction. The compositional modeller need not be aware ofsuch combinations of models, however. All it needs to know is the prerequisites of the individualcomponent models contained within each model fragment. 538
COMPOSITIONAL MODEL REPOSITORIES B D N1+w12ÃN2 1 = b1 à N1 1 = d1 à N1 à K1 d N dt 1 = B1 â D1 d b 1 1 w12 K1 B D w21ÃN1+N2 2 = b2 à N2 2 = d2 à N2 à K2 d N dt 2 = B2 â D2 d b 2 2 w21 K2 Figure 10: A species competition model The ï¬nal model fragment in the knowledge base implements a model of competition between two species. It formally describes the competition model type depicted in Figure 10. As this modelfragment contains the only population competition model in the knowledge base, it does not containa model assumption to represent the model. (defModelFragment competition :source-participants ((?competition-phenomenon :type phenomenon) (?population-1 :type population)(?size-1 :type stock)(?density-1 :type variable)(?capacity-1 :type variable)(?population-2 :type population)(?size-2 :type stock)(?density-2 :type variable)(?capacity-2 :type variable)) :structural-conditions ((competition-phenomenon ?competition-phenomenon ?population-1 ?population-2) (density-of ?density-1 ?size-1)(capacity-of ?capacity-1 ?size-1)(density-of ?density-2 ?size-2)(capacity-of ?capacity-2 ?size-2)) :assumptions ((relevant competition ?population-1 ?population-2)) :target-participants ((?weight-12 :type variable :name weight) (?weight-21 :type variable :name weight)) :postconditions ((== ?density-1 (C-add (/ (* ?weight-12 ?size-2) ?capacity-1))) (== ?density-2 (C-add (/ (* ?weight-21 ?size-1) ?capacity-2))))) 539
KEPPENS & SHEN Stock + relevant Growth Flows growth predator Predator Exponential model predator Exponential model is exponential Logistic model predator predation Logistic model is logistic predator,prey1 "Other" model predator relevant otherâgrowth model is other predation predator,prey1 predationâphen: LotkaâVolterra model comp. Predation LotkaâVolterra predator,prey1 Prey1 model is lotkaâvolterra Stock + relevant Growth Holling model comp. Flows Holling growth prey1 model is holling Exponential model predator otherâgrowth model is other Logistic model predator predation Logistic model is logistic predator,prey2 "Other" model predator relevant Exponential model is exponential predation predator,prey2 predationâphen: LotkaâVolterra model comp. Predation LotkaâVolterra predator,prey2 Prey2 model is lotkaâvolterra Stock + relevant Growth Holling model comp. Flows Holling growth prey2 model is holling Exponential model predator otherâgrowth model is other Logistic model predator Logistic model is logistic "Other" model predator Exponential model is exponential relevant competitionâphen: competition Competition prey1,prey2 prey1,prey2 competition prey1,prey2 Figure 11: Model space for the 1 predator and 2 competing prey scenario 4.2 Model space A model space is constructed when the knowledge base is instantiated with respect to a given sce-nario. Consider for example the following scenario, which describes a predator population thatpreys on two other populations, prey1 and prey2, whilst the two prey populations compete withone another: (defScenario pred-prey-prey-scenario :entities ((predator :type population) (prey1 :type population)(prey2 :type population)) :relations ((predation predator prey1) (predation predator prey2)(competition prey1 prey2))) The full speciï¬cation of the model space is too unwieldy to present here but an abstract graphical representation of the model space for this scenario is shown in Figure 11. This model space containsthe following knowledge: ⢠From each of the three populations in the scenario, a set of three population growth models (i.e. exponential, logistic and other) is derived. This inference is dependent upona relevance assumption of the population growth phenomenon, and a model assumption thatcorresponds to one of the three population growth models. 540
COMPOSITIONAL MODEL REPOSITORIES ⢠From both predation relations (i.e. (predation predator prey1) and (predation predator prey2)), and the populations related by them, a set of two predation models(i.e. Lotka-Volterra and Holling) is derived. This inference is dependent upon a rel-evance assumption of the predation phenomenon and a model assumption that corresponds toone of the two predation models. ⢠From the competition relation (competition prey1 prey2), and the populations re- lated by it, a competition model is derived. Because there is only one competition model,the inference of the competition model is only dependent upon a relevance assumption thatcorresponds to the competition phenomenon. In addition to the hypergraph of Figure 11, the model space also contains a number of constraints on the conjunctions of assumptions that are consistent. As explained earlier, these stem from twosources: 1) non-composable relations and 2) purpose-required properties. An example will be givenof each type. Let predation-phen-1 be the predation phenomenon between predator and prey1, and prey1-size be the variable representing the size of the prey1 population. In this ex-ample, the model fragments exponential-population-growth and Lotka-Volterrawill each generate an equation for computing the value of a variable representing the change inprey1-size. Because both equations can not be composed, the following inconsistency is gen-erated: (relevant growth prey1) â§ (model prey1-size exponential)â§ (relevant growth predator) â§ (relevant predation predator prey1)â§ (model predation-phen-1 lotka-volterra) â ⥠Inconsistencies also arise from purpose-required properties. For example, if the model frag- ment predation-phenomenon is applicable and the predation relation is deemed relevant, thenthe purpose-required property (has-model ?pred-phen) will become a condition for consis-tency. Under certain combinations of assumptions, this property may not be satisï¬ed. Say, when theHolling predation and exponential growth models are both selected, the Holling model is not gener-ated because there is no ?capacity for which (capacity ?capacity ?pred) is true. Nopredation model is created in this case (because the Holling model fragment can not be instanti-ated), even though the predation phenomenon is deemed relevant under this set of assumptions. Thisis inconsistent with the has-model purpose-required property in the predation-phenomenonmodel fragment, and the responsible combination of assumptions is therefore marked as nogood. (relevant growth predator) â§ (model predator-size exponential)â§ (relevant growth prey1) â§ (model prey1-size exponential)â§ (relevant predation predator prey1) â§ (model predation-phen-1 holling) â ⥠4.3 aDPCSP and solution The resultant model space is translated into an aDCSP to enable the selection of a consistent set ofassumptions, using advanced CSP solution techniques. The aDCSP derived from the above modelspace is depicted in Figure 12. 541
KEPPENS & SHEN Attribute Meaning x1 (relevant growth prey1) x2 (relevant growth prey2) x3 (relevant growth predator) x4 (relevant predation predator prey1) x5 (relevant predation predator prey2) x6 (relevant competition prey1 prey2) x7 (model size-1 *) x8 (model size-2 *) x9 (model size-3 *) x10 (model predation-phen-1 *) x11 (model predation-phen-2 *) Table 4: Attribute list Domain Content Meaning D1 d1,y, d1,n population,none D2 d2,y, d2,n population,none D3 d3,y, d3,n population,none D4 d4,y, d4,n (population,population),none D5 d5,y, d5,n (population,population),none D6 d6,y, d6,n (population,population),none D7 d7,l, d7,e, d7,o logistic,exponential,other D8 d8,l, d8,e, d8,o logistic,exponential,other D9 d9,l, d9,e, d9,o logistic,exponential,other D10 d10,h, d10,lv Holling,Lotka-Volterra D11 d11,h, d11,lv Holling,Lotka-Volterra Table 5: The aDCSP for the 1 predator and 2 competing prey scenario: domains and their contents and meaning This aDCSP contains 11 attributes. They are listed with the corresponding assumption classes in table 4. The ï¬rst 6 attributes correspond to the notion of relevance phenomenon: 3 populationgrowth phenomena, 2 predation phenomena and 1 competition phenomenon to be precise. The other5 attributes correspond to 5 sets of model types: 3 sets of population growth models and 2 sets ofpredation models. The assumptions from which the attributes were generated form domains of values. The result- ing domains of the aforementioned attributes are summarised in table 5. The activity constraints in the aDCSP describe the conditions that instantiate the subject of the assumptions that correspond to an attribute. Since each participant or relation has a label in themodel space, a minimal set of assumptions under which it becomes part of the emerging modelis available. When a participant or relation is the subject of an assumption, this label explicitlydescribes the sets of assumptions under which the attribute that corresponds to that subject should 542
COMPOSITIONAL MODEL REPOSITORIES x x x x x x 1 4 6 2 5 3 d d d d d d d d d d d d 1 1 ,y ,n 4,y 4,n 6,y 6,n 2,y 2,n 5,y 5,n 3,y 3,n x x x x x 7 10 8 11 9 d d d d d d d d d d d d d 7 7 7 10 10 8 8 8 11 11 9 9 ,l ,e ,o ,lv ,h ,l ,e ,o ,lv ,h ,l ,e 9,o attribute value compatibility constraint activity constraint Figure 12: aDCSP derived from the models space reï¬ecting the 1 predator and 2 competing prey scenario be activated. By translating the label of a subject into sets of attribute-value assignments, the an-tecedents of the activity constraints are constructed. In this example, the relevance assumptions (attributes x1, . . . , x6) take their subjects from the scenario, and hence, they are always active. The attributes related to the model assumptions forpopulation growth are active if the corresponding assumptions denoting relevance of populationgrowth are true. That is, x1 : d1,y â active(x7)x2 : d2,y â active(x8)x3 : d3,y â active(x9) The attributes related to the assumptions about the predation models are active if the correspondingassumptions denoting relevance of predation, and the assumptions describing relevance of popula-tion growth, are true for the populations involved in the predation relation. That is, x1 : d1,y â§ x3 : d3,y â§ x4 : d4,y â active(x10)x2 : d2,y â§ x3 : d3,y â§ x5 : d5,y â active(x11) Figure 12 shows a graphical representation of these activity constraints. The compatibility constraints correspond directly to the inconsistencies in the nogood node. These inconsistencies have been discussed in the previous section and are depicted in Figure 12. Once the aDCSP is constructed, preferences may be attached to attribute-value assignments. Suppose that preferences are only assigned to the standard population modelling choices, i.e. expo- 543
KEPPENS & SHEN Attribute Preference assignments x1, . . . , x5 no preference assignments x6 P (x6 : d6,y) = pcompetition x7 P (x7 : d7,l) = plogistic, P (x7 : d7,e) = pexponential x8 P (x8 : d8,l) = plogistic, P (x8 : d8,e) = pexponential x9 P (x9 : d9,l) = plogistic, P (x9 : d9,e) = pexponential x10 P (x10 : d10,h = pholling, P (x10 : d10,lv) = plotka-volterra x11 P (x11 : d11,h = pholling, P (x11 : d11,lv) = plotka-volterra Table 6: Preference assignments for the 1 predator and 2 competing prey problem nential growth, logistic growth, lotka-volterra predation and holling predation, and to the relevanceof competition (because only one type model has been implemented for this phenomenon). Forexample, the following BPQs could be employed: pexponential < plogistic plotka-volterra < pholling pcompetition The logistic and Holling models are preferred over the exponential and Lotka-Volterra models be-cause the former are generally regarded as being more accurate. Note that the preferences havebeen ordered in such a way that those corresponding to different phenomena are not related to oneanother. The justiï¬cation for this ordering is that, even though the models are structurally connected(there are restrictions over which models can combined with one another), models of different phe-nomena inherently describe behaviours that can not be compared with one another. The preferenceassignments for attribute value assignments are summarised in table 6. Solving this aDPCSP is simple. First, the attributes x1, . . . , x6 are activated. Each of these attributes is assigned xi : di,y because that assignment maximises the potential preference. Then,the attributes x7, . . . , x11 are activated. Here, attributes x7, . . . , x9 are assigned xi : di,l because thelogistic growth model has the highest preference. Finally, x10 and x11 are assigned x10 : d10,h andx11 : d11,h because the Holling models have the highest preference and are not inconsistent with thelogistic model committed earlier. The resulting solution satisï¬es the following set of assumptions: (relevant growth prey1), (relevant growth prey2), (relevant growth predator), (relevant competition prey1 prey2), (relevant predation predator prey1), (relevant predation predator prey2), (model size-1 logistic), (model size-2 logistic), (model size-3 logistic), (model predation-phen-1 holling), (model predation-phen-2 holling) 544
COMPOSITIONAL MODEL REPOSITORIES SYMBOLS Assumptions in the Stock + relevant aDPCSP solution Growth Flows growth predator Nodes entailed by theaDPCSP solution Predator Exponential model predator Exponential Nodes NOT entailed by the model is exponential aDPCSP solution Applied model fragment Logistic model predator predation Logistic model is logistic predator,prey1 Model fragment that isnot applied "Other" model predator relevant otherâgrowth model is other predation predator,prey1 predationâphen: LotkaâVolterra model comp. Predation LotkaâVolterra predator,prey1 Prey1 model is lotkaâvolterra Stock + relevant Growth Holling model comp. Flows Holling growth prey1 model is holling Exponential model predator otherâgrowth model is other Logistic model predator predation Logistic model is logistic predator,prey2 "Other" model predator relevant Exponential model is exponential predation predator,prey2 predationâphen: LotkaâVolterra model comp. Predation LotkaâVolterra predator,prey2 Prey2 model is lotkaâvolterra Stock + relevant Growth Holling model comp. Flows Holling growth prey2 model is holling Exponential model predator otherâgrowth model is other Logistic model predator Logistic model is logistic "Other" model predator Exponential model is exponential relevant competitionâphen: competition Competition prey1,prey2 prey1,prey2 competition prey1,prey2 Figure 13: Deducing a scenario model from the model space, given a set of assumptions 4.4 Sample scenario model Figure 13 shows how a scenario model can be deduced from the above set of assumptions by ex-ploiting the model space. The nodes corresponding to the aforementioned assumptions and thosethat logically follow from the assumption set are indicated in the Figure. When combining the participants and relations in the resulting scenario model, the model given in Figure 14 can be drawn. This model corresponds to the one that an ecologist would draw if thelogistic growth and Holling predation models were regarded to be appropriate for the task at hand. 5. Conclusion and Future Work This article has presented a novel approach to compositional modelling that enables the constructionof models of ecological systems. This work differs from existing approaches in that it automaticallytranslates the compositional modelling problem into an aDCSP with (order-of-magnitude) prefer-ence valuations. There are several beneï¬ts to this method. The use of a translation algorithm that converts the compositional modelling problem into an aDCSP allows criteria to be formalised. More importantly, it also enables efï¬cient, existing andfuture, aDCSP solution techniques to be effectively applied to solving compositional modellingproblems. 545
Growth Growth B D B D 1 = b1 à N1 1 = d1 à N1 à δ1 2 = b2 à N2 2 = d2 à N2 à δ2 d d N N dt 1 = B1 â D1 â P31 dt 2 = B2 â D2 â P32 Holling Holling P s31ÃN1ÃN3 31 = d s32ÃN2ÃN3 d 1+s P 31 1 b ÃN1Ãth,31 b 2 1 2 32 = 1+s32ÃN2Ãth,32 s s 31 32 K N1 w12ÃN2 K 1 δ + 2 δ w21ÃN1 N2 1 = 2 = + K1 K1 K2 K2 NE t t h,31 h,32 HS Logistic K3 Logistic &S 546 N δ N3 E 3 = K3 PPEK Growth B D 3 = b3 à N3 3 = d3 à N3 à δ3 d N dt 3 = B3 â D3 d b 3 3 Logistic Figure 14: Sample scenario model for the 1 predator and 2 competing prey scenario
COMPOSITIONAL MODEL REPOSITORIES The extension of the aDCSPs with (order-of-magnitude) preferences (to form aDPCSPs) also permits the incorporation of softer requirements in the compositional modelling problem. In thispaper, order-of-magnitude preferences have been employed to express the appropriateness of alter-native model types for certain phenomena. While such considerations may be described by hardconstraints in the physical systems domain3, they are more subjective in less understood problemdomains, such as the ecological modelling domain. The approach presented herein provides a meansto capture and represent the subtlety of the ï¬exible model design decisions. The theoretical ideas presented in this article have been applied to real-world ecological mod- elling problems. In this paper, it has been demonstrated how the resultant compositional modellercan be employed to create a repository of population dynamics models. The approach has also beenapplied to automated model construction of large and complex ecosystems such as the MODMEDmodel of Mediterranean vegetation (Legg et al., 1995), as reported by Keppens (2002). There are some practical and theoretical issues that need to be addressed, however. On the prac- tical side, the types of ecological model design decisions, as represented by the assumptions andassumption classes, and as supported by the inference mechanisms, should be extended. Ecologicalsystems tend to involve interrelated populations of individuals, instead of functional compositions ofindividual components as with physical systems. One particularly important type of design decisionin ecological modelling is therefore granularity. This requires the introduction of novel representa-tion formalisms and inference mechanisms such as aggregation and disaggregation. Initial work forconsidering populations as single entities and for dividing such entities into sub-populations whennecessary has been carried out (Keppens & Shen, 2001a). Integration of such work into the presentaDPCSP framework requires further investigation. On the theoretical side, the analysis of the complexity of the present approach is rather informal. Much remains to be done in this regard, especially when comparing to the complexity of existingcompositional modellers. For this comparison, additional work will be required to adapt the cur-rent translation procedure to suit existing compositional modelling problems. Most compositionalmodellers are of exponential complexity, however. As they employ problem-speciï¬c solution algo-rithms, little is known about opportunities for improving their efï¬ciency. This work hopes to be aï¬rst step toward further understanding this important issue. Acknowledgments This work is partly supported by the UK-EPSRC grant GR/S63267. The ï¬rst author has also beensupported by a College of Science and Engineering scholarship at the University of Edinburgh.We are very grateful to Robert Muetzelfeldt for helpful discussions and assistance in the researchreported, whilst taking the full responsibility of the views expressed here. Thanks also go to theanonymous referees for their constructive comments which are very useful in revising the earlierversion of this paper. References Binger, B., & Hoffman, E. (1998). Microeconomics with Calculus. Longman. 3. These are the so-called operating conditions, stating the range of values of certain variables within which the use of certain assumptions is permitted. 547
KEPPENS & SHEN Bistarelli, S., Montanari, U., & Rossi, F. (1997). Semiring-based constraint satisfaction and opti- mization. Journal of the ACM, 44(2), 201â236. Bobrow, D., Falkenhainer, B., Farquhar, A., Fikes, R., Forbus, K., Gruber, T., Iwasaki, Y., & Kuipers, B. (1996). A compositional modeling language. In Proceedings of the 10th InternationalWorkshop on Qualitative Reasoning about Physical Systems, pp. 12â21. Bradley, E., Easley, M., & Stolle, R. (2001). Reasoning about nonlinear system identiï¬cation. Artiï¬cial Intelligence, 133, 139â188. Dague, P. (1993a). Numeric reasoning with relative orders of magnitude. In Proceedings of the National Conference on Artiï¬cial Intelligence, pp. 541â547. Dague, P. (1993b). Symbolic reasoning with relative orders of magnitude. In Proceedings of the 13th International Joint Conference on Artiï¬cial Intelligence, pp. 1509â1514. de Kleer, J. (1986). An assumption-based TMS. Artiï¬cial Intelligence, 28, 127â162. de Kleer, J. (1988). A general labeling algorithm for assumption-based truth maintenance. In Proceedings of the 7th National Conference on Artiï¬cial Intelligence, pp. 188â192. Easley, M., & Bradley, E. (1999). Generalized physical networks for automated model building. In Proceedings of the 16th International Joint Conference on Artiï¬cial Intelligence, pp. 1047â1053. Falkenhainer, B., & Forbus, K. (1991). Compositional modeling: ï¬nding the right model for the job. Artiï¬cial Intelligence, 51, 95â143. Ford, A. (1999). Modeling the Environment - An Introduction to System Dynamics Modeling of Environmental Systems. Island Press. Forrester, J. (1968). Principles of Systems. Wright-Allen Press, Cambridge, MA, USA. Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimal cost paths. IEEE Transactions on Systems, Science and Cybernetics, SSC-4(2), 100â107. Heller, U., & Struss, P. (1998). Diagnosis and therapy recognition for ecosystems - usage of model- based diagnosis techniques. In Proceedings of the 12th International Symposium âComputerScience for Environment Protectionâ. Heller, U., & Struss, P. (2001). Transformation of qualitative dynamic models - application in hydro- ecology. In Hotz, L., Struss, P., & Guckenbienl, T. (Eds.), Intelligent Diagnosis in IndustrialApplications, pp. 95â106. Shaker Verlag. Holling, C. (1959). Some characteristics of simple types of predation and parasitism. Canadian Entomologist, 91, 385â398. Karnopp, D., Margolis, D., & Rosenberg, R. (1990). System Dynamics: A United Approach (Second Edition edition). John Wiley & Sons, Inc. Keppens, J. (2002). Compositional Ecological Modelling via Dynamic Constraint Satisfaction with Order-of-Magnitude Preferences. Ph.D. thesis, The University of Edinburgh. Keppens, J., & Shen, Q. (2001a). Disaggregation in compositional modelling of ecological systems via dynamic constraint satisfaction. In Proceedings of the 15th International Workshop onQualitative Reasoning about Physical Systems, pp. 21â28. 548
COMPOSITIONAL MODEL REPOSITORIES Keppens, J., & Shen, Q. (2001b). On compositional modelling. Knowledge Engineering Review, 16(2), 157â200. Keppens, J., & Shen, Q. (2002). On supporting dynamic constraint satisfaction with order of mag- nitude preferences. In Proceedings of the 16th International Workshop on Qualitative Rea-soning about Physical Systems, pp. 75â82. Langley, P., Sanchez, J., Todorovski, L., & DËzeroski, S. (2002). Inducing process models from continuous data. In Proceedings of the 19th International Conference on Machine Learning,pp. 347â354. Legg, C., Muetzelfeldt, R., & Heathï¬eld, D. (1995). Modelling vegetation dynamics in mediter- ranean ecosystems: Issues of scale. In Proceedings of the 39th Symposium of the InternationalAssociation for Vegetation Science. Levy, A., Iwasaki, Y., & Fikes, R. (1997). Automated model selection for simulation based on relevance reasoning. Artiï¬cial Intelligence, 96, 351â394. Lotka, A. (1925). Elements of physical biology. Williams & Wilkins Co., Baltimore. Malthus, T. (1798). An essay on the principle of population. Printed for J. Johnson in St. Paulâs Church Yard, London, England. Miguel, I., & Shen, Q. (1999). Hard, ï¬exible and dynamic constraint satisfaction. Knowledge Engineering Review, 14(3), 199â220. Miguel, I., & Shen, Q. (2001a). Solution techniques for constraint satisfaction problems: Advanced approaches. Artiï¬cial Intelligence Review, 15(4), 269â293. Miguel, I., & Shen, Q. (2001b). Solution techniques for constraint satisfaction problems: Founda- tions. Artiï¬cial Intelligence Review, 15(4), 243â267. Minton, S., Johnston, M., Philips, A., & Laird, P. (1992). Minimizing conï¬icts: A heuristic repair method for constraint satisfaction and scheduling problems. Artiï¬cial Intelligence, 58, 161â205. Mittal, S., & Falkenhainer, B. (1990). Dynamic constraint satisfaction problems. In Proceedings of the 8th National Conference on Artiï¬cial Intelligence, pp. 25â32. Nayak, P., & Joskowicz, L. (1996). Efï¬cient compositional modeling for generating causal expla- nations. Artiï¬cial Intelligence, 83, 193â227. Nicholson, A., & Bailey, V. (1935). The balance of animal populations. Proceedings of the Zoolog- ical Society of London, 1, 551â598. Raphael, B. (1990). A* algorithm. In Shapiro, S.C. (Ed.), Encyclopedia of Artiï¬cial Intelligence, Vol. 1, pp. 1â3. John Wiley & Sons. Rickel, J., & Porter, B. (1997). Automated modeling of complex systems to answer prediction questions. Artiï¬cial Intelligence, 93, 201â260. Rogers, D. (1972). Random search and insect population models. Journal of Animal Ecology, 41, 369â383. Schiex, T., Fargier, H., & Verfaillie, G. (1995). Valued constraint satisfaction problems: Hard and easy problems. In Proceedings of the 14th International Joint Conference on Artiï¬cial Intel-ligence, pp. 631â637. 549
KEPPENS & SHEN Thompson, W. (1929). On the relative value of parasites and predators in the biological control of insect pests. Bull. Etnomol. Res., 19, 343â350. Todorovski, L., & DËzeroski, S. (1997). Declarative bias in equation discovery. In Proceedings of the 14th International Conference on Machine Learning, pp. 432â439. Todorovski, L., & DËzeroski, S. (2001). Using domain knowledge on population dynamics mod- eling for equation discovery. In Proceedings of the 12th European Conference on MachineLearning, pp. 478â490. Tsang, E. (1993). Foundations of Constraint Satisfaction. Academic Press, London and San Diego. Verfaillie, G., & Schiex, T. (1994). Solution reuse in dynamic constraint satisfaction problems. In Proceedings of the 12th National Conference on Artiï¬cial Intelligence, pp. 307â312. Verhulst, P. (1838). Recherches math´ematiques sur la loi dâaccroissement de la population. Nou- veaux m´emoires de lâacad´emie royale des sciences et belles-lettres de Bruxelles, 18, 1â38. Volterra, V. (1926). Fluctuations in the abundance of a species considered mathematically. Nature, 118, 558â560. 550