R. Sebastiani

Volume 1, 1994

**Links to Full Text:**

In this paper we describe how to modify GSAT so that it can be applied to non-clausal formulas. The idea is to use a particular ``score'' function which gives the number of clauses of the CNF conversion of a formula which are false under a given truth assignment. Its value is computed in linear time, without constructing the CNF conversion itself. The proposed methodology applies to most of the variants of GSAT proposed so far.

Journal of Artificial Intelligence Research 1 (1994) 309-314 Submitted 1/94; published 6/94 Research Note Applying GSAT to Non-Clausal Formulas Roberto Sebastiani rseba@dist.unige.it Mechanized Reasoning Group, DIST, viale Causa 13, 16145 Genova, Italy. Mechanized Reasoning Group, IRST, loc. Pant`e, 38050 Povo, Trento, Italy. Abstract In this paper we describe how to modify GSAT so that it can be applied to non-clausal formulas. The idea is to use a particular "score" function which gives the number of clauses of the CNF conversion of a formula which are false under a given truth assignment. Its value is computed in linear time, without constructing the CNF conversion itself. The proposed methodology applies to most of the variants of GSAT proposed so far. 1. Introduction GSAT (Selman, Levesque, & Mitchell, 1992; Selman & Kautz, 1993) is an incomplete model-finding algorithm for clausal propositional formulas which performs a randomized local search. GSAT has been shown to solve many "hard" problems much more efficiently than other traditional algorithms like, e.g., DP (Davis & Putnam, 1960). Since GSAT applies only to clausal formulas, using it to find models for ordinary propositional formulas requires some previous clausal-form conversion. This requires extra computation (which can be extremely heavy if the "standard" clausal conversion is used). Much worse, clausal-form conversion causes either a large increase in the size of the input formula or an enlargement of the search space. In this paper we describe how to modify GSAT so that it can be applied to non-clausal formulas directly, i.e., with no previous clausal form conversion. An extended version of the paper (Sebastiani, 1994) provides the proofs of the theorems and a detailed description of the algorithm introduced. This achievement could enlarge GSAT's application domain. Selman et al. (1992) suggest that some traditional AI problems can be formulated as model-finding tasks; e.g., visual interpretation (Reiter & Mackworth, 1989), planning (Kautz & Selman, 1992), generation of "vivid" knowledge representation (Levesque, 1986). It is often the case that non-clausal representations are more compact for such problems. For instance, each rule in the form "Vi OEi oe " gives rise to several distinct clauses if some OEi are disjuncts or is a conjunct. In automated theorem proving (a.t.p.) some applications of model-finding have been proposed (see, e.g., (Artosi & Governatori, 1994; Klingerbeck, 1994)). For instance, some decision procedures for decidable subclasses of first-order logic iteratively perform nonclausal model-finding for propositional instances of the input formulas (Jeroslow, 1988). More generally, some model-guided techniques for proof search, like goal deletion (Ballantyne & Bledsoe, 1982), false preference, or semantic resolution (Slaney, 1993), seem to be applicable to non-clausal a.t.p. as well. cfl1994 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved. Sebastiani procedure GSAT(OE) for j := 1 to Max-tries do T := initial(OE) for k := 1 to Max-flips do if T j= OE then return T else Poss-flips := hill-climb(OE; T ) V := pick(Poss-flips) T := flip(V,T) UpdateScores(OE; V ) end end return "no satisfying assignment found". Figure 1: A general schema for GSAT. 2. GSAT If OE is a clausal propositional formula and T is a truth assignment for the variables of OE, then the number of clauses of OE which are falsified by T is called the score of T for OE (score(T; OE)). T satisfies OE iff score(T; OE) = 0. The notion of score plays a key role in GSAT, as it is considered as the "distance" from a truth assignment to a satisfying one. The schema of Figure 2 describes GSAT as well as many of its possible variants. We use the notation from (Gent & Walsh, 1993). GSAT performs an iterative search for a satisfying truth assignment for OE, starting from a random assignment provided by initial(). At each step, the successive assignment is obtained by flipping (inverting) the truth value of one single variable V in T . V is chosen to minimize the score. Let Ti be the assignment obtained from T by flipping its i-th variable Vi. hill-climb() returns the set Poss-flips of the variables Vr which minimize score(Tr; OE). If the current values of Delta si = score(Ti; OE) Gamma score(T; OE) are stored for every variable Vi, then hill-climb() simply returns the set of the variables Vr with the best Delta sr. pick() chooses randomly one of such variables. flip() returns T with V 's value flipped. After each flipping, UpdateScores() updates the values of Delta si, for all i. This paper exploits the observation that the functions initial(), hill-climb(), pick() and flip() do not depend on the structure of the input formula OE, and that the computation of the scores is the only step where the input formula OE is required to be in clausal form. The idea is thus to find a suitable notion of score for non-clausal formulas, and an efficient algorithm computing it. 3. An extended notion of score Let cnf(') be the result of converting a propositional formula ' into clausal form by the standard method (i.e., by applying the rules of De Morgan). Then the following definition extends the notion of score to all propositional formulas. Definition 3.1 The score of a truth assignment T for a propositional formula ' is the number of the clauses of cnf(') which are falsified by T . 310 Applying GSAT to Non-Clausal Formulas ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . ... ... ... ... . m Delta Delta Delta AA A [1,-] E -C F[1,-][0,-] [0,-] ... ... ... ... . ... ... ... ... . m m m m m m m m m m m m m m Delta Delta Delta AA AA SS S Omega Omega Omega Theta Theta Theta CC C "" """ . '' ' Theta Theta Theta BB B Delta Delta Delta JJ J AA A Theta Theta Theta BB B bb bb ,, ,, EE E Delta Delta Delta AA A ## ## SS SS @@ @ Theta Theta Theta AA A iiii iiii i PPPP PPPP PP ... ... ... ... .... ... ... ... . ... ... ... ... .... ... ... ... . ... ... ... ... .... ... ... ... . .. ... ... ... ...... ... ... ... Delta Delta Delta AA A .................. ... .... ... ..... ... ... ... ... ... ...................................... ... ... ... ... .... .... .... .... .... .. -D -B [1,-][1,-] [1,-] [0,-] [1,-] [1,-] [1 , 0][0,1][1 , 0] [0,1] [0,1] [0,-] [1,-] [2,-] [4,-] [7,-] [14,-] [2,-] [2,0] [2,-] [2,-] [2,-] [1,-] [2,-] -C[1,-] [1,-] A[1,-] [1,-] -A -B C -E -F -D A -E C F D -A -F D -B[1,-] [0,-] [1,-] [0,1] [1,-] [1,-][0,-] [0,-] [1,-] -E B -F D Figure 2: The computation tree of s(T; '). cnf() represents the "natural" clausal form conversion. cnf(') has the same number of propositional variables as ' and it is logically equivalent to '. The problem with cnf() is the exponential size growth of cnf('), that is, jcnf (')j = O(2j'j). Definition 3.1 overcomes such a problem, for it is possible to introduce a linear-time computable function s(T; ') which gives the score of T for a formula '. This is done directly, i.e., without converting ' into clausal form. We define s(T; ') recursively as follows: 1 ' s(T; ') sGamma (T; ') ' literal ( 0 if T j= '1 otherwise ( 1 if T j= '0 otherwise :'1 sGamma (T; '1) s(T; '1)V k 'k Pk s(T; 'k) Qk sGamma (T; 'k)W k 'k Qk s(T; 'k) Pk sGamma (T; 'k) '1 oe '2 sGamma (T; '1) Delta s(T; '2) s(T; '1) + sGamma (T; '2) '1 j '2 s Gamma (T; '1) Delta s(T; '2)+ s(T; '1) Delta sGamma (T; '2) (s(T; '1) + sGamma (T; '2))Delta (sGamma (T; '1) + s(T; '2)) sGamma (T; 'k) is s(T; :'k). The distinction between s(T; 'k) and sGamma (T; 'k) is due to the polarity of the current subformula 'k. During the computation of s(T; '), a call to the function s(T; 'j) [sGamma (T; 'j)] is invoked iff 'j is a positive [negative] subformula of '. Example 3.1 Figure 2 represents the computation tree of the score of a truth assignment T for the formula ' : (((:A ^ :B ^ C) . :D . (:E ^ :F )) ^ :C ^ ((:D ^ A ^ :E) j (C ^ F ))). (D ^ :E ^ B) . (((D ^ :A) . (:F ^ D ^ :B) . :F ) ^ A ^ ((E ^ :C ^ F ) . :B)): T assigns "true" to all the variables of '. The information in square brackets associated to any subformula 'j represents [s(T; 'j); sGamma (T; 'j)]. For instance, if we consider the small subtree in the left of Figure 2, then the score is computed in the following way: 1. Notice that the definition of s(T; ') can be easily extended to formulas involving other connectives (e.g., nand, nor, xor, if-then-else, : : : ) or more complicate boolean functions. 311 Sebastiani s(T; (:A ^ :B ^ C) . :D . (:E ^ :F ) ) = ; s(T; Wk 'k) = Qk s(T; 'k) s(T; :A ^ :B ^ C) Delta s(T; :D) Delta s(T; :E ^ :F ) = ; s(T; Vk 'k) = Pk s(T; 'k) (s(T; :A) + s(T; :B) + s(T; C)) Delta s(T; :D) Delta (s(T; :E) + s(T; :F )) = ; literals (1 + 1 + 0) Delta 1 Delta (1 + 1) = 4: Notice that cnf (') is 360 clauses long. 2 Theorem 3.1 Let ' be a propositional formula and T a truth assignment for the variables of '. Then the function s(T; ') gives the score of T for '. The proof follows from the consideration that, for any truth assignment T , the set of the false clauses of cnf('1 . '2) is the cross product between the two sets of the false clauses of cnf('1) and cnf('2). Theorem 3.2 Let ' be a propositional formula and T a truth assignment for the variables of '. Then the number of operations required for calculating s(T; ') grows linearly with the size of '. The proof follows from the fact that, if T ime(sSigma ('i; T )) is the number of operations required for computing both s(T; 'i) and sGamma (T; 'i), and if T ime(sSigma ('i; T )) ^ ai Delta j'ij + bi, then T ime(sSigma ('1 Pi '2; T )) ^ maxi(ai) Delta j'1 Pi '2j + 2 Delta maxi(bi) + 6, for any Pi 2 f^; .; oe; j g. The number of operations required for computing the score of an assignment T for a clausal formula OE is O(jOEj). If OE = cnf ('), then jOEj = O(2j'j). Thus the standard computation of the score of T for OE requires O(2j'j) operations, while s(T; ') performs the same result directly in linear time. 4. GSAT for non-clausal formulas It follows from Sections 2, 3 that we can extend GSAT to non-clausal formulas ' by simply using the extended notion of score of Definition 3.1. Let NC-GSAT (non-clausal GSAT) be a new version of GSAT in which the scores are computed by some implementation of the function s(). Then it follows from Theorem 3.1 that in NC-GSAT(') the function hillclimb() always returns the same sets of variables as in GSAT(cnf(')), so that NC-GSAT(') performs the same flips and returns the same result as GSAT(cnf(')). Theorem 3.2 ensures that every score computation is performed in linear time. The current implementation of GSAT (Selman & Kautz, 1993) provides a highlyoptimized implementation of Updatescores(OE; V ), which analyzes only the clauses which the last-flipped variable V occurs in. This allows a strong reduction in computational cost. In (Sebastiani, 1994) we describe in detail an analogous optimized version of the updating procedure for NC-GSAT, called NC-Updatescores('; V ), and prove the following properties: (i) if ' is in clausal form, i.e., ' = cnf ('), then NC-UpdateScores('; V ) has the same complexity as UpdateScores('; V ); (ii) if OE = cnf ('), then NC-UpdateScores('; V ) is O(j'j). UpdateScores(OE; V ) is O(2j'j). The latter mirrors the complexity issues presented in Section 3. 312 Applying GSAT to Non-Clausal Formulas The idea introduced in this paper can be applied to most variants of GSAT. In "CSAT" (Cautious SAT) hill-climb() returns all the variables which cause a decrease of the score; in "DSAT" (Deterministic SAT) the function pick() performs a deterministic choice; in "RSAT" (Random walk SAT) the variable is picked randomly among all the variables; in "MSAT" (Memory SAT) pick() remembers the last flipped variable and avoids picking it. All these variants, proposed in (Gent & Walsh, 1992, 1993), can be transposed into NCGSAT as well, as they are independent of the structure of the input formula. Selman and Kautz (1993) suggest some variants which improve the performance and overcome some problems, such as that of escaping local minima. The strategy "Averaging in" suggests a different implementation of the function initial(): instead of a random assignment, initial() returns a bitwise average of the best assignments of the two latest cycles. This is independent of the form of the input formula. In the strategy "random walk " the sequence hill-climb() - pick() is substituted with probability p by a simpler choice function: "choose randomly a variable occurring in some unsatisfied clause". This idea can be transposed into NC-GSAT as well: "choose randomly a branch passing only for nodes whose score is different from zero, and pick the variable at the leaf". One final observation is worth making. In order to overcome the exponential growth of CNF formulas, some algorithms have been proposed (Plaisted & Greenbaum, 1986; de la Tour, 1990) which convert propositional formulas ' into polynomial-size clausal formulas . Such methods are based on the introduction of new variables, each representing a subformula of the original input '. Unfortunately, the issue of size-polynomiality is valid only if no "j" occurs in ', as the number of clauses of grows exponentially with the number of "j" in '. Even worse, the introduction of k new variables enlarges the search space by a 2k factor and reduces strongly the solution ratio. In fact, any model for is also a model for ', but for any model of ' we only know that one of its 2k extensions is a model of (Plaisted & Greenbaum, 1986). Acknowledgements Fausto Giunchiglia and Enrico Giunchiglia have given substantial and continuous feedback during the whole development of this paper. Toby Walsh provided important feedback about a previous version of this paper. Aaron Noble, Paolo Pecchiari, and Luciano Serafini helped with the final revision. Bart Selman and Henry Kautz are thanked for assistance with the GSAT code. References Artosi, A., & Governatori, G. (1994). Labelled Model Modal Logic. In Proc. of CADE12 Workshop on Automated Model Building. Ballantyne, M., & Bledsoe, W. (1982). On Generating and Using Examples in Proof Discovery. In Michie, D. (Ed.), Machines intelligence, Vol. 10, pp. 3-39. Halsted Press. Davis, M., & Putnam, H. (1960). A computing procedure for quantification theory. Journal of the ACM, 7, 201-215. 313 Sebastiani de la Tour, T. B. (1990). Minimizing the Number of Clauses by Renaming. In Proc. of the 10th Conference on Automated Deduction, pp. 558-572. Springer-Verlag. Gent, I. P., & Walsh, T. (1992). The Enigma of SAT Hill-climbing Procedures. Tech. rep. 605, University of Edinburgh, Dept. of Artificial Intelligence. Gent, I. P., & Walsh, T. (1993). Towards an Understanding of Hill-climbing Procedures for SAT. In Proc. of the 11th National Conference on Artificial Intelligence, pp. 28-33. Jeroslow, R. (1988). Computation-Oriented Reduction of Predicate to Propositional Logic. Decision Support System, 4, 183-197. Kautz, H., & Selman, B. (1992). Planning as Satisfiability. In Proc. 10th European Conference on Artificial Intelligence, pp. 359-363. Klingerbeck, S. (1994). Generating Finite Counter Examples with Semantic Tableaux and Interpretation Revision. In Proc. of CADE12 Workshop on Automated Model Building. Levesque, H. (1986). Making believers out of computers. Artificial Intelligence., 30, 81-108. Plaisted, D., & Greenbaum, S. (1986). A Structure-preserving Clause Form Translation. Journal of Symbolic Computation, 2, 293-304. Reiter, R., & Mackworth, A. (1989). A logical framework for depiction and image interpretation. Artificial Intelligence., 41 (2), 125-155. Sebastiani, R. (1994). Applying GSAT to Non-Clausal Formulas. Tech. rep. 94-0018, DIST, University of Genova, Italy. Available via anonimous ftp from mrg.dist.unige.it, /pub/mrg-ftp/. Selman, B., & Kautz, H. (1993). Domain-Independent Extension to GSAT: Solving Large Structured Satisfiability Problems. In Proc. of the 13th International Joint Conference on Artificial Intelligence, pp. 290-295. Selman, B., Levesque, H., & Mitchell, D. (1992). A New Method for Solving Hard Satisfiability Problems. In Proc. of the 10th National Conference on Artificial Intelligence, pp. 440-446. Slaney, J. (1993). SCOTT: A Model-Guided Theorem Prover. In Proc. of the 13th International Joint Conference on Artificial Intelligence, pp. 109-114. Morgan Kaufmann. 314