A. Becker, R. Bar-Yehuda and D. Geiger
Volume 12, 2000
Links to Full Text:Journal of Artificial Intelligence Research 12 (2000) 219-234 Submitted 5/99; published 5/00 Randomized Algorithms for the Loop Cutset Problem Ann Becker anyuta@cs.technion.ac.il Reuven Bar-Yehuda reuven@cs.technion.ac.il Dan Geiger dang@cs.technion.ac.il Computer Science Department Technion, Haifa, 32000, Israel Abstract We show how to find a minimum weight loop cutset in a Bayesian network with high probability. Finding such a loop cutset is the first step in the method of conditioning for inference. Our randomized algorithm for finding a loop cutset outputs a minimum loop cutset after O(c 6kkn) steps with probability at least 1 Gamma (1 Gamma 16k )c6 k, where c ? 1 is a constant specified by the user, k is the minimal size of a minimum weight loop cutset, and n is the number of vertices. We also show empirically that a variant of this algorithm often finds a loop cutset that is closer to the minimum weight loop cutset than the ones found by the best deterministic algorithms known. 1. Introduction The method of conditioning is a well known inference method for the computation of posterior probabilities in general Bayesian networks (Pearl, 1986, 1988; Suermondt & Cooper, 1990; Peot & Shachter, 1991) as well as for finding MAP values and solving constraint satisfaction problems (Dechter, 1999). This method has two conceptual phases. First to find an optimal or close to optimal loop cutset and then to perform a likelihood computation for each instance of the variables in the loop cutset. This method is routinely used by geneticists via several genetic linkage programs (Ott, 1991; Lang, 1997; Becker, Geiger, & Schaffer, 1998). A variant of this method was developed by Lange and Elston (1975). Finding a minimum weight loop cutset is NP-complete and thus heuristic methods have often been applied to find a reasonable loop cutset (Suermondt & Cooper, 1990). Most methods in the past had no guarantee of performance and performed very badly when presented with an appropriate example. Becker and Geiger (1994, 1996) offered an algorithm that finds a loop cutset for which the logarithm of the state space is guaranteed to be at most a constant factor off the optimal value. An adaptation of these approximation algorithms has been included in version 4.0 of FASTLINK, a popular software for analyzing large pedigrees with small number of genetic markers (Becker et al., 1998). Similar algorithms in the context of undirected graphs are described by Bafna, Berman, and Fujito (1995) and Fujito (1996). While approximation algorithms for the loop cutset problem are quite useful, it is still worthwhile to invest in finding a minimum loop cutset rather than an approximation because the cost of finding such a loop cutset is amortized over the many iterations of the conditioning method. In fact, one may invest an effort of complexity exponential in the size of the loop cutset in finding a minimum weight loop cutset because the second phase of the conditioning algorithm, which is repeated for many iterations, uses a procedure of such cfl2000 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved. Becker, Bar-Yehuda, & Geiger complexity. The same considerations apply also to constraint satisfaction problems as well as other problems in which the method of conditioning is useful (Dechter, 1990, 1999). In this paper we describe several randomized algorithms that compute a loop cutset. As done by Bar-Yehuda, Geiger, Naor, and Roth (1994), our solution is based on a reduction to the weighted feedback vertex set problem. A feedback vertex set (FVS) F is a set of vertices of an undirected graph G = (V; E) such that by removing F from G, along with all the edges incident with F , a set of trees is obtained. The Weighted Feedback Vertex Set (WFVS) problem is to find a feedback vertex set F of a vertex-weighted graph with a weight function w : V ! IR+, such that Pv2F w(v) is minimized. When w(v) j 1, this problem is called the FVS problem. The decision version associated with the FVS problem is known to be NP-Complete (Garey & Johnson, 1979, pp. 191-192). Our randomized algorithm for finding a WFVS, called RepeatedWGuessI, outputs a minimum weight FVS after O(c 6kkn) steps with probability at least 1 Gamma (1 Gamma 16k )c6 k, where c ? 1 is a constant specified by the user, k is the minimal size of a minimum weight FVS, and n is the number of vertices. For unweighted graphs we present an algorithm that finds a minimum FVS of a graph G after O(c 4kkn) steps with probability at least 1 Gamma (1 Gamma 14k )c4 k. In comparison, several deterministic algorithms for finding a minimum FVS are described in the literature. One has a complexity O((2k + 1)kn2) (Downey & Fellows, 1995b) and others have a complexity O((17k4)!n) (Bodlaender, 1990; Downey & Fellows, 1995a). A final variant of our randomized algorithms, called WRA, has the best performance because it utilizes information from previous runs. This algorithm is harder to analyze and its investigation is mostly experimental. We show empirically that the actual run time of WRA is comparable to a Modified Greedy Algorithm (MGA), described by Becker and Geiger (1996), which is the best available deterministic algorithm for finding close to optimal loop cutsets, and yet, the output of WRA is often closer to the minimum weight loop cutest than the output of MGA. The rest of the paper is organized as follows. In Section 2 we outline the method of conditioning, explain the related loop cutset problem and describe the reduction from the loop cutset problem to the WFVS Problem. In Section 3 we present three randomized algorithms for the WFVS problem and their analysis. In Section 4 we compare experimentally WRA and MGA with respect to output quality and run time. 2. Background: The Loop Cutset Problem A short overview of the method of conditioning and definitions related to Bayesian networks are given below. See the book by Pearl (1988) for more details. We then define the loop cutset problem. Let P (u1; : : : ; un) be a probability distribution where each variable ui has a finite set of possible values called the domain of ui. A directed graph D with no directed cycles is called a Bayesian network of P if there is a 1-1 mapping between fu1; : : : ; ung and vertices in D, such that ui is associated with vertex i and P can be written as follows: P (u1; : : : ; un) = nY i=1 P (ui j ui1; : : : ; uij(i)) (1) where i1; : : : ; ij(i) are the source vertices of the incoming edges to vertex i in D. 220 Randomized Algorithms for the Loop Cutset Problem Suppose now that some variables fv1; : : : ; vlg among fu1; : : : ; ung are assigned specific values fv1; : : : ; vlg respectively. The updating problem is to compute the probability P (ui j v1 = v1; : : : ; vl = vl) for i = 1; : : :; n. A trail in a Bayesian network is a subgraph whose underlying graph is a simple path. A vertex b is called a sink with respect to a trail t if there exist two consecutive edges a ! b and b c on t. A trail t is active by a set of vertices Z if (1) every sink with respect to t either is in Z or has a descendant in Z and (2) every other vertex along t is outside Z. Otherwise, the trail is said to be blocked (d-separated) by Z. Verma and Pearl proved that if D is a Bayesian network of P (u1; : : :; un) and all trails between a vertex in fr1; : : : ; rlg and a vertex in fs1; : : : ; skg are blocked by ft1; : : :; tmg, then the corresponding sets of variables fur1; : : : ; urlg and fus1; : : : ; uskg are independent conditioned on fut1; : : : ; utmg (Verma & Pearl, 1988). Furthermore, Geiger and Pearl proved that this result cannot be enhanced (Geiger & Pearl, 1990). Both results were presented and extended by Geiger, Verma, and Pearl (1990). Using the close relationship between blocked trails and conditional independence, Kim and Pearl developed an algorithm update-tree that solves the updating problem on Bayesian networks in which every two vertices are connected with at most one trail (Kim & Pearl, 1983). Pearl then solved the updating problem on any Bayesian network as follows (Pearl, 1986). First, a set of vertices S is selected such that any two vertices in the network are connected by at most one active trail in S [ Z, where Z is any subset of vertices. Then, update-tree is applied once for each combination of value assignments to the variables corresponding to S, and, finally, the results are combined. This algorithm is called the method of conditioning and its complexity grows exponentially with the size of S. The set S is called a loop cutset. Note that when the domain size of the variables varies, then update-tree is called a number of times equal to the product of the domain sizes of the variables whose corresponding vertices participate in the loop cutset. If we take the logarithm of the domain size (number of values) as the weight of a vertex, then finding a loop cutset such that the sum of its vertices weights is minimum optimizes Pearl's updating algorithm in the case where the domain sizes may vary. We now give an alternative definition for a loop cutset S and then provide a probabilistic algorithm for finding it. This definition is borrowed from a paper by Bar-Yehuda et al. (1994). The underlying graph G of a directed graph D is the undirected graph formed by ignoring the directions of the edges in D. A cycle in G is a path whose two terminal vertices coincide. A loop in D is a subgraph of D whose underlying graph is a cycle. A vertex v is a sink with respect to a loop Gamma if the two edges adjacent to v in Gamma are directed into v. Every loop must contain at least one vertex that is not a sink with respect to that loop. Each vertex that is not a sink with respect to a loop Gamma is called an allowed vertex with respect to Gamma . A loop cutset of a directed graph D is a set of vertices that contains at least one allowed vertex with respect to each loop in D. The weight of a set of vertices X is denoted by w(X) and is equal to Pv2X w(v) where w(x) = log(jxj) and jxj is the size of the domain associated with vertex x. A minimum weight loop cutset of a weighted directed graph D is a loop cutset F Lambda of D for which w(F Lambda ) is minimum over all loop cutsets of G. The Loop Cutset Problem is defined as finding a minimum weight loop cutset of a given weighted directed graph D. 221 Becker, Bar-Yehuda, & Geiger The approach we take is to reduce the loop cutset problem to the weighted feedback vertex set problem, as done by Bar-Yehuda et al. (1994). We now define the weighted feedback vertex set problem and then the reduction. Let G = (V; E) be an undirected graph, and let w : V ! IR+ be a weight function on the vertices of G. A feedback vertex set of G is a subset of vertices F ` V such that each cycle in G passes through at least one vertex in F . In other words, a feedback vertex set F is a set of vertices of G such that by removing F from G, along with all the edges incident with F , we obtain a set of trees (i.e., a forest). The weight of a set of vertices X is denoted (as before) by w(X) and is equal to Pv2X w(v). A minimum feedback vertex set of a weighted graph G with a weight function w is a feedback vertex set F Lambda of G for which w(F Lambda ) is minimum over all feedback vertex sets of G. The Weighted Feedback Vertex Set (WFVS) Problem is defined as finding a minimum feedback vertex set of a given weighted graph G having a weight function w. The reduction is as follows. Given a weighted directed graph (D; w) (e.g., a Bayesian network), we define the splitting weighted undirected graph Ds with a weight function ws as follows. Split each vertex v in D into two vertices vin and vout in Ds such that all incoming edges to v in D become undirected incident edges with vin in Ds, and all outgoing edges from v in D become undirected incident edges with vout in Ds. In addition, connect vin and vout in Ds by an undirected edge. Now set ws(vin) = 1 and ws(vout) = w(v). For a set of vertices X in Ds, we define (X) as the set obtained by replacing each vertex vin or vout in X by the respective vertex v in D from which these vertices originated. Note that if X is a cycle in Ds, then (X) is a loop in D, and if Y is a loop in D, then Gamma 1(Y ) = Sv2Y Gamma 1(v) is a cycle in Ds where Gamma 1(v) = 8?!?: vin v is a sink on Y vout v is a source on Y fvin; voutg otherwise (A vertex v is a source with respect to a loop Y if the two edges adjacent to v in Y originate from v). This mapping between loops in D and cycles in Ds is one-to-one and onto. Our algorithm can now be easily stated. ALGORITHM LoopCutset Input: A Bayesian network D Output: A loop cutset of D 1. Construct the splitting graph Ds with weight function ws 2. Find a feedback vertex set F for (Ds; ws) using the Weighted Randomized Algorithm (WRA) 3. Output (F ). It is immediately seen that if WRA (developed in later sections) outputs a feedback vertex set F of Ds whose weight is minimum with high probability, then (F ) is a loop cutset of D with minimum weight with the same probability. This observation holds due to the one-to-one and onto correspondence between loops in D and cycles in Ds and because WRA never chooses a vertex that has an infinite weight. 222 Randomized Algorithms for the Loop Cutset Problem 3. Algorithms for the WFVS Problem Recall that a feedback vertex set of G is a subset of vertices F ` V such that each cycle in G passes through at least one vertex in F . In Section 3.1 we address the problem of finding a FVS with a minimum number of vertices and in Sections 3.2 and 3.3 we address the problem of finding a FVS with a minimum weight. Throughout, we allow G to have parallel edges. If two vertices u and v have parallel edges between them, then every FVS of G includes either u, v, or both. 3.1 The Basic Algorithms In this section we present a randomized algorithm for the FVS problem. First we introduce some additional terminology and notation. Let G = (V; E) be an undirected graph. The degree of a vertex v in G, denoted by d(v), is the number of vertices adjacent to v. A self-loop is an edge with two endpoints at the same vertex. A leaf is a vertex with degree less or equal 1, a linkpoint is a vertex with degree 2 and a branchpoint is a vertex with degree strictly higher than 2. The cardinality of a set X is denoted by jXj. A graph is called rich if every vertex is a branchpoint and it has no self-loops. Given a graph G, by repeatedly removing all leaves, and bypassing with an edge every linkpoint, a graph G0 is obtained such that the size of a minimum FVS in G0 and in G are equal and every minimum FVS of G0 is a minimum FVS of G. Since every vertex involved in a self-loop belongs to every FVS, we can transform G0 to a rich graph Gr by adding the vertices involved in self loops to the output of the algorithm. Our algorithm is based on the observation that if we pick an edge at random from a rich graph there is a probability of at least 1=2 that at least one endpoint of the edge belongs to any given FVS F . A precise formulation of this claim is given by Lemma 1 whose proof is given implicitly by Voss (1968, Lemma 4). Lemma 1 Let G = (V; E) be a rich graph, F be a feedback vertex set of G and X = V n F . Let EX denote the set of edges in E whose endpoints are all vertices in X and EF;X denote the set of edges in G that connect vertices in F with vertices in X. Then, jEXj ^ jEF;Xj. Proof. The graph obtained by deleting a feedback vertex set F of a graph G(V; E) is a forest with vertices X = V n F . Hence, jEXj ! jXj. However, each vertex in X is a branchpoint in G, and so, 3 jXj ^ X v2X d(v) = jEF;Xj + 2 jEXj: Thus, jEXj ^ jEF;Xj. 2 Lemma 1 implies that when picking an edge at random from a rich graph, it is at least as likely to pick an edge in EF;X than an edge in EX . Consequently, selecting a vertex at random from a randomly selected edge has a probability of at least 1=4 to belong to a minimum FVS. This idea yields a simple algorithm to find a FVS. 223 Becker, Bar-Yehuda, & Geiger ALGORITHM SingleGuess(G,j) Input: An undirected graph G0 and an integer j ? 0. Output: A feedback vertex set F of size ^ j, or "Fail" otherwise. For i = 1; : : :; j 1. Reduce GiGamma 1 to a rich graph Gi while placing self loop vertices in F . 2. If Gi is the empty graph Return F 3. Pick an edge e = (u; v) at random from Ei 4. Pick a vertex vi at random from (u; v) 5. F F [ fvig 6. V V n fvig Return "Fail" Due to Lemma 1, when SingleGuess(G; j) terminates with a FVS of size j, there is a probability of at least 1=4j that the output is a minimum FVS. Note that steps 3 and 4 in SingleGuess determine a vertex v by first selecting an arbitrary edge and then selecting an arbitrary endpoint of this edge. An equivalent way of achieving the same selection rule is to choose a vertex with probability proportional to its degree: p(v) = d(v)P u2V d(u) = d(v) 2 Delta jEj To see the equivalence of these two selection methods, define Gamma (v) to be a set of edges whose one endpoint is v, and note that for graphs without self-loops, p(v) = X e2Gamma (v) p(vje) Delta p(e) = 12 X e2Gamma (v) p(e) = d(v)2 Delta jEj This equivalent phrasing of the selection criterion is easier to extend to the weighted case and will be used in the following sections. An algorithm for finding a minimum FVS with high probability, which we call RepeatedGuess, can now be described as follows: Start with j = 1. Repeat SingleGuess c 4j times where c ? 1 is a parameter defined by the user. If in one of the iterations a FVS of size ^ j is found, then output this FVS, otherwise, increase j by one and continue. ALGORITHM RepeatedGuess(G,c) Input: An undirected graph G and a constant c ? 1. Output: A feedback vertex set F . For j = 1; : : : ; jV j Repeat c 4j times 1. F SingleGuess(G; j) 2. If F is not "Fail" then Return F End fRepeatg End fForg 224 Randomized Algorithms for the Loop Cutset Problem The main claims about these algorithms are given by the following theorem. Theorem 2 Let G be an undirected graph and c * 1 be a constant. Then, SingleGuess(G; k) outputs a FVS whose expected size is no more than 4k, and RepeatedGuess(G; c) outputs, after O(c 4kkn) steps, a minimum FVS with probability at least 1 Gamma (1 Gamma 14k )c4 k, where k is the size of a minimum FVS and n is the number of vertices. The claims about the probability of success and number of steps follow immediately from the fact that the probability of success of SingleGuess(G; j) is at least (1=4)j and that, in case of success, O(c 4j) iterations are performed each taking O(jn) steps. The result follows from the fact that Pkj=1 j4j is of order O(k4k). The proof about the expected size of a single guess is presented in the next section. Theorem 2 shows that each guess produces a FVS which, on the average, is not too far from the minimum, and that after enough iterations, the algorithm converges to the minimum with high probability. In the weighted case, discussed next, we managed to achieve each of these two guarantees in separate algorithms, but we were unable to achieve both guarantees in a single algorithm. 3.2 The Weighted Algorithms We now turn to the weighted FVS problem (WFVS) of size k which is to find a feedback vertex set F of a vertex-weighted graph (G; w), w : V ! IR+, of size less or equal k such that w(F ) is minimized. Note that for the weighted FVS problem we cannot replace each linkpoint v with an edge because if v has weight lighter than its branchpoint neighbors then v can participate in a minimum weight FVS of size k. A graph is called branchy if it has no endpoints, no self loops, and, in addition, each linkpoint is connected only to branchpoints (Bar-Yehuda, Geiger, Naor, & Roth, 1994). Given a graph G, by repeatedly removing all leaves, and bypassing with an edge every linkpoint that has a neighbor with equal or lighter weight, a graph G0 is obtained such that the weight of a minimum weight FVS (of size k) in G0 and in G are equal and every minimum WFVS of G0 is a minimum WFVS of G. Since every vertex with a self-loop belongs to every FVS, we can transform G0 to a branchy graph without self-loops by adding the vertices involved in self loops to the output of the algorithm. To address the WFVS problem we offer two slight modifications to the algorithm SingleGuess presented in the previous section. The first algorithm, which we call SingleWGuessI, is identical to SingleGuess except that in each iteration we make a reduction to a branchy graph instead of a reduction to a rich graph. It chooses a vertex with probability proportional to the degree using p(v) = d(v)= Pu2V d(u). Note that this probability does not take the weight of a vertex into account. A second algorithm, which we call SingleWGuessII, chooses a vertex with probability proportional to the ratio of its degree over its weight, p(v) = d(v)w(v)= X u2V d(u) w(u): (2) 225 Becker, Bar-Yehuda, & Geiger ALGORITHM SingleWGuessI(G,j) Input: An undirected weighted graph G0 and an integer j ? 0. Output: A feedback vertex set F of size ^ j, or "Fail" otherwise. For i = 1; : : :; j 1. Reduce GiGamma 1 to a branchy graph Gi(Vi; Ei) while placing self loop vertices in F . 2. If Gi is the empty graph Return F 3. Pick a vertex vi 2 Vi at random with probability pi(v) = di(v)= Pu2Vi di(u) 4. F F [ fvig 5. V V n fvig Return "Fail" The second algorithm uses Eq. 2 for computing p(v) in Line 3. These two algorithms have remarkably different guarantees of performance. Version I guarantees that choosing a vertex that belongs to any given FVS is larger than 1=6, however, the expected weight of a FVS produced by version I cannot be bounded by a constant times the weight of a minimum WFVS. Version II guarantees that the expected weight of its output is bounded by 6 times the weight of a minimum WFVS, however, the probability of converging to a minimum after any fixed number of iterations can be arbitrarily small. We first demonstrate via an example the negative claims. The positive claims are phrased more precisely in Theorem 3 and proven thereafter. Consider the graph shown in Figure 1 with three vertices a,b and c, and corresponding weights w(a) = 6, w(b) = 3ffl and w(c) = 3m, with three parallel edges between a and b, and three parallel edges between a and c. The minimum WFVS F Lambda with size 1 consists of vertex a. According to Version II, the probability of choosing vertex a is (Eq. 2): p(a) = ffl(1 + 1=m) Delta ffl + 1 So if ffl is arbitrarily small and m is sufficiently large, then the probability of choosing vertex a is arbitrarily small. Thus, the probability of choosing a vertex from some F Lambda by the criterion d(v)=w(v), as done by Version II, can be arbitrarily small. If, on the other hand, Version I is used, then the probability of choosing a; b, or c is 1=2; 1=4; 1=4, respectively. Thus, the expected weight of the first vertex to be chosen is 3=4 Delta (ffl + m + 4), while the weight of a minimum WFVS is 6. Consequently, if m is sufficiently large, the expected weight of a WFVS found by Version I can be arbitrarily larger than a minimum WFVS. The algorithm for repeated guesses, which we call RepeatedWGuessI(G; c; j) is as follows: repeat SingleWGuessI(G; j) c 6j times, where j is the minimal number of vertices of a minimum weight FVS we seek. If no FVS is found of size ^ j, the algorithm outputs that the size of a minimum WFVS is larger than j with high probability, otherwise, it outputs the lightest FVS of size less or equal j among those explored. The following theorem summarizes the main claims. 226 Randomized Algorithms for the Loop Cutset Problem c a b w(c) = 3ffl w(a) = 6 w(b) = 3m Figure 1: The minimum WFVS F Lambda = fag. Theorem 3 Let G be a weighted undirected graph and c * 1 be a constant. a) The algorithm RepeatedWGuessI(G; c; k) outputs after O(c 6kkn) steps a minimum WFVS with probability at least 1 Gamma (1 Gamma 16k )c6 k, where k is the minimal size of a minimum weight FVS of G and n is the number of vertices. b) The algorithm SingleWGuessII(G,k) outputs a feedback vertex set whose expected weight is no more than six times the weight of a minimum weight FVS. The proof of each part requires a preliminary lemma. Lemma 4 Let G = (V; E) be a branchy graph, F be a feedback vertex set of G and X = V n F . Let EX denote the set of edges in E whose endpoints are all vertices in X and EF;X denote the set of edges in G that connect vertices in F with vertices in X. Then, jEXj ^ 2 Delta jEF;Xj. Proof. Let Xb be the set of branchpoints in X. We replace every linkpoint in X by an edge between its neighbors, and denote the resulting set of edges between vertices in Xb by EbXb and between vertices in Xb and F by EbF;Xb. The proof of Lemma 1 shows that jEbXbj ^ jEbF;Xbj: Since every linkpoint in X has both neighbors in the set Xb [ F , the following holds: jEXj ^ 2 Delta jEbXbj and jEF;Xj = jEbF;Xbj: Hence, jEXj ^ 2 Delta jEF;Xj. 2 An immediate consequence of Lemma 4 is that the probability of randomly choosing an edge that has at least one endpoint that belongs to a FVS is greater or equal 1=3. Thus, selecting a vertex at random from a randomly selected edge has a probability of at least 1=6 to belong to a FVS. Consequently, if the algorithm terminates after c 6k iterations, with a WFVS of size k, there is a probability of at least 1 Gamma (1 Gamma 16k )c6 k that the output is a minimum WFVS of size at most k. This proves part (a) of Theorem 3. Note that since k is not known in advance, we use RepeatedWGuessI(G; c; j) with increasing values of j until a FVS is found, say when j=J. When such a set is found it is still possible that there exists a WFVS with more than J vertices that has a smaller weight than the one found. This happens when k ? J. However, among the WFVSs of size at most J, the algorithm finds one with minimum weight with high probability. The second part requires the following lemma. 227 Becker, Bar-Yehuda, & Geiger Lemma 5 Let G be a branchy graph and F be a FVS of G. Then,X v2V d(v) ^ 6 X v2F d(v): Proof. Denote by dY (v) the number of edges between a vertex v and a set of vertices Y . Then, X v2V d(v) = X v2X d(v) + X v2F d(v) =X v2X dX(v) + X v2X dF (v) + X v2F d(v): Due to Lemma 4, X v2X dX(v) = 2jEXj ^ 4jEF;Xj = 4 X v2X dF (v): (3) Consequently, X v2V d(v) ^ 4 X v2X dF (v)+X v2X dF (v) + X v2F d(v) ^ 6 X v2F d(v) as claimed. 2 We can now prove part (b) of Theorem 3 analyzing SingleWGuessII(G,k). Recall that Vi is the set of vertices in graph Gi in iteration i, di(v) is the degree of vertex v in Gi, and vi is the vertex chosen in iteration i. Furthermore, recall that pi(v) is the probability to choose vertex v in iteration i. The expected weight Ei(w(v)) = Pv2Vi w(v) Delta pi(v) of a chosen vertex in iteration i is denoted with ai. Thus, due to the linearity of the expectation operator, E(w(F )) = Pki=1 ai, assuming jF j = k. We define a normalization constant for iteration i as follows: fli = " X u2Vi di(u) w(u) # Gamma 1 Then, pi(v) = fli Delta di(v)w(v) and ai = X v2Vi w(v) Delta di(v)w(v) Delta fli = fli Delta X v2Vi di(v) Let F Lambda be a minimum FVS of G and F Lambda i be minimum weight FVS of the graph Gi. The expected weight Ei(w(v)jv 2 F Lambda i )) of a vertex chosen from F Lambda i in iteration i is denoted with bi. We have, bi = X v2FLambda i w(v) Delta pi(v) = fli Delta X v2FLambda i di(v) By Lemma 5, ai=bi ^ 6 for every i. 228 Randomized Algorithms for the Loop Cutset Problem Recall that by definition F Lambda 2 is the minimum FVS in the branchy graph G2 obtained from G1 n fv1g. We get, E(w(F Lambda )) * E1(w(v)jv 2 F Lambda 1 )) + E(w(F Lambda 2 )) because the right hand side is the expected weight of the output F assuming the algorithm finds a minimum FVS on G2 and just needs to select one additional vertex, while the left hand side is the unrestricted expectation. By repeating this argument we get, E(w(F Lambda )) * b1 + E(w(F Lambda 2 )) * Delta Delta Delta * kX i=1 bi Using Pi ai= Pi bi ^ maxi ai=bi ^ 6, we obtain E(w(F )) ^ 6 Delta E(w(F Lambda )): Hence, E(w(F )) ^ 6 Delta w(F Lambda ) as claimed. 2 The proof that SingleGuess(G; k) outputs a FVS whose expected size is no more than 4k (Theorem 2) where k is the size of a minimum FVS is analogous to the proof of Theorem 3 in the following sense. We assign a weight 1 to all vertices and replace the reference to Lemma 5 by a reference to the following claim: If F is a FVS of a rich graph G, then Pv2V d(v) ^ 4 Pv2F d(v). The proof of this claim is identical to the proof of Lemma 5 except that instead of using Lemma 4 we use Lemma 1. 3.3 The Practical Algorithm In previous sections we presented several algorithms for finding minimum FVS with high probability. The description of these algorithms was geared towards analysis, rather than as a prescription to a programmer. In particular, the number of iterations used within RepeatedWGuessI(G; c; k) is not changed as the algorithm is run with j ! k. This feature allowed us to regard each call to SingleWGuessI(G; j) made by RepeatedWGuessI as an independent process. Furthermore, there is a small probability for a very long run even when the size of the minimum FVS is small. We now slightly modify RepeatedWGuessI to obtain an algorithm, termed WRA, which does not suffer from these deficiencies. The new algorithm works as follows. Repeat SingleWGuessI(G; jV j) for min(Max; c 6w(F)) iterations, where w(F ) is the weight of the lightest WFVS found so far and Max is some specified constant determining the maximum number of iterations of SingleWGuessI. ALGORITHM WRA(G; c; Max) Input: An undirected weighted graph G(V; E) and constants Max and c ? 1 Output: A feedback vertex set F F SingleWGuessI (G; jV j) M min(Max; c 6w(F)); i 1; While i ^ M do 1. F 0 SingleWGuessI(G; jV j) 2. If w(F 0) ^ w(F ) then 229 Becker, Bar-Yehuda, & Geiger jV j jEj values size MGA WRA Eq. 15 25 2-6 3-6 12 81 7 15 25 2-8 3-6 7 89 4 15 25 2-10 3-6 6 90 4 25 55 2-6 7-12 3 95 2 25 55 2-8 7-12 3 97 0 25 55 2-10 7-12 0 100 0 55 125 2-10 17-22 0 100 0 31 652 17 Figure 2: Number of graphs in which MGA or WRA yield a smaller loop cutset. The last column records the number of graphs for which the two algorithms produced loop cutsets of the same weight. Each line in the table is based on 100 graphs. F F 0; M min(Max; c 6w(F)) 3. i i + 1; End fWhileg Return F Theorem 6 If Max * c6k, where k is the minimal size of a minimum WFVS of an undirected weighted graph G, then WRA(G; c; Max) outputs a minimum WFVS of G with probability at least 1 Gamma (1 Gamma 16k )c6 k. The proof is an immediate corollary of Theorem 3. The choice of Max and c depend on the application. A decision-theoretic approach for selecting such values for any-time algorithms is discussed by Breese and Horvitz (1990). 4. Experimental Results The experiments compared the outputs of WRA vis-`a-vis a greedy algorithm GA and a modified greedy algorithm MGA (Becker & Geiger, 1996) based on randomly generated graphs and on some real graphs contributed by the Hugin group (www.hugin.com). The random graphs are divided into three sets. Graphs with 15 vertices and 25 edges where the number of values associated with each vertex is randomly chosen between 2 and 6, 2 and 8, and between 2 and 10. Graphs with 25 vertices and 55 edges where the number of values associated with each vertex is randomly chosen between 2 and 6, 2 and 8, and between 2 and 10. Graphs with 55 vertices and 125 edges where the number of values associated with each vertex is randomly chosen between 2 and 10. Each instance of the three classes is based on 100 random graphs generated as described by Suermondt and Cooper (1990). The total number of random graphs we used is 700. The results are summarized in the table of Figure 2. WRA is run with Max = 300 and c = 1. The two algorithms, MGA and WRA, output loop cutsets of the same size in only 230 Randomized Algorithms for the Loop Cutset Problem Name jV j jEj jF Lambda j GA MGA WRA Water 32 123 16 40.7 42.7 29.5 Mildew 35 80 14 48.1 40.5 39.3 Barley 48 126 20 72.1 76.3 57.3 Munin1 189 366 59 159.4 167.5 122.6 Figure 3: Log size (base 2) of the loop cutsets found by GA, MGA, and WRA. 17 graphs and when the algorithms disagree, then in 95% of these graphs WRA performed better than MGA. The actual run time of W RA(G; 1; 300) is about 300 times slower than GA (or MGA) on G. On the largest random graph we used, it took 4.5 minutes. Most of the time is spend in the last improvement of WRA. Considerable run time can be saved by letting Max = 5. For all 700 graphs, WRA(G,1,5) has already obtained a better loop cutset than MGA. The largest improvement, with Max = 300, was from a weight of 58.0 (log2 scale) to a weight of 35.9. The improvements in this case were obtained in iterations 1, 2, 36, 83, 189 with respective weights of 46.7, 38.8, 37.5, 37.3, 35.9 and respective sizes of 22, 18, 17, 18, and 17 nodes. On the average, after 300 iterations, the improvement for the larger 100 graphs was from a weight of 52 to 39 and from size 22 to 20. The improvement for the smaller 600 graphs was from a weight of 15 to 12.2 and from size 9 to 6.7. The second experiment compared between GA, MGA and WRA on four real Bayesian networks showing that WRA outperformed both GA and MGA after a single call to SingleWGuessI. The weight of the output continued to decrease logarithmically with the number of iterations. We report the results with Max = 1000 and c = 1. Run time was between 3 minutes for Water and 15 minutes for Munin1 on a Pentium 133 with 32M RAM. 5. Discussion Our randomized algorithm, WRA, has been incorporated into the popular genetic software FASTLINK 4.1 by Alejandro Sch"affer who develops and maintains this software at the National Institute of Health. WRA replaced previous approximation algorithms for finding FVS because with a small Max value it already matched or improved FASTLINK 4.0 on most datasets examined. The datasets used for comparison are described by Becker et al. (1998). The main characteristics of these datasets is that they were all collected by geneticists, they have a small number of loops, and a large number of values at each node (tens to hundreds depending on the genetic analysis). For such networks the method of conditioning is widely used by geneticists. The leading inference algorithm, however, for Bayesian networks is the clique-tree algorithm (Lauritzen & Spiegelhalter, 1988) which has been further developed in several papers (Jensen, Lauritzen, & Olsen, 1990a; Jensen, Olsen, & Andersen, 1990b). For the networks presented in Table 3 conditioning is not a feasible method while the clique tree algorithm can and is being used to compute posterior probabilities in these networks. Furthermore, it has been shown that the weight of the largest clique is bounded by the weight of the loop cutset union the largest parent set of a vertex in a Bayesian network implying that the 231 Becker, Bar-Yehuda, & Geiger clique tree algorithm is always superior in time performance over the conditioning algorithm (Shachter, Andersen, & Szolovits, 1994). The two methods, however, can be combined to strike a balance between time and space requirements as done within the bucket elimination framework (Dechter, 1999). The algorithmic ideas behind the randomized algorithms presented herein can also be applied for constructing good clique trees and initial experiments confirm that an improvement over deterministic algorithms is often obtained. The idea is that instead of greedily selecting the smallest clique when constructing a clique tree, one would randomly select the next clique according to the relative weights of the candidate cliques. It remains to develop the theory behind random choices of clique trees before a solid assessment can be presented. Currently, there is no algorithm for finding a clique tree such that its size is guaranteed to be close to optimal with high probability. Horvitz et al. (1989) show that the method of conditioning can be useful for approximate inference. In particular, they show how to rank the instances of a loop cutset according to their prior probabilities assuming all variables in the cutset are marginally independent. The conditioning algorithm can then be run according to this ranking and the answer to a query be given as an interval that shrinks towards the exact solution as more instances of the loop cutset are considered (Horvitz, Suermondt, & Cooper, 1989; Horvitz, 1990). Applying this idea without making independence assumptions is described by Darwiche (1994). So if the maximal clique is too large to store one can still perform approximate inferences using the conditioning algorithm. Acknowledgment We thank Seffi Naor for fruitful discussions. Part of this work was done while the third author was on sabbatical at Microsoft Research. A variant of this work has been presented at the fifteenth conference on uncertainty in artificial intelligence, July 1999, Sweden. References Bafna, V., Berman, P., & Fujito, T. (1995). Constant ratio approximations of the weighted feedback vertex set problem for undirected graphs. In Proceedings of the Sixth Annual Symposium on Algorithms and Computation (ISAAC95), pp. 142-151. Bar-Yehuda, R., Geiger, D., Naor, J., & Roth, R. (1994). Approximation algorithms for the feedback vertex set problems with applications to constraint satisfaction and Bayesian inference. In Proceedings of the 5th Annual ACM-Siam Symposium On Discrete Algorithms, pp. 344-354. Becker, A., & Geiger, D. (1994). Approximation algorithms for the loop cutset problem. In Proceedings of the 10th conference on Uncertainty in Artificial Intelligence, pp. 60-68. Becker, A., & Geiger, D. (1996). Optimization of pearl's method of conditioning and greedylike approximation algorithms for the feedback vertex set problem. Artificial Intelligence, 83, 167-188. 232 Randomized Algorithms for the Loop Cutset Problem Becker, A., Geiger, D., & Schaffer, A. (1998). Automatic selection of loop breakers for genetic linkage analysis. Human Heredity, 48, 47-60. Bodlaender, H. (1990). On disjoint cycles. International Journal of Foundations of Computer Science (IJFCS), 5, 59-68. Breese, J., & Horvitz, E. (1990). Ideal reformulation of belief netwroks. In Proceedings of the 6th conference on Uncertainty in Artificial Intelligence, pp. 64-72. Darwiche, A. (1994). ffl-bounded conditioning: A method for the approximate updating of causal networks. Research note, Rockwell Science Center. Dechter, R. (1990). Enhancement schemes for constraint processing: backjumping, learning, and cutset decomposition. Artificial Intelligence, 41, 273-312. Dechter, R. (1999). Bucket elimination: A unifying framework for structure-driven inference. Artificial Intelligence, To appear. Downey, R., & Fellows, M. (1995a). Fixed-parameter tractability and completeness I: Basic results. SIAM Journal on Computing, 24 (4), 873-921. Downey, R., & Fellows, M. (1995b). Parameterized computational feasibility. In P. Clote, . J. R. (Ed.), Feasible Mathematics II, pp. 219-244. Birkhauser, Boston. Fujito, T. (1996). A note on approximation of the vertex cover and feedback vertex set problems - unified approach. Information Processing Letters, 59, 59-63. Garey, M., & Johnson, D. (1979). Computers and Intractability: A Guide to the Theory of NP-completeness. W. H. Freeman, San Francisco, California. Geiger, D., & Pearl, J. (1990). On the logic of causal models. In Uncertainty in Artificial Intelligence 4, pp. 3-14 New York. North-Holland. Geiger, D., Verma, T., & Pearl, J. (1990). Identifying independence in bayesian networks. Networks, 20, 507-534. Horvitz, E. J. (1990). Computation and action under bounded resources. Ph.D dissertation, Stanford university. Horvitz, E. J., Suermondt, H. J., & Cooper, G. H. (1989). Bounded conditioning: Flexible inference for decisions under scarce resources. In Proceedings of 5th conference on Uncertainty in Artificial Intelligence, pp. 182-193. Morgan Kaufmann. Jensen, F., Lauritzen, S. L., & Olsen, K. (1990a). Bayesian updating in causal probabilisitic networks by local computations. Computational Statistics Quarterly, 4, 269-282. Jensen, F., Olsen, K., & Andersen, S. (1990b). An algebra of bayesian belief universes for knowledge-based systems. Networks, 20, 637-659. Kim, H., & Pearl, J. (1983). A computational model for combined causal and diagnostic reasoning in inference systems. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI83), pp. 190-193. 233 Becker, Bar-Yehuda, & Geiger Lang, K. (1997). Mathematical and statistical methods for genetic analysis. Springer. Lange, K., & Elston, R. (1975). Extensions to pedigree analysis. I. likelihood calculation for simple and complex pedigrees. Human Heredity, 25, 95-105. Lauritzen, S., & Spiegelhalter, D. (1988). Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal Royal Statistical Society, B, 50, 157-224. Ott, J. (1991). Analysis of human genetic linkage (revised edition). The Johns Hopkins University Press. Pearl, J. (1986). Fusion, propagation and structuring in belief networks. Artificial Intelligence, 29, 241-288. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann, San Mateo, California. Peot, M., & Shachter, R. (1991). Fusion and propagation with multiple observations in belief networks. Artificial Intelligence, 48, 299-318. Shachter, R., Andersen, S., & Szolovits, P. (1994). Global conditioning for probabilistic inference in belief networks. In Proceedings of the tenth conference on Uncertainty in Artificial Intelligence, pp. 514-522. Morgan Kaufmann. Suermondt, H., & Cooper, G. (1990). Probabilistic inference in multiply connected belief networks using loop cutsets. International Journal of Approximate Reasoning, 4, 283- 306. Verma, T., & Pearl, J. (1988). Causal networks: Semantics and expressiveness. In Proceedings of 4th Workshop on Uncertainty in Artificial Intelligence, pp. 352-359. Voss, H. (1968). Some properties of graphs containing k independent circuits. In Proceedings of Colloquium Tihany, pp. 321-334 New York. Academic Press. 234