H. Chan and A. Darwiche

Volume 17, 2002

**Links to Full Text:**

Common wisdom has it that small distinctions in the probabilities (parameters) quantifying a belief network do not matter much for the results of probabilistic queries. Yet, one can develop realistic scenarios under which small variations in network parameters can lead to significant changes in computed queries. A pending theoretical question is then to analytically characterize parameter changes that do or do not matter. In this paper, we study the sensitivity of probabilistic queries to changes in network parameters and prove some tight bounds on the impact that such parameters can have on queries. Our analytic results pinpoint some interesting situations under which parameter changes do or do not matter. These results are important for knowledge engineers as they help them identify influential network parameters. They also help explain some of the previous experimental results and observations with regards to network robustness against parameter changes.

Journal of Artificial Intelligence Research 17 (2002) 265-287 Submitted 10/01; published 9/02 When do Numbers Really Matter? Hei Chan hei@cs.ucla.edu Adnan Darwiche darwiche@cs.ucla.edu Computer Science Department University of California, Los Angeles Los Angeles, CA 90095, USA Abstract Common wisdom has it that small distinctions in the probabilities (parameters) quan-tifying a belief network do not matter much for the results of probabilistic queries. Yet, one can develop realistic scenarios under which small variations in network parameters canlead to significant changes in computed queries. A pending theoretical question is then to analytically characterize parameter changes that do or do not matter. In this paper, westudy the sensitivity of probabilistic queries to changes in network parameters and prove some tight bounds on the impact that such parameters can have on queries. Our analyticresults pinpoint some interesting situations under which parameter changes do or do not matter. These results are important for knowledge engineers as they help them identifyinfluential network parameters. They also help explain some of the previous experimental results and observations with regards to network robustness against parameter changes. 1. Introduction A belief network is a compact representation of a probability distribution (Pearl, 1988;Jensen, 2001). It consists of two parts, one qualitative and the other quantitative. The qualitative part of a belief network (called its structure) is a directed acyclic graph inwhich nodes represent domain variables and edges represent direct influences between these variables. The quantitative part of a belief network is a set of conditional probability tables(CPTs) that quantify our beliefs in such influences. Figure 1 depicts the structure of a belief network and Figure 2 depicts its CPTs.1 Automated reasoning systems based on belief networks have become quite popular re-cently as they have enjoyed much success in a number of real-world applications. Central to the development of such systems is the construction of a belief network (hence, a probabil-ity distribution) that faithfully represents the domain of interest. Although the automatic synthesis of belief networks--based on design information in certain applications and basedon learning techniques in others--has been drawing a lot of attention recently, mainstream methods for constructing such networks continue to be based on traditional knowledge en-gineering (KE) sessions involving domain experts. One of the central issues that arise in such KE sessions is the assessment of impact that changes in network parameters may haveon probabilistic queries of interest. Consider for example the following common method for constructing belief networks inmedical diagnosis applications (Coup'e, Peek, Ottenkamp, & Habbema, 1999). First, the 1. This specific network and its CPTs are distributed with the evaluation version of the commercial HUGIN system at http://www.hugin.com/. cr^2002 AI Access Foundation and Morgan Kaufmann Publishers. All rights reserved. Chan & Darwiche Fire Smoke Tampering Leaving Alarm Report Figure 1: A belief network structure. Fire t,x|u true .01 false .99 Fire Smoke t,x|u true true .9 true false .1 false true .01 false false .99 Alarm Leaving t,x|u true true .88 true false .12 false true .001 false false .999 Tampering t,x|u true .02 false .98 Fire Tampering Alarm t,x|u true true true .5 true true false .5 true false true .99 true false false .01 false true true .85 false true false .15 false false true .0001 false false false .9999 Leaving Report t,x|u true true .75 true false .25 false true .01 false false .99 Figure 2: The CPTs of the belief network shown in Figure 1. network structure is developed. Next, parameters are estimated by non-experts using acombination of statistical data and qualitative influences available from textbook materials. Finally, medical experts are brought in to evaluate the network and fine-tune its parameters.One method of evaluation is to pose diagnostic scenarios to the network, and compare the results of such queries to those expected by the experts. For example, given some set ofsymptoms e, and two potential diagnoses y and z, the network may give us the conclusionthat Pr ( y | e)/Pr (z | e) = 2, while a domain expert may believe that the ratio shouldbe no less than 4. Assuming that the network structure is correct, a central question is then: which network parameters should be changed to give us the correct ratio, and by howmuch? To automate the task of identifying such parameter changes, we have recently devel-oped a belief network tool, called SamIam (Sensitivity Analysis, Modelling, Inference And 266 When do Numbers Really Matter? Figure 3: A screen shot of SamIam performing sensitivity analysis on the belief networkshown in Figure 1. More)2. One of its feature is sensitivity analysis, which allows domain experts to fine-tunenetwork parameters in order to enforce constraints on the results of certain queries. Users can specify the constraint that they want to enforce, and SamIam will automatically de-cide whether a given parameter is relevant to this constraint, and if it is, will compute the minimum amount of change to that parameter which is needed to enforce the constraint.The technical details of our approach to sensitivity analysis are the subject of Section 2. As we experimented with SamIam, we ran into scenarios that we found to be surprisingat first glance. Specifically, there were many occasions in which queries would be quite sensitive to small variations in certain network parameters. Consider the scenario in Figure 3for one example, which corresponds to the network detailed in Figures 1 and 2. Here, we have evidence e = report, smoke: people are reported to be evacuating a building, but thereis no evidence for any smoke. This evidence should make tampering more likely than fire, and the given belief network does indeed reflect this with Pr (tampering | e) = .50 andPr (fire | e) = .03. We wanted, however, the probability of tampering to be no less than .65.Hence, we asked SamIam to identify parameter changes that can enforce this constraint,and it made two recommendations: 1. either decrease the probability of a false report, Pr (report | leaving), from its currentvalue of .01 to <= .0047, 2. SamIam is developed by the UCLA Automated Reasoning Group. Its web page is at http://reasoning.cs.ucla.edu/. 267 Chan & Darwiche 2. or increase the prior probability of tampering from its current value of .02 to >= .036. Therefore, the distinctions between .02 and .036, and the one between .01 and .0047, doreally matter in this case as each induces an absolute change of .15 on the probabilistic queryof interest. Note also that implicit in SamIam's recommendations is that the parametersof variables Fire, Smoke, Leaving, and Alarm are irrelevant to enforcing this constraint, i.e. no matter how much we change any of these parameters, we would not be able to enforceour desired constraint. This example shows that the absolute change in a query can be much larger than theabsolute change in the corresponding parameters. Later, we will show an example where an infinitesimal change to a network parameter leads to a change of .5 to a correspondingquery. We also show examples in which the relative change in the probability of a query is larger than the corresponding relative change in a network parameter. One wonders thenwhether there is a different method for measuring probabilistic change (other than absolute or relative), which allows one to non-trivially bound the change in a probabilistic query interms of the corresponding change in a network parameter. To answer this and related questions, we conduct in Section 3 an analytic study of thepartial derivative of a probabilistic query Pr ( y | e) with respect to some network parameter t,x|u. Our study leads us to three main results: 1. a bound on the derivative in terms of Pr (y | e) and Pr (x | u) only, which is indepen-dent of any other aspect of the given belief network; 2. a bound on the sensitivity of queries to infinitesimal changes in network parameters; 3. a bound on the sensitivity of queries to arbitrary changes in network parameters. The last bound in particular shows that the amount of change in a probabilistic query canbe bounded in terms of the amount of change in a network parameter, as long as change is understood to be the relative change in odds. This result has a number of practical impli-cations. First, it can relieve experts from having to be too precise when specifying certain parameters subjectively. Next, it can be important for approximate inference algorithmsthat pre-process network parameters to eliminate small distinctions in such parameters, in order to increase the efficiency of inference (Poole, 1998). Finally, it can be used to showthat automated reasoning systems based on belief networks are robust and, hence, suitable for real-world applications (Pradhan, Henrion, Provan, Del Favero, & Huang, 1996).Section 4 is indeed dedicated to exploring the implications of the above bounds, where we provide an analytic explanation of why certain parameter changes don't matter. Wefinally close in Section 5 with some concluding remarks. Proofs of all theorems are given in Appendix A. 2. The Tuning of Network Parameters We report in this section on a tool that we have been developing, called SamIam, for fine-tuning network parameters (Laskey, 1995; Castillo, Guti'errez, & Hadi, 1997; Jensen, 1999; Kjaerulff & van der Gaag, 2000; Darwiche, 2000). Given a belief network, some evidence e, which is an instantiation of variables E in the belief network, and two events y and z ofvariables Y and Z respectively, where Y, Z 62 E, our tool can efficiently identify parameterchanges needed to enforce the following types of constraints: 268 When do Numbers Really Matter? Difference: Pr (y | e) - Pr (z | e) >= s^; Ratio: Pr (y | e)/Pr (z | e) >= s^. These two constraints often arise when we debug belief networks. For example, we canmake event y more likely than event z, given evidence e, by specifying the constraint,Pr ( y | e) - Pr (z | e) >= 0, or we can make event y at least twice as likely as event z, givenevidence e, by specifying the constraint, Pr (y | e)/Pr (z | e) >= 2. We will discuss next howone would enforce the two constraints, but we need to settle some notational conventions and technical preliminaries first. Variables are denoted by upper-case letters (A) and their values by lower-case letters (a).Sets of variables are denoted by bold-face upper-case letters ( A) and their instantiationsare denoted by bold-face lower-case letters ( a). For a variable A with values true and false,we use a to denote A = true and a to denote A = false. The CPT for variable X withparents U defines a set of conditional probabilities of the form Pr (x | u), where x is a valueof variable X, u is an instantiation of parents U, and Pr (x | u) is a probability known as anetwork parameter and denoted by t,x|u. We finally recall a basic fact about belief networks.The probability of some instantiation x of all network variables X equals the productof all network parameters that are consistent with that instantiation. For example, the probability of instantiation fire, tampering, smoke, alarm, leaving, report in Figure 1 equals .01 * .98 * .9 * .99 * .12 * .01, which is the product of network parameters (from Figure 2)that are consistent with this instantiation. 2.1 Binary Variables We first consider the parameters of a binary variable X, with two values x and x and,hence, two parameters t,x|u and t,x|u for each parent instantiation u. We assume that foreach variable X and parent instantiation u we have a meta parameter #x|u, such that t,x|u = #x|u and t,x|u = 1 - #x|u. Therefore, our goal is then to determine the amount ofchange to the meta parameter #x|u which would lead to a simultaneous change in both t,x|uand t,x|u. We use the meta parameter #x|u because it is not meaningful to change only t,x|uor t,x|u without changing the other since t,x|u + t,x|u = 1. First we observe that the probability of an instantiation e, Pr (e), is a linear functionin any network parameter t,x|u in a belief network (Russell, Binder, Koller, & Kanazawa,1995; Castillo et al., 1997). In fact, the probability is linear in any meta parameter #x|u. Theorem 2.1 The derivative of Pr (e) with respect to the meta parameter #x|u is given by: @Pr (e) @#x|u = Pr (e, x, u) t,x|u - Pr (e, x, u) t,x|u , (1) when t,x|u 6= 0 and t,x|u 6= 0.3 We will designate the derivative as constant o""e. In Theorem 2.1, o""e = Pr (e, x, u)/t,x|u - Pr (e, x, u)/t,x|u is a constant in terms of both t,x|u and t,x|u (and consequently, #x|u) since Pr (e, x, u) = Kxt,x|u and Pr (e, x, u) = Kxt,x|u, 3. If either of the previous parameters is zero, we can use the differential approach by Darwiche (2000) to compute the derivative directly. 269 Chan & Darwiche where Kx = Pr (u)Pr (e | x, u) and Kx = Pr (u)Pr (e | x, u) are constants in terms of both t,x|u and t,x|u. By substituting y, e and z, e for e in Theorem 2.1, we get: o""y,e = @Pr (y, e)@# x|u = Pr (y, e, x, u) t,x|u - Pr (y, e, x, u) t,x|u ; (2) o""z,e = @Pr (z, e)@# x|u = Pr (z, e, x, u) t,x|u - Pr (z, e, x, u) t,x|u . (3) Now, if we want to enforce the Difference constraint, Pr (y | e) - Pr (z | e) >= s^, itsuffices to ensure that Pr ( y, e) - Pr (z, e) >= s^Pr (e). Suppose that the previous constraintdoes not hold, and we wish to establish it by applying a change of s' to the meta parameter #x|u. Such a change leads to a change of o""es' in Pr (e). It also changes Pr (y, e) and Pr (z, e)by o""y,es' and o""z,es', respectively. Hence, to enforce the Difference constraint, we need tosolve for s' in the following inequality: [Pr (y, e) + o""y,es'] - [Pr (z, e) + o""z,es'] >= s^[Pr (e) + o""es']. Rearranging the terms, we get the following result. Corollary 2.1 To satisfy the Difference constraint, we need to change the meta param-eter #x|u by s', such that: Pr (y, e) - Pr (z, e) - s^Pr (e) >= s'[-o""y,e + o""z,e + s^o""e], where the o"" constants are defined by Equations 1, 2 and 3. We can similarly solve for parameter changes s' that enforce the Ratio constraint,Pr ( y | e)/Pr (z | e) >= s^, in the following inequality: [Pr (y, e) + o""y,es']/[Pr (z, e) + o""z,es'] >= s^. Rearranging the terms, we get the following result. Corollary 2.2 To satisfy the Ratio constraint, we need to change the meta parameter #x|uby s', such that: Pr (y, e) - s^Pr (z, e) >= s'[-o""y,e + s^o""z,e], where the o"" constants are defined by Equations 2 and 3. For both the Difference and Ratio constraints, the solution of s', if any, is always inone of two forms: * s' <= q, for some computed q < 0, in which case the new value of meta parameter #x|umust be in the interval [0 , p + q]. * s' >= q, for some computed q > 0, in which case the new value of meta parameter #x|umust be in the interval [ p + q, 1]. 270 When do Numbers Really Matter? Note that p is the current value of meta parameter #x|u (before the change). For manyparameters, these intervals are empty and, therefore, there is no way we can change these meta parameters to enforce the constraint. The question now is how to solve these inequalities, efficiently, and for all meta parame-ters. Note that there may be more than one possible parameter change that would enforce the given constraint, so we need to identify all such changes. With either Corollary 2.1or 2.2, we can easily solve for the amount of change needed, s', once we know the followingprobabilities: Pr ( e), Pr (y, e), Pr (z, e), Pr (e, x, u), Pr (e, x, u), Pr (y, e, x, u), Pr (y, e, x, u),Pr ( z, e, x, u), and Pr (z, e, x, u). This leads to the following complexity of our technique. Corollary 2.3 If we have an algorithm that can compute Pr (i, x, u), for a given instanti-ation i, and all family instantiations x, u of every variable X, in time O(f ), then we cansolve for Corollaries 2.1 and 2.2 for all parameters in time O(f ). We do this by runningthe algorithm three times, once with i = e, and then with i = y, e, and finally with i = z, e. Recall that the family of a variable X is the set containing X, and its parents U in thebelief network. The join-tree algorithm (Jensen, Lauritzen, & Olesen, 1990) and the differential ap-proach (Darwiche, 2000) can both compute Pr ( i, x, u), for a given instantiation i and allfamily instantiations x, u of every variable X in O(n exp w) time. Here, n is the number ofvariables in the belief network, and w is the width of a given elimination order. SamIamuses the differential approach, and thus its running time to identify all possible parameter changes in a network is also O(n exp w). Note that this is also the time needed to answerone of the simplest queries, that of computing the probability of evidence e. 2.2 Multi-Valued Variables Our results can be easily extended to multi-valued variables, as long as we assume a modelfor changing co-varying parameters when one of them changes (Darwiche, 2000; Kjaerulff & van der Gaag, 2000). After the parameter t,x|u changes, we need to use a scheme to changethe other parameters, t,xi|u for all xi 6= x, in order to ensure the sum-to-one constraint.The most common way to do this is to use the proportional scheme. In this scheme, we change the other parameters so that the ratios between them remain the same. Forexample, suppose we have three parameters t,x1|u = .6, t,x2|u = .3 and t,x3|u = .1. After t,x1|u changes to .8, the other two parameter values will be changed to t,x2|u = .3(.2/.4) = .15and t,x3|u = .1(.2/.4) = .05 accordingly. We now define the meta parameter #x|u such that itsimultaneously changes all parameters according to the proportional scheme. We can then obtain a linear relation between Pr (e) and #x|u, and the partial derivative is given by: @Pr (e) #x|u = Pr (e, x, u) t,x|u - P xi6=x Pr (e, xi, u)P xi6=x t,xi|u . This is very similar to the result in Theorem 2.1, in the way that we have grouped all thevalues xi 6= x into the value x. We can then use Corollaries 2.1 and 2.2 to solve for the Difference and Ratio constraints. We now present another example to illustrate how the results above are used in practice. 271 Chan & Darwiche Example 2.1 Consider again the network in Figure 3. Here, we set the evidence such thatwe have smoke, but no report of people evacuating the building, i.e. e = smoke, report. Wethen got the posteriors Pr (fire | e) = .25 and Pr (tampering | e) = .02. We thought in thiscase that the posterior on fire should be no less than .5 and asked SamIam to recommendthe necessary changes to enforce the constraint, Pr (fire | e) - Pr (fire | e) >= 0. There werefive recommendations in this case, three of which could be ruled out based on qualitative considerations: 1. increase the prior on fire to >= .03 (from .01); 2. increase the prior on tampering to >= .80 (from .02); 3. decrease Pr (smoke | fire) to <= .003 (from .01); 4. increase Pr (leaving | alarm) to >= .923 (from .001); 5. increase Pr (report | leaving) to >= .776 (from .01). Clearly, the only sensible change here is either to increase the prior on fire, or to decreasethe probability of having smoke without a fire. This example and other similar ones suggest that identifying such parameter changesand their magnitudes is inevitable for developing a faithful belief network, yet it is not trivial for experts to accomplish this task by visual inspection of the belief network, often due toits size and complexity. Sensitivity analysis tools such as SamIam can help facilitate thisby identifying important parameters that need to be fine-tuned in order to satisfy certain constraints. Of course, if we are given multiple constraints, we need to be cautious whenimplementing a recommendation made by SamIam due to one constraint, because this mayresult in violating other constraints. In this case, the parameter changes recommended by SamIam should be used to help experts in focusing their attention on the relevantparameters. Moreover, the previous examples illustrate the need to develop more analytic tools tounderstand and explain the sensitivity of queries to certain parameter changes. There is also a need to reconcile the sensitivities exhibited by our examples with previous experimen-tal studies demonstrating the robustness of probabilistic queries against small parameter changes in certain application areas, such as diagnosis (Pradhan et al., 1996). We addressthese particular questions in the next two sections. 3. The Sensitivity of Probabilistic Queries to Parameters Changes Our starting point in understanding the sensitivity of a query Pr (y | e) to changes in ameta parameter #x|u is to analyze the derivative @Pr (y | e)/@#x|u. In our analysis, weassume that X is binary, but Y and all other variables in the network can be multi-valued.The following theorem provides a simple bound on this derivative, in terms of Pr ( y | e)and Pr ( x | u) only. We then use this simple bound to study the effect of changes to metaparameters on probabilistic queries. 272 When do Numbers Really Matter? 0.2 0.4 0.6 0.8Pr(x| u) 0.2 0.4 0.6 0.8 Pr(y|e) 0 2 4 6 pd bound 0.2 0.4 0.6 0.8Pr(x| u) Figure 4: The plot of the upper bound on the partial derivative @Pr (y | e)/@#x|u, as givenin Theorem 3.1, against Pr ( x | u) and Pr (y | e). Theorem 3.1 If X is a binary variable in a belief network, then:4r'r'r'r' r' @Pr (y | e) @#x|u r'r'r'r'r' <= Pr (y | e)(1 - Pr (y | e)) Pr (x | u)(1 - Pr (x | u)) . The bound in Theorem 3.1 is tight, and we will show later an example for which thederivative assumes the above bound exactly. The main point to note about this bound is that it is independent of any given belief network.5 The plot of this bound against Pr (x | u) and Pr (y | e) is shown in Figure 4. A numberof observations are in order about this plot: * For extreme values of Pr (x | u), the bound approaches infinity, and thus a smallabsolute change in the meta parameter #x|u can have a big impact on the queryPr ( y | e). * On the other hand, the bound approaches 0 for extreme values of the query Pr (y | e).Therefore, a small absolute change in the meta parameter #x|u will have a small effecton the absolute change in the query. One of the implications of this result is that if we have a belief network where queriesof interest Pr ( y | e) have extreme values, then such queries will be robust against smallchanges in network parameters. This of course assumes that robustness is understood to 4. This theorem and all results that follow requires that #x|u 6= 0 and #x|u 6= 1, since we can only use the expression in Equation 2.1 under these conditions. 5. Note that we have an exact closed form for the derivative @Pr(y | e)/@#x|u (Darwiche, 2000; Greiner, Grove, & Schuurmans, 1997), but that form includes terms which are specific to the given belief network. 273 Chan & Darwiche X Y E Figure 5: The network used in Example 3.1. be a small change in the absolute value of the given query. Interestingly enough, if y is adisease which is diagnosed by finding e--that is, the probability Pr (y | e) is quite high--then it is not surprising that such queries would be robust against small perturbations to network parameters. This seems to explain some of the results by Pradhan et al. (1996),where robustness have been confirmed for queries with Pr ( y | e) >= .9.Another implication of the above result is that one has to be careful when changing parameters that are extreme. Such parameters are potentially very influential and onemust handle them with care. Therefore, the worst situation from a robustness viewpoint materializes if one has ex-treme parameters with non-extreme queries. In such a case, the queries can be very sensitive to small variations in the parameters. Example 3.1 Consider the network structure in Figure 5. We have two binary nodes, Xand Y with respective parameters t,x, t,x and t,y, t,y. We assume that E is a deterministicbinary node where the value of E is e iff X = Y . This dictates the following CPT for E:Pr ( e | x, y) = 1, Pr (e | x, y) = 1, Pr (e | x, y) = 0 and Pr (e | x, y) = 0. The conditionalprobability Pr ( y | e) can be expressed using the root parameters t,x and t,y as: Pr (y | e) = t,xt,yt, xt,y + t,xt,y . Since @t,x/@#x = 1 and @t,x/@#x = -1, the derivative of Pr (y | e) with respect to the metaparameter #x is given by: @Pr (y | e) @#x = (t,xt,y + t,xt,y)t,y - t,xt,y(t,y - t,y) (t,xt,y + t,xt,y)2 = t,yt,y(t, xt,y + t,xt,y)2 . This is equal to the upper bound given in Theorem 3.1: Pr (y | e)(1 - Pr (y | e)) Pr (x)(1 - Pr (x)) = (t,xt,y)(t,xt,y) t,xt,x(t,xt,y + t,xt,y)2 = t,yt,y(t, xt,y + t,xt,y)2 . Now, if we set t,x = t,y, the derivative becomes: @Pr (y | e) @#x = 1 4t,xt,x , 274 When do Numbers Really Matter? and as t,x (or t,x) approaches 0, the derivative approaches infinity. Finally, if we set t,x = t,y = s^, we have Pr (y | e) = .5, but if we keep t,y and t,y constant and change #x from s^ to0, we get the new result Pr ( y | e) = 0. Example 3.1 then illustrates three points. First, it shows that the bound in Theorem 3.1is tight, i.e. we can construct a belief network that assumes the bound. Second, it gives an example network for which the derivative @Pr (y | e)/@#x|u tends to infinity, and therefore wecannot bound the derivative by any constant. Finally, it shows that an infinitesimal absolute change in a meta parameter (changing #x from s^ to 0) can induce a non-infinitesimal absolutechange in some query (Pr ( y | e) changes from .5 to 0). The following theorem, however,shows that this is not possible if we consider a relative notion of change. Theorem 3.2 Assume that #x|u <= .5 without loss of generality.6 Suppose that Delta #x|u is aninfinitesimal change applied to the meta parameter #x|u, leading to a change of Delta Pr (y | e)to the query Pr ( y | e). We then have:r'r'r'r' Delta Pr (y | e) Pr (y | e) r'r'r'r' <= 2 r'r'r'r'r' Delta #x|u #x|u r'r'r'r'r' . For a function f (x), the quantity: lim(x-x 0)!0 (f (x) - f (x0))/f (x0) (x - x0)/x0 , is typically known as the sensitivity of f to x at x0. Therefore, Theorem 3.2 shows that thesensitivity of Pr ( y | e) to #x|u is bounded.As an example application of Theorem 3.2, consider Example 3.1 again. The change of #x from s^ to 0 amounts to a relative change | - s^/s^| = 1. The corresponding change ofPr ( y | e) from .5 to 0 amounts to a relative change of | - .5/.5| = 1. Hence, the relativechange in the query is not as great from this viewpoint. 7 The relative change in Pr (y | e) may be greater than double the relative change in #x|ufor non-infinitesimal changes because the derivative @Pr (y | e)/@#x|u depends on the valueof #x|u (Darwiche, 2000; Jensen, 1999). Going back to Example 3.1, if we set t,x = .5 and t,y = .01, we obtain the result Pr (y | e) = .01. If we now increase #x to .6, a relative changeof 20%, we get the new result Pr ( y | e) = 0.0149, a relative change of 49%, which is morethan double of the relative change in #x.The question now is: Suppose that we change a meta parameter #x|u by an arbitraryamount (not an infinitesimal amount), what can we say about the corresponding change in the query Pr (y | e)? We have the following result. Theorem 3.3 Let O(x | u) denote the odds of x given u: O(x | u) = Pr (x | u)/(1 - Pr (x | u)), and let O(y | e) denote the odds of y given e: O(y | e) = Pr (y | e)/(1 - Pr (y | e)).Let O0(x | u) and O0(y | e) denote these odds after having applied an arbitrary change to 6. For a binary variable X, if #x|u > .5, we can instead choose the meta parameter #x|u without loss of generality. 7. If we consider the meta parameter #x = 1 - s^ instead, the relative change in #x will then amount to s^/(1 - s^). But Theorem 3.2 will not be applicable in this case (assuming that s^ is close to 0) since the theorem requires that the chosen meta parameter be no greater than .5. 275 Chan & Darwiche the meta parameter #x|u where X is a binary variable in a belief network. If the change ispositive, then: O(x | u) O0(x | u) <= O0(y | e) O(y | e) <= O0(x | u) O(x | u) ; or if it is negative, then: O0(x | u) O(x | u) <= O0(y | e) O(y | e) <= O(x | u) O0(x | u) . Combining both results, we have: | ln(O0(y | e)) - ln(O(y | e))| <= | ln(O0(x | u)) - ln(O(x | u))|. Theorem 3.3 means that the relative change in the odds of y given e is bounded by therelative change in the odds of x given u, if X is a binary variable.8 Note that the resultmakes no assumptions whatsoever about the structure of the given belief network. To illustrate this theorem, we go back to Example 2.1. We intend to increase theposterior Pr (fire | e) from .25 to .5, for e = smoke, report . The log-odds change for thequery is thus Delta lo(Pr (y | e)) = | ln(O0(y | e)) - ln(O(y | e))| = 1.1. There were fiverecommendations made by SamIam and we can calculate the log-odds change, Delta lo(#x|u) =| ln( O0(x | u)) - ln(O(x | u))| for each parameter change: 1. increase the prior on fire to >= .03 (from .01): Delta lo(#x|u) = 1.1; 2. increase the prior on tampering to >= .80 (from .02): Delta lo(#x|u) = 5.3; 3. decrease Pr (smoke | fire) to <= .003 (from .01): Delta lo(#x|u) = 1.2; 4. increase Pr (leaving | alarm) to >= .923 (from .001): Delta lo(#x|u) = 9.4; 5. increase Pr (report | leaving) to >= .776 (from .01): Delta lo(#x|u) = 5.8. Therefore, we can see that all the recommended parameter changes satisfy Theorem 3.3,i.e. the log-odds change of the query is bounded by the log-odds change of the parameter. An interesting special case of Theorem 3.3 is when X is a root node and X = Y . Frombasic probability theory, we have: O(x | e) = O(x) Pr (e | x)Pr (e | x) . As the ratio Pr (e | x)/Pr (e | x) is independent of Pr (x), the ratio O(x | e)/O(x) is alsoindependent of this prior. Therefore, we can conclude that: O0(x | e) O(x | e) = O0(x) O(x) . (4) This means we can find the exact amount of change needed for a meta parameter #x inorder to induce a particular change on the query Pr ( x | e). There is no need to use themore expensive technique of Section 2 in this case. 8. We recently expanded our results to multi-valued variables, where we arbitrarily change parameters t,x|u to new values t,0x|u, for all values x. The resulting bound is: | ln(O0(y | e)) - ln(O(y | e))| <= ln(maxx t,0x|u/t,x|u) - ln(minx t,0x|u/t,x|u) (Chan & Darwiche, 2002). 276 When do Numbers Really Matter? Example 3.2 Consider the network in Figure 3. Suppose that e = report, smoke. Cur-rently, Pr (tampering) = .02 and Pr (tampering | e) = .50. We wish to increase the condi-tional probability to .65. We can compute the new prior probability Pr 0(tampering) usingEquation 4: .65/.35 .50/.50 = Pr 0(tampering)/(1 - Pr 0(tampering)) .02/.98 , giving us Pr 0(tampering) = .036, which is equal to the result we obtained using SamIamin Section 1. Both the changes to Pr (tampering) and Pr (tampering | e) bring a log-oddsdifference of .616. Theorem 3.3 has a number of implications. First, given a particular query Pr (y | e) anda meta parameter #x|u, it can be used to bound the effect that a change in #x|u will have onthe query Pr ( y | e). Going back to Example 3.2, we may wish to know what is the impacton other conditional probabilities if we apply the change making Pr 0(tampering) = .036.The log-odds changes for all conditional probabilities in the network will be bounded by .616. For example, currently Pr (fire | e) = .029. Using Theorem 3.3, we can find the rangeof the new conditional probability value Pr 0(fire | e):r'r'r'r' r'ln A~ Pr 0(fire | e) 1 - P r0(fire | e) ! - ln t, .029 .971 u""r'r'r'r'r' <= .616, giving us the range .016 <= Pr 0(fire | e) <= .053. The exact value of Pr 0(fire | e), obtainedby inference, is .021, which is within the computed bounds.Second, Theorem 3.3 can be used to efficiently approximate solutions to the Differenceand Ratio problems we discussed in Section 2. That is, given a desirable change in thevalue of query Pr ( y | e), we can use Theorem 3.3 to immediately compute a lower boundon the minimum change to meta parameter #x|u needed to induce the change. This methodcan be applied in constant time and can serve as a preliminary recommendation, as the method proposed in Section 2 is much more expensive computationally.Third, suppose that SamIam was used to recommend parameter changes that wouldinduce a desirable change on a given query. Suppose further that SamIam returned anumber of such changes, each of which is capable of inducing the necessary change. The question is: which one of these changes should we adopt? The main principle appliedin these situations is to adopt a "minimal" change. But what is minimal in this case? As Theorem 3.3 reveals, a notion of minimality which is based on the amount of absolute changecan be very misleading. Instead, it suggests that one adopts the change that minimizes the relative change in the odds, as other queries can be shown to be robust against such achange in a precise sense. For example, we are given two parameter changes, one from .1 to .15, and another from .4 to .45. Both these changes give us the same absolute change of .05. However, the firstchange has an log-odds change of .462, while the second one has an log-odds change of .205.Therefore, two parameter changes that give us the same absolute change can have different amounts of log-odds change.On the other hand, two parameter changes that give us the same relative change can also have different amounts of log-odds change. For example, we are given two parameterchanges, one from .1 to .2, and another from .2 to .4. Both these changes double the original 277 Chan & Darwiche parameter value. However, the first change has a log-odds change of .811, while the secondone has a log-odds change of .981. Finally, the result can be used to obtain a better intuitive understanding of parameterchanges that do or do not matter, a topic which we will discuss in the next section. 4. Changes that (Don't) Matter We now return to a central question: When do changes in network parameters matter andwhen do they not matter? As we mentioned earlier, there have been experimental studies investigating the robustness of belief networks against parameter changes (Pradhan et al.,1996). But we have also shown very simple and intuitive examples where networks can be very sensitive to small parameter changes. This calls for a better understanding of theeffect of parameter changes on queries, so one can intuitively sort out situations in which such changes do or do not matter. Our goal in this section is to further develop suchan understanding by looking more closely into some of the implications of Theorem 3.3. We start first by highlighting the difference between this theorem and previous results onsensitivity analysis. 4.1 Network-Specific Sensitivity Analysis One of the main differences between our results and other sensitivity analysis approachesis that we do not need to know the belief network, and hence, do not need to perform inference. To clarify this difference, we compare it with the sensitivity function approach(van der Gaag & Renooij, 2001), which computes the sensitivity function that relates a query, f (x), and a parameter, x, in the form: f (x) = a * x + bc * x + d , where a, b, c, d are constants that depend on the given network and are computed byperforming inference as suggested by van der Gaag and Renooij (2001). Going back to Example 2.1, we can express the query Pr (fire | smoke, report) as afunction of the parameter x = Pr (smoke | fire). The function is given by: f (x) = 0.0031650.9684 * x + 0.003165 , and we plot this function in Figure 6. We can see that at the current parameter value .01,the query value is .25, but if we decrease it to .003, the query value increases to .5, whichis one of the suggested parameter changes by SamIam. However, we can find a bound on the relations between the query and the parameterusing Theorem 3.3, without doing inference on the network (and without knowing the network). For example, by changing the current parameter value from .01 to .003, the newquery value will be within the bounds of .09 and .53. On the other hand, if we want thequery value to increase to .5, we have to at least decrease the parameter value from .01 to .003, or increase it to .03. 278 When do Numbers Really Matter? 0.2 0.4 0.6 0.8 1 Pr(smoke,fire ___) 0.1 0.2 0.3 0.4 0.5 Pr(fire|smoke,report___ ) 0.005 0.01 0.015 0.02 Pr(smoke,fire ___) 0.2 0.4 0.6 0.8 1 Pr(fire|smoke,report___ ) Figure 6: The plot of the query Pr (fire | smoke, report ) against the parameter Pr (smoke |fire). The second graph shows a magnification of the first graph for the region where Pr (smoke | fire) is between 0 and .02. 4.2 Assuring Query Robustness One of the important issues we have yet to settle is: "What does it mean for a parameterchange to not matter?" One can think of at least three definitions. First, the absolute change in the probability Pr (y | e) is small. Second, the relative change in the probabilityPr ( y | e) is small. Third, relative change in the odds O(y | e) is small. The first notion isthe one most prevalent in the literature, so we shall adopt it in the rest of this section. Suppose we have a belief network for a diagnostic application and suppose we are con-cerned about the robustness of the query Pr ( y | e) with respect to changes in networkparameters. In this application, y is a particular disease and e is a particular finding whichpredicts the disease, with Pr ( y | e) = .9. Let us define robustness in this case to be anabsolute change of no more than .05 to the given query. Now, let X be a binary variable inthe network and let us ask: What kind of changes to the parameters on X are guaranteedto keep the query within the desirable range? We can use Theorem 3.3 easily to answer this question. First, if we are changing a parameter by s', and if we want the value of the queryto remain <= .95, we must ensure that: | ln((p + s')/(1 - p - s')) - ln(p/(1 - p))| <= | ln(.95/.05) - ln(.9/.1)| = .7472, where p is the current value of the parameter. Similarly, if we want to ensure that the queryremains >= .85, we want to ensure that: | ln((p + s')/(1 - p - s')) - ln(p/(1 - p))| <= | ln(.85/.15) - ln(.9/.1)| = .4626. Figure 7 plots the permissible change s' as a function of p, the current value of theparameter. The main point to observe here is that the amount of permissible change depends on the current value of p, with smaller changes allowed for extreme values of p.It is also interesting to note that it is easier to guarantee the query to stay <= .95 than toguarantee that it stays >= .85. In general, it is more likely for a parameter change to reduce 279 Chan & Darwiche 0.2 0.4 0.6 0.8 1 p -0.15 -0.1 -0.05 0.05 0.1 0.15 d Figure 7: The amount of parameter change s' that would guarantee the query Pr (y | e) = .9to stay within the interval [ .85, .95], as a function of the current parameter value p. The outer envelope guarantees the query to remain <= .95, while the innerenvelope guarantees the query to remain >= .85. 0.2 0.4 0.6 0.8 1 p -0.04 -0.02 0.02 0.04 d Figure 8: The amount of parameter change s' that would guarantee the query Pr (y | e) = .6to stay within the interval [ .55, .65], as a function of the current parameter value p. The outer envelope guarantees the query to remain <= .65, while the innerenvelope guarantees the query to stay in >= .55. the value of a query which is close to 1 (and to increase the value of a query which is closeto 0). Finally, if we are increasing the parameter, then a parameter value close to .4 willallow the biggest absolute change. But if we are decreasing the parameter, then a value close to .6 will allow the biggest absolute change. Now let us repeat the same exercise but assuming that the initial value of the queryis Pr ( y | e) = .6, yet insisting on the same measure of robustness. Figure 8 plots the 280 When do Numbers Really Matter? 0.2 0.4 0.6 0.8Pr(x| u) 0.2 0.4 0.6 0.8 Pr'(x| u) 0 2 4 6 8 Delta lo 0.2 0.4 0.6 0.8Pr(x| u) Figure 9: The plot of the log-odd difference, Delta lo = | ln(O0(x | u)) - ln(O(x | u))|, againstPr ( x | u) and Pr 0(x | u). 0.2 0.4 0.6 0.8 1 p c' 1 2 3 4 5 6 Delta lo 0.2 0.4 0.6 0.8 1 p c' 1 2 3 4 Delta lo 0.2 0.4 0.6 0.8 1 p c' 1 2 3 4 5 6 Delta lo Figure 10: The plots of the log-odd difference, Delta lo = | ln(O0(x | u)) - ln(O(x | u))|, againstthe new parameter value p0 = Pr 0(x | u). The figures correspond to differentinitial values of the parameter, p = Pr (x | u) = .1, .5, .9, respectively. permissible changes s' as a function of p, the current value of the parameter. Again, theamount of permissible change becomes smaller as the probability p approaches 0 or 1. Theother main point to emphasize is that the permissible changes are now much smaller than in the previous example, since the initial value of the query is not as extreme. Therefore,this query is much less robust than the previous one. More generally, Figure 9 plots the log-odds difference, | ln(O0(x | u)) - ln(O(x | u))|,against Pr ( x | u) = p and Pr 0(x | u) = p + s', and Figure 10 shows cross-sections of Figure 9for three different values of p. Again, the plots explain analytically why we can afford moreabsolute changes to non-extreme probabilities (Pradhan et al., 1996; Poole, 1998). 281 Chan & Darwiche From Figure 10, we also notice that although the plot is symmetric for p = .5, it is notfor both p = .1 and p = .9, i.e. absolute changes of Delta p and -Delta p give us different amountsof log-odds change. For example, changing the parameter from .1 to .05 give us a largerlog-odds change than changing the parameter from .1 to .15. We also notice that the plotsfor p = .1 and p = .9 are mirror images of each other. Therefore, the log-odds change is thesame for complementary parameter changes on t,x|u and t,x|u. We close this section by emphasizing that the above figures identify parameter changesthat guarantee keeping queries within certain ranges. However, if the belief network has specific properties, such as a specific topology, then it is possible for the query to be robustagainst parameter changes that are outside the identified bounds. 5. Conclusion In this paper, we presented an efficient technique for fine-tuning the parameters of a beliefnetwork. The technique suggests minimal changes to network parameters which ensure that certain constraints are enforced on probabilistic queries. Based on this technique, wehave experimented with some belief networks, only to find out that these networks are more sensitive to parameter changes than previous experimental studies seem to suggest.This observation leads us to an analytic study on the effect of parameter changes, with the aim of characterizing situations under which parameter changes do or do not matter. Wehave reported on a number of results in this direction. Our central result shows that belief networks are robust in a very specific sense: the relative change in query odds is boundedby the relative change in the parameter odds. A closer look at this result, its meaning, and its implications provides interesting characterizations of parameter changes that door do not matter, and explains analytically some of the previous experimental results and observations on this matter. Acknowledgments A shorter version of this paper appeared in Proceedings of the 17th Conference on Uncer-tainty in Artificial Intelligence (UAI-01), pp. 65-74. This work has been partially supported by NSF grant IIS-9988543, MURI grant N00014-00-1-0617, and by DiMI grant 00-10065. Appendix A. Proofs Theorem 2.1 The derivative of Pr (e) with respect to the meta parameter #x|u is givenby: @Pr (e) @#x|u = Pr (e, x, u) t,x|u - Pr (e, x, u) t,x|u , when t,x|u 6= 0 and t,x|u 6= 0. 282 When do Numbers Really Matter? Proof From Russell et al. (1995), the semantics of the first derivative of Pr (e) with respectto parameter t,x|u is given by:9 @Pr (e) @t,x|u = Pr (e, x, u) t,x|u , if t,x|u 6= 0, and: @Pr (e) @t,x|u = Pr (e, x, u) t,x|u , if t,x|u 6= 0. Because t,x|u = #x|u and t,x|u = 1 - #x|u, we have: @Pr (e) @#x|u = @Pr (e) @t,x|u - @Pr (e) @t,x|u = Pr (e, x, u)t, x|u - Pr (e, x, u) t,x|u , if t,x|u 6= 0 and t,x|u 6= 0.2 Theorem 3.1 If X is a binary variable in a belief network, then:r'r'r'r'r' @Pr (y | e) @#x|u r'r'r'r'r' <= Pr (y | e)(1 - Pr (y | e)) Pr (x | u)(1 - Pr (x | u)) . Proof From Darwiche (2000), the derivative @Pr (y | e)/@t,x|u is equal to: @Pr (y | e) @t,x|u = Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e) Pr (x | u) . Since: @Pr (y | e) @#x|u = @Pr (y | e) @t,x|u - @Pr (y | e) @t,x|u , we have: @Pr (y | e) @#x|u = Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e)Pr (x | u) - Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e)Pr (x | u) = Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e) - Pr (x | u)(Pr (y, u | e) - Pr (y | e)Pr (u | e))Pr (x | u)(1 - Pr (x | u)) . In order to find an upper bound on the derivative, we would like to bound the termPr ( y, x, u | e)-Pr (y | e)Pr (x, u | e). Since, Pr (y, x, u, e) <= Pr (y, u, e) and Pr (y, x, u, e) <=Pr ( x, u, e), we have: Pr (y, x, u | e) - Pr (y | e)Pr (x, u | e) <= Pr (y, x, u | e) - Pr (y | e)Pr (y, x, u | e) = Pr (y, x, u | e)Pr (y | e)<= Pr (y, u | e)Pr (y | e). 9. We allow the notations @Pr(e)/@t,x|u and @Pr(e)/@t,x|u by assuming Pr(e) as functions of t,x|u and t,x|u, even though it is not allowed in belief networks to change only t,x|u or t,x|u. 283 Chan & Darwiche Therefore, the upper bound on the derivative is given by: @Pr (y | e) @#x|u <= Pr (y, u | e)Pr (y | e) - Pr (x | u)(Pr (y, u | e) - Pr (y | e)Pr (u | e)) Pr (x | u)(1 - Pr (x | u)) , which is equal to the following term: Pr (y | e)Pr (y, u | e) Pr (x | u) + Pr (y | e)Pr (y, u | e) 1 - Pr (x | u) = (1 - Pr (x | u))Pr (y | e)Pr (y, u | e) + Pr (x | u)Pr (y | e)Pr (y, u | e)Pr (x | u)(1 - Pr (x | u)) = Pr (y, u | e)Pr (y | e) - Pr (x | u)(P r(y, u | e) - Pr (y | e)Pr (u | e))Pr (x | u)(1 - Pr (x | u)) . Since Pr (y, u | e) <= Pr (y | e) and Pr (y, u | e) <= Pr (y | e), the upper bound on thederivative is given by: @Pr (y | e) @#x|u <= Pr (y | e)Pr (y, u | e) Pr (x | u) + Pr (y | e)Pr (y, u | e) 1 - Pr (x | u) <= Pr (y | e)Pr (y | e)Pr (x | u) + Pr (y | e)Pr (y | e)1 - Pr (x | u) = Pr (y | e)(1 - Pr (y | e))Pr (x | u)(1 - Pr (x | u)) . In order to find a lower bound on the derivative, we note that Pr (y | e) = 1 - Pr (y | e),and thus @Pr (y | e)/@#x|u = -@Pr (y | e)/@#x|u. Therefore, we can get our lower bound byfinding the upper bound on the derivative @Pr (y | e)/@#x|u and multiplying by -1: @Pr (y | e) @#x|u >= - Pr (y | e)(1 - Pr (y | e)) Pr (x | u)(1 - Pr (x | u)) = - Pr (y | e)(1 - Pr (y | e))Pr (x | u)(1 - Pr (x | u)) . Combining the upper bound and the lower bound, we have:r'r'r'r' r' @Pr (y | e) @#x|u r'r'r'r'r' <= Pr (y | e)(1 - Pr (y | e)) Pr (x | u)(1 - Pr (x | u)) .2 Theorem 3.2 Assume that #x|u <= .5 without loss of generality. Suppose that Delta #x|u is aninfinitesimal change applied to the meta parameter #x|u, leading to a change of Delta Pr (y | e)to the query Pr ( y | e). We then have:r'r'r'r' Delta Pr (y | e) Pr (y | e) r'r'r'r' <= 2 r'r'r'r'r' Delta #x|u #x|u r'r'r'r'r' . 284 When do Numbers Really Matter? Proof Because Delta #x|u is infinitesimal, from Theorem 3.1:r'r'r'r' r' Delta Pr (y | e) Delta #x|u r'r'r'r'r' ' r'r'r'r'r' @Pr (y | e) @#x|u r'r'r'r'r' <= Pr (y | e)(1 - Pr (y | e))Pr (x | u)(1 - Pr (x | u)) . Arranging the terms, we have:r'r'r'r' Delta Pr (y | e) Pr (y | e) r'r'r'r' <= 1 - Pr (y | e) 1 - Pr (x | u) r'r'r'r'r' Delta #x|u #x|u r'r'r'r'r' <= 1.5 r'r'r'r'r' Delta #x|u# x|u r'r'r'r'r' = 2 r'r'r'r'r' Delta #x|u# x|u r'r'r'r'r' , since Pr (x | u) = #x|u <= .5.2 Theorem 3.3 Let O(x | u) denote the odds of x given u: O(x | u) = Pr (x | u)/(1-Pr (x | u)), and let O(y | e) denote the odds of y given e: O(y | e) = Pr (y | e)/(1 - Pr (y | e)).Let O0(x | u) and O0(y | e) denote these odds after having applied an arbitrary change tothe meta parameter #x|u where X is a binary variable in a belief network. If the change ispositive, then: O(x | u) O0(x | u) <= O0(y | e) O(y | e) <= O0(x | u) O(x | u) ; or if it is negative, then: O0(x | u) O(x | u) <= O0(y | e) O(y | e) <= O(x | u) O0(x | u) . Combining both results, we have: | ln(O0(y | e)) - ln(O(y | e))| <= | ln(O0(x | u)) - ln(O(x | u))|. Proof We obtain this result by integrating the bound in Theorem 3.1. In particular, ifwe change #x|u to # 0x|u > #x|u, and consequently Pr (y | e) changes to Pr 0(y | e), we canseparate the variables in the upper bound on the derivative in Theorem 3.1, integrate over the intervals, and yield:Z Pr0(y|e) Pr(y|e) dPr (y | e) Pr (y | e)(1 - Pr (y | e)) <= Z #0x|u #x|u d#x|u #x|u(1 - #x|u) . This gives us the solution: ln(Pr 0(y | e)) - ln(Pr (y | e)) - ln(1 - Pr 0(y | e)) + ln(1 - Pr (y | e))<= ln(# 0x|u) - ln(#x|u) - ln(1 - # 0x|u) + ln(1 - #x|u), 285 Chan & Darwiche and after taking exponentials, we have: Pr 0(y | e)/(1 - Pr 0(y | e)) Pr (y | e)/(1 - Pr (y | e)) <= # 0x|u/(1 - # 0x|u) #x|u/(1 - #x|u) , which is equivalent to: O0(y | e) O(y | e) <= O0(x | u) O(x | u) . Similarly, we can separate the variables in the lower bound on the derivative in Theo-rem 3.1, integrate over the intervals, and yield:Z Pr0(y|e) Pr(y|e) dPr (y | e) Pr (y | e)(1 - Pr (y | e)) >= - Z #0x|u #x|u d#x|u #x|u(1 - #x|u) . This gives us the solution: ln(Pr 0(y | e)) - ln(Pr (y | e)) - ln(1 - Pr 0(y | e)) + ln(1 - Pr (y | e))>= - ln(# 0x|u) + ln(#x|u) + ln(1 - # 0x|u) - ln(1 - #x|u), and after taking exponentials, we have: Pr 0(y | e)/(1 - Pr 0(y | e)) Pr (y | e)/(1 - Pr (y | e)) >= #x|u/(1 - #x|u) # 0x|u/(1 - # 0x|u) , which is equivalent to: O0(y | e) O(y | e) >= O(x | u) O0(x | u) . Therefore, we have the following inequality if # 0x|u > #x|u: O(x | u) O0(x | u) <= O0(y | e) O(y | e) <= O0(x | u) O(x | u) . On the other hand, if we now change #x|u to # 0x|u < #x|u, we can instead integrate from # 0x|u to #x|u. The integrals will satisfy these two inequalities:Z Pr(y|e) Pr0(y|e) dPr (y | e) Pr (y | e)(1 - Pr (y | e)) <= Z #x|u #0x|u d#x|u #x|u(1 - #x|u) ;Z Pr(y|e) Pr0(y|e) dPr (y | e) Pr (y | e)(1 - Pr (y | e)) >= - Z #x|u #0x|u d#x|u #x|u(1 - #x|u) . We can solve for the inequalities similarly and get the result: O0(x | u) O(x | u) <= O0(y | e) O(y | e) <= O(x | u) O0(x | u) . Combining the results for both # 0x|u > #x|u and # 0x|u < #x|u, we have: | ln(O0(y | e)) - ln(O(y | e))| <= | ln(O0(x | u)) - ln(O(x | u))|.2 286 When do Numbers Really Matter? References Castillo, E., Guti'errez, J. M., & Hadi, A. S. (1997). Sensitivity analysis in discrete Bayesiannetworks. IEEE Transactions on Systems, Man, and Cybernetics, 27, 412-423. Chan, H., & Darwiche, A. (2002). A distance measure for bounding probabilistic beliefchange. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI), pp. 539-545. Coup'e, V. M. H., Peek, N., Ottenkamp, J., & Habbema, J. D. F. (1999). Using sensitiv-ity analysis for efficient quantification of a belief network. Artificial Intelligence in Medicine, 17, 223-247. Darwiche, A. (2000). A differential approach to inference in Bayesian networks. In Pro-ceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 123-132. Greiner, R., Grove, A., & Schuurmans, D. (1997). Learning Bayesian nets that performwell. In Proceedings of the 13th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 198-207. Jensen, F. V., Lauritzen, S., & Olesen, K. (1990). Bayesian updating in recursive graphicalmodels by local computation. Computational Statistics Quarterly, 4, 269-282. Jensen, F. V. (1999). Gradient descent training of bayesian networks. In Proceedings of theFifth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU), pp. 190-200. Jensen, F. V. (2001). Bayesian Networks and Decision Graphs. Springer-Verlag, Inc., NewYork. Kjaerulff, U., & van der Gaag, L. C. (2000). Making sensitivity analysis computationallyefficient. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 317-325. Laskey, K. B. (1995). Sensitivity analysis for probability assessments in Bayesian networks.IEEE Transactions on Systems, Man, and Cybernetics, 25, 901-909. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible In-ference. Morgan Kaufmann Publishers, Inc., San Mateo, California. Poole, D. (1998). Context-specific approximation in probabilistic inference. In Proceedingsof the 14th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 447-454. Pradhan, M., Henrion, M., Provan, G., Del Favero, B., & Huang, K. (1996). The sensitivityof belief networks to imprecise probabilities: an experimental investigation. Artificial Intelligence, 85, 363-397. Russell, S., Binder, J., Koller, D., & Kanazawa, K. (1995). Local learning in probabilisticnetworks with hidden variables. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1146-1152. van der Gaag, L. C., & Renooij, S. (2001). Analysing sensitivity data from probabilistic net-works. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 530-537. 287