page 2  (12 pages)
to previous section1
3to next section

RF #97RM-102: RF2generally be semi-Markov with non-constantfailure rates. Because Markov chain models canbe huge and difficult to specify, MCI-HARP usesthe fault tree notation instead. Moreover, thefault tree notation includes order dependent gatesto increase the modeling flexibility and power.This notation is completely compatible withHARP; so for fault tree models that HARP cansolve, MCI-HARP can also solve them. Thedualism of solution techniques can be useful forverification of the model solution. The Monte Carlo simulation offers one otherimportant feature necessary for modeling today?scomplex systems that often involve orderdependent failures. Because the techniquesearches through the fault tree to determine thesystem state, it is unnecessary to store the systemtransition matrix in memory. The number ofelements in the transition matrix is given by 22Nfor N system components; and even with sparsematrix techniques, analytical solutions quicklybecome limited by computer memory resourcesfor modest to large system models. The HARPprogram has this limitation which it attempts tominimize by using a number of approximationtechniques including truncation-bounding andbehavioral decomposition. MCI-HARP has made it practical to modelsome complex systems and produce results whichat first appear to be counter intuitive. WhenPresident Bush proposed that NASA shoulddirect its attention toward a manned mission toMars, some of us at NASA?s Langley ResearchCenter explored the feasibility of state-of-the-art(SOA) fault-tolerant guidance, navigation, andcontrol (GN&C) systems being reliable enoughfor such a mission. Using SOA constant failurerate data, we concluded that the reliability of theGN&C would be too low to justify such amission. Although, I was aware at the time of themounting data to support the use of Weibulldecreasing failure rates in spacecraft systems, Ihad no practical way to model such systems withcold or warm spares. Intuitively I believed thatdecreasing failure rates and warm spares wouldincrease the predicted system reliability, butwhether or not enough reliability gain could beattained was beyond my computational reach.The HARP program is capable of correctlymodeling Weibull decreasing failure rates with orwithout hot spares but not with warm or coldspares. By the end of 1990, a prototype Monte CarloHARP was developed by researchers atNorthwestern University under the leadership ofE.E. Lewis. The idea of structuring thesimulation based on the Markov chain was firstpresented to me by Robert Geist at ClemsonUniversity. Under grant to NASA Langley,Northwestern implemented this concept using theHARP program?s fault/error-handling models.Mark Boyd (a HARP codeveloper now atNASA?s Ames Research Center) furtherintegrated the Monte Carlo simulator withHARP?s fault tree notation, and I laterreengineered the entire program to be consistentwith HARP (now called Monte Carlo integratedHARP, i.e., MCI-HARP). With a workingprogram on hand, Boyd and I used MCI-HARPto explore the effects of decreasing failure rateswith warm and cold spares on a Jet PropulsionLaboratory?s proposedGN&C system, a 3-dimensional hypercube fault-tolerant system.The details of the study can be found in the 1993RAMS proceedings; however, some of the resultsare worth mentioning here. Although the study examined the systemreliability for missions times of 1 to 10 years, 10years was considered the target mission time. Anacceptable GN&C unreliability for a 10 yearmission was specified to be less than 50%, i.e., areliability greater than or equal to 50%. Whenall components were assigned constant failuresrates (CFR) and all spares were hot, the systemunreliability was computed to be 63%,confirming our previous studies which alsopredicted an unacceptable unreliability. Usingdecreasing failure rates (DFR) with hot sparesand a conservative shape value of 0.5, a threeorders of magnitude improvement was computedwith an unreliability of 0.078%, clearlydemonstrating the beneficial effect of DFRs.That?s not to say that such an improvement canactually be obtained. The important point to notehere is the potential for reliability gain whenusing DFRs. Actual gains will depend on theaccuracy of the DFR data. In this study, weassumed the initial instantaneous failure rate wasequal to a component?s CFR; thus theinstantaneous failure rate of the DFR will alwaysbe lower than the CFR after the initial missiontime. Whether or not this assumption isreasonable, is yet to be determined. When the spares are allowed to be cold, theCFR model produced an unreliability of 57%,about a 10% improvement over the hot CFRspare model, an expected trend but stillunacceptable. This outcome is expected because,hot spares can fail before they can be used toreplace failed operational components; where as