Rosenbloom 1993). For example, PRODIGY/EBL(Minton 1993) and Soar (Laird, Newell, & Rosenbloom 1987; Rosenbloom et al. 1991) ? two problem solvers that learn rules by variants of EBL ? ignore many of the searchcontrol rules during learning in order to increase the generality of the learned rules. However, the consequence of this omission is that the learned rules are not constrained by the path actually taken in the problem space, and thus can perform an exponential amount of search even when the original problem-solving search was highly directed (by the control rules). This problem was solved in (Kim & Rosenbloom 1993) by extending the explanation to include the search-control rules used during problem solving, thus creating more appropriately constrained rules.
In this article, we focus on a second difference ? when the structure of the problem solving differs from the structure of the match process for the learned rules, time after learning can be greater than time before learning. During problem solving, the rules that fire tend to form a hierarchical structure in which the early rules provide information upon which the firing of later rules depends. This hierarchical structure is reflected in EBL most obviously in the structure of the explanation (and the more general explanation structure). However, if this hierarchical structure is then flattened into a linear sequence of conditions for use in matching the rule that is learned ? as must be done in creating Ops-like rules or Prolog clauses ? the time after learning can be greater than the time before learning. If instead, the learning mechanism is made sensitive to the problem-solving structure ? i.e., by reflecting such hierarchical structure in the match of the learned rule ? this source of expensiveness can be avoided.
The focus of the analysis in this paper is Soar/EBL (Kim & Rosenbloom 1995). Although our prior work is based on chunking(Laird, Rosenbloom, & Newell 1985) in Soar, we analyze an implementation of EBL in Soar here to be able to more easily generalize the resulting analysis to other EBL systems. Soar/EBL is a little different from the standard version of Soar with chunking. Soar/EBL creates an explanation structure (i.e., it replaces rule instantiations with rules) and employs regression in deciding which variables should be included in the learned rule, while chunking creates a new rule by directly variablizing one particular class of symbols in the explanation (i.e., in the rule instantiations).
In Soar, productions comprise the domain theory for Soar/EBL. Each production consists of a set of conditions and a set of actions. Conditionstest workingmemory for the presence or absence of patterns of tuples, where each tuple consists of an object identifier, an attribute and a value. Actions create preferences, each of which specifies the relative or absolute worth of a value for an attribute of a given object. Productions in Soar propose changes to working memory through these preferences, and do not actually make the changes themselves. Changes to working memory are made based on a synthesis of the preferences (by a fixed decision procedure). The cycle of production firings, creation
Figure 1: An example of Soar/EBL process.
of preferences, and creation of working memory elements (WMEs) underlies the problem solving. In the remainder of this article, when we talk about the cost of problem solving, we will be referring to the match cost of the rules that fired plus the cost of making decisions.4
To create rules, Soar maintains an instantiated trace of the rules. The set of instantiations connected to the goal achievement becomes the proof tree (or explanation) for Soar/EBL. The instantiations in the explanation are replaced by rules which have unique names for the variables across the rules. This new structure is called the explanation structure. A regression algorithm (our algorithm is inspired by the EGGS generalization algorithm (Mooney & Bennett 1986)) is applied to this explanation structure. A set of substitutions is computed by unifying each connected actioncondition pair, and the substitutions are then applied to the variables in the explanation structure. The operational conditions become the conditions of the new rule. The action of the rule is the generalization of the goal concept. An example of Soar/EBL is shown schematically in Figure 1. The two striped vertical bars mark the beginning and the end of the problem solving. T1 ? T4 are traces of the rule firings. For example, T1 records a rule firing which examined WMEs A and B and generated a preference suggesting WME G. The highlighted rule traces are those included in the explanation; T2, T3, and T4 have participated in the result creation. This explanation is generalized by regression, and a new rule is created.
The match algorithm is critical in computing both the cost of problem solving and the cost of matching learned rules. Soar employs Rete as the match algorithm. When a new rule is created, it is compiled into a Rete network. Rete is one of the most efficient rule-match algorithms presently known. Its efficiency stems primarily from two key optimizations: sharing and state saving. Sharing of common conditions in a production, or across a set of productions, reduces the number of tests performed during match. State saving
4The cost of a problem solving episode also actually includes the costs of firing rules (i.e., executing actions). However, we will not explicitly focus on this factor here because it drops out in the learning process.