| ![]() | |||||||||
In WCNN'93 World Congress on Neural Networks, vol. III, pp. 376-380, 1993.
GS: A Network that Learns Important Features
Cory Barker and Tony Martinez
Computer Science Department, Brigham Young University, Provo, Utah 84602
Abstract
GS is a network for supervised inductive learning from examples that uses ideas from neural networks and symbolic inductive learning to gain benefits of both methods. The network is built of many simple nodes that learn important features in the input space and then monitor the ability of the features to predict output values. The network avoids the exponential nature of the number of features by using information gained by general features to guide the creation of more specific features. Empirical evaluation of the model on real world data has shown that the network provides good generalization performance. Convergence is accomplished within a small number of training passes. The network provides these benefits while automatically allocating and deleting nodes and without requiring user adjustment of any parameters. The network learns incrementally and operates in a parallel fashion.
1. Introduction
This paper describes a network architecture for supervised learning that combines techniques used in neural networks (Ackley, Hinton, & Sejnowski, 1985; Rosenblatt, 1958; Rumelhart, Hinton, & Williams, 1986) with symbolic machine learning (Michalski, 1983; Mitchell, 1982, Quinlan, 1986) to gain advantages of both approaches. In supervised learning the network is given a training set containing examples. Each example gives an input pattern along with the corresponding output that the network should produce when presented with the input. The job of the network is not only to converge to a representation that contains the information given by the training set, but to generalize that information so that the network will respond well to inputs that it has not been trained on.
One approach to generalization is to look for important features in the input space. A feature is some subset of network inputs along with their associated values. A feature is matched when the values on the network inputs that are part of the feature are equal to the values for those inputs as given in the feature. Inputs that are not part of the feature can be any value. A feature that predicts an output with high probability is an important feature. The number of inputs contained in a feature is the order of the feature and determines the generality of the feature. A feature with few inputs is a general feature, while a feature with many inputs is a specific feature. It is impractical to monitor all possible input features because the number of features is exponential in the number of inputs.
This paper proposes GS (General to Specific), a network that monitors general input features and then specializes those features by combining the best general features. This section presents an overview of the model while later sections provide detail about the system. The network is made up of many simple nodes. Each node contains the input feature that it monitors. The node gathers statistics during training giving the discrete conditional probability of each possible output value given the input feature.
We use a simplified version of the problem of deciding when to use manual or automatic flight controls when landing an aircraft to illustrate learning in one version of the model. Two examples will be presented to the network shown below which has already processed several examples.
Node 1
{(Visibility, Clear)}
Manual .50Automatic .50
Node 2
{(Wind Direction, Head)}
Manual .50Automatic .50
Node 3
{(Wind Speed, Strong)}
Manual .33Automatic .67
The problem has three inputs (Visibility, Wind Direction, and Wind Speed) and one output that can be set to either Manual or Automatic. Node 1 monitors the input Visibility and matches when the value is Clear. Node 1 predicts both outputs with equal probability. Node 3 matches the feature {(Wind Speed, Strong)} which has a higher probability for output Automatic.
Example 1 (Visibility, Clear) (Wind Direction, Tail) (Wind Speed, Strong) fi Automatic
When a new example is presented all possible first-order features in the example are always created. The example above contains the feature {(Wind Direction, Tail)} that does not have an existing node so a new node is created. The network nodes compete according to probability to assert their most probable output value. Node 3 wins the competition and outputs Automatic. The output agrees with the example so no additional nodes are created. The nodes that match the input example update their probabilities giving an updated network.
Node 1
{(Visibility, Clear)}
Manual .33Automatic .67
Node 2
{(Wind Direction, Head)}
Manual .50Automatic .50
Node 3
{(Wind Speed, Strong)}
Manual .25Automatic .75
Node 4
{(Wind Direction, Tail)}
Manual 0 Automatic 1Example 2 (Visibility, Clear) (Wind Direction, Head) (Wind Speed, Light) fi Manual