page 1  (6 pages)
2to next section

Abstract

In this paper1 we describe a genetic algorithm capable of evolving large programs by exploiting two new genetic operators which construct and deconstruct parameterized subroutines. These subroutines protect useful partial solutions and help to solve the scaling problem for a class of genetic problem solving methods. We demonstrate that our algorithm acquires useful subroutines by evolving a modular program from ?scratch? to play and win at Tic-Tac-Toe against a flawed ?expert?. This work also serves to amplify our previous note (Pollack, 1991) that a phase transition is the principle behind ?induction? in dynamical cognitive models.

Introduction

While complex processes of cognition require some form of modularity, learning this modularity has been problematic. It is ignored by simple learning systems (which thus cannot learn complex processes) or built into the architectural ?bias? of more complex learning systems (thus begging the origin of such complexity). Thus, the issue of inducing modularity from a complex task in order to perform that task has not been addressed, although a few connectionists are beginning this research (Saunders et al., 1992; Jacobs & Jordan, 1991; Jacobs et al., 1991; Nowlan & Hinton, 1991).

In this paper we describe a genetic algorithm which is capable of evolving large programs by exploiting two new genetic operators which construct and deconstruct parameterized subroutines. These subroutines protect useful partial solutions and help to solve the scaling problem for a class of genetic problem solving methods. After a brief background on genetic algorithms, we show that our system is able to learn how to play and win at Tic-Tac-Toe from ?scratch? against an imperfect ?expert? player. We discuss the formation and tuning of the subroutines and the reasons why their acquisition

1. This research is supported by the Office of Navel Research under contract #N00014-89-J1200.

The Evolutionary Induction of Subroutines

Peter J. Angeline and Jordan B. Pollack Laboratory for Artificial Intelligence Research Computer and Information Science Department The Ohio State University
Columbus, Ohio 43210
pja@cis.ohio-state.edu
pollack@cis.ohio-state.edu

addresses the scaling problem within this framework. An analysis of the frequency of subroutine calls shows an exponential growth and decay of subroutine usage as they are induced or expelled from the language, leading us to name this phenomenon evolutionary induction, an amplification of our earlier principle of ?induction by phase transition? (Pollack, 1991).

Genetic Algorithms Background

The genetic algorithm (Holland, 1975; Goldberg 1989a) is a form of problem solving search analogous to natural selection, and is a surprisingly adept search method in even very large ill-formed problem spaces. A simple genetic algorithm typically operates by reproducing and altering a population of fixed-length binary strings. A fitness function interprets the strings as task solutions and scores their ability to solve the task. Novel strings are added to the population by a process akin to biological reproduction using a collection of genetic operators. One such operator, the crossover operator, takes two ?parent? strings selected for their fitness and returns a ?child? string which is a complementary collection of components from both parents. The point mutation operator alters the value of a single position of a single parent string to create offspring.

The schema theorem (Holland, 1975), often called the Fundamental Theorem of Genetic Algorithms, illustrates the power behind these search methods. Holland defines a schema to be a class of binary strings which share a collection of subsequences. We use the ?#? notation to indicate ?don?t care? positions in the schema, i.e. positions where a 1 or 0 don?t matter. For instance, the string ?100101? is a member of the schema ?10##01? as is ?100001?. The intuition behind schemata is that certain combinations of bits will have a larger contribution to the fitness for a particular string than others. The schema notation allows us to talk about such desirable organizations concisely. A schema?s defining length is the number of positions at which if we divided the schema into two parts, some of the defined positions (i.e. ones not ?#?) would be separated. For instance, dividing the schema ?#1.0.0##? at any position marked with a ?.? will