Genetic Algorithms
Evolution has created a wealth of species that are well adapted to their environments. Genetic algorithms (GAs) are artificial procedures seek to achieve similar success in a wide variety of optimization problems--design problems where the best possible solution is sought--by mimicking the principles behind biological evolution.
A GA searches a potentially huge space of solutions by maintaining a population of "individuals," that is, candidate solutions to a particular optimization problem. A crucial ingredient in any GA algorithm is the fitness function it uses to rate individuals' success. The optimization goal is to find the individual with the maximum possible "fitness"; this is accomplished by taking some fraction of the most fit individuals in the current population and discarding the rest. The remainder of relatively fit individuals is then modified to create a new generation by mutation and crossover. Mutation modifies a single individual randomly. Crossover involves the creation of offspring by combining two parent individuals. In crossover, data from each individual are combined to form the offspring. This overall "survival of the fittest" process is iterated from generation to generation. Over time, just as in biological evolution, the GA generates individuals that are optimal solutions to the problem at hand.
Consider the question of optimizing some function of n variables: f(x1, x2 . . . xn). (Here "f( )" means some function of whatever is inside the parentheses--not defined explicitly here--and the x's are the n inputs of the function.) We wish to find the global maximum of the function f( )--that is, the largest single value that the function takes on over all possible values of its input variables. In particular, we wish to know what the values of the n variables are which produce this maximum value. Assuming that the function's n inputs are represented as 32-bit floating-point numbers inside a digital computer, we can represent an "individual" in the solution space by a string of 32n bits. In keeping with the analogy between GAs and biology, the data string defining an individual is often also called a chromosome, since the operations of mutation and crossover act directly on this data. (In living systems, mutation and crossover occur only at the chromosomal level.) In this example, a mutation corresponds to flipping a randomly selected bit in each individual with some low probability. A crossover is implemented by picking a random number k between 1 and 32n and concatenating k bits from one parent with 32(n - k) bits from another parent to create a 32-bit offspring. If no better method is available, an initial population can be selected simply by generating a number of random individuals.
Say that a first generation has been created somehow, at random if necessary. The "fitness" of each individual is then measured against the fitness function appropriate for the problem. In this case, fitness corresponds simply to magnitude, since we want to know where the maximum value of f( ) occurs. Applying the fitness function might therefore consist simply of keeping that 10% of individuals which produce the largest values of f( ). After discarding most of the individuals from the first generation according to the fitness function, one (a) subjects the survivors to mutation and crossover, producing a generation of altered offspring; (b) re-evaluates f( ) for all the offspring; and (c) culls the offspring, as the parents had been culled, by applying the fitness function again.
As this process is repeated over time the population will—unless f( ) is ill-behaved--become more and more fit. Eventually one individual should come close to the global maximum of f( ). Convergence on the optimum solution stalls out when changes in new generations cease to produce improvements in fitness. The GA must therefore have a stopping criterion so that it knows when to give up.
This example illustrates the reason why GAs can be more successful than conventional approaches to some optimization problems. A typical conventional approach to the trial-and-error solution of an optimization problem involves gradient ascent or hill-climbing, in which one starts with a reasonable guess for a solution and then incrementally improves it by modifying it slightly so as to maximize its fitness at each step. One may do this by examining all solutions in a neighborhood of the solution and then picking the best one as the next guess. The main problem with hill-climbing is that it may only find a local maximum in the vicinity of the initial guess, a neighborhood hillock; the global maximum (the biggest maximum anywhere) may be far away, and still undiscovered. GAs often work better because, first, with a GA it is easy to assure that a large population is searching the landscape broadly to begin with. Second, the process of genetic change (mutation and crossover) allows solutions to jump all over the solution space randomly, thereby avoiding the local-maximum trap that bedevils the hill-climbing approach. The end result, if all goes well, is a population of solutions crowding around the highest peak of the fitness landscape.
We now illustrate the use of GAs on a less trivial and more important problem, namely the traveling salesman problem (TSP), first discussed by the mathematician Euler in 1759. The challenge is to visit a large number of cities while minimizing total distance traveled. Each city must be visited exactly once and at the end of the journey the salesman must return to his starting point. Many problems in engineering can be formulated as a TSP, including circuit-board layout and very-large-scale integrated circuit fabrication, where the number of "cities" (points to be connected by wiring) can number in the hundreds of thousands. The space of solutions for the TSP is the set of permutations of the n cities and is n! in size. (The factorial of n, written as n!, is given by n! = 1 x 2 x 3 x . . . [n - 2] x [n - 1] x n, and grows very rapidly with n.) The solution space is intractably large and cannot be searched methodically. A GA, however, is well-suited to the task. In this problem a chromosome (individual) can be represented by an ordered sequence of integers that label the cities to be visited. Furthermore, each integer only occurs once. A mutation may be introduced by simply flipping the order of two adjacent cities in the chromosome. Crossover can be more complicated; any crossover function must take two chromosomes and create a third that bears some structural similarity to its parents. One possibility is to choose a sub-sequence of cities from one parent and choose the remaining cities from the other parent, preserving their order. Simulations of this type have shown that for n = 100 cities, after about 20,000 generations sequences emerge which are only about 9.4% longer than optimum. This is better than random search, but still leaves room for improvement.
Another interesting application of GAs is strategy-seeking in games and similar scenarios, for example in poker, maze solving, and the prisoner's dilemma. In each such task the chromosome encodes a strategy relevant to the particular game and appropriate mutation and crossover operations are defined on the chromosome. Usually the fitness function is the average score received by the chromosome when its strategy is used to play the game several times. Often GAs come up with quite interesting and non-obvious strategies to win games.
There are several variants of GAs including genetic programming, which is simply the application of GAs to the creation of computer programs. GAs have also been applied successfully in production systems for artificial intelligence and in training neural networks. In setting up all these methods, the key issues are how to represent the chromosome, or solution to the given problem, in a manner suitable to the application of mutation and crossover operations and an appropriate fitness function. Another subject of much current research in GAs is how to make the basic framework more realistic from a biological point of view, in the hopes of making it more efficient. The application of GAs to specific problems and the setting of specific parameters controlling the evolutionary process remains to this day an art rather than an exact science and there is much room for progress.
This is the complete article, containing 1,360 words
(approx. 5 pages at 300 words per page).