Some time ago, I began work on a program to make movies. An editing program? Well, no. In fact, it was program that was asked to make a presentation when given a topic.
For instance, it might be given the name of city or a diet plan. The program would then search the internet for pictures to put into a small movie or presentation. The program thought about the kind of presentation it needed to make for the topic, e.g. a book, flowing with music, or other things. It did a pretty good job making very simple presentations in its infancy.
And, what was its secret? Well it organized media into groups, each targeted for some place in a presentation. So, for example, several pictures could be made into candidates for a place in the presentation. And, more candidates might have been created as well by changing the existing candidates. The media could also be presentations as well — presentations within presentations.
The program made various versions of the presentation (not too many), and eventually figured out which presentation and media might be best for the topic. Further info can be found in a patent by Schoenbach and Leddy 2010.
The secret was about those candidates. The program gathered together candidates, a small population, and made changes to the candidates to get more on each cycle of the program.
Some might call the changes mutations. Some might call the cycles of the program the time of a new generation. And, you might recognize some of this lingo as belonging to discussions about evolution of species.
Perhaps it’s a blast from your biological past like something from a high school biology course. We recall from high school biology that species evolve to fit their environment. The best fit are those who can sustain on the resources and stay around long enough to produce offspring. Environments can include a number of features such as the kind of food and water available, the exposure to the sun, the amount of air or the kind of air, etc.
Environments in programs might be numerical constraints or how close a point is to a surface. In the movie making program, media had to fit with topics addressed, had to have color matching, or had to be cropped or resized with attention to the aspect ratio.
Discovering What Evolutionary Computation Is
It turns out that the program that I wrote was an instance of evolutionary computation. I thought of the program as a sort of Darwin-inspired creation, but I did not know how much work in evolutionary computation had been done already. And, while it seems that evolutionary computation is not often the most talked about topic, it is in fact a thriving area of artificial intelligence. A well written account of it is by Fogel 2006.
It may have been a lucky choice for me to choose a form of evolutionary computation for the program. Evolutionary computation, or EC, has desirable computational properties that can make it fairly easy to choose measurements of fitness such that programs converge to a solution.
There is one theorem from Hsu and Robbins that says that if the EC program keeps combining population members to make new ones (this is a process called mating), then eventually it will converge to some solution, and in particular, if the stuff for the most fit element is in the population, it will converge to the most fit element. This was the case in light of evolution in Fogel 2006.
To my luck, the most fit media element (or some version of it) was likely returned by the internet search, so the program converged rather quickly to a final presentation.
The "Other" AI Camp
One of the most talked about subjects within AI these days is machine learning, or ML. Neural Networks are also getting a lot of press. Neural Networks may have been explored since the early 1940’s by McCulloch and Pitts and made big steps in the 1980s, e.g. McClelland and Rumelhart, but people got excited about them as soon as they started getting good at recognizing handwriting Revow, Williams, and Hinton 1994. There have been a number of advances in the development of neural networks, or NN, with recursive and convolutions NN’s and LSTM’s giving rise to a host of new practical applications.
Also, with regard to machine learning, there have been advances in practical statistical techniques. What you can do with a radial basis kernel in a vector state machine is pretty amazing.
All the while, evolutionary computation has been advancing as well — it can be used for machine learning. It might also work in a more process-oriented manner, as a form of search. As you get deeper into EC, it starts looking pretty versatile for a fairly simple algorithmic structure. The first work on it started in the 1950's and 1960's, and starting in the 1990's, it started to gain traction.
Family of Algorithms for Global Optimization
At the high level, evolutionary computation has a pretty comprehensible structure. The programs put together a population and pick some subset of those to work on. The programs mutate and crossbreed them. Then the programs select the next generation by picking those that cross some threshold of fitness. Fitness may be measured in a number of ways — there is a lot of liberty in picking it within applications.
A simple version of evolutionary algorithms may be used to study their properties of convergence. For such algorithms, it helps to define a payoff function, which provides a real number value for fitness with parameters taken from some parts of a binary representation of a candidate. The chosen parts of the representation have to be of constant type during the run of the algorithm. The values have to be changing, but the fields of the data structure get picked once. In ML lingo, one might say that the set of features remain fixed.
Next, with these fitness values taken from all the candidates of a generation, the algorithm makes a random variable selection on an interval [0,S], were S is the sum of fitness measures. The random variable, \\(\theta\\)
, is a threshold for deciding the parents of the next generation. Fitness values are summed from random subsets of candidates and the candidate that takes the sum over \\(\theta\\)
gets selected. This method of selection is referred to as Roulette Wheel Selection. As you can see, each new \\(\theta\\)
will be greater than preceding ones and S grows as well.
An Example
Imagine a slot in the movie program that has gotten a lot of pictures assigned to it (perhaps from searching the internet). A geometric slot will appear on the screen and will have a height and width. The program could reshape the pictures to fit, shrinking or expanding.
Suppose the program wants to preserve aspect ratios as much as possible. The program might go through and see how much each picture would change if the width of the picture fit, and measure how far off the height would be from the final slot height. So, S could be the sum of all the slot height differences.
The program would then roll the dice to pick a threshold that candidates would have to pass to be in the next round. Then the program would randomly try throwing in candidates until the sum of heights go over the threshold. (Here, the threshold crossing would have more to do with exclusion than inclusion.) Then, the next generation would all be closer to fitting the slot without deforming. Features other than height might be used, for instance, images might be scaled but would lose resolution, so the degree to which objects could be resolved might be figured into the loss function for the slot.
While the progress towards greater fitness scores seems to drive the computation towards optimization, the real progress is made by running through a set of stochastic state transitions which can be guaranteed to converge on the fittest candidate, which can be viewed as an absorbing stochastic state.
One thing that makes evolutionary computation really worth looking into is that evolutionary computation algorithms can avoid getting stuck in local minima. In fact, there are theorems that say that EC programs will eventually get to the global optimum. These theorems make their arguments by examining certain Markov Processes. We can take a look at some of that math, but any kind program that can get to a global optimum is a kind of program worth looking into.
These programs can also converge quickly. It depends on how many offspring are made in each generation. The number of offspring can be controlled, so the rate of convergence should be controllable to some extent.
Biomimicry
What is Biomimicry?
Biomimicry is an approach to design inspired by living things. Many people point to airplane wings and start talking about how the shape of birds’ wings led people like the Wright brothers to understand lift gained from gas flow.
Artificial intelligence attempts to figure out how to make an intelligent machine. AI may be pursued without reference to the biological, but it has always been closely tied to cognitive sciences. The study of the mind, processes of thought, the brain, consciousness, goal seeking, etc. are all given attention within cognitive science. One part of cognitive science has always been the exercise of creating programs that carry out the processes of thinking subjects with some watchfulness given to human memory limits, symbology of task domains, and problem solving strategies.
Evolutionary Computation and Neural Networks
Neural networks are simulations of brain cells that learn patterns from large amounts of data. Their learning algorithms are derived from mathematical analysis. The resulting computational framework does not wander far from classical numerical techniques. But, the emphasis is on the creation of connection weights between neurons with neural nodes being assigned particular activation functions.
Evolution is a biological processes. The computational models built from it are focused on selecting points that optimize a function. So, the common theme of function optimization continues within EC, but it works differently from neural networks. Evolutionary computation is not about connection weights between entities in a collection.
A final result of EC might be a population candidate representing connections, but that would be one application. Evolutionary computation, or more correctly, evolutionary design, might result in an 3D printed object or in a picture. The movie-making program made movies by building up a graph of the presentation. Similarly, evolutionary programming makes state machines, which are graphs. The focus for evolutionary computation is on the optimal set of points for a structure, or maybe on a single point for global optimization.
If we think of the Newton-Raphson method of root location, we may see a similarity. But, with the Newton-Raphson method, one point leads to another point at each step of the algorithm. In the evolutionary strategy, a lot of points are tried, and the most promising ones spawn a group of points not far from them to do more exploring. It seems natural to think that the points near the tops of different hills would be explored more by allowing for more offspring from those. And, it seems natural that the highest hill will eventually have the most successful offspring.
Evolutionary computation might be just a random casting of the point net, but processes are borrowed from the biology of genotypes and phenotypes. Two points from a thriving population can be used to make offspring by selection parts from both parents at random. This process of knitting children together from parts of parents is referred to as crossover. Crossover leads to a convergence to a local optima. In another process, one parent or two may make children and those children can be changed randomly. This process of random change is referred to as mutation. Mutation may provide enough randomness to move candidates out of local minima.
Self-Organization
Evolving populations may or may not be self organizing. So, evolutionary computation is not just about how individual candidates act or express traits.
Self-organizing populations contain individuals that act independently but have some information about other entities in their population in order to influence how they act. We can think of self-organizing entities moving along trajectories, with time dependent change. We could think of the change as a mutation and the movement of an individual as the replacement of the individual. For self-organization, it helps to think of the individuals as being atomic with some sort of motion, which might be anything such as color change, loudness, size, velocity change, and so on. After some iterations, the population may become fixed in a particular configuration.
For EC, there are extrinsic parameters that influence the replacement of one generation with the next so that a certain set of traits may remain constant. If a set of candidates are round and on a hill that has food at the top and no food at the bottom, successive generation may become more square or grow roots. The offspring that deforms from the round shape will be the ones that don’t starve. But, the shape change may occur even if there is no information shared between members of the population. Mating may only speed the rate of change relative to a rate yielded by pure mutation.
Evolutionary computation provides a lot to think about. So, in the rest of this discussion, I would like to get a little more formulaic and provide some brief points about the subject.
Common Core: Meta-heuristics and Stochastic Optimization
There is a manner of parlance when dealing with evolutionary computation, and some of it helps differentiate EC from other forms of search.
Evolutionary
Evolutionary computation is a family of algorithms that is meta-heuristic. This means that it is a strategy for structuring optimization search. When considering EC at the meta-heuristic level, people point out that it is population oriented, as opposed to individually oriented. EC is also nature based and a global search, as opposed to local. It may be parallelized.
Stochastic Optimization Definition
EC is a family of optimization algorithms. In particular, it selects its next generation of candidates by doing some form of random selection, using techniques like the Roulette Wheel Selection process.
It picks parts of the candidate representation to cross over or mutate by selecting locations (or fields) at random.
The randomness employed serves more to avoid local optima than to overcome modeling error. And, as for data, it may be symbolic or lacking noise. So, the EC algorithms actually introduce randomness in order to succeed. That is not to say that it may not work with noisy data drawn from the environment of the populations.
The Four Functions of Lewontin and Atmar
Evolutionary computation mimics selection with respect to gene expression. The basic ideas can be clarified by four functional definitions flushed out by Lewontin 1974 and Atmar 1992. (refs from Fogel 2006)
For the definitions to be made, three sets are assumed. One is a set of environmental factors, I. I is a set of all sequences of factors. P is the set of phenotypes, traits that may be expressed. G is the set of all genotypes, which correspond to genetic material, or in the case of ML, the set of features.
These functions are as follows:
- Epigenesis : I x G → P
- Selection : P → P
- Genotypic survival : P → G
- Mutation : G → G
Epigenesis
Epigenesis describes the expression of the gene within the environment of the individual. For instance, a certain set of vector components may figure into the distance to a point, and the distance is an expression of those components. In the case of this example, all the points on a sphere would have the same phenotypic distance to a point.
Selection
Selection picks a new generation by picking preferred phenotypes. Continuing with the sphere example, if all our candidates were on a set of concentric spheres, our evolutionary computation program might only progress with those on the sphere with the smallest radius.
Genotypic Survival
Genotypic survival describes what genetic material is left after selection. Given this material, we should be able to make some predictions about the next generation. Going further with the sphere example, we would see that most of the vectors in the selected population would have smaller values in their configuration space coordinates. If by chance some other coordinates are clustered in some way, the phenotype related to those might not tell us much about future evolution, but it might leave candidates with some sticky properties.
Maybe all the candidates will be green, so we might expect future generations to be green and not red or blue. However, allowing for mutations, the colors may vary while the configuration space coordinates get smaller and smaller.
Mutation
Mutation may be random or not. It changes the gene material before expressions can be evaluated. Mutations can increase the rate of convergence, but can also regress. So, in the case of our sphere, if crossover starts interchanging X’s and Y’s between parents, the population might stay on one sphere and never move closer to the center (but they might become multicolored). A mutation could make an X smaller, causing populations to move closer.
Species of Evolutionary Computation
Once again, evolutionary computation describes a family. Here's the structure of it:
Let's go into more details about some population-based algorithms.
Genetic Algorithms
Mutation, crossover, and selection are employed over generations, and there is perhaps less of a focus on phenotype. High level descriptions of this algorithm family are similar to evolutionary computation which, however, includes more variants.
Evolutionary Strategy
The representation of a candidate is a numerical vector. The process of optimization is mostly mutation, and fitness ranking determines selection.
Swarm Intelligence
Swarming is a form of self-organization — it mimics group animal behavior. Flocking is an instance of swarming, and swarm behavior may be driven by survival or the need for getting resources.
Swarm intelligence usually mimics the collection of intelligent agents spontaneously operating on some problem in order to solve it or in order to attain some goal. Often, these agents can be characterized as distributed peer-to-peer communicators each with some problem solving capability. Individually, the agents may not be able to operate to solve the same problem that a group might solve.
While swarming does not require evolutionary computation, it may arise in a set of individuals that evolve to swarm. So, one EC project might be to create a set of swarm bots.
Memetic Algorithms
These are hybrid learning algorithms in which individuals can learn using any number of learning techniques. So, an algorithm that evolves individuals into swarming entities, might be memetic, especially if the individuals change by learning. Learning can be used to enhance phenotypic expression. And so, selection can occur after the effect of learning is apparent.
...and More
The many forms of evolutionary computation algorithms are something to explore. Hopefully, the few examples here elucidated the way in which algorithms are differentiated in the discussions.
Contrast with Neural Networks and Common AI Algorithms
The first thing one talks about when considering evolutionary computation algorithms is a set of candidates that get selected. The emphasis is on fitness of the individuals.
Candidate processing is not peculiar to evolutionary computation though. We see it in tree search algorithms as well as in min-max algorithms like alpha-beta search.
Neural Networks
Although Neural Networks are a form of biomimicry, neural networks don’t manage a set of candidates. Many examples are sequenced through them until they have an idea of how to match candidates. Evolutionary computation might be used to train neural networks, but it does not necessarily provide the mechanism for matching. In any case, if a matching process can be implemented using evolutionary computation, it will proceed to the optimal match by making selections from a set of candidates.
Search and Game Play Algorithms
In the graphs search, a node is said to be expanded with its offspring as a candidate for path selection. That is, the algorithm tries to figure out if an optimal path to a goal is to be found going through the parent and a particular offspring.
On search algorithms, the fitness of a particular candidate is determined by examining the future of each ply up to some point so that a heuristic evaluation of a node can be calculated. In contrast, evolutionary computation does not look ahead. The offspring may or may not survive a selection process. This state of affairs is not absolute however, because some evolutionary computation algorithms include the parents in the next generation. So, it may be that search could be construed as an evolutionary computation algorithm by treating path finding as a surviving lineage, where each individual lives for a really long time.
Constraint Satisfaction
In constraint satisfaction, all the individuals that may exist at certain nodes (points) of a structure are determined at the outset of the algorithm. Then, candidates are removed if they violate constraints.
The behavior of constraint satisfaction has the flavor of self organization in that the programs could be written with each candidate checking on the other as neighbors. But, it has an evolutionary flavor in the sense that various specimens die off if they violate constraints. There is no attempt to change candidates in order to make the complete set of individuals more suitable to their nodal assignment.
Stochastic Gradient Descent v.s. Neuro-evolution
Stochastic Gradient Descent Definition
Stochastic Gradient Descent, or SGD, looks a little like the Newton-Raphson method on steroids. It is a multi-dimensional version in some sense.
The step of the gradient descent is defined as follows:
Here \\(\lambda \\)
is a learning rate. \\(\nabla\\\)
means gradient. The Objective Function
(Loss Function or Payoff Function) is defined for the learning method.
A stochastic version of this:
where X is a randomly sampled subset of elements from the population.
Take a look at the two graphs below. One sketch gives the idea of gradient descent in which every element in the population is considered in an analytic sense. The other sketch shows a random walk resulting in making errors in direction by working with a small sample. But, the random walk is constrained to stay within a small tunnel by limiting the step size of the random straying. SGD goes through many more iterations, all of which incur a fraction of the cost of any one step of gradient descent.
Neural network training is formulated in terms of SGD.
Neuro-evolution: A Brief Overview
Neuro-evolution is an evolutionary computation strategy that evolves neural networks. There are those that both add in nodes and figure weights, and there are those that just figure weights. SGD is not necessarily needed to generate the weights of the Neural Nets. In any case, neuro-evolution may allow for an unsupervised learning scheme for neural networks.
Some Investigations of Neuro-evolution
A good survey for evolutionary computation in game playing can be found in Risi and Togelius 2015. They discuss a variety of approaches to game play learning. One of the earliest Neuro-evolutionary computation applications was by Fogel 2001 who worked on creating a master level checkers player, Blondie24, for a few years. Blondie24 was beaten by a program that was carefully constructed by hand, but Blondie24 learned on its own.
Researchers at Uber referenced Risi and Togelius 2015 and carried the research forward in a few areas (see Stanley and Clune 2017). In particular, some results on the differences between SGD and SC can be found here: Zhang, Clune, Stanley 2017.
Conclusion
People are talking about changes in the world of AI. Some say that AI is due for a ten-year topic shift. The pundits claim that that is the way AI has progressed for many years.
Now, they are saying that deep learning will lose its attention, while reinforcement learning and perhaps evolutionary computation versions of it will be the wave of 2020 and on (see this article in Futurism January 2019).
Hopefully, you have gained some insight as to where to start looking in order to explore evolutionary computation.