Genetic Programming With clojure.spec
Clojure.spec is a new library for Clojure that enables you to write specifications for your program. In an earlier post, I showed off some of it’s power to generate test data from your specifications. It’s a pretty cool feature. Given some clojure.spec code, you can generate sample data for you based off of the specifications. But what if you could write a program that would generate your clojure.spec program based off of data so that you could generate more test data?
Genetic programming
Here is where we embark for fun. We are going to use genetic programming to generate clojure.spec creatures that contain a program. Through successive generations, those creatures will breed, mutate, and evolve to fit the data that we are going to give it. Going with our creature theme, we can say that it eats a sequence of data like this
1
|
|
Each creature will be represented by a map that has information about two key pieces, its program and the fitness score. Each program is going to start with a clojure.spec/cat, (which is the spec to describe a sequence). From here on out, I’m going to refer to the clojure.spec namespace as s/
. So, a simple creature would look like this.
1 2 |
|
How do we figure out a score from the creature’s spec? We run the spec and see how much of the data that it can successfully consume.
Scoring a creature
To score a creature, we’re going to use the clojure.spec explain-data
function. It enables us to run a spec against some data and get back the problems in a data format that we can inspect. If there are no problems and the spec passes, the result is nil.
1 2 |
|
However, if there is a problem, we can get information about what went wrong. In particular, we can see where it went wrong in the sequence.
1 2 |
|
In the above example, the :in
key tells us that it fails at index 1. This gives us all the information we need to write a score function for our creature.
1 2 3 4 5 6 7 |
|
This function tries to run the spec against the data. If there are no problems, the creature gets a 100 score. Otherwise, it records the farthest point in the sequence that it got. Creatures with a higher score are considered more fit.
1 2 |
|
Now that we have a fitness function to evaluate our creatures, we need a way to generate a random clojure.spec creature.
Create a random creature
This is where I really love Clojure. Code is data, so we can create the programs as lists and they are just themselves. To run the programs, we just need to call eval
on them. We are going to constrain the creatures somewhat. They are all going to start out with s/cat
and have a certain length of items in the sequence. Also, we are going to allow the parts of the spec to be created with certain predicates.
1
|
|
Also allowing, composition with ands and ors and other sequences.
1 2 |
|
We are also going to have some probability knobs to control how the random creature is constructed.
1 2 3 4 |
|
The seq-prob
is the probability that a new spec sub sequence will be constructed. The nest-prob
is set to zero right now, to keep things simple, but if turned up with increase the chance that a nested spec sequence would occur. We are going to be writing a recursive function for generation, so we’ll keep things to a limited depth with max-depth
. Finally, we have the chance that when constructing a spec sub sequence, that it will be an and/or with and-or-prob
. Putting it all together with code to construct a random arg.
1 2 3 4 |
|
Also creating a random sub sequence.
1 2 3 4 5 6 7 8 9 10 11 |
|
Finally, we can create a random s/cat
spec with
1 2 3 4 5 6 7 |
|
Let’s see it in action.
1 2 |
|
We can make a batch of new creatures for our initial population using this function.
1 2 3 |
|
Great! Now we have a way to make new random spec creatures. But, we need a way to alter them and let them evolve. The first way to do this is with mutation.
Mutating a creature
Mutation in our case, means changing part of the code tree of the creature’s program. To keep the program runnable, we don’t want to be able to mutate every node, only specific ones. We’re going to control this by defining a mutable
function that will only change nodes that start with our sequences or predicates.
1 2 3 4 |
|
Then, we can use postwalk
to walk the code tree and alter a node by a mutation probability factor
1 2 3 4 5 6 7 8 9 |
|
Trying it on one of our creatures.
1 2 |
|
We can change our creatures via mutation, but what about breeding it with other creatures?
Crossovers with creatures
Crossover is another way to modify programs. It takes two creatures and swaps a node from one creature to another. To accomplish this, we’re going to use the walk
function to select at a random probability the crossover node from the first node, then insert it into the second’s creatures program at another random spot.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Taking two creatures and putting them together.
1 2 3 |
|
We have our ways to change our creatures to let them evolve and we have a way to rank them. What we need now is to put it together in a way that will let them evolve to the solution.
Evolving creatures
The process will be in general terms:
- Create initial population
- Rank them
- Take the top two best ones and carry them over (this is known as elitism)
- Create the next generation from by selecting creatures for crossover and mutation
- Repeat!
So how do we select the best creatures for our next population? This is an interesting question, there are many approaches. The one that we’re going to use is called tournament selection. It involves picking n creatures from the whole population and then, among those, picking the best scored one. This will allow diversity in our population that is needed for proper evolution.
1 2 3 |
|
We’re now ready to write our evolve function. In it, we pass in the population size, how many generations we want, the tournament size, and of course, our test data that our creatures are going to feed on. The loop ends when it reaches a perfect fitting solution, (a creature with a score of 100), or the max generations.
Note that we have a chance for a completely random creature to appear in the generations, to further encourage diversity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Trying it out. We get a perfect clojure.spec creature!
1 2 3 4 |
|
Of course, our clojure.spec creature can generate data on its own with the exercise
function. Let’s have it generate 5 more examples of data that conform to its spec.
1 2 3 4 5 6 |
|
If we wanted to, we could adjust our evolve function and let it continue to evolve creatures and lots of different solutions to choose from. We could even take the generated data from the exercise function
and let it generate more creatures who generate more data……
The mind boggles.
We’ll leave with a quick summary of Genetic Programming.
- Start with a way to generate random creatures
- Have a way to evaluate their fitness
- Create a way to change them for the next generations using
- Mutation
- Crossover
- Have an evolution process
- Create an initial population
- Rank them
- Create the next generation using selection techniques and mutation/ crossovers
- Don’t forget about diversity
Most importantly, have fun!
If you want to play with the code, it’s on github here https://github.com/gigasquid/genetic-programming-spec
If you want to learn more about clojure.spec this video is a great place to start. The guide is also a great reference with examples.
If you want to learn more about genetic programming, there are a couple of books I would recommend: Collective Intelligence and Genetic Algorithms + Data Structures = Evolution Programs