It’s holiday time and that means parties and getting together with friends. Bringing a baked good or dessert to a gathering is a time honored tradition. But what if this year, you could take it to the next level? Everyone brings actual food. But with the help of Deep Learning, you can bring something completely different – you can bring the image of baked good! I’m not talking about just any old image that someone captured with a camera or created with a pen and paper. I’m talking about the computer itself creating. This image would be never before seen, totally unique, and crafted by the creative process of the machine.
That is exactly what we are going to do. We are going to create a flan
If you’ve never had a flan before, it’s a yummy dessert made of a baked custard with caramel sauce on it.
“Why a flan?”, you may ask. There are quite a few reasons:
- It’s tasty in real life.
- Flan rhymes with GAN, (unless you pronounce it “Gaaahn”).
- Why not?
Onto the recipe. How are we actually going to make this work? We need some ingredients:
- Clojure – the most advanced programming language to create generative desserts.
- Apache MXNet – a flexible and efficient deep learning library that has a Clojure package.
- 1000-5000 pictures of flans – for Deep Learning you need data!
Gather Flan Pictures
The first thing you want to do is gather your 1000 or more images with a scraper. The scraper will crawl google, bing, or instagram and download pictures of mostly flans to your computer. You may have to eyeball and remove any clearly wrong ones from your stash.
Next, you need to gather all these images in a directory and run a tool called im2rec.py on them to turn them into an image record iterator for use with MXNet. This will produce an optimized format that will allow our deep learning program to efficiently cycle through them.
python3 im2rec.py --resize 28 root flan
to produce a
flan.rec file with images resized to 28x28 that we can use next.
Load Flan Pictures into MXNet
The next step is to import the image record iterator into the MXNet with the Clojure API. We can do this with the
Add this to your require:
[org.apache.clojure-mxnet.io :as mx-io]
Now, we can load our images:
1 2 3
Now, that we have the images, we need to create our
model. This is what is actually going to do the learning and creating of images.
Creating a GAN model.
GAN stands for Generative Adversarial Network. This is a incredibly cool deep learning technique that has two different models pitted against each, yet both learning and getting better at the same time. The two models are a generator and a discriminator. The generator model creates a new image from a random noise vector. The discriminator then tries to tell whether the image is a real image or a fake image. We need to create both of these models for our network.
First, the discriminator model. We are going to use the
symbol namespace for the clojure package:
1 2 3 4 5 6 7 8 9 10 11 12
There is a variable for the
data coming in, (which is the picture of the flan), it then flows through the other layers which consist of convolutions, normalization, and activation layers. The last three layers actually repeat another two times before ending in the output, which tells whether it thinks the image was a fake or not.
The generator model looks similar:
1 2 3 4 5 6 7 8 9 10 11 12 13
There is a variable for the
data coming in, but this time it is a random noise vector. Another interesting point that is is using a
deconvolution layer instead of a
convolution layer. The generator is basically the inverse of the discriminator. It starts with a random noise vector, but that is translated up through the layers until it is expanded to a image output.
Next, we iterate through all of our training images in our
reduce-batches. Here is just an excerpt where we get a random noise vector and have the generator run the data through and produce the output image:
1 2 3 4 5 6 7 8
The whole code is here for reference, but let’s skip forward and run it and see what happens.
FLANS!! Well, they could be flans if you squint a bit.
Now that we have them kinda working for a small image size 28x28, let’s biggerize it.
Turn on the Oven and Bake
Turning up the size to 128x128 requires some alterations in the layers’ parameters to make sure that it processes and generates the correct size, but other than that we are good to go.
Here comes the fun part, watching it train and learn:
In the beginning there was nothing but random noise.
It’s beginning to learn colors! Red, yellow, brown seem to be important to flans.
It’s learning shapes! It has learned that flans seem to be blob shaped.
It is moving into its surreal phase. Salvidor Dali would be proud of these flans.
Things take a weird turn. Does that flan have eyes?
Even worse. Are those demonic flans? Should we even continue down this path?
Answer: Yes – the training must go on..
Big moment here. It looks like something that could possibly be edible.
Ick! Green Flans! No one is going to want that.
We’ve achieved maximum flan, (for the time being).
If you are interested in playing around with the pretrained model, you can check it out here with the pretrained function. It will load up the trained model and generate flans for you to explore and bring to your dinner parties.
Wrapping up, training GANs is a lot of fun. With MXNet, you can bring the fun with you to Clojure.
Want more, check out this Clojure Conj video – Can You GAN?.