There is an awesome new Clojure-first machine learning library called Cortex that was open sourced recently. I’ve been exploring it lately and wanted to share my discoveries so far in this post. In our exploration, we are going to tackle one of the classic classification problems of the internet. How do you tell the difference between a cat and dog pic?
Where to Start?
For any machine learning problem, we’re going to need data. For this, we can use Kaggle’s data for the Cats vs Dogs Challenge. The training data consists of 25,000 images of cats and dogs. That should be more than enough to train our computer to recognize cats from doggies.
We also need some idea of how to train against the data. Luckily, the Cortex project has a very nice set of examples to help you get started. In particular there is a suite classification example using MNIST, (hand written digit), corpus. This example contains a number cutting edge features that we’ll want to use:
- Uses GPU for fast computation.
- Uses a deep, multi-layered, convolutional layered network for feature recognition.
- Has “forever” training by image augmentation.
- Saves the network configuration as it trains to an external nippy file so that it can be imported later.
- Has a really nice ClojureScript front end to visualize the training progress with a confusion matrix.
- Has a way to import the saved nippy network configuration and perform inference on it to classify a new image.
Basically, it has everything we need to hit the ground running.
To use the example’s forever training, we need to get the data in the right form. We need all the images to be the same size as well as in a directory structure that is split up into the training and test images. Furthermore, we want all the dog images to be under a “dog” directory and the cat images under the “cat” directory so that the all the indexed images under them have the correct “label”. It will look like this:
1 2 3 4 5 6 7
For this task, we are going to use a couple image libraries to help us out:
We can resize and rewrite the original images into the form we want. For a image size, we’re going to go with 52x52. The choice is arbitrary in that I wanted it bigger than the MNIST dataset which is 28x28 so it will be easier to see, but not so big that it kills my CPU. This is even more important since we want to use RGB colors which is 3 channels as opposed to the MNIST grey scale of 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
As far as the split between training images and testing images, we are going the go for an simple even split between testing and training data.
The Network layer configuration is the meat of the whole thing. We are going to go with the exact same network description as the MNIST example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
It uses a series of convolutional layers with max pooling for feature recognition. We’ll see if it works for color versions of cats and dogs as well as street numbers.
We’ll also keep the image augmentation the same as in the example.
1 2 3 4 5 6 7 8 9 10 11
It injects one augmented image into our training data by slightly rotating it and adding noise.
It’s time to test it out. Using
lein run, we’ll launch the
1 2 3 4 5 6 7 8
This opens a port to a localhost webpage where we can view the progress
Below the confusion matrix is shown. This tracks the progress of the training in the classification. In particular, how many times it thought a cat was really a cat and how many times it got it wrong.
As we are training the data, the loss for each epoch is shown on the console as well as when it saves the network to the external file.
After only thirty minutes of training on my Mac Book Pro, we get to some pretty good results, with the correct percentage in the 99s :
It’s time to do some inference on our trained network.
Firing up a REPL we can connect to our namespace and use the
label-one function from the cortex example to spot check our classification. It reads in the external nippy file that contains the trained network description, takes a random image from the testing directory, and classifies it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
(label-one) gives us the picture:
and classifies it as a cat. Yipee!
Not bad, but let’s try it with something harder. Personally, I’m not even sure whether this is a cat or a dog.
Feeding it through the program – it says it is a cat.
After much debate on the internet, I think that is the best answer the humans got too :)
So it seems like we have a pretty good model, why don’t we submit our results to the Kaggle competition and see how it rates. All they need is to have us run the classification against their test data of 12,500 images and classify them as 1 = dog or 0 = cat in a csv format.
We will take each image and resize it, then feed it into cortex’s
infer-n-observations function, to do all our classification as a batch.
1 2 3 4 5 6 7
Finally, we just need to format our results to a csv file and export it:
1 2 3 4 5 6
After uploading the file to the Kaggle, I was pleased that the answer got in the top 91%! It made it on the Leaderboard.
Using an example setup from the Cortex project and 30 minutes of processing time on my laptop, we were able to crunch through some significant data and come up with a trained classification model that was good enough to make the charts in the Kaggle competition. On top of it all, it is in pure Clojure.
In my mind, this is truely impressive and even though the Cortex library is in it’s early phases, it puts it on track to be as useful a tool as Tensor Flow for Machine Learning.
Earlier this month, I watched an ACM Learning webcast with Peter Norvig speaking on AI. In it, he spoke of one of the next challenges of AI which is to combine symbolic with neural. I can think of no better language than Clojure with it’s simplicity, power, and rich LISP heritage to take on the challenge for the future. With the Cortex library, it’s off to a great start.
If want to see all the cats vs dog Kaggle Code, it’s out on github here https://github.com/gigasquid/kaggle-cats-dogs