Neural Networks in Clojure With core.matrix
After having spent some time recently looking at top-down AI, I thought I would spend some time looking at bottom’s up AI, machine learning and neural networks.
I was pleasantly introduced to @mikea’s core.matrix at Clojure Conj this year and wanted to try making my own neural network using the library. The purpose of this blog is to share my learnings along the way.
What is a neural network?
A neural network is an approach to machine learning that involves simulating, (in an idealized way), the way our brains work on a biological level. There are three layers to neural network: the input layer, the hidden layers, and the output layer. Each layer consists of neurons that have a value. In each layer, each neuron is connected to the neuron in the next layer by a connection strength. To get data into the neural network, you assign values to the input layer, (values between 0 and 1). These values are then “fed forward” to the hidden layer neurons though an algorithm that relies on the input values and the connection strengths. The values are finally “fed forward” in a similar fashion to the output layer. The “learning” portion of the neural network comes from “training” the network data. The training data consists of a collection of associated input values and target values. The training process at a high level looks like this:
- Feed forward input values to get the output values
- How far off are the output values from the target values?
- Calculate the error values and adjust the strengths of the network
- Repeat until you think it has “learned” enough, that is when you feed the input values in the result of the output values are close enough to the target you are looking for
The beauty of this system is that the neural network, (given the right configuration and the right training), can approximate any function - just by exposing it to data.
Start Small
I wanted to start with a very small network so that I could understand the algorithms and actually do the maths for the tests along the way. The network configuration I chose is one with 1 hidden layer. The input layer has 2 neurons, the hidden layer has 3 neurons and the output layer has 2 neurons.
1 2 3 4 5 6 7 8 9 10 |
|
In this example we have:
- Input Neurons: neuronA neuronB
- Hidden Neurons: neuron1 neuron2 neuron3
- Output Neurons: neuronC neuronD
- Connections between the Input and Hidden Layers
- neuronA-neuron1
- neuronA-neuron2
- neuronA-neuron3
- neuronB-neuron1
- neuronB-neuron2
- neuronB-neuron3
- Connections betwen the Hidden and Output Layers
- neuron1-nerounC
- neuron1-nerounD
- neuron2-nerounC
- neuron2-nerounD
- neuron3-nerounC
- neuron3-nerounD
To give us a concrete example to work with, let’s actually assign all our neurons and connection strengths to some real values.
1 2 3 4 5 6 7 |
|
Feed Forward
Alright, we have values in the input neuron layer, let’s feed them forward through the network. The new value of neuron in the hidden layer is the sum of all the inputs of its connections multiplied by the connection strength. The neuron can also have its own threshold, (meaning you would subtract the threshold from the sum of inputs), but to keep things a simple as possible in this example, the threshold is 0 - so we will ignore it. The sum is then feed into an activation function, that has an output in the range of -1 to 1. The activation function is the tanh function. We will also need the derivative of the tanh function a little later when we are calculating errors, so we will define both here.
1 2 3 4 5 6 7 8 |
|
Note how nice core.matrix works on multipling vectors <3.
So now if we calculate the hidden neuron values from the input [1 0], we get:
1 2 |
|
Let’s just remember those hidden neuron values for our next step
1 2 |
|
Now we do the same thing to calculate the output values
1 2 3 4 5 |
|
Alright! We have our answer [0.02315019005321053 0.027608061500083565]. Notice that the values are pretty much the same. This is because we haven’t trained our network to do anything yet.
Backwards Propagation
To train our network, we have to let it know what the answer,(or target), should be, so we can calculate the errors and finally update our connection strengths. For this simple example, let’s just inverse the data - so given an input of [1 0] should give us an output of [0 1].
1
|
|
`
Calculate the errors of the output layer
The first errors that we need to calculate are the ones for the output layer. This is found by subtracting the target value form the actual value and then multiplying by the gradient/ derivative of the activation function
1 2 3 4 5 6 7 |
|
`
Great let’s remember this output deltas for later
1
|
|
Calculate the errors of the hidden layer
The errors of the hidden layer are based off the deltas that we just found from the output layer. In fact, for each hidden neuron, the error delta is the gradient of the activation function multiplied by the weighted sum of the ouput deltas of connected ouput neurons and it’s connection strength. This should remind you of the forward propagation of the inputs - but this time we are doing it backwards with the error deltas.
1 2 3 4 5 6 7 8 9 10 |
|
Great let’s remember the hidden layer error deltas for later
1 2 3 4 |
|
Updating the connection strengths
Great! We have all the error deltas, now we are ready to go ahead and update the connection strengths. In general this is the same process for both the hidden-output connections and the input-hidden connections.
- weight-change = error-delta * neuron-value
- new-weight = weight + learning rate * weight-change
The learning rate controls how fast the weigths and errors should be adjusted. It the learning rate is too high, then there is the danger that it will converge to fit the solution too fast and not find the best solution. If the learning rate is too low, it may never actually converge to the right solution given the training data that it is using. For this example, let’s use a training rate of 0.2
1 2 3 |
|
Update the hidden-output strengths
Updating this layer we are going to look at
- weight-change = odelta * hidden value
- new-weight = weight + (learning rate * weight-change)
1 2 3 4 5 6 7 8 |
|
Of course, let’s remember these values too
1 2 3 4 5 6 |
|
Update the input-hidden strengths
We are going to do the same thing with the input-hidden strengths too.
- weight-change = hdelta * input value
- new-weight = weight + (learning rate * weight-change)
1 2 3 4 5 6 7 |
|
These are our new strengths
1 2 3 4 5 6 |
|
Putting the pieces together
We have done it! In our simple example we have:
- Forward propagated the input to get the output
- Calculated the errors from the target through backpropogation
- Updated the connection strengths/ weights
We just need to put all the pieces together. We’ll do this with the values that we got earlier to make sure it is all working.
Construct a network representation
It would be nice if we could represent an entire neural network in a data structure. That way the whole transformation of feeding forward and training the network could give us a new network back. So lets define the data structure as [input-neurons input-hidden-strengths hidden-neurons hidden-output-strengths output-neurons].
We will start off with all the values of the neurons being zero.
1 2 |
|
Generalized feed forward
Now we can make a feed forward function that takes this network and constructs a new network based on input values and the layer-activation function that we defined earlier.
1 2 3 4 5 |
|
This should match up with the values that we got earlier when we were just working on the individual pieces.
1 2 3 |
|
`
Generalized update weights / connection strengths
We can make a similiar update-weights function that calculate the errors and returns back a new network with the updated weights
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
This too should match up with the pieces from the earlier examples.
1 2 3 4 5 6 7 |
|
Generalized train network
Now we can make a function that takes input and a target and feeds the input forward and then updates the weights.
1 2 3 4 5 6 7 8 9 10 |
|
Try it out!
We are ready to try it out! Let’s train our network on a few examples of inversing the data
1 2 3 4 |
|
We’ll also make a helper function that just returns the output neurons for the feed-forward function.
1 2 |
|
Let’s look at the results of the untrained and the trained networks
1 2 3 4 |
|
Whoa! The trained example isn’t perfect, but we can see that it is getting closer to the right answer. It is learning!
MOR Training Data
Well this is really cool and it is working. But it would be nicer to be able to present a set of training data for it to learn on. For example, it would be nice to have a training data structure look like:
1
|
|
Let’s go ahead and define that.
1 2 3 4 5 6 7 |
|
Let’s try that out on the example earlier
1 2 3 4 5 6 7 8 |
|
Cool. We can now train on data sets. That means we can construct data sets out of infinite lazy sequences too. Let’s make a lazy training set of inputs and their inverse.
1 2 3 |
|
Let’s see how well our network is doing after we train it with some more data
1 2 3 4 5 |
|
Wow. The more examples it sees, the better that network is doing at learning what to do!
General Construct Network
The only piece that we are missing now is to have a function that will create a general neural network for us. We can choose how many input nerurons, hidden neurons, and output neurons and have a network constructed with random weights.
1 2 3 4 5 6 7 8 9 10 |
|
Now we can construct our network from scratch and train it.
1 2 3 |
|
And that’s it. We have constucted a neural network with core.matrix
Want more?
I put together a github library based on the neural network code in the posts. It is called K9, named after Dr. Who’s best dog friend. You can find the examples we have gone through in the tests. There is also an example program using it in the examples directory. It learns what colors are based on thier RGB value.
There are a couple web resources I would recommend if you want to look farther as well.
Go forth and create Neural Networks!