Neural Networks in Clojure With core.matrix

After having spent some time recently looking at top-down AI, I thought I would spend some time looking at bottom’s up AI, machine learning and neural networks.

I was pleasantly introduced to @mikea’s core.matrix at Clojure Conj this year and wanted to try making my own neural network using the library. The purpose of this blog is to share my learnings along the way.

What is a neural network?

A neural network is an approach to machine learning that involves simulating, (in an idealized way), the way our brains work on a biological level. There are three layers to neural network: the input layer, the hidden layers, and the output layer. Each layer consists of neurons that have a value. In each layer, each neuron is connected to the neuron in the next layer by a connection strength. To get data into the neural network, you assign values to the input layer, (values between 0 and 1). These values are then “fed forward” to the hidden layer neurons though an algorithm that relies on the input values and the connection strengths. The values are finally “fed forward” in a similar fashion to the output layer. The “learning” portion of the neural network comes from “training” the network data. The training data consists of a collection of associated input values and target values. The training process at a high level looks like this:

  • Feed forward input values to get the output values
  • How far off are the output values from the target values?
  • Calculate the error values and adjust the strengths of the network
  • Repeat until you think it has “learned” enough, that is when you feed the input values in the result of the output values are close enough to the target you are looking for

The beauty of this system is that the neural network, (given the right configuration and the right training), can approximate any function – just by exposing it to data.

Start Small

I wanted to start with a very small network so that I could understand the algorithms and actually do the maths for the tests along the way. The network configuration I chose is one with 1 hidden layer. The input layer has 2 neurons, the hidden layer has 3 neurons and the output layer has 2 neurons.

1
2
3
4
5
6
7
8
9
10
;;Neurons
;;  Input Hidden  Output
;;  A     1       C
;;  B     2       D
;;        3


;; Connection Strengths
;; Input to Hidden => [[A1 A2 A3] [B1 B2 B3]]
;; Hidden to Output => [[1C 1D] [2C 2D] [3C 3D]]

In this example we have:

  • Input Neurons: neuronA neuronB
  • Hidden Neurons: neuron1 neuron2 neuron3
  • Output Neurons: neuronC neuronD
  • Connections between the Input and Hidden Layers
    • neuronA-neuron1
    • neuronA-neuron2
    • neuronA-neuron3
    • neuronB-neuron1
    • neuronB-neuron2
    • neuronB-neuron3
  • Connections betwen the Hidden and Output Layers
    • neuron1-nerounC
    • neuron1-nerounD
    • neuron2-nerounC
    • neuron2-nerounD
    • neuron3-nerounC
    • neuron3-nerounD

To give us a concrete example to work with, let’s actually assign all our neurons and connection strengths to some real values.

1
2
3
4
5
6
7
(def input-neurons [1 0])
(def input-hidden-strengths [ [0.12 0.2 0.13]
                              [0.01 0.02 0.03]])
(def hidden-neurons [0 0 0])
(def hidden-output-strengths [[0.15 0.16]
                              [0.02 0.03]
                              [0.01 0.02]])

Feed Forward

Alright, we have values in the input neuron layer, let’s feed them forward through the network. The new value of neuron in the hidden layer is the sum of all the inputs of its connections multiplied by the connection strength. The neuron can also have its own threshold, (meaning you would subtract the threshold from the sum of inputs), but to keep things a simple as possible in this example, the threshold is 0 – so we will ignore it. The sum is then feed into an activation function, that has an output in the range of -1 to 1. The activation function is the tanh function. We will also need the derivative of the tanh function a little later when we are calculating errors, so we will define both here.

1
2
3
4
5
6
7
8
(def activation-fn (fn [x] (Math/tanh x)))
(def dactivation-fn (fn [y] (- 1.0 (* y y))))

(defn layer-activation [inputs strengths]
  "forward propagate the input of a layer"
  (mapv activation-fn
      (mapv #(reduce + %)
       (* inputs (transpose strengths)))))

Note how nice core.matrix works on multipling vectors <3.

So now if we calculate the hidden neuron values from the input [1 0], we get:

1
2
(layer-activation input-neurons input-hidden-strengths)
;=>  [0.11942729853438588 0.197375320224904 0.12927258360605834]

Let’s just remember those hidden neuron values for our next step

1
2
(def new-hidden-neurons
  (layer-activation input-neurons input-hidden-strengths))

Now we do the same thing to calculate the output values

1
2
3
4
5
(layer-activation new-hidden-neurons hidden-output-strengths)
;=>  [0.02315019005321053 0.027608061500083565]

(def new-output-neurons
  (layer-activation new-hidden-neurons hidden-output-strengths))

Alright! We have our answer [0.02315019005321053 0.027608061500083565]. Notice that the values are pretty much the same. This is because we haven’t trained our network to do anything yet.

Backwards Propagation

To train our network, we have to let it know what the answer,(or target), should be, so we can calculate the errors and finally update our connection strengths. For this simple example, let’s just inverse the data – so given an input of [1 0] should give us an output of [0 1].

1
(def targets [0 1])

`

Calculate the errors of the output layer

The first errors that we need to calculate are the ones for the output layer. This is found by subtracting the target value form the actual value and then multiplying by the gradient/ derivative of the activation function

1
2
3
4
5
6
7
(defn output-deltas [targets outputs]
  "measures the delta errors for the output layer (Desired value – actual value) and multiplying it by the gradient of the activation function"
  (* (mapv dactivation-fn outputs)
     (- targets outputs)))

(output-deltas targets new-output-neurons)
;=> [-0.023137783141771645 0.9716507764442904]

`

Great let’s remember this output deltas for later

1
(def odeltas (output-deltas targets new-output-neurons))

Calculate the errors of the hidden layer

The errors of the hidden layer are based off the deltas that we just found from the output layer. In fact, for each hidden neuron, the error delta is the gradient of the activation function multiplied by the weighted sum of the ouput deltas of connected ouput neurons and it’s connection strength. This should remind you of the forward propagation of the inputs – but this time we are doing it backwards with the error deltas.

1
2
3
4
5
6
7
8
9
10
(defn hlayer-deltas [odeltas neurons strengths]
  (* (mapv dactivation-fn neurons)
     (mapv #(reduce + %)
           (* odeltas strengths))))

(hlayer-deltas
    odeltas
    new-hidden-neurons
    hidden-output-strengths)
;=>  [0.14982559238071416 0.027569216735265096 0.018880751432503236]

Great let’s remember the hidden layer error deltas for later

1
2
3
4
(def hdeltas (hlayer-deltas
              odeltas
              new-hidden-neurons
              hidden-output-strengths))

Updating the connection strengths

Great! We have all the error deltas, now we are ready to go ahead and update the connection strengths. In general this is the same process for both the hidden-output connections and the input-hidden connections.

  • weight-change = error-delta * neuron-value
  • new-weight = weight + learning rate * weight-change

The learning rate controls how fast the weigths and errors should be adjusted. It the learning rate is too high, then there is the danger that it will converge to fit the solution too fast and not find the best solution. If the learning rate is too low, it may never actually converge to the right solution given the training data that it is using. For this example, let’s use a training rate of 0.2

1
2
3
(defn update-strengths [deltas neurons strengths lrate]
  (+ strengths (* lrate
                  (mapv #(* deltas %) neurons))))

Update the hidden-output strengths

Updating this layer we are going to look at

  • weight-change = odelta * hidden value
  • new-weight = weight + (learning rate * weight-change)
1
2
3
4
5
6
7
8
(update-strengths
       odeltas
       new-hidden-neurons
       hidden-output-strengths
       learning-rate)
;=> [[0.14944734341306073 0.18320832546991603]
    [0.019086634528619688 0.06835597662949369]
    [0.009401783798869296 0.04512156124675721]]

Of course, let’s remember these values too

1
2
3
4
5
6
(def new-hidden-output-strengths
  (update-strengths
       odeltas
       new-hidden-neurons
       hidden-output-strengths
       learning-rate))

Update the input-hidden strengths

We are going to do the same thing with the input-hidden strengths too.

  • weight-change = hdelta * input value
  • new-weight = weight + (learning rate * weight-change)
1
2
3
4
5
6
7
 (update-strengths
           hdeltas
           input-neurons
           input-hidden-strengths
           learning-rate)
;=>  [[0.14996511847614283 0.20551384334705303 0.13377615028650064]
           [0.01 0.02 0.03]]

These are our new strengths

1
2
3
4
5
6
(def new-input-hidden-strengths
  (update-strengths
       hdeltas
       input-neurons
       input-hidden-strengths
       learning-rate))

Putting the pieces together

We have done it! In our simple example we have:

  • Forward propagated the input to get the output
  • Calculated the errors from the target through backpropogation
  • Updated the connection strengths/ weights

We just need to put all the pieces together. We’ll do this with the values that we got earlier to make sure it is all working.

Construct a network representation

It would be nice if we could represent an entire neural network in a data structure. That way the whole transformation of feeding forward and training the network could give us a new network back. So lets define the data structure as [input-neurons input-hidden-strengths hidden-neurons hidden-output-strengths output-neurons].

We will start off with all the values of the neurons being zero.

1
2
(def nn [ [0 0] input-hidden-strengths hidden-neurons
hidden-output-strengths [0 0]])

Generalized feed forward

Now we can make a feed forward function that takes this network and constructs a new network based on input values and the layer-activation function that we defined earlier.

1
2
3
4
5
(defn feed-forward [input network]
  (let [[in i-h-strengths h h-o-strengths out] network
        new-h (layer-activation input i-h-strengths)
        new-o (layer-activation new-h h-o-strengths)]
    [input i-h-strengths new-h h-o-strengths new-o]))

This should match up with the values that we got earlier when we were just working on the individual pieces.

1
2
3
(testing "feed forward"
  (is (== [input-neurons input-hidden-strengths new-hidden-neurons hidden-output-strengths new-output-neurons]
          (feed-forward [1 0] nn))))

`

Generalized update weights / connection strengths

We can make a similiar update-weights function that calculate the errors and returns back a new network with the updated weights

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(defn update-weights [network target learning-rate]
  (let [[ in i-h-strengths h h-o-strengths out] network
        o-deltas (output-deltas target out)
        h-deltas (hlayer-deltas o-deltas h h-o-strengths)
        n-h-o-strengths (update-strengths
                         o-deltas
                         h
                         h-o-strengths
                         learning-rate)
        n-i-h-strengths (update-strengths
                         h-deltas
                         in
                         i-h-strengths
                         learning-rate)]
    [in n-i-h-strengths h n-h-o-strengths out]))

This too should match up with the pieces from the earlier examples.

1
2
3
4
5
6
7
(testing "update-weights"
  (is ( == [input-neurons
            new-input-hidden-strengths
            new-hidden-neurons
            new-hidden-output-strengths
            new-output-neurons]
           (update-weights (feed-forward [1 0] nn) [0 1] 0.2))))

Generalized train network

Now we can make a function that takes input and a target and feeds the input forward and then updates the weights.

1
2
3
4
5
6
7
8
9
10
(defn train-network [network input target learning-rate]
  (update-weights (feed-forward input network) target learning-rate))

(testing "train-network"
  (is (== [input-neurons
            new-input-hidden-strengths
            new-hidden-neurons
            new-hidden-output-strengths
           new-output-neurons]
          (train-network nn [1 0] [0 1] 0.2))))

Try it out!

We are ready to try it out! Let’s train our network on a few examples of inversing the data

1
2
3
4
(def n1 (-> nn
     (train-network [1 0] [0 1] 0.5)
     (train-network [0.5 0] [0 0.5] 0.5)
     (train-network [0.25 0] [0 0.25] 0.5)))

We’ll also make a helper function that just returns the output neurons for the feed-forward function.

1
2
(defn ff [input network]
  (last (feed-forward input network)))

Let’s look at the results of the untrained and the trained networks

1
2
3
4
;;untrained
(ff [1 0] nn) ;=> [0.02315019005321053 0.027608061500083565]
;;trained
(ff [1 0] n1) ;=> [0.03765676393050254 0.10552175312900794]

Whoa! The trained example isn’t perfect, but we can see that it is getting closer to the right answer. It is learning!

MOR Training Data

Well this is really cool and it is working. But it would be nicer to be able to present a set of training data for it to learn on. For example, it would be nice to have a training data structure look like:

1
[ [input target] [input target] ... ]

Let’s go ahead and define that.

1
2
3
4
5
6
7
(defn train-data [network data learning-rate]
  (if-let [[input target] (first data)]
    (recur
     (train-network network input target learning-rate)
     (rest data)
     learning-rate)
    network))

Let’s try that out on the example earlier

1
2
3
4
5
6
7
8
(def n2 (train-data nn [
                        [[1 0] [0 1]]
                        [[0.5 0] [0 0.5]]
                        [[0.25 0] [0 0.25] ]
                        ]
                    0.5))

(ff [1 0] n2) ;=> [0.03765676393050254 0.10552175312900794]

Cool. We can now train on data sets. That means we can construct data sets out of infinite lazy sequences too. Let’s make a lazy training set of inputs and their inverse.

1
2
3
(defn inverse-data []
  (let [n (rand 1)]
    [[n 0] [0 n]]))

Let’s see how well our network is doing after we train it with some more data

1
2
3
4
5
(def n3 (train-data nn (repeatedly 400 inverse-data) 0.5))

(ff [1 0] n3) ;=> [-4.958278484025221E-4 0.8211647699205362]
(ff [0.5 0] n3) ;=> [2.1645760787874696E-4 0.5579396715416916]
(ff [0.25 0] n3) ;=> [1.8183385523103048E-4 0.31130601296149013]

Wow. The more examples it sees, the better that network is doing at learning what to do!

General Construct Network

The only piece that we are missing now is to have a function that will create a general neural network for us. We can choose how many input nerurons, hidden neurons, and output neurons and have a network constructed with random weights.

1
2
3
4
5
6
7
8
9
10
(defn gen-strengths [to from]
  (let [l (* to from)]
    (map vec (partition from (repeatedly l #(rand (/ 1 l)))))))

(defn construct-network [num-in num-hidden num-out]
  (vec (map vec [(repeat num-in 0)
             (gen-strengths num-in num-hidden)
             (repeat num-hidden 0)
             (gen-strengths num-hidden num-out)
             (repeat num-out 0)])))

Now we can construct our network from scratch and train it.

1
2
3
(def tnn (construct-network 2 3 2))
(def n5 (train-data tnn (repeatedly 1000 inverse-data) 0.2))
(ff [1 0] n4) ;=> [-4.954958580800465E-4 0.8160149309699489]

And that’s it. We have constucted a neural network with core.matrix

Want more?

I put together a github library based on the neural network code in the posts. It is called K9, named after Dr. Who’s best dog friend. You can find the examples we have gone through in the tests. There is also an example program using it in the examples directory. It learns what colors are based on thier RGB value.

There are a couple web resources I would recommend if you want to look farther as well.

Go forth and create Neural Networks!

Comments