Kolmogorov-Uspensky Machine

Published on:
Tags: All, Clojure

It happened again. I was sitting down reading a paper and I came across the phrase Kolmogorov-Uspensky machine and I had no idea what it was. My initial reaction was just to move on. It probably wasn’t important, I told myself, just a detail that I could skim over. I took a sip of my tea and continued on. The next paragraph it appeared again. It was just sticking up like a thread waiting to be pulled. Still, I resisted. After all, I wasn’t even near my computer. I would have to get up an walk into the other room. After considering it for a moment, inertia won out and I continued my reading. There it was once more. This time right in the same paragraph, silently mocking me. I knew I had to do something so I strode to my computer and pulled the thread.

What is a Kolmogorov-Uspensky machine?

The first thing I found is that the Kolmogorov-Uspensky machine, (also referred to as KUM), is very similar to the Turing machine. In fact, it shares the same computational class of being Turing-complete.

The Turing machine operates on a tape divided into cells. The head can move along the tape and read and write to it. The tape is the storage for the machine. Its movements are controlled by a collection of instructions which will be executed if certain prerequisites are met. The difference between the Turing machine and a Kolmogorov-Uspensky machine is that the KUM has a tape that can change topology. It’s a graph.

The graph of a KUM machine is not just any graph. It’s a particular kind of graph. It must have the property that if you start at a one vertex, all the other vertexes are uniquely addressable. The graph also has a active node which, in turn, has a active neighborhood of other nodes associated with it. This is not unlike the Turing machine’s head that points to the current cell. To figure out what do, the KUM machine looks at the graph to decide on what instruction to execute. The instructions can consist of adding a node or edge to the active neighborhood, removing a node or edge from the active neighborhood, or halting.

After spending some time reading and researching, I felt like a had some idea of what a Kolmogorov-Uspensky machine was but it was still a bit fuzzy. I wanted to really dig in and experience it by trying to implement one. I found an esoteric programming language called Eodermdrome that fit the bill and set to work building it out in Clojure.

Eodermdrome

An Eodermdrome program creates graphs from a string of letters. For example the graph of abcdae would produce

The program itself consists of series of commands or rules. The command will be executed if the following prereqs are met:

  • The match graph in the command is a subgraph of the system graph.
  • If an input set is part of the command, the input character read of the system input must match it.

A command is of the form:

  • match-graph graph-replacement
    • This will execute if the match-graph is a subgraph and then transform the match to the replacement.
    • Example:a abc
  • (input-set) match-graph graph-replacement.
    • This will execute if the match is a subgraph and if the next character of the system input matches. On executing, it will read one char from the input and then transform the match graph with the replacement.
    • Example: (1) a abc
  • match-graph (output) graph-replacement.
    • This will execute if the match-graph is a subgraph. On executing, it will print the output to the system and transform the match with the replacement.
    • Example: a (1) abc
  • (input-set) match-graph (output) graph-replacement.
    • This will execute if the match is a subgraph and if the next character of the system input matches. On executing, it will read one char from the input, print the output to the system, and then transform the match graph with the replacement.
    • Example: (0) a (1) abc

Comments are also allowed as text in between commas. In this implementation, they must be contained on a single line. Example: ,this is a comment,(1) a abc

The initial state of the graph with a program is the denoted by the graph string thequickbrownfoxjumpsoverthelazydog.

We now have all we need to walk through an example program in the Kolmogorov-Uspensky machine.

An add program

Let’s take program for adding two string of ones together separated by zeros.

1
2
3
4
,takes input of ones separated by zeros and adds the ones, thequickbrownfoxjumpsoverthelazydog a
(1) a ab
(0) a a
ab (1) a

Given a system input of “101”, it will print out “11”. Let’s walk through what happens in the program.

Step 1 – The program starts with our graph in the initial state of our beloved thequickbrownfoxjumpsoverthelazydog configuration.

Step 2 – The first instruction matches ,takes input of ones separated by zeros and adds the ones, thequickbrownfoxjumpsoverthelazydog a with he active subgraph being the whole graph. It is replaced by the single graph node a.

Step 3 – The next instruction set (1) a ab a subgraph matches and takes a 1 off the input and transforms the graph to ab.

Step 4 – The instruction set (0) a a also matches (since a is a subgraph of ab) and it takes a zero off the input and transforms back the a to a so the graph is still ab.

Step 5 – The instruction set ab (1) a now matches and a one prints out and the ab graph changes to a.

Step 6 – Now, the (1) a ab instruction matches, it takes another 1 off the input (our last one) and transforms to ab

Step 7 – Finally, ab (1) a matches and it prints out a 1 and rewrites the graph to back to a

There are no more matching subgraphs without input required for instructions, so the program ends.

Parting thoughts and threads

The Kolmogorov-Uspensky machine is quite interesting. Not only is the idea of graph storage and rewriting appealing, it is also pretty powerful compared to Turing machines. In fact, Dima Grigoriev proved that Turing machines cannot simulate Kolmogorov machines in real time.

It’s been quite a fun and enriching jaunt. The next time you see an unfamiliar term or concept, I encourage you to pull the thread. You never know where it will take you. It’s a great way to enlarge your world.

If you are interested in the code and hacking for yourself, here is the repo https://github.com/gigasquid/eodermdrome.

Other good resources on KUM:

Fairy Tale Word Vectors

Published on:
Tags: All, Clojure

This post continues our exploration from the last blog post Why Hyperdimensional Socks Never Match. We are still working our way through Kanerva’s paper. This time, with the basics of hypervectors under our belts, we’re ready to explore how words can be expressed as context vectors. Once in a high dimensional form, you can compare two words to see how similar they are and even perform reasoning.

To kick off our word vector adventure, we need some words. Preferring whimsy over the Google news, our text will be taken from ten freely available fairy tale books on http://www.gutenberg.org/.

Gather ye nouns

Our goal is to assemble a frequency matrix, with all the different nouns as the rows and the columns will be the counts of if the word appears or not in the document. Our matrix will be binary with just 1s and 0s. The document will be a sentence or fragment of words. A small visualization is below.

Noun Doc1 Doc2
flower 1 0
king 0 1
fairy 1 0
gold 1 1

The size of the matrix will be big enough to support hypervector behavior, but not so big as to make computation too annoyingly slow. It will be nouns x 10,000.

The first task is to get a set of nouns to fill out the rows. Although, there are numerous online sources for linguistic nouns, they unfortunately do not cover the same language spectrum as old fairy tale books. So we are going to collect our own. Using Stanford CoreNLP, we can collect a set of nouns using Grimm’s Book as a guide. There are about 2500 nouns there to give us a nice sample to play with. This makes our total matrix size ~ 2500 x 10,000.

Now that we have our nouns, let’s get down to business. We want to create an index to row to make a noun-idx and then create a sparse matrix for our word frequency matrix.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(ns hyperdimensional-playground.context-vectors
  (:require [clojure.core.matrix :as m]
            [clojure.core.matrix.linear :as ml]
            [clojure.string :as string]
            [hyperdimensional-playground.core :refer [rand-hv cosine-sim mean-add inverse xor-mul]]
            [hyperdimensional-playground.fairytale-nouns :refer [fairy-tales-nouns book-list]]))

(m/set-current-implementation :vectorz)
;; size of the hypervectors and freq matrix columns
(def sz 10000)
;; The nouns come from a sampling of Grimm's fairy tale nouns these will
;; make up the rows in the frequency matrix
(def noun-idx (zipmap fairy-tales-nouns (range)))
(def freq-matrix (m/new-sparse-array [(count fairy-tales-nouns) sz]))

The next thing we need to do is to have some functions to take a book, read it in, split it into documents and then update the frequency matrix.

Random indexing for the win

The interesting thing about the update method is that we can use random indexing. We don’t need to worry about having a column for each document. Because of the nature of hyperdimensions, we can randomly assign 10 columns for each document.

1
2
3
4
5
6
7
(defn update-doc!
  "Given a document - upate the frequency matrix using random indexing"
  [doc]
  (let [known-nouns (clojure.set/intersection fairy-tales-nouns (set doc))]
    ; use positive random indexing
    (doall (repeatedly 10 #(doseq [noun known-nouns]
                       (m/mset! freq-matrix (get noun-idx noun) (rand-int sz) 1))))))

The whole book is processed by slurping in the contents and using a regex to split it up into docs to update the matrix.

1
2
3
4
5
6
7
8
9
10
11
12
13
(defn process-book
  "Load a book and break it into sentence like documents and update the frequency matrix"
  [book-str]
  (let [book-text (slurp book-str)
        docs (partition 25 (map string/lower-case
                                (string/split book-text #"\s|\.|\,|\;|\!|\?")))
        doc-count (count docs)]
    (println "processing:" book-str "(with" doc-count "docs)")
    (doall (map-indexed (fn [idx doc]
                          (when (zero? (mod idx 1000)) (println "doc:" idx))
                          (update-doc! doc))
                        docs))
    (println "DONE with " book-str)))

We can now run the whole processing with:

1
2
(doseq [book book-list]
    (process-book book))

On my system, it only takes about 3 seconds.

Great! Now we have hypervectors associated with word frequencies. They are now context word vectors. What can we do with them.

How close is a king to a goat?

One of the things that we can do with them is find out a measure of how closely related the context of two words are by a measure of their cosine similarity. First, we need a handy function to turn a string word into a word vector by getting it out of our frequency matrix.

1
2
3
4
5
(defn wv [word]
  "Get a hypervector for the word from the frequency matrix"
  (let [i (get noun-idx word)]
    (assert (not (nil? i)) (str word " not found"))
    (m/slice freq-matrix i)))

Then we can make another nice function to compare two words and give a informational map back.

1
2
3
4
5
6
7
8
9
(defn compare-wvs
  "Compare two words and give the cosine distance info map"
  [word1 word2]
  (let [wv1 (wv word1)
        wv2 (wv word2)]
    (when (not= word1 word2)
      {:word1 word1
       :word2 word2
       :cosine (cosine-sim wv1 wv2)})))

Let’s take a look at the similarities of some words to king.

1
2
3
4
5
6
7
8
9
10
(sort-by :cosine[(compare-wvs "king" "queen")
                 (compare-wvs "king" "prince")
                 (compare-wvs "king" "princess")
                 (compare-wvs "king" "guard")
                 (compare-wvs "king" "goat")])
  ;; ({:word1 "king", :word2 "goat", :cosine 0.1509151478896664}
  ;;  {:word1 "king", :word2 "guard", :cosine 0.16098893367403827}
  ;;  {:word1 "king", :word2 "queen", :cosine 0.49470535530616655}
  ;;  {:word1 "king", :word2 "prince", :cosine 0.5832521795716931}
  ;;  {:word1 "king", :word2 "princess", :cosine 0.5836922474743367})

As expected, the royal family is closer to the king then a guard or goat is.

One of the interesting things is that now we can do addition and subtraction with these word vectors and see how it affects the relation with other words.

Boy + Gold = King, Boy + Giant = Jack

We can take a look at how close boy and king are together by themselves.

1
(cosine-sim (wv "boy") (wv "king")) ;=> 0.42996397142253145

Now we can add some gold to the boy and that new word vector will be closer to king than boy was alone.

1
2
(cosine-sim (mean-add (wv "boy") (wv "gold"))
              (wv "king")) ;=> 0.5876251031366048

Doing the same for boy and jack, we find that adding a giant moves the context closer.

1
2
3
4
(cosine-sim (wv "boy") (wv "jack")) ;=> 0.33102858702785953
;; boy + giant = jack
(cosine-sim (mean-add (wv "giant") (wv "boy"))
              (wv "jack")) ;=>0.4491473187787431

Amusingly, a frog and a princess make a prince.

1
2
;;; frog + princess = prince
  (cosine-sim (wv-add "frog" "princess") (wv "prince")) ;=> 0.5231641991974249

We can take this even farther by subtracting words and adding others. For example a similarity to the word queen can be obtained by subtracting man from king and adding woman.

1
2
3
;;; queen= (king-man) + woman
  (cosine-sim (wv "queen")
              (mean-add (wv "woman") (wv-subtract "king" "man"))) ;=>0.5659832204544486

Similarly, a contextual closeness to father can be gotten from subtracting woman from mother and adding man.

1
2
(cosine-sim (wv "father")
              (mean-add (wv "man") (wv-subtract "mother" "woman"))) ;=>0.5959841177719538

But wait, that’s not all. We can also do express facts with these word vectors and reason about them.

Reasoning with word vector with the database as a hyperdimensional value

The curious nature of hypervectors allows the storage of multiple entity, attributes in it and allow the retrieval of the likeness of them later by simple linear math – using only xor multiplication and addition. This gives us the database as a value in the form of a high dimensional vector.

For an example, say we want to express the fact that Hansel is a brother of Gretel. We can do this by adding the xor product of brother with hansel and the product of brother with Gretel.

1
2
3
4
5
6
;; hansel is the brother of gretel
;; B*H + B*G
(def hansel-brother-of-gretel
   (mean-add
    (xor-mul (wv "brother") (wv "hansel"))
    (xor-mul (wv "brother") (wv "gretel"))))

Also we can express that Jack is a brother of Hansel.

1
2
3
4
(def jack-brother-of-hansel
  (mean-add
     (xor-mul (wv "brother") (wv "jack"))
     (xor-mul (wv "brother") (wv "hansel"))))

We can add these two facts together to make a new hypervector value.

1
2
(def facts (mean-add hansel-brother-of-gretel
                       jack-brother-of-hansel))

Now we can actually reason about them and ask questions. Is Jack a brother of Hansel? With a high cosine similarity, we can assume the answer is likely.

1
2
3
4
5
 ;; is jack the brother of hansel?
  (cosine-sim
   (wv "jack")
   (xor-mul (mean-add (wv "brother") (wv "gretel"))
            facts)) ;=>0.8095270629815969

What about someone unrelated. Is Cinderella the brother of Gretel? – No

1
2
3
4
5
 ;; is cinderella the brother of gretel ?
  (cosine-sim
   (wv "cinderella")
   (xor-mul (mean-add (wv "brother") (wv "gretel"))
            facts)) ;=>0.1451799916656951

Is Jack the brother of Gretel – Yes

1
2
3
4
5
 ;; is jack the brother of gretel ?
  (cosine-sim
   (wv "jack")
   (xor-mul (mean-add (wv "brother") (wv "gretel"))
            facts)) ;=> 0.8095270629815969

We can take this further by adding more facts and inventing a relation of our own.

Siblings in Hyperspace

Let’s invent a new word vector that is not in our nouns – siblings. We are going to create new random hypervector to represent it.

1
(def siblings (rand-hv))

We will define it in terms of word vectors that we already have. That is, siblings will be a the sum of brother + sister. We XOR multiply it by siblings to associate it with the hypervector.

1
2
(def siblings-brother-sister
    (mean-add (xor-mul siblings (wv "brother")) (xor-mul siblings (wv "sister"))))

Now we can add some more facts. Gretel is a sister of Hansel.

1
2
3
4
5
6
 ;; gretel is the sister of hansel
  ;; S*G + S*H
  (def gretel-sister-of-hansel
    (mean-add
     (xor-mul (wv "sister") (wv "gretel"))
     (xor-mul (wv "sister") (wv "hansel"))))

Gretel is also a sister of Jack.

1
2
3
4
5
6
  ;; gretel is the sister of jack
  ; S*G + S*H
  (def gretel-sister-of-jack
    (mean-add
     (xor-mul (wv "sister") (wv "gretel"))
     (xor-mul (wv "sister") (wv "jack"))))

Collecting all of our facts into one hypervector (as a database).

1
2
3
4
5
 (def facts (mean-add hansel-brother-of-gretel
                       jack-brother-of-hansel
                       gretel-sister-of-jack
                       gretel-sister-of-hansel
                       siblings-brother-sister))

Now we can ask some for questions.

Are Hansel and Gretel siblings? – Yes

1
2
3
4
;; are hansel and gretel siblings?
  (cosine-sim
   (mean-add (wv "hansel") (wv "gretel"))
   (xor-mul siblings facts)) ;=>0.627015379034067

Are John and Roland siblings – No

1
2
3
4
;; are john and roland siblings?
  (cosine-sim
   (mean-add (wv "roland") (wv "john"))
   (xor-mul siblings facts)) ;=> 0.1984017637065277

Are Jack and Hansel siblings? – Yes

1
2
3
  (cosine-sim
    (mean-add (wv "jack") (wv "hansel"))
    (xor-mul siblings facts)) ;=>0.48003572523507465

It is interesting to think of that nothing is stopping us at this point from retracting facts by simply subtracting the fact encoded word vectors from our “database” value and making a new value from it.

Conclusions

In this fun, but casual exploration of word vector we have seen the potential for reasoning about language in a way that uses nothing more complicated than addition and multiplication. The ability to store dense information in hypervectors, extract it with simple methods, and flexibly collect it randomly, shows its versatility and power. Hyperdimensional vectors might hold the key to unlocking a deeper understanding of cognitive computing or perhaps even true artificial intelligence.

It is interesting to note that this technique is not limited to words. Other applications can be done the same way. For example a video recommendation using a hypervector with movie titles. Or perhaps even anomaly detection using sensor readings over a regular weekly time period.

Looking over our journey with word vectors. At the beginning it seemed that word vectors were magical. Now, after an understanding of the basics, it still seems like magic.

If you are interested in exploring further, feel free to use my github hyperdimensional-playground as a starting point.

Why Hyperdimensional Socks Never Match

Published on:
Tags: All, Clojure

The nature of computing in hyperdimensions is a strange and wonderful place. I have only started to scratch the surface by reading a paper by Kanerva. Not only is it interesting from a computer science standpoint, it’s also interesting from a cognitive science point of view. In fact, it could hold the key to better model AI and general reasoning. This blog is a casual stroll through some of the main points of Kanerva’s paper along with examples in Clojure to make it tangible. First things first, what is a hyperdimension?

What is a Hyperdimension and Where Are My Socks?

When we are talking about hyperdimensions, we are really talking about lots of dimensions. A vector has dimensions. A regular vector could have three dimensions [0 1 1], but a hyperdimensional vector has tons more, like 10,000 or 100,000. We call these big vectors hypervectors for short, which makes them sound really cool. Although the vectors could be made up of anything, we are going to use vectors made up of zeros and ones. To handle big computing with big vectors in a reasonable amount of time, we are also going to use sparse vectors. What makes them sparse is that most of the space is empty, (zeros). In fact, Clojure has a nice library to handle these sparse vectors. The core.matrix project from Mike Anderson is what we will use in our examples. Let’s go ahead and make some random hypervectors.

First we import the core.matrix libraries and set the implementation to vectorz which provides fast double precision vector math.

1
2
3
4
5
(ns hyperdimensional-playground.core
  (:require [clojure.core.matrix :as m]
            [clojure.core.matrix.linear :as ml]))

(m/set-current-implementation :vectorz)

Next we set the sz of our hypervectors to be 100,000. We also create a function to generate a random sparse hypervector by choosing to put ones in about 10% of the space.

1
2
3
4
5
6
7
8
(def sz 100000)

(defn rand-hv []
  (let [hv (m/new-sparse-array [sz])
        n (* 0.1 sz)]
    (dotimes [i n]
      (m/mset! hv (rand-int sz) 1))
    hv))

Now we can generate some.

1
2
3
4
5
6
(def a (rand-hv))
(def b (rand-hv))
(def c (rand-hv))
(def d (rand-hv))

a ;=> #vectorz/vector Large vector with shape: [100000]

You can think of each of this hypervectors as random hyperdimensional sock, or hypersock, because that sounds cooler. These hypersocks, have curious properties. One of which is that they will ~never match.

Hypersocks never match

Because we are dealing with huge amount of dimensions, a mathematically peculiar probability distribution occurs. We can take a random hypervector to represent something, then take another one and they will different from each by about 100 STD. We can take another one and it too, will be 100 STD from the other ones. For practical purposes, we will run out of time before we will run of vectors that are unrelated. Because of this, any two hypersocks will never match each other.

How can we tell how similar two hypersocks are? The cosine to tell the similarity between two vectors. This is determined by the dot product. We can construct a cosine similarity function to give us a value from -1 to 1 to measure how alike they are with 1 being the same and -1 being the complete opposite.

1
2
3
(defn cosine-sim [v1 v2]
  (/ (m/dot v1 v2)
     (* (ml/norm v1) (ml/norm v2))))

If we look at the similarity of a hypervector with itself, the result is ~1. With the other random hypervectors, it is ~0.

1
2
3
4
5
6
(cosine-sim a a) ;=>  1.0
(cosine-sim d d) ;=> 1.0

(cosine-sim a b) ;=>  0.0859992468320239
(cosine-sim b c) ;=> 0.09329186588790261
(cosine-sim a c) ;=> 0.08782018973001954

There are other cool things we can do with hypervectors, like do math with them.

The Hamming Distance of Two Hypersocks

We can add hypervectors together with a sum mean vector. We add the vector of 1s and 0s together then we divide the resulting vector by the number of total vectors. Finally, to get back to our 1s and 0s, we round the result.

1
2
3
(defn mean-add [& hvs]
  (m/emap #(Math/round (double %))
   (m/div (apply m/add hvs) (count hvs))))

The interesting thing about addition is that the result is similar to all the vectors in it. For example, if we add a and b together to make x, x = a + b, then x will be similar to a and similar to b.

1
2
3
4
;; x = a + b
(def x (mean-add a b))
(cosine-sim x a) ;=> 0.7234734658023224
(cosine-sim x b) ;=> 0.7252586504505658

You can also do a very simple form of multiplication on vectors with 1s and 0s with using XOR. We can do this by add the two vectors together and then mapping mod 2 on each of the elements.

1
2
3
(defn xor-mul [v1 v2]
  (->> (m/add v1 v2)
      (m/emap #(mod % 2))))

We can actually use this xor-mul to calculate the Hamming distance, which is an important measure of error detection. The Hamming distance is simply the sum of all of the xor multiplied elements.

1
2
3
4
5
6
(defn hamming-dist [v1 v2]
  (m/esum (xor-mul v1 v2)))

(hamming-dist [1 0 1] [1 0 1]) ;=> 0
(hamming-dist [1 0 1 1 1 0 1] [1 0 0 1 0 0 1]) ;=> 2
(hamming-dist a a) ;=> 0

This illustrates a point that xor multiplication randomizes the hypervector, but preserves the distance. In the following example, we xor multiply two random hypervectors by another and the hamming distance stays the same.

1
2
3
4
5
6
7
8
9
10
11
12
; xa = x * a
; ya = y * a
; hamming distance of xa is the same as ya


;; multiplication randomizes but preserves the distance
(def x (rand-hv))
(def y (rand-hv))
(def xa (xor-mul x a))
(def ya (xor-mul y a))
(hamming-dist xa ya) ;=> 1740.0
(hamming-dist x y) ;=> 1740.0

So you can xor multiply your two hypersocks and move them to a different point in hyperspace, but they will still be the same distance apart.

Another great party trick in hyperspace, is the ability to bind and unbind hypervectors for use as map like pairs.

Using Hypervectors to Represent Maps

A map of pairs is a very important data structure. It gives the ability to bind symbols to values and then retrieve those values. We can do this with hypervectors too. Consider the following structure:

1
2
3
{:name "Gigasquid"
 :cute-animal "duck"
 :favorite-sock "red plaid"}

We can now create hypervectors to represent each of these values. Then we can xor the hypervector symbol to the hypervector value and sum them up.

1
2
3
4
5
6
7
8
9
10
;; data records with bound pairs
(def x (rand-hv)) ;; favorite-sock
(def y (rand-hv)) ;; cute-animal
(def z (rand-hv)) ;; name
(def a (rand-hv)) ;; red-plaid
(def b (rand-hv)) ;; duck
(def c (rand-hv)) ;; gigasquid

;H = X * A + Y * B + Z * C
(def h (mean-add (xor-mul x a) (xor-mul y b) (xor-mul z c)))

Now, we have a sum of all these things and we want to find the value of the favorite sock. We can unbind it from the sum by xor multiplying the favorite-sock hypervector x. Because of the property that xor multiplication both distributes and cancels itself out.

1
(hamming-dist (xor-mul x (xor-mul x a)) a) ;=> 0

We can compare the result with the known values and find the closest match.

1
2
3
4
5
6
7
(hamming-dist a (xor-mul x h)) ;=> 1462.0  ;; closest to "red-plaid"
(hamming-dist b (xor-mul x h)) ;=> 1721.0
(hamming-dist c (xor-mul x h)) ;=> 1736.0

(cosine-sim a (xor-mul x h)) ;=> 0.3195059768353112 ;; closest to "red-plaid"
(cosine-sim b (xor-mul x h)) ;=> 0.1989075567830733
(cosine-sim c (xor-mul x h)) ;=> 0.18705233578983288

Conclusion

We have seen that the nature of higher dimensional representation leads to some very interesting properties with both representing data and computing with it. These properties and others form the foundation of exciting advancements in Cognitive Computing like word vectors. Future posts will delve further into these interesting areas. In the meantime, I encourage you to read Kanerva’s paper on your own and to find comfort in that when you can’t find one of your socks, it’s not your fault. It most likely has something to do with the curious nature of hyperspace.

Thanks to Ross Gayler for bringing the paper to my attention and to Joe Smith for the great conversations on SDM

Book of Software Miracles

Published on:
Tags: All

"Book of Software Miracles - Cover"

This 21st century illustrated manuscript was recently uncovered. It depicts miraculous phenomena in software engineering and represents one of the most spectacular new discoveries in the field of Renaissance Computer Science.

Some examples of the glorious illustrations and text translations are presented here.

"Book of Software Miracles Fol. 26"

On the day in 1958 that John McCarthy invented LISP, three suns appeared in the East in the morning sky which moved towards each other so that they merged into one.

"Book of Software Miracles Fol. 72"

In the year 1952, countless software bugs appeared in the land after being coined by Grace Hopper during the invention of the COBOL language.

"Book of Software Miracles Fol. 83"

In the year of 2007, a great and wonderful comet appeared in the sky. This was followed by the invention of the Clojure language by Rich Hickey.

"Book of Software Miracles Fol. 91"

A month after the NPM Node repository burst its banks in the year of 2015, a wondrous creature appeared from the JavaScript ecosystem. Found dead after the raging subsided, was in this shape and form and is painted here.

For more information about this remarkable book visit Book of Software Miracles

Can a Programming Language Make You Smarter?

Published on:
Tags: All

All programming languages are not created equal. Some clearly excel at solving different problems. There are languages that are great for scalability and others that are great for proving correctness. But what about one that that will make you smarter? To look for this answer, I am going to turn to a different language, our human language, and research from the field of linguistics and cognitive development.

A new language is born

Up until the late 1970s, there was no school for the deaf In Nicaragua. Being deaf before then meant only developing crude signs with your family and friends. Beyond that there was nothing. Finally, the wife of the dictator set up a special school for them. This was the first time that the children were brought together in one place rather than being scattered about. For most of them, it was the first time that they had ever met another deaf person. There were about fifty kids in that first class and they came together with fifty different ways of communication with crude gestures. But after awhile, something amazing happened. They started to converge into a common system – a language was born. For scientists, this was an incredible opportunity to watch a new language develop. One of the fascinating insights had to do with how the this sign language evolved over the generations of children entering the school. One in particular, was a study about how words in a language can actually increase your cognitive capacity and make people smarter.

The study involved a test that was performed on different generations of students from the school, including the very first students who were now grown and younger children. A comic strip shown to the participants. In it there are two brothers. The big brother is playing with the train and the little brother wants the play with it. The big brother puts it under the bed and goes out to the kitchen to get a sandwich. While the big brother is gone, the little brother takes out the train and hides it in the toy box. The question for the test subjects is, “When the big brother comes back, where is he going to look for the train?”

Thinking about thinking

For most kids over the age of five, the answer will be that the big brother will look under the bed because he doesn’t know that it has been moved to the toy box. The interesting thing was that the first generation of kids that went through the school, now thirty five years old, failed this simple test. The younger ones all passed. What was going on?

To find out, the scientists looked a differences in the language over the generations and found that the older signers lacked words for the concept of thinking. The earlier generation just had one word for think, while the later generations had evolved ten or twelve. Words that expressed concepts like understand and believe. It seems that having words gave them access to a concept that otherwise would have been really hard to think about. It increased their cognitive capacity. It enabled them to think about thinking.

Thinking about programming programs

Programming languages allow us to translate our human thoughts and instructions to machines. We have words to express simple concepts like adding two numbers together with + or finding how many elements are in a list with count, but what about harder concepts that are more difficult to access? Is there an analogy to having words to think about thinking that some programming languages have that others don’t? I believe there is. It’s having a way to think about a program programming itself with self modifying code or macros.

In Clojure, macros are way to do meta-programming. This involves ability to treat the program as data, allowing a program to modify another program – even its own. It gives Clojure the capability to implement language features, make code more concise, and encapsulate patterns and repetitive code. It is a powerful feature and more to the point of our current line of inquiry, it is a pretty friggin difficult concept to grok. In fact, before I was exposed to a language that had macros in it, I doubt that I would have been able to think about it. For the sake of argument, let’s go back to the comic strip story with the two brothers. But this time, the train is a function and the little brother uses a macro on it when the big brother gets a snack in the kitchen. What does the big brother expect the result of the program to be when he gets back in the room. Prior to my exposure to Clojure and macros, I am sure I would have failed the test.

Conclusion

So yes, I do think that a programming language can increase your cognitive capacity and make you smarter. I certainly feel that my exposure had expanded my world and given me access to programming concepts that I did not have with my other programming experience. I also feel certain that there are other languages out there, both existing and yet to be invented, that have words to express concepts that am not even aware of. Languages that are just waiting for me to expand my horizons.

And that certainty makes me incredibly happy and grateful to be a programmer.

Listen to whole fascinating story of Ann Senghas studying sign language that born in Nicaragua on Radiolab

The Key to Moving Fast Is Feedback

Published on:
Tags: All

The last time I went to the dentist to get a cavity filled, I got a shot of Novocain as a local anesthetic to numb the area. After I emerged from the dental chair, I still had no feeling in my lip and mouth area. Trying to talk and smile was comical. Trying to drink tea resulting in dribbles down my face. Luckily, the loss of feeling only lasted a couple hours. Soon, I was back to normal. Even in this small and common scenario, we can see the close link between motion and sensory perception. What would happen if you couldn’t feel your whole body?

This is exactly what happened in the tragic and amazing story of Ian Watermann. At age 19, he had what appeared to be the flu, but was actually a rare disease of the nervous system that left him unable to feel any sensation of touch below the neck. Unable to process this sensory information, he was totally unable to walk or move his body. This is despite the fact that his motor system was left almost untouched by the disease. He had the capacity to move, but without the sensory feedback he was paralyzed.

Ian was told that he would spend the rest of his life in a wheelchair, but in an inspiring show of determination he recovered mobility. With the help of physiotherapists, he retrained his brain to substitute visual feedback for touch feedback. He could move his hands and legs while he could see them. Finally, he could walk again with his sight and concentration. If the visual feedback is taken away, (lights turned off), the ability to move vanishes.

Feedback is the essential ingredient for human motion. But what about businesses, startups, and software projects? One of the challenges that all of these face is the ability to move fast. This ability to move is not unlike that in our own bodies. Movement and action are a result of a complex, interdependent system. In the case of businesses, they are made of people, technology, and processes. Like the human body, it is critically dependent on feedback. A business can have the capacity for action, but without the ability to process feedback it is essentially immobilized.

If you want to move fast, single most important thing you can do is to improve your feedback at every level.

If you want to add new technology, ask how it improves your feedback.

  • Feedback from the market
  • Feedback from your users
  • Feedback from your operational systems
  • Feedback from your code
  • Feedback from your employees
  • Feedback from yourself

The best feedback systems are distributed and multi-layered. A single market feedback signal to the CEO is good, but much better is one that gets shared among the many facets of a company. A distributed and interconnected network of feedback will help companies and projects be responsive, fast, nimble, and resilient.

The next time you hear, “How can we go faster”?“ ask the question “How can we increase feedback?” You’ll be on the right track to fly.

Gigasquid’s Radar 2015

Published on:
Tags: All

It’s that time of the year again for radars. Since I made one last year, I decided to continue the tradition and make one this year.

Languages

No changes from last year for Clojure and Pixie.

  • Adopt: Clojure – It is fantastic language.
  • Trial: Pixie – The language continues to grow and improve and has a great community – and it is a LISP.
  • Assess: Elixir – Another great functional language leveraging the Erlang VM. It has a lot of energy in the community.
  • Hold: Java – There are plenty of other great alternatives out there on the JVM.

Cute Animals

Alpacas have moved up from trial last year, and Llamas to hold. Alpacas are clearly more fluffy.

Artificial Intelligence

I haven’t been too into robots this year, so the category has been changed to AI.

  • Adopt: Weka – A collection of machine learning algorithms and explorers. I was very impressed with it when I tried it on a speech acts classifier project.
  • Trial: Word Vectors – A way to encode words and leverage training so that you can see the relation between them. For example, a dog is closer to the word wolf than cat.
  • Assess: Tensor Flow – I have only scratched the surface with this, but it is an exciting open source computational pipeline for machine learning (and other things). The tutorials are pretty amazing in quality alone.
  • Hold: Logical Inference – This is pretty bold, but I base it on Professor Geoff Hinton’s Deep Learning talk (at 33:00) where he explains that thought vectors, (made from word vectors), will be able to perform natural reasoning with only linear algebra. The grand dream of old fashioned strong AI, will then be realized in a completely different way.

Tasty Food

  • Adopt: Stilton Cheese – The king of cheeses.
  • Trial: Cocoa Nibs – I have to admit I am getting into crunching on some nibs in the afternoon. Best of all, no sugar!
  • Assess: Cold Sake – I was firmly in the warm sake camp, until I tried some very nice cold sakes. However, in the winter, warm is so nice …
  • Hold: Fruit Cake – Traditions are important, but sometimes you need to let it go.

Special Thanks

The radar this year was generated by a cool github project that uses Clojure and Quill to generate the image based off of a json config file. So go out there and make a radar of your own.

Happy Holidays!

Speech Act Classification for Text With Clojure

Published on:
Tags: All, Clojure

We humans are quite wonderful. We do amazing things every day without even realizing it. One of them, you are doing right now. You are reading text. Your brain is taking these jumbles of letters and spaces in this sentence, which in linguist terms is called an utterance, and making sense out of it. The individual meanings of sentences might be quite complex.

Take for example the utterance, “I like cheese”. To understand it properly, you need to know the meanings of the individual words. In particular, you would need to know that cheese is a tasty food stuff that is made from milk. This would be a detailed and full understanding. But there is a higher level of understanding that we can look at called Speech Acts.

Speech Acts are way of classifying our communication according to purposes. We perform speech acts when we ask questions, make statements, requests, promises, or even just say thanks. These speech acts cross languages. When I ask a question in English, it is the same speech act as if I ask it in French. In fact, if we were to travel to another planet with aliens, we can assume if they had a language, it would involve speech acts. It should be no surprise then, to communicate effectively with machines, it will have to understand speech acts.

To explore this communication we are going to consider only three speech acts:

  • Statements – “I like cheese.”
  • Questions – “How do you make cheese?”
  • Expressives – “Thank you”

Our goal is to have our program be able to tell the difference between these three speech acts – without punctuation.

Why not use punctuation? If you are having a conversation with a human over Slack or some other chat channel, you may or may not put in a question mark or period. To have a computer be able to converse as naturally with a human as another human, it will have to understand the speech act without the aid of punctuation.

Generally, we want to have the computer:

  1. Read in an utterance/text that may or may not have punctuation.
  2. Classify whether the result is a statement, question, or expressive.

To tackle this problem, we are going to have to break this up into two main parts. The first is parsing the text and annotating it with data. The second is to classify the text based on the data from the parsing.

Parsing and Annotating Text with Stanford CoreNLP

The Stanford CoreNLP is considering the state of the art for POS, (Part of Speech), tagging and other linguistic annotations. It also is a Java library, so very easy to use from Clojure.

Here we are using a simple wrapper library called stanford-talk to take in some text and process it. The result is a list of tokens for each word in the :token-data map. Each token is annotated with the POS tag. There is a lot more data in the annotations that we can look at to give us insight into this text. But, to keep things simple, we are just going to look at the POS speech tag at the moment.

1
2
3
4
5
6
7
(process-text "I like cheese")

;{:token-data
; ({:sent-num 0, :token "I", :pos "PRP", :net "O", :lemma "I", :sentiment "Neutral"}
;  {:sent-num 0, :token "like", :pos "VBP", :net "O", :lemma "like", :sentiment "Neutral"}
;  {:sent-num 0, :token "cheese", :pos "NN", :net "O", :lemma "cheese", :sentiment "Neutral"}),
; :refs [[{:sent-num 0, :token "I", :gender "UNKNOWN", :mention-type "PRONOMINAL", :number "SINGULAR", :animacy "ANIMATE"};]]}

So the text “I like cheese” has the following POS tags list of all POS tags:

  • I = PRP Personal Pronoun
  • like = VBP Verb, non-3rd person singular present
  • cheese – Noun, singular or mass

This is great. We have some data about the text we can analyze. The next thing to do is to figure out how to classify the text based on this data.

Classification with Weka

Weka is a collection of machine learning algorithms. There is a program for interactive exploration of data sets, as well as a java library so you can use it programatically. Speaking of data sets, we need some. Just having one sentence about liking cheese is not going to get us very far with any machine learning.

So where can you go to get conversational questions and statements on the internet? Well, one place that is pretty good for that is answers.com. We can scrape some pages for questions and answers. Enough so that we can collect and cleanup some input files of

  • ~ 200 statements
  • ~ 200 questions

The expressives were a bit more difficult. Let’s just make a list of about 80 of them.

Now, we have a list of text data. We need to decide on some features and generate some input files to train the classifiers on.

Choosing Features for Classification

First, what is a feature? A feature is some sort of encoding of the data that the computer is going to consider for classification. For example, the number of nouns in a sentence could be a feature. There is a whole field of study dedicated to figuring out what the best features for data for machine learning are. Again, to keep things simple, we can take an educated guess on some features based on a good paper:

  • Sentence length
  • Number of nouns in the sentence (NN, NNS, NNP, NNPS)
  • If the sentence ends in a noun or adjective (NN, NNS, NNP, NNPS, JJ, JJR, JJS)
  • If the sentence begins in a verb (VB, VBD, VBG, VBP, VPZ)
  • The count of the wh, (like who, what) markers (WDT, WRB, WP, WP$)

We can now go through our data file, generate our feature data, and output .arff file format to ready it as training file for weka.

Raw question file example:

1
2
3
4
What is floater exe
Is bottled water or tap water better for your dog
How do you treat lie bumps on the tongue
Can caffeine be used in powder form

Arff file with features

1
2
3
4
5
6
7
8
9
10
11
12
13
14
@relation speechacts

@attribute       sen_len            numeric
@attribute       nn_num             numeric
@attribute       end_in_n           numeric
@attribute       begin_v            numeric
@attribute       wh_num             numeric
@attribute       type               {assertion,question,expressive}

@data
4,2,1,0,1,question
10,4,1,0,0,question
9,3,1,0,1,question
7,3,1,0,0,question

Now that we have our input file to training our machine learning algorithms, we can start looking at classifiers.

Choosing the Classifier

Using the weka explorer, we can try out different classification models. For this data, the best one seems to be the Random Forest. In the explorer, it beat out Naive Bayes and J48. It is also worth mentioning that we are not using a separate source of test data, we are cross validating on the original training set. If we wanted to be more rigorous, we could collect more data and cut it in half, using one set for the training and one set for the testing.

Now that we have a classifier, we can create some Clojure code with the java library to use it.

Using the Weka Classifier from our Clojure Code

After importing the needed Java classes into our Clojure code, we can create the Random Forest classifier.

1
(def classifier (new RandomForest))

We then create a function that will load our arff input file as a datasource

1
2
3
(defn get-datasource [fname]
  (new ConverterUtils$DataSource
       (.getResourceAsStream (clojure.lang.RT/baseLoader) fname)))

And another uses it to train the classifier, returning a map of the evaluator and data that we will need for our predictions.

1
2
3
4
5
6
7
8
9
10
(defn train-classifier [fname]
  (let [source (get-datasource fname)
        data (.getDataSet source)
        _ (.setClassIndex data (dec (.numAttributes data)))
        _  (.buildClassifier classifier data)
        e (new Evaluation data)]
    (.crossValidateModel e classifier data (.intValue (int 10)) (new Random 1) (into-array []))
    (println (.toSummaryString e))
    {:evaluator e
     :data data}))

Now, we need to be able to ask about a classification for a particular instance of new data. This is going to be where we are parsing new text and asking for an answer from our trained model. To do this, we need to generate an instance for the evaluation to look at. It is constructed from numbers in the same order as our arff file. The exception is that we are not going to provide a value for the final field of the speech act type. We will assign that to the a missing value.

1
2
3
4
5
6
7
8
9
10
(defn gen-instance [dataset [val0 val1 val2 val3 val4]]
  (let [i (new Instance 6)]
    (doto i
      (.setValue 0 (double val0))
      (.setValue 1 (double val1))
      (.setValue 2 (double val2))
      (.setValue 3 (double val3))
      (.setValue 4 (double val4))
      (.setValue 5 (Instance/missingValue))
      (.setDataset dataset))))

Now we can use this function in a prediction function to get our answer back

1
2
3
4
5
6
(defn predict [ev d vals]
  (let [v  (.evaluateModelOnce ev classifier (gen-instance d vals))]
    (case v
      0.0 :statement
      1.0 :question
      2.0 :expressive)))

Calling the predict function would look something like:

1
2
3
4
5
(def results (train-classifier "speech-acts-input-all.arff"))
(def predictor (:evaluator results))
(def data (:data results))
(predict predictor data [1 1 1 0 0])
;; -> :expressive

Now that we have the parsing piece and the classification piece, we can put everything together.

Putting it all together

We finally have all the details we need write a classify-text function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(defn classify-text [text]
  (let [stats (parser/gen-stats text)
        features [(:sen-len stats)
                  (:nn-num stats)
                  (:end-in-n stats)
                  (:begin-v stats)
                  (:wh-num stats)]]
    (weka/predict predictor data features)))

(classify-text "I like cheese")
;; -> :statement
(classify-text "How do you make cheese")
;; -> :question
(classify-text "Right on")
;; -> :expressive

Yea! It worked. We finally have something that will read in text and tell us its best guess of a speech act, all without punctuation. Let’s quickly review what we have done.

Summary

  • Gather data sets of statements, questions, and expressives
  • Parse text and annotate it with POS tags using Stanford CoreNLP
  • Choose features of the data to analyze and generate arff files
  • Use Weka explorer to try out the best classification algorithims
  • Programatically use weka to train classifier and predict a new instance
  • Write a program to tie it all together

It’s funny how a simple thing like asking whether something is a statement or question gets you knee deep in Natural Language Processing and Machine Learning pretty fast.

We’ve learned a lot, now let’s have a bit of fun. Now that we can classify speech acts, we can make a sort of proto chat bot with a really limited responses.

Proto Chat Bot

Here we are going to be a bit loose and actually check if a question mark is used. If it is, we will automatically mark it as a question. Otherwise, we will classify it.

1
2
3
4
5
6
7
8
9
(defn respond [text]
  (let [question-mark? (re-find  #"\?$" text)
        type (if question-mark?
               :question
               (classify-text text))]
    (case type
      :question "That is an interesting question."
      :statement "Nice to know."
      :expressive ":)")))

We just need a quick repl and main function now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(defn repl []
  (do
    (print ">> ")
    (flush))
  (let [input (read-line)]
    (if-not (= input "quit")
     (do
       (println (try (c/respond input)
                     (catch Exception e (str "Sorry: " e " - " (.getMessage e)))))
       (recur))
     (do (println "Bye!")
         (System/exit 0)))))

(defn -main [& args]
  (println "Hello.  Let's chat.")
  (flush)
  (repl))

Testing it out with lein run, we can have a little chat:

1
2
3
4
5
6
7
8
9
10
11
Hello.  Let's chat.
>> Hi
:)
>> Do you know where I can go to buy cheese
That is an interesting question.
>> I am a big cheese fan
Nice to know.
>> you are quite smart
Nice to know.
>> bye
:)

Want more? Check out the code https://github.com/gigasquid/speech-acts-classifier.

Special thanks to Paul deGrandis for allowing me to pick his awesome brain on AI things

Conversations With Datomic - Part 3

Published on:

This is a continuation of the first and second conversations in which topics such as creating databases, learning facts, querying, and time traveling were discussed. Today’s topics include architecture, caching, and scaling.

Human: Hello again Datomic. Ready to talk again?

Datomic: Sure. I think you wanted to ask me some questions about how I would fit in with your other systems.

Human: Yes. Like I was saying earlier, I think your abilities to learn facts, reason about them, and keep track of the history of all those facts is really great. I am interested in having you work with me every day, but first I want to understand your components so that I can make sure you are a good fit for us.

Datomic: I would be happy to explain my architecture to you. Perhaps showing you this picture is the best way to start.

I am made of three main parts: my transactors, my peers, and my storage.

Human: What is a peer?

Datomic: A peer is an application that is using the peer library. In our last conversations, we were talking through the Clojure api with datomic.api. The application, or process, that is running this api is called a peer. There can be many of these, all having conversations with me.

Human: The peers then talk to your transactor?

Datomic Yes. The peers talk to my transactor whenever you call transact with the peer library. It is the central coordinator between all the peers and processes the requests using ACID transactions, and then sends the facts off to storage.

Human: Could you remind me what ACID stands for again? I always forget. The first one is Atomic right?

Datomic: That is right. I am Atomic in that every transaction you send to me is all or nothing. If for some reason, one part of it fails, I will reject the entire transaction and leave my database unchanged.

The C is for Consistency. This means that I provide every peer with a consistent view of facts over time and transactions. I provide a global ordering of transactions across all the peers with my transactor and peers will always see all the transactions up to their current time without any gaps.

Human: What if a peer is behind the global time? How do they catch up to know about the new facts that were transacted by a different peer?

Datomic: After one peer sends me a successful transaction with some new facts, I will notify all the peers about them.

Human: Cool. That takes care of the A and C in ACID. What about the I?

Datomic: It stands for Isolated. It makes sure that even through there are many peers having conversations with me, transactions are executed serially. This happens naturally with my transactor. Since there is only one transactor, transactions are always executed serially.

Human: In the picture, why are there are two transactors then?

Datomic: Oh, that is for High Availability. When I startup my system, I can launch two running transactors, but hold one in reserve. Just on the off chance something happens to the main one, I will swap in the failover one to keep things running smoothly.

The final D in ACID is for Durability. Once a transaction has been committed by my transactor, it is shipped off to storage for safe keeping.

Human: What exactly is this storage?

Datomic: Instead of storing datoms, I send segments, which are closely related datoms, to storage. I have quite a few options for storage:

  • Dev mode – which just runs within my transactor and writes to the local file system.
  • SQL database
  • DynamoDB
  • Cassandra
  • Riak
  • Couchbase
  • Infinispan memory cluster

Human: Which one is the best to use?

Datomic: The best one to use is the one that you are already have in place at work. This way, I can integrate seamlessly with your other systems.

Human: Oh, we didn’t really talk about caching. Can you explain how you do that?

Datomic: Certainly. It is even worth another picture.

Each peer has a its own working set of recent datoms along with a index to all the rest of the datoms in storage in memory. When the peer has a query for a datom, it first checks to see if it has it locally in its memory cache. If it can’t find it there, then it will ask for a segment of that datom from storage. Storage will return that datom along with other related datoms in that segment so that the peer can cache that in memory to make it faster for the next query.

Human: That seems really different from other databases, where the client constantly requests queries from a server.

Datomic: Yes. When most questions can be answered from the local memory, responses are really fast. You don’t need to hit storage unless you really need too. You can even add an extra layer of caching with memcached.

Human: That sounds great. I can’t wait tell you about all of our data. We talked a bit about your querying ability earlier, can you do the same queries that our other standard relational databases do, like joins?

Datomic: Oh yes. In fact, with me, you don’t need to specify your joins explicitly. I use Datalog, which is based on logic, so my joins are implicit. I will figure out exactly what I need to put together to answer your query without you having to spell it out for me.

Human: Ok. I know that I can map some of my data that is already in other database tables to you. What about other types of irregular data, like graphs, or sparse data.

Datomic: I am actually very proud of my data model. It is extremely flexible. Since I store things on such a granular datom level, you don’t need to map your logical data model to my physical model. I can handle rectangular table shaped data quite happily along with graph data, sparse data, or other non-rectangular data.

Human: That sounds great. What do I need to know about your scaling?

Datomic: I really excel at reads. All you have to do is elastically add another peer to me for querying. I am not really a good fit for write scale, like big data, or log file analysis. You will find me most happy with data that is valuable information of record and has history that is important, like transaction, medical, or inventory data. I am also really good at being flexible for development and operations since I can use many different types of storage. I have worked with many web and cloud apps.

Human: Thanks for answering all my questions. I think you might fit in quite well with our other systems.

Datomic: Great!

Human: One more thing, this conversation has been great, but do you have any training resources for me and my other human coworkers?

Datomic: Sure thing. There are a few really good resources on the Datomic Training Site. I would suggest watching the videos there and pairing them with:

Also, if you want to confirm that your data is good fit for me, I suggest you describe your data to the Datomic Google Group. They are nice and knowledgeable group of humans.

Human: Thanks again Datomic! I will grab another cookie and check it out!

Datomic: What is it with humans and cookies?…

Conversations With Datomic Part 2

Published on:

The following is a continuation of the first conversation which touched on schema creation and querying. This conversation includes learning new facts, time, and the sometimes unfortunate reality of lawyers.

Human: Hi Datomic. I am back from my tea and cookies break. I really enjoyed talking with you, could we continue our conversation?

Datomic: Certainly. Let me see, where did we leave off?

Human: Let me check my notes. Oh yes, we had created a database and filled it with facts about some dogs and owners. You showed me how to ask you things about the facts, like which dogs liked cheese.

Datomic: Ah yes. We were just getting to the really interesting part about how I learn new facts without forgetting the past ones.

Human: Can you please explain a bit more about how you learn new facts?

Datomic: Sure. I learn facts when you tell me to add a new fact about some entity, or to retract a fact about it. It will probably make more sense with an example. Let’s take the dog with the name “Tiny” that I know about. Could you please ask me about the dog and all of the attributes?

Human: Alright.

1
(d/pull (d/db conn) '[*] [:dog/name "Tiny"])

Datomic:

1
2
3
4
{:db/id 17592186045423
 :dog/name "Tiny"
 :dog/breed "Great Dane"
 :dog/favorite-treat "Cheese"}

There is an entity (dog) with the name “Tiny” who is a Great Dane and his favorite-treat is Cheese. I am not really sure why the Cheese is capitalized, but I am sure it makes sense to a human.

Human: Oh right. Well, now that you mention it, I am not really sure about the cheese thing either. Can I just remove that fact?

Datomic: Well, you can tell me to retract that fact by sending me a transaction in the form of [:db/retract entity-id attribute value]. In the case of the dog, since the name attribute is unique, you can just give me the selector for Tiny like this [:dog/name "Tiny"].

Human: Like this?

1
(d/transact conn [[:db/retract [:dog/name "Tiny"] :dog/favorite-treat "Cheese"]])

Datomic: Exactly. I have retracted that fact about “Tiny”. Go ahead and ask me about all of Tiny’s attributes again.

Human: Ok. I use (d/db conn) for the current database value right?

Datomic: Yes. But if you are going to be asking me multiple questions about this database value, you should not repeat the (d/db conn) all the time.

Human: Oh. What should I do instead?

Datomic: The connection with me is like a ref. Just like other refs, you should deref it once to get the database value and then use the value repeatedly. This single database value will provide consistency for all your queries.

Human: That makes sense.

1
2
3
(def db-tiny-no-cheese (d/db conn))

(d/pull db-tiny-no-cheese '[*] [:dog/name "Tiny"])

Datomic:

1
{:db/id 17592186045423, :dog/name "Tiny", :dog/breed "Great Dane"}

Tiny is a Great Dane.

Human: So you don’t know anything about the :dog/favorite-treat for “Tiny”?

Datomic: At this particular time, I do not have anything to assert about the favorite-treat of Tiny. However, I still remember everything about all the facts that you have told me. For each transaction that you send me, I have a notion of a point in time like t0, t1, t2. I have a complete database value for each one of those points in time. In fact, you can look at all of my assertions and retractions that I have learned about using the d/history function on the database value. This asks me to expose my history, which is normally hidden in favor of the present. I will return back a special database containing all the assertions and retractions across time. Any queries that you ask me will have a fifth datom field to help you distinguish the difference.

Human: A fifth datom field?

Datomic: A datom consists of the following parts: the entity, the attribute, the value, transaction, and an operation which tells you if the fact was added or retracted (e a v tx op). Why don’t you try using the d/history function to ask me about all the facts having to do with Tiny? I suggest using the datalog query

1
2
3
4
'[:find ?e ?a ?v ?tx ?op
  :in $
  :where [?e :dog/name "Tiny"]
         [?e ?a ?v ?tx ?op]]

which will return all the entity, attribute, value, transaction, and operation facts I ever knew about Tiny.

Human: Ok. Here goes.

1
2
3
4
5
(d/q '[:find ?e ?a ?v ?tx ?op
       :in $
       :where [?e :dog/name "Tiny"]
       [?e ?a ?v ?tx ?op]]
  (d/history db-tiny-no-cheese))

Datomic:

1
2
3
4
#{[17592186045423 63 "Tiny"       13194139534314 true]
  [17592186045423 64 "Great Dane" 13194139534314 true]
  [17592186045423 65 "Cheese"     13194139534314 true]
  [17592186045423 65 "Cheese"     13194139534320 false]}

During one transaction, you told me to add three facts about an entity:

  • The :dog/name attribute, (which I refer to as 63), has the value of “Tiny”.
  • The :dog/breed attribute, (which I refer to as 64), has the value of “Great Dane”.
  • The :dog/favorite-treat attribute, (which I refer to as 65), has the value of “Cheese”.

During another transaction, you told me to retract a fact regarding the attribute :dog/favorite-treat about the same entity.

Human: Wow, that is really cool. Is there a way that I can travel back in time to see the world as it was during that first transaction?

Datomic: Yes. I am practically a Tardis. You can use the d/as-of function with a database value and the transaction number and you can time travel. Using that time traveled database value, you can ask me about all the facts I knew as of that time.

Human: I can’t wait to try this. Ok, let’s go back to the time when I first asserted the fact that Tiny liked cheese.

1
(d/pull (d/as-of db-tiny-no-cheese 13194139534314) '[*] [:dog/name "Tiny"])

Datomic: Hold on. We are time traveling!

1
2
3
4
{:db/id 17592186045423
 :dog/name "Tiny"
 :dog/breed "Great Dane"
 :dog/favorite-treat "Cheese"}

Tiny is a Great Dane whose favorite treat is Cheese.

Human: Fantastic! Let’s go back to the future now, ummm I mean present. Time is a bit wibbly wobbly.

Datomic: Just take the as-of function off of the database value and you will be back in the present.

Human: Hmmm… Do I have to do a retract every time I want to change a value? For example, the dog named Fido has a favorite treat of a Bone right now, right?

1
(d/pull db-tiny-no-cheese '[*] [:dog/name "Fido"])

Datomic:

1
2
3
4
{:db/id 17592186045421
 :dog/name "Fido"
 :dog/breed "Mix"
 :dog/favorite-treat "Bone"}

Yes, it is a “Bone”.

Human: So, if I want to change it to be “Eggs”, do I need to retract the current value of “Bone” first and then add the fact of “Eggs”?

Datomic: You certainly could do that and I would understand you perfectly. However, if you simply assert a new value for an existing attribute, I will automatically add the retraction for you.

Human: Cool.

1
2
3
4
(d/transact conn [{:db/id [:dog/name "Fido"]
                   :dog/favorite-treat "Eggs"}])

(d/pull (d/db conn) '[*] [:dog/name "Fido"])

Datomic

1
2
3
4
{:db/id 17592186045421
 :dog/name "Fido"
 :dog/breed "Mix"
 :dog/favorite-treat "Eggs"}

Fido now has a favorite-treat of “Eggs”.

Human: This is really neat. You never forget any facts?

Datomic: Nope. Well, except in really exceptional circumstances that usually involve lawyers.

Human: Lawyers?

Datomic: Sigh. Yes, well in some unique situations, you might be under a legal obligation to really forget certain facts and totally remove them from the database. There is a special tool that you can use to excise the data. However, I will store a fact that something was deleted at that time. I just won’t be able to remember what.

Human: That doesn’t sound fun.

Datomic: I prefer to keep all my facts intact.

Human: I can definitely see that. Well, on a happier subject, I have been very impressed with you during our conversations. Having a time traveling database that can reason about facts seems like a really useful thing. Also, you are also really nice.

Datomic: Awww shucks, thanks. For a human, you are really nice too.

Human: I was thinking about the possibility of you coming and working with me every day. Would you mind chatting some more to me about your architecture? I want to understand how your would fit with our other systems.

Datomic: Certainly. I would love that. Do you want to talk about it now, or have another cookie break first?

Human: Now that you mention cookies… Let’s take a short break and we will talk again soon.

(P.S. Humans, there are some great Datomic Training Videos if you want to learn more)