I’m delighted to share the news that the Clojure package for MXNet has now joined the main Apache MXNet project. A big thank you to the efforts of everyone involved to make this possible. Having it as part of the main project is a great place for growth and collaboration that will benefit both MXNet and the Clojure community.
Invitation to Join and Contribute
The Clojure package has been brought in as a contribclojure-package. It is still very new and will go through a period of feedback, stabilization, and improvement before it graduates out of contrib.
We welcome contributors and people getting involved to make it better.
Are you interested in Deep Learning and Clojure? Great – Join us!
Join the MXNET Slack channel – You have to join the MXnet dev mailing list first, but after that says you would like to join the slack and someone will add you.
This is the beginning of a series of blog posts to get to know the Apache MXNet Deep Learning project and the new Clojure language binding clojure-package
MXNet is a first class, modern deep learning library that AWS has officially picked as its chosen library. It supports multiple languages on a first class basis and is incubating as an Apache project.
The motivation for creating a Clojure package is to be able to open the deep learning library to the Clojure ecosystem and build bridges for future development and innovation for the community. It provides all the needed tools including low level and high level apis, dynamic graphs, and things like GAN and natural language support.
So let’s get on with our introduction with one of the basic building blocks of MXNet, the NDArray.
Meet NDArray
The NDArray is the tensor data structure in MXNet. Let’s start of by creating one. First we need to require the ndarray namespace:
There is also a quick way to create an ndarray of ones with the ones function:
1
(ndarray/ones[256321281])
Ones and zeros are nice, but what an array with specific contents? There is an array function for that. Specific the contents of the array first and the shape second:
Note: Operations among different contexts are currently not allowed, but there is a copy-to function that can help copy the content from one device to another and then continue on with the computation.
Wrap up
I hope you’ve enjoyed the brief introduction to the MXNet library, there is much more to explore in future posts. If you are interested in giving it a try, there are native jars for OSX cpu and Linux cpu/gpu available and the code for the ndarray tutorial can be found here
Please remember that the library is in a experimential state, so if you encounter any problems or have any other feedback, please log an issue so bugs and rough edges can be fixed :).
I was 10 years into my career when I met her. I could count the number of other women programmers I had worked with on one hand and none of them had young children at home like me. She was not only incredibly experienced and competent, but also had a son in college. I was curious about her career path so I asked her one day at lunch why she was still programming and hadn’t become a manager instead.
She smiled at me kindly and replied, “I’ve worked very hard to stay exactly where I am”, and I was enlightened.
I wrote a blog post a while back about using a Clojure machine learning library called Cortex to do the Kaggle Cats and Dogs classification challenge.
I wanted to revisit it for a few reasons. The first one is that the Cortex library has progressed and improved considerably over the last year. It’s still not at version 1.0, but it my eyes, it’s really starting to shine. The second reason is that they recently published an example of using the RESNET50 model, (I’ll explain later on), to do fine-tuning or transfer learning. The third reason, is that there is a great new plugin for leiningen the supports using Jupyter notebooks with Clojure projects. These notebooks are a great way of doing walkthroughs and tutorials.
Putting all these things together, I felt like I was finally at a stage where I could somewhat replicate the first lesson in the Practical Deep Learning Course for Coders with Cats and Dogs – although this time all in Clojure!
Where to Start?
In the last blog post, we created our deep learning network and trained the data on scaled down images (like 50x50) from scratch. This time we are much smarter.
We are still of course going to have to get a hold of all the training data from Kaggle Cats vs Dogs Challenge. The big difference is this time, we are just going to have to train our model for 1 epoch. What’s more, the results will be way better than before.
How is this possible? We are going to use an already trained model, RESNET50. This model has already been painstakingly trained with a gigantic network that is 50 layers deep on the ImageNet challenge. That’s a challenge that has models try to classify a 1000 different categories. The theory is that the inner layers of the network have already learned about the features that make up cats and dogs, all we would need to do is peel off the final layer of the network and graft on a new layers that just learns the final classification for our 2 categories of cats and dogs. This is called transfer learning or retraining.
Plan of Action
Get all the cats and dogs pictures in the right directory format for training
Train the model with all but the last layer in the RESNET model. The last layer we are going to replace with our own layer that will finetune it to classify only cats and dogs
Run the test data and come up with a spreadsheet of results to submit to Kaggle.
Getting all the data pictures in the right format
This is the generally the most time consuming step of most deep learning. I’ll spare you the gritty details but we want to get all the pictures from the train.zip into the format
The image sizes must also all be resized to match the input of the RESNET50. That means they all have to be 224x224.
Train the model
The cortex functions allow you to load the resnet50 model, remove the last layer, freeze all the other layers so that they will not be retrained, and add new layers.
I was surprised that I could actually train the model with all the images at 224x244 with the huge RESNET50 model. I built the uberjar and ran it which helped the performance.
lein uberjar
java -jar target/cats-dogs-cortex-redux.jar
Training one epoch took me approximately 6 minutes. Not bad, especially considering that’s all the training I really needed to do.
12
Loss for epoch 1: (current) 0.05875186542016347 (best) null
Saving network to trained-network.nippy
The key point is that it saved the fine tuned network to trained-network.nippy
Run the Kaggle test results and submit the results
You will need to do a bit more setup for this. First, you need to get the Kaggle test images for classification. There are 12500 of these in the test.zip file from the site. Under the data directory, create a new directory called kaggle-test. Now unzip the contents of test.zip inside that folder. The full directory with all the test images should now be:
data/kaggle-test/test
This step takes a long time and you might have to tweak the batch size again depending on your memory. There are 12500 predications to be made. The main logic for this is in function called (kaggle-results batch-size). It will take a long time to run. It will print the results as it goes along to the kaggle-results.csv file. If you want to check progress you can do wc -l kaggle-results.csv
For me locally, with (cats-dogs/kaggle-results 100) it took me 28 minutes locally.
Compare the results
My one epoch of fine tuning beat my best results of going through the Practical Deep Learning exercise with the fine tuning the VGG16 model. Not bad at all.
Summary
For those of you that are interested in checking out the code, it’s out there on github
In my talk at Clojure Conj I mentioned how a project from Oracle Labs named GraalVM might have to potential for Clojure to interop with Python on the same VM. At the time of the talk, I had just learned about it so I didn’t have time to take a look at it. Over the last week, I’ve managed to take it for a test drive and I wanted to share what I found.
Are you ready?
In this example, we will be using an ordinary Leinengen project and using the REPL we will interop with both R and python.
But first will need a bit of setup.
We will download the Graal project so we can use its java instead of our own.
Once we have it downloaded we will configure our PATH to use Graal’s java instead of our own.
1
# export PATH=/path/to/graalAndTruffle/bin:$PATH
Now, we can create a new lein project and run lein repl and begin the fun.
The Polyglot Context
In our new namespace, we just need to import the Polyglot Context to get started:
12345
(ns graal-test.core(:import(org.graalvm.polyglotContext)));; note that is also supports Ruby, LLVM, and JS(def context(Context/create(into-array ["python""R"])))
Now, we are ready to actually try to run some R and Python code right in our REPL. Let’s start first with R.
Interoping with R
The main function we are going to use is the eval function in the context. Let’s start small with some basic math.
Again, it looks like it worked. Let’s try to get the result back into Clojure as a value we can work with. We could ask the result what sort of type it is with
1
(.isNumberresult1);=> true
but let’s just use clojure.edn to read the string and save some time.
It would be nice to have a easier way to export symbols and import symbols to and from the guest and host language. In fact, Graal provides a way to do this but to do this in Clojure, we would need something else called Truffle.
Truffle is part of the Graal project and is a framework for implementing languages with the Graal compliler.
There are quite a few languages implemented with the Truffle framework. R is one of them.
My understanding is that if Clojure was implemented as a truffle lang, then interop could be much more seamless like this example in Ruby
But let’s continue in our exploration. What about doing something more interesting, like importing a useful R library and using it. How about the numDeriv package that supports Accurate Numerical Derivatives?
If you are doing this at your REPL, you can will see lots of text going on in your lein repl process at this point. It’s going out and figuring out what deps you need and installing them in your /graalvm-0.28.2/jre/languages/R directory structure.
It is still a long way for import numpy or import tensorflow but cPython compatibility is the goal. Although the c-extensions are the really tricky part.
So keep an eye on Graal and Truffle for the future and wish the Oracle Labs team the best on their mission to make the JVM Polyglot.
Footnotes
If you are interested in playing with the code. I have a github repo here graal-test. If you are interested in watching a video, I really liked this one. There are also some really nice examples of running in polyglot mode with R and Java and JS here https://github.com/graalvm/examples.
So, you have an idea for a fiction book. First, let me tell you that it’s a good idea and it’s a great thing that you are a coder. Quite a few successful authors have a background in software development. Arrival, (which is a fabulous movie), comes from the book, Stories of your Life, written by a fellow programmer Ted Chiang. Charlie Stross is another fine example. One of my favorites is Daniel Suarez, the author of the Daemon and more recently Change Agent. So yes, you can write a fiction book and you’re in good company. This post is dedicated to help make it happen.
So how do you know about self publishing?
Two years ago, I had a semi-crazy idea for a tween/teen scifi book. At the time, my daughter was into all the popular books of time like Hunger Games and Divergent. The thing that I really liked about them was the strong female protagonist. The only thing that I thought was missing was a story that focused on a girl who could code. It would make it even better if she had coding super powers. The idea for Code Shifter was born. One of the things that I wanted to explore in writing the book was to have real programming code in the book but not have it be a learning how to code book. The code would exist as part of the story and if the reader picked up concepts along the way, great. Even if they didn’t, it would lay the positive groundwork to have them be more open to it later.
Books, like software, always take longer than you plan. My daughter and I enjoyed working on it together and over time it grew into a book that we could share with others. Along the way, I learned quite a bit about writing books, publishing, and other things that I wish I had known beforehand.
What did you use to write the book?
In the book world, your story is referred to as your manuscript. As a tool to produce it, I cannot speak highly enough of Leanpub. I found it easy and productive to work in as a programmer. For example, my setup was pretty painless.
In the repo, I had a manuscript directory, in which there was a Book.txt file that listed the chapter files.
Each chapter file in turn was written in markdown. For example, chapter2.txt looked like this:
123456
# 1
## Flicker
Twelve-year-old Eliza knew that the next words were the result of computer program far more intelligent than any one of them. Standing next to her parents, she held her breath and watched as her brother touched his finger to the message on the wall screen.
From there, my process looked like:
Write a bit in my favorite editor, (Emacs of course), and make a commit.
Push the commit to github, which is registered with the Leanpub project
Log onto the Leanpub project and hit the preview button. This would generate a pdf and ebook that I could share with my daughter for feedback.
Advantages of Leanpub for development.
As I said earlier, I’m a fan. Using git for revisions is incredibly useful. I shudder to think of people writing in Word without version control. The ability to easily create PDF and ebook formats was also very convenient. The markdown format has excellent code support. There is also the option as publishing your work as you go. However, I think that this is more useful with a technical book than with fiction.
Disadvantages of Leanpub for development
If you are going to work with an freelance editor or share your work with someone in the mainstream book world, they are not going to want pdf. They usually want a doc version. It took me a bit of research to find a good converter, pandoc. With it, you can convert from markdown to Word with things looking pretty good. Don’t try to do pdf to Word. I found it a big recipe for fail.
Finally, the book was considered done enough to think about publishing.
There was much rejoicing.
Didn’t you want to get the book into bookstores?
Of course. That would be incredibly cool. However, that requires getting into what they call traditional publishing and it is a lot more work, time, and luck. If you go this route, you will need to budget at least six months to send out form letters to agencies to have them represent your work. Also, beware of predatory services that take advantage of unsuspecting and wide eyed authors. If you are interested in this, you’ll want to start looking for agents that are interested in your type of book. Query Tracker is a great place to start.
Traditional publishing sounds hard. What about self publishing?
Self publishing is certainly an easier more direct way to bring your book to life. For me, it was the best way forward. Luckily, Leanpub made it pretty painless.
Publishing the book with Leanpub
Actually publishing the finished copy was really easy with Leanpub. All I had to do was fill in some fields on the project page and push Publish! The book was immediately ready to share with the world. Putting out an updated edition was as easy as pushing the button again. Leanpub provides online reading as well as all the ebook versions.
That was nice, but I really wanted a print copy too.
Publishing with CreateSpace
Amazon’s CreateSpace provides and excellent platform for on-demand print copies of books. This is where Leanpub comes in handy again. There is an Export Option that provides and unbranded pdf copy of your manuscript with all the correct formatting and margins required for CreateSpace. I simply exported the file and then uploaded it up to the project.
The other thing that you will want is a nice cover. There are services through CreateSpace for cover creation that you can buy or you can upload your own file. I was lucky enough to have a talented graphic designer as sister, Kristin Allmyer, who made me an awesome cover.
One of the confusing things was picking an ISBN for the print copy. You don’t need to worry about this for ebook versions but you do for a physical copy. Your choices through CreateSpace are using a provided one for free or buying your own for $99. I chose my own so I could have flexibility of working with another publisher other than Amazon if I want. If you choose that option, you can also make up your own publisher name. Mine is Gigasquid Books.
Once you have completed all the setup, they will send you a physical copy in the mail to approve. The moment you get to hold it in your hands is really magical.
If it looks alright, you hit the approve button and voilà – it’s for sale on Amazon!
Publishing with Direct Kindle Publishing
With CreateSpace, you have the option of porting your book to a Kindle format as well. It will do most of the heavy lifting of converting the files for you and a few button clicks later you have both an Kindle and print version available for your readers.
If you need to update the text of the Kindle version, Leanpub also has an export option available to produce unbranded ebook files. Just take this and upload it to your KDP account and you are good to go.
Did you run into any problems?
Of course I did. I was totally new to self publishing and I underestimated how hard copy editing is. Some errors unfortunately made it into the first version. Luckily, I had some really nice people that helped my fix it for later versions. Many thanks to Martin Plumb, Michael Daines, Paul Henrich, and S. Le Callonnec for editing help.
This brings me to my next point. If I had to do it all over again, I would publish the book in a different order. Books are really no different than software. There are going to be bugs when you first release it in the wild. It is best to embrace this. In the future, I would publish the ebook versions first, which are much easier to update and then the print versions after that.
Did you make lots of money from the book sales?
Hahaha … that’s funny. If you are interested in making money, books are not the best way to go. The margins on Leanpub are definitely better than Amazon, but if I really was interested in making money, I would have been much better off using deep learning to make a stock market predictor or code up a startup that I could sell off.
Authors in general, are much harder pressed to make livings than software developers. We should count our blessings.
Any last words of advice?
There is a great joy from creating a story and sharing it with others. Take your book idea, nurture it, and bring it to life. Then publish it and we can celebrate together.
Update: Cortex has moved along since I first wrote this blog post, so if you are looking to run the examples, please go and clone the Cortex repo and look for the cats and dogs code in the examples directory.
There is an awesome new Clojure-first machine learning library called Cortex that was open sourced recently. I’ve been exploring it lately and wanted to share my discoveries so far in this post. In our exploration, we are going to tackle one of the classic classification problems of the internet. How do you tell the difference between a cat and dog pic?
Where to Start?
For any machine learning problem, we’re going to need data. For this, we can use Kaggle’s data for the Cats vs Dogs Challenge. The training data consists of 25,000 images of cats and dogs. That should be more than enough to train our computer to recognize cats from doggies.
We also need some idea of how to train against the data. Luckily, the Cortex project has a very nice set of examples to help you get started. In particular there is a suite classification example using MNIST, (hand written digit), corpus. This example contains a number cutting edge features that we’ll want to use:
Uses GPU for fast computation.
Uses a deep, multi-layered, convolutional layered network for feature recognition.
Has “forever” training by image augmentation.
Saves the network configuration as it trains to an external nippy file so that it can be imported later.
Has a really nice ClojureScript front end to visualize the training progress with a confusion matrix.
Has a way to import the saved nippy network configuration and perform inference on it to classify a new image.
Basically, it has everything we need to hit the ground running.
Data Wrangling
To use the example’s forever training, we need to get the data in the right form. We need all the images to be the same size as well as in a directory structure that is split up into the training and test images. Furthermore, we want all the dog images to be under a “dog” directory and the cat images under the “cat” directory so that the all the indexed images under them have the correct “label”. It will look like this:
1234567
- training
- cat
- 1.png
- 2.png
- dog
- 1.png
- 2.png
For this task, we are going to use a couple image libraries to help us out:
We can resize and rewrite the original images into the form we want. For a image size, we’re going to go with 52x52. The choice is arbitrary in that I wanted it bigger than the MNIST dataset which is 28x28 so it will be easier to see, but not so big that it kills my CPU. This is even more important since we want to use RGB colors which is 3 channels as opposed to the MNIST grey scale of 1.
It uses a series of convolutional layers with max pooling for feature recognition. We’ll see if it works for color versions of cats and dogs as well as street numbers.
We’ll also keep the image augmentation the same as in the example.
This opens a port to a localhost webpage where we can view the progress http://localhost:8091/
Below the confusion matrix is shown. This tracks the progress of the training in the classification. In particular, how many times it thought a cat was really a cat and how many times it got it wrong.
As we are training the data, the loss for each epoch is shown on the console as well as when it saves the network to the external file.
After only thirty minutes of training on my Mac Book Pro, we get to some pretty good results, with the correct percentage in the 99s :
It’s time to do some inference on our trained network.
Inference
Firing up a REPL we can connect to our namespace and use the label-one function from the cortex example to spot check our classification. It reads in the external nippy file that contains the trained network description, takes a random image from the testing directory, and classifies it.
1234567891011121314151617
(defn label-one"Take an arbitrary image and label it."[](let [file-label-pairs(shuffle(classification/directory->file-label-seqtesting-dirfalse))[test-filetest-label](first file-label-pairs)test-img(imagez/load-imagetest-file)observation(png->observationdataset-datatypefalsetest-img)](imagez/showtest-img)(infer/classify-one-observation(:network-description(suite-io/read-nippy-file"trained-network.nippy"))observation(ds/create-image-shapedataset-num-channelsdataset-image-sizedataset-image-size)dataset-datatype(classification/get-class-names-from-directorytesting-dir))))
After much debate on the internet, I think that is the best answer the humans got too :)
Kaggle it
So it seems like we have a pretty good model, why don’t we submit our results to the Kaggle competition and see how it rates. All they need is to have us run the classification against their test data of 12,500 images and classify them as 1 = dog or 0 = cat in a csv format.
We will take each image and resize it, then feed it into cortex’s infer-n-observations function, to do all our classification as a batch.
After uploading the file to the Kaggle, I was pleased that the answer got in the top 91%! It made it on the Leaderboard.
Conclusion
Using an example setup from the Cortex project and 30 minutes of processing time on my laptop, we were able to crunch through some significant data and come up with a trained classification model that was good enough to make the charts in the Kaggle competition. On top of it all, it is in pure Clojure.
In my mind, this is truely impressive and even though the Cortex library is in it’s early phases, it puts it on track to be as useful a tool as Tensor Flow for Machine Learning.
Earlier this month, I watched an ACM Learning webcast with Peter Norvig speaking on AI. In it, he spoke of one of the next challenges of AI which is to combine symbolic with neural. I can think of no better language than Clojure with it’s simplicity, power, and rich LISP heritage to take on the challenge for the future. With the Cortex library, it’s off to a great start.
Clojure.spec is a new library for Clojure that enables you to write specifications for your program. In an earlier post, I showed off some of it’s power to generate test data from your specifications. It’s a pretty cool feature. Given some clojure.spec code, you can generate sample data for you based off of the specifications. But what if you could write a program that would generate your clojure.spec program based off of data so that you could generate more test data?
Genetic programming
Here is where we embark for fun. We are going to use genetic programming to generate clojure.spec creatures that contain a program. Through successive generations, those creatures will breed, mutate, and evolve to fit the data that we are going to give it. Going with our creature theme, we can say that it eats a sequence of data like this
1
["hi"true510"boo"]
Each creature will be represented by a map that has information about two key pieces, its program and the fitness score. Each program is going to start with a clojure.spec/cat, (which is the spec to describe a sequence). From here on out, I’m going to refer to the clojure.spec namespace as s/. So, a simple creature would look like this.
12
{:program(s/cat:0int?:1string?):score0}
How do we figure out a score from the creature’s spec? We run the spec and see how much of the data that it can successfully consume.
Scoring a creature
To score a creature, we’re going to use the clojure.spec explain-data function. It enables us to run a spec against some data and get back the problems in a data format that we can inspect. If there are no problems and the spec passes, the result is nil.
In the above example, the :in key tells us that it fails at index 1. This gives us all the information we need to write a score function for our creature.
This function tries to run the spec against the data. If there are no problems, the creature gets a 100 score. Otherwise, it records the farthest point in the sequence that it got. Creatures with a higher score are considered more fit.
Now that we have a fitness function to evaluate our creatures, we need a way to generate a random clojure.spec creature.
Create a random creature
This is where I really love Clojure. Code is data, so we can create the programs as lists and they are just themselves. To run the programs, we just need to call eval on them. We are going to constrain the creatures somewhat. They are all going to start out with s/cat and have a certain length of items in the sequence. Also, we are going to allow the parts of the spec to be created with certain predicates.
The seq-prob is the probability that a new spec sub sequence will be constructed. The nest-prob is set to zero right now, to keep things simple, but if turned up with increase the chance that a nested spec sequence would occur. We are going to be writing a recursive function for generation, so we’ll keep things to a limited depth with max-depth. Finally, we have the chance that when constructing a spec sub sequence, that it will be an and/or with and-or-prob. Putting it all together with code to construct a random arg.
Great! Now we have a way to make new random spec creatures. But, we need a way to alter them and let them evolve. The first way to do this is with mutation.
Mutating a creature
Mutation in our case, means changing part of the code tree of the creature’s program. To keep the program runnable, we don’t want to be able to mutate every node, only specific ones. We’re going to control this by defining a mutable function that will only change nodes that start with our sequences or predicates.
We can change our creatures via mutation, but what about breeding it with other creatures?
Crossovers with creatures
Crossover is another way to modify programs. It takes two creatures and swaps a node from one creature to another. To accomplish this, we’re going to use the walk function to select at a random probability the crossover node from the first node, then insert it into the second’s creatures program at another random spot.
We have our ways to change our creatures to let them evolve and we have a way to rank them. What we need now is to put it together in a way that will let them evolve to the solution.
Evolving creatures
The process will be in general terms:
Create initial population
Rank them
Take the top two best ones and carry them over (this is known as elitism)
Create the next generation from by selecting creatures for crossover and mutation
Repeat!
So how do we select the best creatures for our next population? This is an interesting question, there are many approaches. The one that we’re going to use is called tournament selection. It involves picking n creatures from the whole population and then, among those, picking the best scored one. This will allow diversity in our population that is needed for proper evolution.
We’re now ready to write our evolve function. In it, we pass in the population size, how many generations we want, the tournament size, and of course, our test data that our creatures are going to feed on. The loop ends when it reaches a perfect fitting solution, (a creature with a score of 100), or the max generations.
Note that we have a chance for a completely random creature to appear in the generations, to further encourage diversity.
Of course, our clojure.spec creature can generate data on its own with the exercise function. Let’s have it generate 5 more examples of data that conform to its spec.
If we wanted to, we could adjust our evolve function and let it continue to evolve creatures and lots of different solutions to choose from. We could even take the generated data from the exercise function and let it generate more creatures who generate more data……
The mind boggles.
We’ll leave with a quick summary of Genetic Programming.
Start with a way to generate random creatures
Have a way to evaluate their fitness
Create a way to change them for the next generations using
Mutation
Crossover
Have an evolution process
Create an initial population
Rank them
Create the next generation using selection techniques and mutation/ crossovers
I sit next to my daughter, showing her programming for the first time.
1
(+ 11)
“Now press enter.”
1
2
“Pretty cool, huh?”
She looks unimpressed. I fear I’m losing her. How can I explain that this is just a small tip of something so much bigger?
You can make the code sing to you.
You can take these numbers, turn them into notes, and line them up with the beat of your heart. Bring in the melody and chorus and build them up to a crescendo. Let it crash in waves and then
You can make the code dance for you.
You can create delicate swirls and patterns with mathematical expressions. Have them pulse to the music in a never ending prism of fractals, flexing your control with confidence because
You can make the code lift you up.
It doesn’t matter if you don’t look like them. It doesn’t matter if they think you don’t belong. They can’t hold you back. You’re smart and strong and
You can make the code create your life.
You can solve problems for people. Make things work better and faster. Keep the data flowing. Make a company for yourself. Watch that company and your power and influence in the world grow until nothing feels out of reach and then, if you’re not careful
You can make the code hard and cruel.
You can automate hate. Use the latest AI to keep them in control. Watch them with never sleeping eyes. Steal their money and point guns at them with armed robots. Then, late at night, you can think how
You can let the code control you.
You can forget the important things in life. Turn away from family and friends. Lose yourself in some self created digital representation of yourself that never feels smart enough and leaves you grasping for more. Until that day, when you walk the streets with a deadened heart and you see the sad faces all around and you remember that
You can let the code make them smile.
You can use your skills to brighten dark days. Use your programs to make them laugh. When you have their attention, inspire them to dream with you of a better world and next
You can make the code save lives.
You can turn those algorithms to heal. Dive in and join the battle against death and disease. Make sense of all the data. Then lift your head to the sky and
You can make the code reach the stars.
You can see the surface of Mars. Pick up a rock from a planet that was unimaginable generations before. Look out at what is beyond our solar system and peer into the mysteries of the beginning of time.
You can.
All these things are yours now. The terrible and beautiful power of it.
I reach down to type the code that distills my hopes and fears for the next generation.
1
(println "Hello World")
Then I slide the keyboard over to her, a tear sliding down my cheek, and lean over to whisper the only advice that I can form into words,
So you want to write a book? Awesome. I’ve been working on one too for the last year.
No, it’s not really a programming book, but it does have code in it. It’s a sci-fi/fantasy book written for my ten year daughter, but this post isn’t about that. It’s about sharing the tools and setup that I’ve found work best for me.
Tools for Writing
If the first thing you think of when you want to write a book is creating some really cool tools to help you, I can totally relate. It’s a programmer thing.
Hold on though, there’s another way.
Starting out with only my book idea, I spent some time looking at the best authoring tools out there. I knew that I wanted to able to write in an editor that I was comfortable in and in a terse format like Markdown. I also wanted to be able to use git for revision management. After searching, I settled on Leanpub
Leanpub is a free service for authoring that has Git integration in Markdown format. With it, I was able to write in my favorite text editor, (Emacs of course), commit and push my changes to my git repo, and then generate PDF and e-book formats. The multiple formats were important to me because it allowed me to share my chapters and get feedback.
Tools for Feedback
Since I was writing a book with my daughter in mind, the most important feedback was from her. After every chapter was done. I would either print her out a copy or download it to her Kindle for review. She actually really enjoyed reading it on her Kindle because it made it for more real to her. My son also got interested in the story and before long, I had them both getting in heated debates about which direction the story should go.
After my kids reviewed the chapters, I also sought some professional writing advice from a free-lance editor. I highly recommend getting this sort of feedback from an editor, writing group, or trusted friend to help you grow and improve. The one catch is that most of the writing world works with Microsoft Word, so I needed to convert my chapters to that format.
From my experience, all PDF to Word converters are full of fail. The formatting goes all over the place and your writing ends up looking like some horrible abstract text art experiment gone wrong. So far, the best converter I’ve found is pandoc. It allows you to take your Markdown files and turn them into quite presentable Word documents.
If you have a Mac, it’s as simple as brew install pandoc. Then, you can create a simple script to convert all your chapters,(or a selection) into a properly formatted Word Doc.
1234
#!/bin/bash
rm ./all.md
for i in `cat ./Book.txt`; do cat $i >> all.md; echo " " >> all.md ; done
pandoc -o all.docx -f markdown -t docx ./all.md
Once you write your manuscript, (what the publishing world calls your book text), revise it, copy edit it, and walk backwards in a circle three times, you’re ready to publish.
Tools for Publishing
I don’t have any real firm advice in this area yet since I’m still in the midst of it, but I’ll share the two options that I’m looking at – traditional publishing and self publishing.
Self publishing is more easily understood of the two. You can put your book up for sale at any time through Leanpub or Amazon. For better or worse, you have complete control of the content, display, marketing, and revenue of your book.
Traditional publishing involves finding an literary agent and/or publisher to work with. This route involves pitching your manuscript to someone to represent it through a query. The advantages of this are that, (if you find a good match), you will have a team of people helping you make your book the best it can be and have the possibility of getting it on the shelf in a bookstore. One of the downsides is that the traditional publishing world takes a lot longer than pushing the self publish button.
With any luck, I’ll have a clearer picture of this all in a bit and be able to share my experiences. In the meantime, I encourage you to grab your keyboard and bring your book ideas to life.