The keyboard that comes with the laptop I use for my main work machine currently a 2011 MacBook Air is pretty great. The low, square keys look great, and they feel good enough for me to not even notice them when I put in long hours at the keyboard. I like it so much that I now have three Apple keyboards in use at the three different computers in my life.Read on →
For the opening night of an art show in Kelowna’s Alternator Gallery, I recreated the classic Asteroids game in Processing. The game ran off a laptop and could be played using a Wii controller. This particular game let other participate too – via Twitter. The game watched the Twittersphere for hashtags related to the show. Those tweets were then displayed in text in the middle of the asteroid field, along with a new asteroid for the main player to shoot.
For the Vancouver 2010 Olympics, I worked with Makiko Yoshii on The Follower.
A pair of magnetic switches detected the robot’s vector across a pair of railroad tracks embedded in the road. An Arduino monitored the switches and told the pair of 12V motors how fast to spin forward. This simple robot followed the tracks in a very loose manner, appearing to wander down the Granville Island road at a walking pace.
This summer in Kelowna, the Keloha Volcano came to life. The chicken wire and painted foam volcano listened to the Twittersphere for any tweets marked #keloha and responded with a tweet and 30 seconds of bubbly fun. A Raspberry Pi running a Node.js program triggered a relay to turn on a bubble machine that I picked up from Walmart. Seemed like it was a hit.
Part of running a cluster means automating tasks across many computers. Putting executable scripts in a repo of some kind (I use a private github one) makes it easy to make changes for testing, and you definitely want to use a script of some kind to automatically check out and run your executables. Many folks will probably already know how to do this – so this may not be useful to many. I might just be posting this here for posterity.
In /etc/init.d/rc.local do something like:
1 2 3 4 5
The sleep makes the script wait for two minutes before running. I have found that without it, unexpected things can happen. There are 25th level Linux wizards out there that could tell me why this is necessary – if you are one and you feel like sharing some education, hit me up on Twitter.
Everything else is pretty much what you would run on the command line.
Test your script by running the rc.local file as you would any other bash script:
Once the script is doing what you expect, you can do whatever you do to spin up your cluster. Each node should run your commands after they boot up and sleep for two minutes.
These past months I’ve been working to gather large data sets from the internet using a cluster on AWS and Rackspace. (Just a quick note – I’ve been totally polite about wielding the cluster of 200 virtual machines. More about how I did this later.) The end result of this data gathering was 33 million html files. Since our team has been running pretty lean to start, the 800GB data set was stored on a 2006 Xserve, where it sat for a couple of weeks on an HFS+ formatted drive. Having only 1GB of memory, this machine was useful for not much more than storing the files. This week I setup a new server that was more capable of running the tasks that I wanted to over the millions of files. I could have physically moved the drive but didn’t want to have to shift my work focus to learn the intricacies of either rebuilding the Linux kernel with HFS+ support, or removing journaling from the drive to allow it to mount in Linux and deal with the caveats that I’ve read that solution brings. Not my job. Plus I feel that a more purely network oriented solution was more appropriate for the cloud space that I’m working in. The obvious (to me) solution here is rsync, which I thought would be a snap. I first used this:
Considering I had 33 million files, I expected the incremental file list to take a long time to generate. After hitting enter, my terminal responded with a blank line, which I guessed was okay since rsync had to think about all those files. I left it overnight in a tmux session. When I came back the next day, I had an error: “Argument list too long”. A quick Google search showed me that I didn’t want to use the wildcard to specify every file in the directory – I could specify just the directory. This worked for me:
After entering my password for the remote server, rsync immediately reported that it was sending the incremental file list. Much better than the blank line I was getting before, and after about 20 minutes the files started to transfer. Hopefully this post helps others out.