git-svn, and thoughts on Subversion
We use Subversion for our revision control system, and it’s great. It’s certainly not the most advanced thing out there, but it has perhaps the best client support on every platform out there, and when you need to work with non-coders on Windows, Linux and Mac OS X, there’s a lot of better things to do than explain how to use the command-line to people who’ve never heard of it before.
However, I also really need to work offline. My usual modus operandi is working at a café without Internet access (thanks for still being in the stone-age when it comes to data access, Australia), which pretty rules out using Subversion, because I can’t do commits when I do the majority of my work. So, I used svk for quite a long time, and everything was good.
Then, about a month ago, I got annoyed with svk screwing up relatively simple pushes and pulls for the last time. svk seems to work fine if you only track one branch and all you ever do is use its capabilities to commit offline, but the moment you start doing anything vaguely complicated like merges, or track both the trunk and a branch or two, it’ll explode. Workmates generally don’t like it when they see 20 commits arrive the next morning that totally FUBAR your repository.
So, I started using git-svn instead.
People who know me will understand that I have a
hatred of crap user interfaces, and I have a special
hatred of UIs that are different “just because”,
which applies to git rather well. I absolutely
refused to use tla for
that reason—which thankfully never seems to be
mentioned in distributed revision control circles
anymore—and I stayed away from git for a long time
because of its refusal to use conventional revision
control terminology. git-svn in particular suffered
more much from (ab)using different terminology than
git, because you were intermixing Subversion jargon
with git jargon. Sorry, you use checkout
to revert a commit? And checkout also
switches between branches? revert is
like a merge? WTF? The five
or ten tutorials that I found on the ‘net helped
quite a lot, but since a lot of them told me to do
things in different ways and I didn’t know what the
subtle differences between the commands were, I went
back to tolerating svk until it screwed up a commit
for the very last time. I also tried really hard to
use
bzr-svn since I really like Bazaar (and the guys
behind Bazaar), but it was clear that git-svn was
stable and ready to use right now, whereas bzr-svn
still had some very rough edges around it and isn’t
quite ready for production yet.
However, now that I’ve got my head wrapped around git’s jargon, I’m very happy to say that it was well worth the time for my brain to learn it. Linus elevated himself from “bloody good” to “true genius” in my eyes for writing that thing in a week, and I now have a very happy workflow using git to integrate with svn.
So, just to pollute the Intertubes more, here’s my own git-svn cheatsheet. I don’t know if this is the most correct way to do things (is there any “correct” way to do things in git?), but it certainly works for me:
* initial repository import (svk sync):
git-svn init https://foo.com/svn -T trunk -b branches -t tags
git checkout -b work trunk
* pulling from upstream (svk pull):
git-svn rebase
* pulling from upstream when there's new branches/tags/etc added:
git-svn fetch
* switching between branches:
git checkout branchname
* svk/svn diff -c NNNN:
git diff NNNN^!
* commiting a change:
git add
git commit
* reverting a change:
git checkout path
* pushing changes upstream (svk push):
git-svn dcommit
* importing svn:ignore:
(echo; git-svn show-ignore) >> .git/info/exclude
* uncommit:
git reset <SHA1 to reset to>
Drop me an email if you have suggestions to
improve those. About the only thing that I miss from
svk was the great feature of being able to delete a
filename from the commit message, which would unstage
it from the commit. That was tremendously useful; it
meant that you could git commit -a all
your changes except one little file, which was simply
deleting one line. It’s much easier than tediously
trying to git add thirteen files in
different directories just you can omit one file.
One tip for git: if your repository has top-level
trunk/branches/tags
directories, like this:
trunk/
foo/
bar/
branches/
foo-experimental/
bar-experimental/
tags/
foo-1.0/
bar-0.5/
That layout makes switching between the trunk and
a branch of a project quite annoying, because while
you can “switch” to (checkout)
branches/foo-experimental/, git won’t
let you checkout trunk/foo; it’ll only
let you checkout trunk. This isn’t a big
problem, but it does mean that your overall directory
structure keeps changing because switching to
trunk means that you have
foo/ and bar/ directories,
while switching to a foo-experimental or
bar-experimental omits those
directories. This ruins your git excludes and tends
to cause general confusion with old files being left
behind when switching branches.
Since many of us will only want to track one
particular project in a Subversion repository rather
than an entire tree (i.e. switch between
trunk/foo and
branches/foo-experimental), change your
.git/config file from this:
[svn-remote "svn"]
url = https://mysillyserver.com/svn
fetch = trunk:refs/remotes/trunk
branches = branches/*:refs/remotes/*
tags = tags/*:refs/remotes/tags/*
to this:
[svn-remote "svn"]
url = https://mysillyserver.com/svn
fetch = trunk/foo:refs/remotes/trunk
; ^ change "trunk" to "trunk/foo" as the first part of the fetch
branches = branches/*:refs/remotes/*
tags = tags/*:refs/remotes/tags/*
Doing that will make git’s “trunk” branch track
trunk/foo/ on your server rather than
just trunk/, which is probably what you
want. If you want to track other projects in the
tree, it’s probably better to git-svn
init another directory.
As an aside, while I believe that distributed version control systems look like a great future for open-source projects, it’s interesting that DVCS clients are now starting to support Subversion, which now forms some form of lowest common denominator. (I’d call it the FAT32 of revision control systems, but that’d be a bit unkind… worse-is-better, perhaps?) Apart from the more “official” clients such as command-line svn and TortoiseSVN, it’s also supported by svk, Git, Bazaar, Mercurial, and some great other GUI clients on Mac OS X and Windows. Perhaps Subversion will become a de-facto repository format that everyone else can push and pull between, since it has the widest range of client choice.
Speaking at DevWorld 2008
For Mac developers in Australia, I’ll be speaking at the inaugural conference of DevWorld 2008, which will be held from September 29-30 this year in Melbourne. You can check out the full list of sessions; I’ll be giving the very last talk on that page: The Business of Development.
Coding is just one part of what makes a great product, but there’s always so much else to do and learn. So, what can you do to help ship a great product—besides coding—if you’re primarily a developer? In this talk, learn about important commercial and business issues that you, as a coder, can help to define and shape in your company, such as licensing and registration keys, adopting new technologies, software updates, handling support, your website, and crash reports.
Note that DevWorld 2008 is unfortunately only open to staff and students at an Australian university (“AUC member university”, to be exact), so unless you’re a student right now at one of those Unis, you’ll have to miss out on this incredibly exciting opportunity to hear me talk at you for an hour (snort). I hear the story behind this is that if this year’s DevWorld is successful, next year’s will be a more standard open conference. Anyway, hopefully catch some of you there in September!
Mac Developer Roundtable #11
The Mac Developer Network features an excellent series of podcasts aimed at both veteran Mac developers and those new to the platform who are interested in developing for the Mac. If you’re a current Mac coder and haven’t seen them yet, be sure to check them out. I’ve been listening to the podcasts for a long time, and they’re always both informative and entertaining. (Infotainment, baby.)
Well, in yet another case of “Wow, do I really sound like that?”, I became a guest on The Mac Developer Roundtable episode #11, along with Marcus Zarra, Jonathan Dann, Bill Dudney, and our always-eloquent and delightfully British host, Scotty. The primary topic was Xcode 3.1, but we also chatted about the iPhone NDA (c’mon Apple, lift it already!) and… Fortran. I think I even managed to sneak in the words “Haskell” and “Visual Studio” in there, which no doubt left the other show guests questioning my sanity. I do look forward to Fortran support in Xcode 4.0.
It was actually a small miracle that I managed to be on the show at all. Not only was the podcast recording scheduled at the ungodly time of 4am on a Saturday morning in Australian east-coast time, but I was also in transit from Sydney to the amazing alpine village of Dinner Plain the day before the recording took place. While Dinner Plain is a truly extraordinary village that boasts magnificent ski lodges and some of the best restaurants I’ve ever had the pleasure of eating at, it’s also rather… rural. The resident population is somewhere around 100, the supermarket doesn’t even sell a wine bottle opener that doesn’t suck, and Vodafone has zero phone reception there. So, it was to my great surprise that I could get ADSL hooked up to the lodge there, which was done an entire two days before the recording. Of course, since no ADSL installation ever goes smoothly, I was on the phone to iPrimus tech support1 at 10pm on Friday night, 6 hours before the recording was due to start. All that effort for the privilege of being able to drag my sleepy ass out of bed a few hours later, for the joy of talking to other Mac geeks about our beloved profession. But, I gotta say, being able to hold an international conference call over the Intertubes from a tiny little village at 4am in the morning, when snow is falling all around you… I do love technology.
Of course, since I haven’t actually listened to the episode yet, maybe it’s all a load of bollocks and I sound like a retarded hobbit on speed. Hopefully not, though. Enjoy!
1 Hey, I like Internode and Westnet as much as every other Australian tech geeks, but they didn’t service that area, unfortunately.
Talks for 2008
I’ve given a few talks so far this year, which I’ve been kinda slack about and haven’t put up any slides for yet. So, if you’re one of the zero people who’ve been eagerly awaiting my incredibly astute and sexy opinions, I guess today’s your lucky day, punk!
Earlier this year, on January 2, 2008, Earth Time, I gave a talk at Kiwi Foo Camp in New Zealand, also known as Baa Camp. (Harhar, foo, baa, get it?) The talk was titled “Towards the Massive Media Matrix”, with the MMM in the title being a pun on the whole WWW three-letter acronym thing. (Credit for the MMM acronym should go to Silvia Pfeiffer and Conrad Parker, who phrased the term about eight years ago :). The talk was about the importance of free and open standards on the Web, what’s wrong with the current status quo about Web video basically being Flash video, and the complications involved in trying to find a solution that satisfies everyone. I’m happy to announce that the slides for the talk are now available for download; you can also grab the details off my talks page.
A bit later this year in March, Manuel Chakravarty and I were invited to the fp-syd functional programming user group in Sydney, to give a talk about… monads! As in, that scary Haskell thing. We understand that writing a monad tutorial seems to be a rite of passage for all Haskell programmers and was thus stereotypical of the “Haskell guys” in the group to give a talk about, but the talk seemed to be well-received.
Manuel gave a general introduction to monads: what they are, how to use them, and why they’re actually a good thing rather than simply another hoop you have to jump through if you just want to do some simple I/O in Haskell. I focused on a practical use case of monads that didn’t involve I/O (OMG!), giving a walkthrough on how to use Haskell’s excellent Parsec library to perform parsing tasks, and why you’d want to use it instead of writing a recursive descent parser yourself, or resort to the insanity of using lex and yacc. I was flattered to find out that after my talk, Ben Leppmeier rewrote the parser for DDC (the Disciplined Disciple Compiler) to use Parsec, rather than his old system of Alex and Happy (Haskell’s equivalents of lex and yacc). So, I guess I managed to make a good impression with at least one of our audience members, which gave me a nice warm fuzzy feeling.
You can find both Manuel’s and my slides online at the Google Groups files page for fp-syd, or you can download the slides directly from my own site. Enjoy.
Finally, during my three-week journey to the USA last month in June, I somehow got roped into giving a talk at Galois Inc. in Portland, about pretty much whatever I wanted. Since the audience was, once again, a Haskell and functional programming crowd, I of course chose to give a talk about an object-oriented language instead: Objective-C, the lingua franca of Mac OS X development.
If you’re a programming language geek and don’t know much about Objective-C, the talk should hopefully interest you. Objective-C is a very practical programming language that has a number of interesting features from a language point of view, such as opt-in garbage collection, and a hybrid of a dynamically typed runtime system with static type checking. If you’re a Mac OS X developer, there’s some stuff there about the internals of the Objective-C object and runtime system, and a few slides about higher-order messaging, which brings much of the expressive power of higher-order functions in other programming languages to Objective-C. Of course, if you’re a Mac OS X developer and a programming language geek, well, this should be right up your alley :). Once again, you can download the slides directly, or off my talks page.
The Long Road to RapidWeaver 4
Two years ago, I had a wonderful job working on a truly excellent piece of software named cineSync. It had the somewhat simple but cheery job of playing back movies in sync across different computers, letting people write notes about particular movie frames and scribbling drawings on them. (As you can imagine, many of the drawings that we produced when testing cineSync weren’t really fit for public consumption.) While it sounds like a simple idea, oh boy did it make some people’s lives a lot easier and a lot less stressful. People used to do crazy things like fly from city to city just to be the same room with another guy for 30 minutes to talk about a video that they were producing; sometimes they’d be flying two or three times per week just to do this. Now, they just fire up cineSync instead and get stuff done in 30 minutes, instead of 30 minutes and an extra eight hours of travelling. cineSync made the time, cost and stress savings probably an order of magnitude or two better. As a result, I have immense pride and joy in saying that it’s being used on virtually every single Hollywood movie out there today (yep, even Iron Man). So, hell of a cool project to work on? Tick ✓.
Plus, it was practically a dream coding job when it came to programming languages and technologies. My day job consisted of programming with Mac OS X’s Cocoa, the most elegant framework I’ve ever had the pleasure of using, and working with one of the best C++ cross-platform code bases I’ve seen. I also did extensive hacking in Erlang for the server code, so I got paid to play with one of my favourite functional programming languages, which some people spend their entire life wishing for. And I got schooled in just so much stuff: wielding C++ right, designing network protocols, learning about software process, business practices… so, geek nirvana? Tick ✓.
The ticks go on: great workplace ✓; fantastic people to work with ✓; being privy to the latest movie gossip because we were co-located with one of Australia’s premiere visual effects company ✓; sane working hours ✓; being located in Surry Hills and sampling Crown St for lunch nearly every day ✓; having the luxury of working at home and at cafés far too often ✓. So, since it was all going so well, I had decided that it was obviously time to make a life a lot harder, so I resigned, set up my own little software consulting company, and start working on Mac shareware full-time.
Outside of the day job on cineSync, I was doing some coding on a cute little program to build websites named RapidWeaver. RapidWeaver’s kinda like Dreamweaver, but a lot more simple (and hopefully just as powerful), and it’s not stupidly priced. Or, it’s kinda like iWeb, but a lot more powerful, with hopefully most of the simplicity. I first encountered RapidWeaver as a normal customer and paid my $40 for it since I thought it was a great little program, but after writing a little plugin for it, I took on some coding tasks.
And you know what? The code base sucked. The
process sucked. Every task I had to do was a chore.
When I started, there wasn’t even a revision control
system in place: developers would commit their
changes by emailing entire source code files or zip
archives to each other. There was no formal bug
tracker. Not a day went by when I shook my fist, lo,
with great anger, and thunder and lightning appeared.
RapidWeaver’s code base had evolved since version 1.0
from nearly a decade before, written by multiple
contractors with nobody being an overall custodian of
the code, and it showed. I saw methods that were over
thousand lines long, multithreaded bugs that would
make Baby
Jesus cry, method names that were prefixed with
with Java-style global package namespacing (yes, we
have method names called
com_rwrp_currentlySelectedPage), block
nesting that got so bad that I once counted
thirteen tabs before the actual line of code
started, dozens of lines of commented-out code,
classes that had more than a hundred and
twenty instance variables, etc, etc. Definitely
no tick ✗.
But the code—just like PHP—didn’t matter, because the product just plain rocked. (Hey, I did pay $40 for it, which surprised me quite a lot because I moved to the Mac from the Linux world, and sneered off most things at the time that cost more than $0.) Despite being a tangled maze of twisty paths, the code worked. I was determined to make the product rock more. After meeting the RapidWeaver folks at WWDC 2007, I decided to take the plunge and see how it’d go full-time. So, we worked, and we worked hard. RapidWeaver 3.5 was released two years ago, in June 2006, followed by 3.5.1. 3.6 followed in May 2007, followed by a slew of upgrades: 3.6.1, 3.6.2, 3.6.3… all the way up to 3.6.7. Slowly but surely, the product improved. On the 3rd of August 2007, we created the branch for RapidWeaver 3.7, which we didn’t realise yet was going to be such a major release that it eventually became 4.0.
And over time, it slowly dawned on me just how many users we had. A product that I initially thought had a few thousand users was much closer to about 100,000 users. I realised I was working on something that was going to affect a lot of people, so when we decided to call it version 4.0, I was a little nervous. I stared at the code base and it stared back at me; was it really possible ship a major new revision of a product and add features to it, and maintain my sanity?
I decided in my naïvety to refactor a huge bunch of things. I held conference calls with other developers to talk about what needed to change in our plugin API, and how I was going to redo half of the internals so it wouldn’t suck anymore. Heads nodded; I was happy. After about two weeks of being pleased with myself and ripping up many of our central classes, reality set in as I realised that I was very far behind on implementing all the new features, because those two weeks were spent on nothing else but refactoring. After doing time estimation on all the tasks we had planned out for 4.0 and realising that we were about within one day of the target date, I realised we were completely screwed, because nobody sane does time estimation for software without multiplying the total estimate by about 1.5-2x longer. 4.0 was going to take twice as long as we thought it would, and since the feature list was not fixed, it was going to take even longer than that.
So, the refactoring work was dropped, and we concentrated on adding the new required features, and porting the bugfixes from the 3.6 versions to 4.0. So, now we ended up with half-refactored code, which is arguably just as bad as no refactored code. All the best-laid plans that I had to clean up the code base went south, as we soldiered on towards feature completion for 4.0, because we simply didn’t have the time. I ended up working literally up until the last hour to get 4.0 to code completion state, and made some executive decisions to pull some features that were just too unstable in their current state. Quick Look support was pulled an hour and a half before the release as we kept finding and fixing bugs with it that crashed RapidWeaver while saving a document, which was a sure-fire way to lose customers. Ultimately, pulling Quick Look was the correct decision. (Don’t worry guys, it’ll be back in 4.0.1, without any of that crashing-on-save shenanigans.)

So, last Thursday, it became reality: RapidWeaver 4.0 shipped out the door. While I was fighting against the code, Dan, Aron, Nik and Ben were revamping the website, which now absolutely bloody gorgeous, all the while handling the litany of support requests and being their usual easygoing sociable selves on the Realmac forums. I was rather nervous about the release: did we, and our brave beta testers, catch all the show-stopper bugs? The good news is that it seems to be mostly OK so far, although no software is ever perfect, so there’s no doubt we’ll be releasing 4.0.1 soon (if only to re-add Quick Look support).
A day after the release, it slowly dawned on me that the code for 4.0 was basically my baby. Sure, I’d worked on RapidWeaver 3.5 and 3.6 and was the lead coder for that, but the 3.5 and 3.6 goals were much more modest than 4.0. We certainly had other developers work on 4.0 (kudos to Kevin and Josh), but if I had a bad coding day, the code basically didn’t move. So all the blood, sweat and tears that went into making 4.0 was more-or-less my pride and my responsibility. (Code-wise, at least.)
If there’s a point to this story, I guess that’d be it: take pride and responsibility in what you do, and love your work. The 4.0 code base still sucks, sitting there sniggering at me in its half-refactored state, but we’ve finally suffered the consequences of its legacy design for long enough that we have no choice but to give it a makeover with a vengeance for the next major release. Sooner or later, everyone pays the bad code debt.
So, it’s going to be a lot more hard work to 4.1, as 4.1 becomes the release that we all really wanted 4.0 to be. But I wouldn’t trade this job for pretty much anything else in this world right now, because it’s a great product loved by a lot of customers, and making RapidWeaver better isn’t just a job anymore, it’s a need. We love this program, and we wanna make it so good that you’ll just have to buy the thing if you own a Mac. One day, I’m sure I’ll move on from RapidWeaver to other hopefully great things, but right now, I can’t imagine doing anything else. We’ve come a long way from RapidWeaver 3.5 in the past two years, and I look forward to the long road ahead for RapidWeaver 5. Tick ✓.
Xcode Distributed Builds Performance
[Sorry if you get this post twice—let’s say that our internal builds of RapidWeaver 4.0 are still a little buggy, and I needed to re-post this ;)]
Xcode, Apple’s IDE for Mac OS X, has this neat ability to perform distributed compilations across multiple computers. The goal, of course, is to cut down on the build time. If you’re sitting at a desktop on a local network and have a Mac or two to spare, distributed builds obviously make a lot of sense: there’s a lot of untapped power that could be harnessed to speed up your build. However, there’s another scenario where distributed builds can help, and that’s if you work mainly off a laptop and occasionally join a network that has a few other Macs around. When your laptop’s offline, you can perform a distributed build with just your laptop; when your laptop’s connected to a few other Macs, they can join in the build and speed it up.
There’s one problem with idea, though, which is that distributed builds add overhead. I had a strong suspicion that a distributed build with only the local machine was a significant amount slower than a simple individual build. Since it’s all talk unless you have benchmarks, lo and behold, a few benchmarks later, I proved my suspicion right.
- Individual build: 4:50.6 (first run), 4:51.7 (second run)
- Shared network build with local machine only: 6:16.3 (first run), 6:16.3 (second run)
This was a realistic benchmark: it was a full build of RapidWeaver including all its sub-project dependencies and core plugins. The host machine is a 2GHz MacBook with 2GB of RAM. The build process includes a typical number of non-compilation phases, such running a shell script or two (which takes a few seconds), copying files to the final application bundle, etc. So, for a typical Mac desktop application project like RapidWeaver, turning on shared network builds without any extra hosts evokes a pretty hefty speed penalty: ~30% in my case. Ouch. You don’t want to leave shared network builds on when your laptop disconnects from the network. To add to the punishment, Xcode will recompile everything from scratch if you switch from individual builds to distributed builds (and vice versa), so flipping the switch when you disconnect from a network or reconnect to it is going to require a full rebuild.
Of course, there’s no point to using distributed builds if there’s only one machine participating. So, what happens when we add a 2.4GHz 20” Aluminium Intel iMac with 2GB of RAM, via Gigabit Ethernet? Unfortunately, not much:
- Individual build: 4:50.6 (first run), 4:51.7 (second run)
- Shared network build with local machine + 2.4GHz iMac: 4:46.6 (first run), 4:46.6 (second run)
You shave an entire four seconds off the build time by getting a 2.4GHz iMac to help out a 2GHz MacBook. A 1% speed increase isn’t very close to the 40% build time reduction that you’re probably hoping for. Sure, a 2.4GHz iMac is not exactly a build farm, but you’d hope for something a little better than a 1% speed improvement by doubling the horsepower, no? Gustafson’s Law strikes again: parallelism is hard, news at 11.
I also timed Xcode’s dedicated network builds (which are a little different from its shared network builds), but buggered if I know where I put the results for that. I vaguely remember that dedicated network builds was very similar to shared network builds with my two hosts, but my memory’s hazy.
So, lesson #1: there’s no point using distributed builds unless there’s usually at least one machine available to help out, otherwise your builds are just going to slow down. Lesson #2: you need to add a significant amount more CPUs to save a significant amount of time with distributed builds. A single 2.4GHz iMac doesn’t appear to help much. I’m guessing that adding a quad-core or eight-core Mac Pro to the build will help. Maybe 10 × 2GHz Intel Mac minis will help, but I’d run some benchmarks on that setup before buying a cute Mac mini build farm — perhaps the overhead of distributing the build to ten other machines is going to nullify any timing advantage you’d get from throwing another 20GHz of processors into the mix.
Dick Gabriel on A Lot More Than Lisp
If you love programming, and especially if you love programming languages, there’s an episode of the Software Engineering Radio podcast that has a fantastic interview with Dick Gabriel, titled “Dick Gabriel on Lisp”. If you don’t know who Gabriel is, he’s arguably one of the more important programming language people around, is one of the founding fathers of XEmacs (neé Lucid Emacs), wrote the famous Worse is Better essay (along with a pretty cool book that I’ll personally recommend), and also gave one of the most surreal and brilliant keynotes that I’ve ever heard that received a standing ovation at HoPL III and OOPSLA.
The episode’s about fifty minutes long, and Gabriel talks about a lot more than just Lisp in the interview: among other things, he gives some major insight into the essence of object-oriented message-passing, how functions are objects and objects are functions, what continuations are, metacircularity, and the relationship between XML and S-expressions (and why XML is just a glorified half-assed version of Lisp). There’s also some great stories in the interview for computing historians: how the Common Lisp Object System was initially inspired by Scheme and the original Actor language (yep, “actors” as in “Erlang processes”), what AI research was like in the 1960s and ’70s, and the story of how John McCarthy and his students implemented the first Lisp interpreter in one night.
A wonderful interview, and well worth listening to if programming languages is your shindig.
Justice Kirby on Code as Law
I found this short article about law and code on builderAU rather interesting. The money quotes are the first and last paragraphs:
Technology has outpaced the legal system’s ability to regulate its use in matters of privacy and fair use rights, said Kirby… “We are moving to a point in the world where more and more law will be expressed in its effective way, not in terms of statutes solidly enacted by the parliament… but in the technology itself—code,” said Kirby.
I think that’s a great quote, and it shoes that Justice Kirby has a pretty solid understanding of what code is, how it interacts with law, and that the USA’s Digital Millennium Copyright Act (DMCA)—and Australia’s Digital Agenda Act—are dangerous things. (I do think that the builderAU article emphasis on Google and Yahoo being the two culprits seem odd, although it’s hard to say this without listening to Kirby’s original speech.) I’ve always been a fan of Justice Kirby, and it’s nice to know that somebody-on-high understands that code-as-law is a problem, and it’s a complex one.
Yay, New Computing Books
I guess I'll be doing some bedtime reading for the next few weeks. (Note that I'm not actually a games programmer by trade—nor really a C++ programmer these days—but games coding tends to have interesting constraints such as high performance and memory management, which encourages a much better understanding of lower-level problems.) I'm a little of the way through Refactoring to Patterns, and it's great so far.
In other news, I think these three books in a row fit the definition of Alanic rather well:
Seriously, I didn't move 'em next to each other or anything. I especially love it how Java in a Nutshell looks like it's about 1,000 pages. Nutshell my arse.
Lines of Code
Haskell magician Don Stewart echoes my own opinion on lines of code in his Haskell Workshop demo on xmonad:
A couple of nice refactorings happened when we found data structures that fit better, and you dramatically drop down in the number of lines of code. So we use lines of code as a bit of a heuristic for working out when code sucks. If something gets really big, it probably needs to be rewritten.
I’m staring at a 60,000 line code base right now that I’m positive could be under 30,000 lines of code if it had a good rewri… erm, refactoring. Sometimes when you can’t figure out what’s going on in a function that’s a thousand lines long, the best solution is to rewrite the thing in a hundred lines instead. (And that time, I really did mean rewrite, not refactor.)
Update: Steve Yegge writes a good essay about code size, and believes that “the worst thing that can happen to a code base is size”. (If only he applied that principle to his blog posts as well…)
Raganwald on Geek Attitude
Reg Braithwaite has said very eloquently something I’ve been meaning to express for a long time:
When someone says something outrageous, like “f*ck compilers and their false sense of security”, it is not important whether I happen to think that programming languages with strong, expressive type systems are valuable (hint: I do). What is important is to look at this statement and ask yourself: Is there just one thing in there, one kernel of wisdom that I can extract and use to be a better programmer?
I wrote about geek culture and criticism earlier, but Braithwaite knocks it up a notch and hammers the point home in a single paragraph. To use an analogy, being a good geek is like being a good partner in a relationship… step one: listen. Step two: empathise. (Step three: profit!)
On D&D and C++
Erlang Interview in BuilderAU
It seems that I’ve been interviewed by BuilderAU about The Importance of Being Erlang. (And I quite like the pun, even if it wasn’t my idea.) Feedback on it is always welcome, of course—shoot me an email if you have any questions.
And yeah, I know I haven’t been blogging in nearly three months; I’ve been secretly working on an image editor in my spare time. (Sorry, that was a small Mac developer in-joke.) I’ve got nearly two dozen articles that are half-written here that I’ve been meaning to finish. However, there’s nothing like being sick to make you catch up on reading books and updating your blog. (… and play more Bioshock, even if it basically just a normal-mapped phong-shaded bastard child of System Shock 2. All hail SHODAN!)
Linux Audio and the Paradox of Choice
Mike Melanson—a primary author of the Linux Flash plugin, xine, ffmpeg, and a general crazy-good multimedia hacker—on the state of Linux Audio APIs:
There are 2 primary methods of sending audio data to a DAC under Linux: OSS and ALSA. OSS came first; ALSA supplanted OSS. Despite this, and as stated above, there are numerous different ways to do the DAC send. There are libraries and frameworks that live at a higher level than OSS and ALSA. In the end, they all just send the data out through OSS or ALSA.
The zaniest part is that some of these higher level libraries can call each other, sometimes in a circular manner. Library A supports sending audio through both OSS and ALSA, and library B does the same. But then library A also has a wrapper to send audio through library B and vice versa. For that matter, OSS and ALSA both have emulation layers for each other. I took the time to map out all of the various libraries that I know of that operate on Linux and are capable of nudging that PCM data out to a DAC:
Barry Schwartz would be shaking his head, methinks. And yes, I’m well aware of efforts to unify this mess. That doesn’t excuse that this jungle has been the state of Linux audio for the past ten years. I love the comments too: instead of admitting how dumbass this is, they give suggestions for using even more APIs (“try KDE4’s Phonon! That’ll fix everything!”)… totally missing the amusing irony, and also missing the point that Mike needs something that works on as many Linux distributions as possible.
Geek Culture and Criticism
What’s happened to Kathy Sierra, and what she wrote about angry and negative people, inspired me to write a bit, so let me indulge myself a little. I live in the computing community, with other like-minded geeks. Computing geeks have a (deserved) reputation for being a little negative. This is not without cause: there’s a lot of things wrong in our world. A lot of the technology we use and rely on every day is brittle and breaks often, and as Simon Peyton-Jones says, we’re quite often trying to build buildings out of bananas. Sure, you can do it, but it’s painful, and it’s downright depressing when the bricks are just over there, just out of reach. Our efforts for releasing software is often met with never-ending bug reports and crash reports, and it’s quite sobering looking at our task trackers.
It’s impossible to resist ragging on something or abusing something. This is part of geek computing culture. We have to work with a lot of crap, so it’s easy to be critical and complain about everything around you. However, from this day forward, I’m going to try to at least make any criticism not totally destructive. (I don’t think I’m vitriolic, mind you, but I’ll make a conscious effort to be more constructive now.) Wrap it up in some humour; offer some suggestions or alternatives. Resist using inflammatory language as much as you can when you’re personally attacked, or simply walk away from it. Re-read everything you write and think “Is what I’m writing simply making people more bitter? Is it actually worth somebody’s time to read this?”
Be more gentle with your language and kinder to your fellow netizens. Don’t participate in flamewars. Don’t join the mob mentality and rail on Microsoft or C++ or Ruby or Apple or Linux when everyone else does. (You’re meant to be a scientist after all, aren’t you?) Break away from that self-reinforcing sub-culture that often comes with being a geek.
Now that I’ve got that off my chest, back to work!
Movable Type's Export File Format
Here are a short list of things that possess more elegance than Movable Type’s export file format:
- XML,
- SMTP,
- the C string API,
- the C multibyte string API (mbsinit, wcrtomb, mbsnrtowcs, etc),
- the C++ grammar specification,
- C++ template error messages,
- the BIND zone file format,
- Bourne shell parameter expansion involving spaces,
- PHP,
- CSV,
- GNU libtool,
- wGetGUI,
- POSIX regular expressions,
- MPEG-7,
- the
mplayercode base, - the Cisco VPN client,
- the
ld(1)manpage on the UNIX system of your choice, - the
sudoers(5)manpage, - Makefiles generated by GNU autogoats,
- Eric S. Raymond,
-
ICCCM, - pretty much everything.
Feel free to extend this list in the comments.
UCS-2 vs UTF-16
I always used to get confused between UCS-2 and UTF-16. Which one’s the fixed-width encoding and which one’s the variable-length encoding that supports surrogate pairs?
Then, I learnt this simple little mnemonic: you know that UTF-8 is variable-length encoded1. UTF = variable-length. Therefore UTF-16 is variable-length encoded, and therefore UCS-2 is fixed-length encoded. (Just don’t extend this mnemonic to UTF-32.)
Just thought I’d pass that trick on.
1 I’m assuming you know what UTF-8 is, anyway. If you don’t, and you’re a programmer, you should probably learn sometime…
Computing Heroes
I was chatting to a mate of mine about a remarkable book that I found the other day:

One of the greatest intellectuals of our century writes about computing systems and fundamental aspects of the brain. What’s there not to like here? I’m only halfway through the book, and it’s already got so much worthy material in it that I will recommend it to any other computing folks. It’s worth it for the Foreword alone.
Alas, von Neumann passed on a while ago. Right after our discussion, I find out that John Backus passed away last Saturday. Phil Windley comments that “Computer Science has always been a discipline where the founders were still around. That’s changing.”
Arguably computing’s most famous face-person right now is Bill Gates. I don’t see Gates being famous as bad: after all, the guy is a multi-billionaire, which naturally gives him a little bit of a reputation, and his philanthropic acts are to be admired even if one despises his business tactics. However, what does the greater public know about our real heroes? Alan Turing? John von Neumann? Grace Hopper? Alan Kay? John Backus? Donald Knuth? Edsgar Dijkstra? Doug Engelbart?
I remember when John Shepherd taught me CS2041 at university, he spent 5 minutes at the start of each lecture talking about “famous geeks” and what they did for our industry. We need to educate ourselves as an industry and learn and respect what these folks did; go back to our roots; respect our elders. I’d wager that a lot more mathematicians know about Bertrand Russell and Leonhard Euler than self-described programmers and computing geeks know about Alan Turing and Edsgar Dijkstra.
If you’re a programmer (or even if you’re not), go to Wikipedia’s list of Turing Award winners sometime and just start reading about people you don’t know, starting with the man who the Turing award’s named after. (I’m ashamed to say that I only recognise a mere 22 out of 51 names of the Turing Award winners, and I’m scared to think that I’m probably doing a lot better than a lot of other folks.)
I understand that people such as Knuth and Dijkstra made specialised contributions to our field, and that the greater public won’t particularly care for them (in the same way that a lot of the general public won’t know about Bertrand Russell or even Euler, but they’re known by pretty much every single mathematician). However, there are lots of computing legends who we can talk about at dinner with all our non-geek friends and family. Go get Doug Engelbart’s Mother of All Demos or Ivan Sutherland’s Sketchpad demo and show it to your friends. Tell your family about the role that Turing played in World War II, and the amusing story of Grace Hopper finding an actual bug inside her computer.
As Dijkstra said, “in their capacity as a tool, computers will be but a ripple on the surface of our culture. In their capacity as intellectual challenge, they are without precedent in the cultural history of mankind.” Computing is one of the most important things to emerge from this entire century. I hope that in twenty years’ time, at least Alan Turing will be a household name alongside Bill Gates. Let’s do our part to contribute to that.
Objective-C Accessors
I like Objective-C. It’s a nice language. However, having to write accessor methods all day is boring, error-prone, and a pain in the ass:
- (NSFoo*) foo{ return foo;}- (void) setFoo:(NSFoo* newFoo){ [foo autorelease]; foo = [newFoo retain];}I mean, c’mon. This is Objective-C we’re talking about, not Java or C++. However, until Objective-C 2.0’s property support hits the streets (which, unfortunately, will only be supported on Mac OS X 10.5 and later as far as I know), you really have to write these dumb-ass accessors to, well, access properties in your objects correctly. You don’t need to write accessors thanks to the magic of Cocoa’s Key-Value Coding, but it just feels wrong to access instance variables using strings as keys. I mean, ugh—one typo in the string and you’ve got yourself a problem. Death to dynamic typing when it’s totally unnecessary.
As such, I got totally fed up with this and wrote a little script to generate accessor methods. I’m normally not a fan of code generation, but in this case, the code generation’s actually designed to be one-shot, and it doesn’t alter the ever-picky build process. It’s meant to be used in Xcode, although you can run it via the commandline too if you like.
Given the following input:
int integerThing;NSString* _stringThing;IBOutlet NSWindow* window;It will spit out the following:
#pragma mark Accessors- (int) integerThing;- (void) setIntegerThing:(int)anIntegerThing;- (NSString*) stringThing;- (void) setStringThing:(NSString*)aStringThing;- (NSWindow*) window;- (void) setWindow:(NSWindow*)aWindow;%%%{PBXSelection}%%%#pragma mark Accessors- (int) integerThing{ return integerThing;}- (void) setIntegerThing:(int)anIntegerThing{ integerThing = anIntegerThing;}- (NSString*) stringThing{ return _stringThing;}- (void) setStringThing:(NSString*)aStringThing{ [_stringThing autorelease]; _stringThing = [aStringThing copy];}- (NSWindow*) window{ return window;}- (void) setWindow:(NSWindow*)aWindow{ [window autorelease]; window = [aWindow retain];}There’s a couple of dandy features in the script that I find useful, all of which are demonstrated in the above output:
- It will detect whether your instance variables
start with a vowel, and write out
anIntegerinstead ofaIntegeras the parameter names for the methods. - It will
copyrather thanretainvalue classes such as NSStrings and NSNumbers, as God intended. - For all you gumbies who prefix your instance variables with a leading underscore, it will correctly recognise that and not prefix your accessor methods with an underscore.1
- IBOutlet and a few
other type qualifiers (
__weak,__strong,volatileetc) are ignored correctly. - It will emit Xcode-specific
#pragma markplaces to make the method navigator control a little more useful. - It will emit Xcode-specific
%%%{PBXSelection}%%%markers so that the accessor methods meant to go into your.mimplementation file are automatically selected, ready for a cut-and-paste.
Download the objc-make-accessors
script and throw it into your
“~/Library/Application Support/Apple/Developer
Tools/Scripts” folder. If you don’t have one
yet:
Done. You should now have a Scripts menu in Xcode with a new menu item named “IVars to Accessor Methods”. Have fun.
1 Note that older versions of the Cocoa Coding Guidelines specified that prefixing instance variables with underscores is an Apple-only convention and you should not do this in your own classes. Now the guidelines just don’t mention anything about this issue, but I still dislike it because putting underscores every time you access an instance variable really lowers code readability.
Cocoa Users Group in Sydney
To all the Mac users out there: would you be
interested in a Cocoa Users’ Group in Sydney? If so,
please drop me an email—my address is at the bottom
of the page—and if there’s enough numbers, perhaps we
can organise something. The idea’s to have a local
forum for geekupsmeetups, random
presentations, mailing lists, and all that sort of
fun stuff.
Oh yeah, and please also let me know your self-described level of expertise: none, novice, intermediate, expert.
(For those who closely track the Cocoa scene in Australia: yep, this is the same call for interest that Duncan Campbell has initiated.)
Static and Dynamic Typing: Fight!
It’s rare that I find a good, balanced article on the (dis)advantages of static vs dynamic typing, mostly because people on each side are too religious (or perhaps just stubborn) to see the benefits of the other. Stevey’s blog rant comparing static vs dynamic typing is one of the most balanced ones that I’ve seen, even if I think half his other blog posts are on crack.
I lean toward pretty far toward the static typing
end of the spectrum, but I also think that dynamic
typing not only has its uses, but is absolutely
required in some applications. One of my favourite
programming languages is Objective-C, which seems to
be quite unique in its approach: the runtime system
is dynamically typed, but you get a reasonable amount
of static checking at compile-time by using type
annotations on variables. (Survey question: do you
know of any Objective-C programmers who simply use
id types everywhere, rather than the more
specific types such as NSWindow* and
NSArray*? Yeah, I didn’t think so.) Note
that I think Objective-C could do with a more a
powerful type system: some sort of parameterised type
system similar in syntax to C++ templates/Java
generics/C# generics would be really useful
just for the purposes of compile-time
checking, even though it’s all dynamically typed at
runtime.
One common thread in both Stevey’s rant and what I’ve personally experienced is that dynamic typing is the way to go when your program really needs to be extensible: if you have any sort of plugin architecture or long-lived servers with network protocols that evolve (hello Erlang), it’s really a lot more productive to use a dynamic typing system. However, I get annoyed every time I do some coding in Python or Erlang: it seems that 50% of the errors I make are type errors. While I certainly don’t believe that static type systems guarantee that “if it compiles, it works”, it’s foolish to say that they don’t help catch a large class of errors (especially if your type system’s as powerful as Haskell’s or Ocaml’s), and it’s also foolish to say that unit tests are a replacement for a type system.
So, the question I want to ask is: why
are programming languages today so polarised into
either the static and dynamic camp? The only
languages I know of that strive to accommodate for
the benefits of both are Objective-C, Perl (though
I’d say that writing Perl without use
strict is an exercise in pain, since its only
three types are scalars, arrays and hashes), and
(gasp)
Visual Basic. Programming languages and
programming language research should’ve
looked at integrating static and dynamic typing a
long time ago. C’mon guys, it’s obvious to
me that both approaches have good things to
offer, and I ain’t that smart. I think a big reason
they haven’t is largely for religious reasons,
because people on both sides are too blinded to even
attempt to see each other’s point of view. How many
academic papers have there been that address this
question?
I hope that in five years, we’ll at least have one mainstream programming language that we can write production desktop and server applications in, that offer the benefits of both static and dynamic typing. (Somebody shoot me, now I’m actually agreeing with Erik Meijer.) Perhaps a good start is for the current generation of programmers to actually admit that both approaches have their merit, rather than simply get defensive whenever one system is critiqued. It was proved a long time ago that dynamic typing is simply staged type inference and can be subsumed as part of a good-enough static type system: point to static typing. However, dynamic typing is also essential for distributed programming and extensibility. Point to dynamic typing. Get over it, type zealots.
P.S. Google Fight reckons that dynamic typing beats static typing. C’mon Haskell and C++ guys, unite! You’re on the same side! Down with those Pythonistas and Rubymongers! And, uhh, down with Smalltalk and LISP too, even though they rule! (Now I’m just confusing myself.)
All Hail the DeathStation 9000
While learning about the immaculate DeathStation 9000, I came across the homepage of Armed Response Technologies. That page nearly had me snort coffee out of my nose on my beautiful new 30” LCD monitor. Make very sure you see their bloopers page.
(This is quite possibly the best thing to ever come out of having so much stupidit… uhh, I mean, undefined behaviour, in C.)
The Problem with Threads
If you haven’t had much experience with the wonderful world of multithreading and don’t yet believe that threads are evil1, Edward A. Lee has an excellent essay named “The Problem with Threads”, which challenges you to solve a simple problem: write a thread-safe Observer design pattern in Java. Good luck. (Non-Java users who scoff at Java will often fare even worse, since Java is one of the few languages with some measure of in-built concurrency control primitives—even if those primitives still suck.)
His paper’s one of the best introductory essays I’ve read about the problems with shared state concurrency. (I call it an essay since it really reads a lot more like an essay than a research paper. If you’re afraid of academia and its usual jargon and formal style, don’t be: this paper’s an easy read.) For those who aren’t afraid of a bit of formal theory and maths, he presents a simple, convincing explanation of why multithreading is an inherently complex problem, using the good ol’ explanation of computational interleavings of sets of states.
His essay covers far more than just the problem of inherent complexity, however: Lee then discusses how bad threading actually is in practice, along with some software engineering improvements such as OpenMP, Tony Hoare’s idea of Communicating Sequential Processes2, Software Transactional Memory, and Actor-style languages such as Erlang. Most interestingly, he discusses why programming languages aimed at concurrency, such as Erlang, won’t succeed in the main marketplace.
Of course, how can you refuse to read a paper that has quotes such as these?
- “… a folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs.”
- “I conjecture that most multi-threaded general-purpose applications are, in fact, so full of concurrency bugs that as multi-core architectures become commonplace, these bugs will begin to show up as system failures. This scenario is bleak for computer vendors: their next generation of machines will become widely known as the ones on which many programs crash.”
- “Syntactically, threads are either a minor extension to these languages (as in Java) or just an external library. Semantically, of course, they rhoroughly disrupt the essential determinism of the languages. Regrettably, programmers seem to be more guided by syntax than semantics.”
- “… non-trivial multi-threaded programs are incomprehensible to humans. It is true that the programming model can be improved through the use of design patterns, better granularity of atomicity (e.g. transactions), improved languages, and formal methods. However, these techniques merely chip away at the unnecessarily enormous non-determinism of the threading model. The model remains intrinsically intractable.” (Does that “intractable” word remind you of anyone else?)
- “… adherents to… [a programming] language are viewed as traitors if they succumb to the use of another language. Language wars are religious wars, and few of these religions are polytheistic.”
If you’re a programmer and aren’t convinced yet that shared-state concurrency is evil, please, read the paper. Please? Think of the future. Think of your children.
1 Of course, any non-trivial exposure to multithreading automatically implies that you understand they are evil, so the latter part of that expression is somewhat superfluous.
2 Yep, that Tony Hoare—you know, the guy who invented Quicksort?