RickyRobinson.id.au – This is me

Test driving Socrata for Open Data

Post author By ricky
Post date Monday, October 29, 2012
2 Comments on Test driving Socrata for Open Data

The Queensland Police Service recently published some data about reported offences from July 1997 to June 2012. I was interested to see how Socrata would handle that data. Here’s the results.

Note that the graph is only displaying the first few years worth of data. Also note that this experiment was conducted for personal enlightenment purposes only!

Innovation

Spork to Spin

In my last post, I documented the trials and tribulations of getting spork working properly with RSpec, Cucumber, SimpleCov and Mongoid, and a solution I devised to the problems. I also posted my solution to the SimpleCov issues list on GitHub.

Then Christoph Olszowka, the maintainer of SimpleCov introduced me to spin, a lightweight alternative to spork for fast Rails testing written by Jesse Storimer. It takes a different approach to DRb-based solution like spork, which use remote method invocations. Essentially, spin loads up Rails (but does not initialize it) in one process (spin serve), and then you submit RSpec or Test::Unit jobs to it over a unix socket using another process (spin push). This causes spin serve to fork a child process (in much the same way as spork), which loads up the relevant test framework (RSpec or Test::Unit), initializing Rails along the way. This splitting of loading and initializing Rails mirrors part of the spork solution in my previous post. Another thing about spin is that, unlike spork, it doesn’t require you to alter your spec_helper.rb or test_helper.rb file. It just works. It’s also tiny: one small executable ruby file. Simple solutions appeal to me greatly.

However (there’s always a however), it didn’t support Cucumber. Also, spin serve would load up only one test environment (either Test::Unit or RSpec, but not both). Also, the guard plugin for spin (guard-spin) is in its early days, and as such is a little immature. For instance, the Guardfile template it ships with doesn’t watch config/**/*.rb for changes so that it knows when to restart the spin serve process. And if you added that watcher in, it still wouldn’t work because the plugin is only expecting changes on application and test/spec code (i.e., so that it runs spin push). I guess you could always force a reload on the guard CLI, but I shouldn’t have to do that. The other thing with guard-spin is that the spin configuration block in Guardfile needs to watch the relevant files for RSpec and Test::Unit. And with me about to add Cucumber support, that was going to get long and messy.

So, yesterday, sick at home with the flu (which hasn’t yet let up, dammit!) and a nice fever, I set about modifying spin so that it would support Cucumber. But I wanted it so that one spin serve process would handle Cucumber and RSpec (and Test::Unit, but I don’t use it) at the same time. This is because managing one spin serve process through guard is much easier than managing several. Take a look at the guard-spork runner.rb, which needs to manage sporks and global sporks (?) and whatnot to enable multiple sporks to run at the same time listening on a priori agreed ports for the various testing frameworks.

Changing spin to enable this was fairly simple, because spin itself was already an elegant piece of code, IMHO. The other thing I needed to change in spin was the client side of things. Instead of issuing spin push <path1> <path2> ... to submit jobs to spin serve, I wanted to be able to issue spin rspec, spin cucumber or spin testunit with the paths as argument (and also with no paths as argument, so that spin serve would look in the default locations for specs/features/tests). This way the client selects which test framework to run. You just need to make sure you invoked the server with the right options to load the required test frameworks (using the --rspec, --cucumber and --testunit options to spin serve). Interestingly, this all seems to work even when Rails is running in threadsafe mode, which I was not able to achieve with my spork solution. My version of spin is over on GitHub, and I’m going to have a chat with Jesse about whether this is something he might want to pull back into his version of spin.

Of course, a simpler solution isn’t worth as much if it doesn’t retain the performance gains of the more complex solution. Recall that I was seeing a five- to six-fold improvement (in terms of user time) with spork over plain rspec. Here is the comparison between plain rspec and spin rspec on the same specs (and with SimpleCov enabled) that I tested my spork solution with:

Plain rspec:

real	0m9.499s
user	0m8.455s
sys 	0m0.990s

Now spin rspec:

real	0m4.915s
user	0m1.528s
sys 	0m0.106s

Awesome! About the same improvement as the DRb solution, but without the various bits of spork jujitsu and spec_helper.rb modification to make it work. Note that to conduct this very simple benchmark I used the --push-results switch with spin serve so that the output was written to STDOUT by the client rather than the server, which is how rspec --drb works.

That done, the next step was making it all work with guard. I modified guard-spin so that it only handles the spin serve side of things (just like guard-spork handles only starting/restarting/stopping spork), and created the rather ugly-named guard-spin_rspec and guard-spin_cucumber (just like guard-rspec and guard-cucumber). I haven’t done guard-spin_testunit. These guys watch the application and test code directories for changes, and then issue spin rspec|cucumber <path1> <path2> ... as necessary.

Here are the relevant bits of my Gemfile for getting this set up:

gem "spin", :github => "rickyrobinson/spin", :branch => "cucumber"
gem "guard", ">= 0.6.2"
gem "guard-spin", :github => "rickyrobinson/guard-spin", :branch => "cucumber"
gem "guard-spin_rspec", :github => "rickyrobinson/guard-spin_rspec"
gem "guard-spin_cucumber", :github => "rickyrobinson/guard-spin_cucumber"

And, this is the relevant Guardfile config, which you can include automatically with guard init spin|spin_rspec|spin_cucumber:

# Start the spin server with RSpec and Cucumber support, and report time for each run
guard 'spin', :cli => "--time --rspec --cucumber" do
  # Spin itself
  watch('config/application.rb')
  watch('config/environment.rb')
  watch(%r{^config/environments/.*\.rb$})
  watch(%r{^config/initializers/.*\.rb$})
  watch('Gemfile')
  watch('Gemfile.lock')
end

guard 'spin_rspec' do
  # RSpec
  # uses the .rspec file
  # --colour --fail-fast --format documentation --tag ~slow
  watch(%r{^spec/(.+)_spec\.rb$})
  watch('spec/spec_helper.rb')                        { "spec" }
  watch(%r{^app/(.+)\.rb$})                           { |m| "spec/#{m[1]}_spec.rb" }
  watch(%r{^app/(.+)\.haml$})                         { |m| "spec/#{m[1]}.haml_spec.rb" }
  watch(%r{^lib/(.+)\.rb$})                           { |m| "spec/lib/#{m[1]}_spec.rb" }
  watch(%r{^app/controllers/(.+)_(controller)\.rb$})  { |m| ["spec/routing/#{m[1]}_routing_spec.rb", "spec/#{m[2]}s/#{m[1]}_#{m[2]}_spec.rb", "spec/requests/#{m[1]}_spec.rb"] }
end

guard 'spin_cucumber' do
  # Cucumber
  watch(%r{^features/(.+)\.feature$})
  watch(%r{^features/support/.+$})          { 'features' }
  watch(%r{^features/step_definitions/(.+)_steps\.rb$}) { |m| Dir[File.join("**/#{m[1]}.feature")][0] || 'features' }
end

There are still a number of things that could be improved, so please go ahead and fork any of this stuff on GitHub. For instance, the client might try to push jobs to the server before the server has started up properly. This might be fixed by enabling the client or the server to create the socket. Also, it would be nice if guard-spin_[rpsec|cucumber] worked a bit more like guard-[rspec|cucumber] which are a little smarter about which specs/features to run when something changes. Ideally, it would be nice to modify cucumber and rspec so that they know about spin. Then we could just pass --spin instead of --drb and we could ditch guard-spin_[rpsec|cucumber]. It might also be better if the spin client determined whether results should be pushed back or not (i.e., the --push-results switch should be given to spin rspec|cucumber|testunit instead of spin serve). Anyhow, please give it a try.

Now to medicate myself.

Tags cucumber, guard, rails, rspec, spin, spork

Innovation

When spork puts a fork in your cucumber and a spanner in your specs

Post author By ricky
Post date Friday, July 20, 2012
1 Comment on When spork puts a fork in your cucumber and a spanner in your specs

TL;DR: Getting Rails, RSpec, Cucumber and SimpleCov to play nicely with spork is a pain. However, it is possible to get them all working together. Ensure config.cache_classes = true, that Rails threadsafe mode (config.threadsafe!) is not enabled, and then see my spec_helper.rb file below.

So, I’m hacking again. How I have missed it. It has been a long time since I’ve had a good solid run of a week to write some code, largely, but not completely, uninterrupted by things like grant writing and meetings. It’s been a lot of fun. But I did lose a day this week due to a not so fun problem in my testing setup…

With a colleague, I’m working on a Rails app that will serve as a platform for some research we want to do along with the boss, and will hopefully be useful in its own right, thereby enabling us to make lots of moolah, build an interplanetary spacecraft, and retire to a little ecodome on Mars. But we sort of just plunged in. No specs, no integration tests. Naughty. It was getting to a point where it was a pain in the butt to test things by opening a browser and refreshing web pages. It’s not always easy to track down the source of a problem when loading web pages manually is your only recourse. And it’s a damn slow way of operating. And since we hadn’t written our specs and features up front, it means I wasn’t being very disciplined in what code I was writing:

“Look, I just wrote a couple of hundred lines of code!”

“Great! Was it all necessary? What feature does it implement?”

“Oh.”

This may be a slight exaggeration of a situation that may or may not have actually occurred. The point is, it was time (well past time, actually) to say hello to my friends RSpec, Cucumber, Machinist and SimpleCov. Getting an initial set of specs and features written was easy enough. But then, because the specs and features were taking longer to run than I thought they should, I tried to improve the runtime of the tests with spork. Things went a bit belly up at this point.

The main symptom was getting “NameError: uninitialized constant” all over the place. My tests couldn’t find my models or controllers. I tried moving code from spork’s prefork block to the each_run block. No cigar. I tried explicitly requiring libraries in each_run. I added the app directory of the Rails project to the autoload path since I suspected the eager_load! hacks that spork-rails introduces might have been playing silly buggers with Mongoid. Nothing. And I tried requiring everything under the app directory explicitly in the each_run block, which got me part of the way, but bombed out on Devise-related stuff. I googled. There was lots to read. But mostly, it was about how to stop spork from preloading all your models when you’re using Mongoid in your persistence layer. At this point I’d have been happy if my models were being preloaded! I did find one trick that fixed my specs but made Cucumber complain: set config.cache_classes to false in test.rb.

Then I remembered having switched on config.threadsafe! in application.rb a week or two earlier to solve a problem with EventMachine, Thin and Cramp (yes, the Cramp stuff will be served from a different web server than the Rails stuff shortly, but let’s get back to the problem at hand, shall we?), and I wondered whether this was the source of my current problems. I took a look at what threadsafe mode actually does:

# Enable threaded mode. Allows concurrent requests to controller actions and
# multiple database connections. Also disables automatic dependency loading
# after boot, and disables reloading code on every request, as these are
# fundamentally incompatible with thread safety.
def threadsafe!
  @preload_frameworks = true
  @cache_classes = true
  @dependency_loading = false
  @allow_concurrency = true
  self
end

Hmm… That looked like a possible cause, since spork was probably relying on reloading code on each run. So, I shifted the call to config.threadsafe! from application.rb to production.rb and development.rb, so that threadsafe mode would be on for production and development environments but not for my test environment. Et voilà! RSpec was happy. Cucumber was (almost) happy. Spork was happy. And now that I’d figured out what to Google, it turns out others have been onto this issue, too, though nobody seems to have delved into it very deeply as far as I can tell.

There was one more problem with my Cucumber features, though. It was complaining about a Devise helper that it didn’t seem to know about. But someone had been there before in the RSpec context. To fix the problem, I added the following code to the bottom of the spork prefork block in my Cucumber env.rb:

  class DeviseController
    include DeviseHelper
    include ActionView::Helpers::TagHelper
    helper_method :devise_error_messages!
    helper_method :content_tag
  end

I thought my problems were all solved. But alas, I had forgotten to check whether SimpleCov was still working as expected. Nope. There are well known issues when SimpleCov is used with spork. The most glaring one is that SimpleCov just doesn’t get run under spork. Starting SimpleCov in the each_run block when using spork as suggested by a participant in a discussion about the issue on GitHub at least enabled SimpleCov to run. However, I was getting vastly different coverage reports depending on whether I was running under spork or not. That wasn’t cool. SimpleCov needs to be started prior to loading any code that you want coverage reports for, but config/initializers/*.rb are run in prefork, before SimpleCov was starting (since I was now starting SimpleCov in each_run), and one of my initializers was executing custom code I had put in the lib directory of our Rails project. So, what to do?

The fix wasn’t too difficult to figure out. Instead of requiring environment.rb in the prefork block, I required application.rb. This meant that I could load the app without running the initializers. I could then call the initialize! method on my application explicitly in the each_run block after starting SimpleCov. A problem with this is that after the each_run block exits, spork calls initialize! again, blowing things up. But if I didn’t call initialize! explicitly in each_run, I ran into the original NameError problem. So, I just stubbed out the initialize! method after calling it. Ugly, but works. Here’s the entire spec_helper.rb file complete with some Spork.trap_method tricks I pinched from a blog article to prevent Mongoid from preloading the models:

require 'spork'
#uncomment the following line to use spork with the debugger
#require 'spork/ext/ruby-debug'

Spork.prefork do
  ENV["RAILS_ENV"] ||= 'test'
  unless ENV['DRB']
    require 'simplecov'
    SimpleCov.start 'rails'
  end

  require 'rails/application'

  # Use of https://github.com/sporkrb/spork/wiki/Spork.trap_method-Jujutsu
  Spork.trap_method(Rails::Application, :reload_routes!)
  Spork.trap_method(Rails::Application::RoutesReloader, :reload!)

  require 'rails/mongoid'
  Spork.trap_class_method(Rails::Mongoid, :load_models)

  # Prevent main application to eager_load in the prefork block (do not load files in autoload_paths)
  Spork.trap_method(Rails::Application, :eager_load!)

  require Rails.root.join("config/application")

  # Load all railties files
  Rails.application.railties.all { |r| r.eager_load! }

  require 'rspec/rails'
  require 'email_spec'
  require 'rspec/autorun'

  RSpec.configure do |config|
    config.include(EmailSpec::Helpers)
    config.include(EmailSpec::Matchers)

    config.infer_base_class_for_anonymous_controllers = true

    # Clean up the database
    require 'database_cleaner'
    config.before(:suite) do
      DatabaseCleaner.strategy = :truncation
      DatabaseCleaner.orm = "mongoid"
    end

    config.before(:each) do
      DatabaseCleaner.clean
    end

  end
  ActiveSupport::DescendantsTracker.clear
  ActiveSupport::Dependencies.clear
end

Spork.each_run do
  if ENV['DRB']
    require 'simplecov'
    SimpleCov.start 'rails'
    Statusdash::Application.initialize!
    class Statusdash::Application
      def initialize!; end
    end
  end

  # Requires supporting ruby files with custom matchers and macros, etc,
  # in spec/support/ and its subdirectories.
  Dir[Rails.root.join("spec/support/**/*.rb")].each {|f| require f}
end

Even after this, rspec spec and rspec –drb spec were giving slightly different results, but I was very close. It seemed one of the files from the aforementioned code in the lib directory was still being preloaded. Why? Well, I intend to eventually split out the code in the lib directory into its own repo/gem, and so I had structured it as a gem. In preparation for this eventuality, instead of loading it by adding it to the Rails autoload_paths in application.rb, I added it to Gemfile using the :path option, which meant that the “root” file in the gem was being loaded by bundler in boot.rb. This meant, of course, it was loading in the prefork block. My fix for this was to add :require => false in the Gemfile for that “gem”, and then require it explicitly in the intializer file that used it (which was now not being run until after forking due to the above hacks in spec_helper.rb).

Bingo! SimpleCov was now giving exactly the same coverage results for my specs with and without spork.

Finally, I have RSpec, Cucumber and SimpleCov all running nicely (and speedily) under spork. For what it’s worth, I’m seeing a five to six times speed up in user time for my specs with spork compared to no spork, and I have them automatically run with guard, which along with mongod and some other background processes is in turn managed by foreman. Was it worth all the hassle? It won’t get me my ecodome on Mars, but it was an interesting challenge, my tests run faster, and hopefully this post will help someone else.

Note: this post was updated on 24 July, 2012 to correct the claim of a seven to eight times speed up to a five to six times speed up in terms of user time. Roughly 8.5 seconds compared to 1.5 seconds on a small suite of specs with SimpleCov enabled.

Tags cucumber, rails, rspec, simplecov, spork

Innovation

ICT: An industry for all!

Today I joined Group X to speak to year 10 students at St John’s Anglican College in Forest Lake about why there’s never been a better time to launch a career in tech. This post first appeared on the Group X blog, and it’s reposted here with their kind permission.

One of our favourite activities here at Group X is heading out and visiting students to spruik our favourite subject, ICT. This week we visited St John’s Anglican College on the south side of Brisbane for their careers day.

Joining Group X and the I Choose Technology project to talk to students was researcher at National ICT Australia (NICTA’s) Queensland Research Laboratory (QRL), Dr Ricky Robinson.

Responsible for the development of software prototypes, the author of numerous academic papers, and having previously worked in a number of software engineering roles, Dr Robinson said technology, and in particular software, is reinventing the world.

“If you look at any industry, from mining to health to construction to tourism, computer technology is the pivotal reason for advancing and improving efficiencies in each and every one of those industries,” Dr Robinson said. “What other field can you say that about?”

This is great news for ICT professionals, and those aspiring to work in the industry, as it means software and electrical engineers will be in high demand for a long time to come.

To inspire your students about ICT, Dr Robinson believes it is a field where students need to be encouraged to play.

“We all learn better by doing, and of course, learning from our mistakes.

“ICT is about inventing and innovating through design, creative and logical thinking,” Dr Robinson explained.

When asked what inspired Dr Robinson to get into ICT, he said it’s the feeling that technology has the ability to reinvent entire industries.

“Take book retail for example,” he said. “Amazon has been able to completely redefine what it is to be a bookseller because it looked at the business of selling books through the lens of technology.

“In fact, it’s arguably a software company that happens to sell books.”

The research side of ICT, however attracted Dr Robinson because of the many challenging problems facing the twenty-first century that if solved, would make the world a better place.

“NICTA, for example, is working on technologies for greatly expediting the discovery of geothermal energy sources, enabling people with certain kinds of blindness to see again, and improving emergency communication networks in disaster response scenarios,” Dr Robinson explained.

Now that is inspiring.

Tags education, group x, ict, it

Eco-philo-pol

Of Thanksgiving Turkeys and Black Swans

Post author By ricky
Post date Saturday, January 01, 2011
No Comments on Of Thanksgiving Turkeys and Black Swans

A couple of months ago I finished reading The Black Swan (TBS) by Nassim Nicholas Taleb. I suspect I’ll read it again sometime.

In a nutshell, TBS is about (un)predictability, uncertainty and knowledge. Karen and the kids bought me the second edition of TBS for Fathers’ Day. It’s the one with a lengthy postscript essay, which I thought was arguably the best part of the book. I was happy to read in the postscript (p. 333) that the author appreciates my rather slow reading of his book.

Uncertainty, TBS explains, is predominantly an epistemic problem, one that is subjective and one that the social sciences ought not model with conventional Gaussian methods. The propensity for Nobel prize-winning economists to wield bell curves is the target of much of Taleb’s disdain. Black Swans are those rare, unpredictable events that mathematics has no business in attempting to predict (i.e., because they’re unpredictable. Duh!).

Taleb contends that the concepts of probability and randomness as they are taught in universities by bow tie wearing academics, and used by all manner of practitioners are wholly unsuitable for application in most non-physical domains, like economics, policy and risk management. These are typically domains which are dominated by, often cumulative, human action. Sometimes, Taleb explains, these systems can be more appropriately modelled with power laws or fractal mathematics, which can render Black Swans grey; but these models are not intended to provide the concreteness of the more commonplace methods with which we’re familiar. More often, these systems ought not be modelled at all, particularly not with sophisticated mathematics or equations taken from physics text books, as they are Black Swan prone, and impermeable to these approaches.

With uncertainty’s epistemic roots, Taleb spends some time discussing some important aspects of knowledge. Knowledge is biased both in terms of its distribution and its verification. Consider the Thanksgiving turkey: it is fed day after day, given a place to roost (is that what turkeys do?) and generally cared for, until one day, chop! The turkey couldn’t have suspected this was coming. It’s an event lying totally outside its experience. A Black Swan. The butcher, on the other hand, knew what was coming all along. Not a Black Swan. There is an imbalance of knowledge here, highlighting the subjective nature of uncertainty. We don’t need to look far to find examples of this kind of uncertainty, and massively consequential historical events that illustrate the disproportionate impact that Black Swans have.

Knowledge is also governed by the confirmation bias: no matter how many pieces of evidence are collected in support of some theory or idea, only one piece of negative evidence is required to refute it. This is the basis for the Popperian notion of falsification, which is itself fundamental to the way science proceeds.

A related idea is that of silent knowledge, cheekily termed the “Casanova problem” by Taleb. This reflects the observation that we only remember confirmatory instances, the successes, and rarely the failures. Just think about startup companies. Look at Company A. They’re so successful because they did X, Y and Z, so we should do the same. Of course, there may be a graveyard full of companies that did X, Y and Z, too. The silent evidence. Likewise, Casanova didn’t live to tell his tale because he was particularly clever or immunised against misfortune; rather, probability tells us that a small number of playboy types from that era would survive their ordeals and thus feel indestructible, and perhaps go on to write a book or two about their experiences. But we don’t hear about those other Casanovas, who weren’t quite so lucky. This problem tends to make us blind to the real course of history.

Taleb moves on to describe how social systems are currently modelled by social scientists, and it is here that he is especially scathing. Economics, particularly academic economics, is full of phonies, says Taleb. Run from anyone who tells you that Brownian motion or Heisenberg’s uncertainty principle can model human behaviour. Or, if you’re not the running type, put a mouse down the back of their shirt when they least expect it. These things don’t model true randomness or uncertainty; they model a very tame version of it. This is, he says, evidenced by the fact that our coffee cups don’t jump off our coffee tables. Yet, the equivalent of jumping coffee cups happens with relatively high frequency in social systems (e.g., stock market crashes).

Rather than find false safety in econometrics and other phony methods, writes Taleb, we should heed the advice of that intuitive economic philosopher, Friedrich Hayek. In Hayek’s view, it is impossible for a central planner to aggregate all the pieces of data required to make a meaningful forecast of the economy and to plan a priori. Rather, the interactions between the individual agents in the system, who each hold knowledge, often tacit knowledge, of their own, result in a coherent, self-organised system — what we might call society (though Lady Thatcher mightn’t call it such). One way of looking at this idea is that locals can integrate local knowledge in a way that a central planner never could. The difficulty in central planning has been met by economists with increasingly “scientific” methods, but this creeping scientism, as it was called by Hayek, is just making matters worse according to Taleb. It is the scandal of prediction. Medical empiricism, evidence-based medicine or clinical medicine, is perhaps the field to which economists should look for inspiration, rather than to physics. Physics, funnily enough, is for the physical world, where its methods and models apply, and where the Gaussian and related distributions are observable in reality. But its models are often inappropriate for the social world.

TBS presents an idea born of a rich body of existing literature, but perhaps nobody in the relevant fields has articulated their ideas as colourfully and passionately as Taleb. I will say that while his narrative is colourful, and while it’s generally comprehensible by the amateur reader (like me), I did find his rambling style a bit hard to digest at times; the book doesn’t flow as well as it could have. Taleb can also be rather self-indulgent at times. Nevertheless, this is one of those books that any thinking person should get a hold of and read. Gift it to someone as a late Christmas present. In fact, my dad scored a copy of it today (New Year’s Day, 2011) for “Christmas”, as my parents just arrived in Brisbane from Cairns. I’ve lent my own copy out to someone, and I hope she remembers to bring it with her next time she travels to Brisbane so that I might lend it to another interested reader.

Tags philosophy, prediction, randomness, reading, taleb, uncertainty

Eco-philo-pol

The Australian and the new Battle of Jericho

Post author By ricky
Post date Sunday, October 17, 2010
3 Comments on The Australian and the new Battle of Jericho

When the Israelites crossed the River Jordan into the land of Canaan, they came upon the city of Jericho. God spoke to the leader of the Israelites, Joshua, saying he and seven priests should walk around the city once a day with the Ark of the Covenant, until the seventh day, at which time they were to march around the city seven times and then sound their ram’s horns. This Joshua and the priests did. The walls of Jericho collapsed, being particularly susceptible to bad music, enabling the Israelites to storm into the city, slaying every man, woman and child (barring the Canaanite traitor Rahab and her family, who had provided Joshua’s spies with shelter, and possibly other services). Once Joshua ensured Jericho was completely burned to the ground, he declared that anyone who attempted to rebuild the city would pay the price of their firstborn son (at the time it would seem, firstborn sons were routinely the subject of honour killings to “right” the apparent wrongs committed by their daddies, or were sacrificed to prove their daddies’ faith or allegiance to something, like a god for instance). A rather gruesome Biblical tale, but thankfully one that carbon dating and other methods have shown to be completely fictitious.

Fast forward a few thousand years to the present day, where a not altogether dissimilar battle is taking place. Sure, it doesn’t involve child sacrifices or indiscriminate killing. At least not yet. But bad music has been aired, walls have fallen down, people have been burned, and some sections of the Australian twittering class consider that threats have been made, if not against their firstborns, then against their “rights”.

For those few who don’t know, Grog’s Gamut is a blog written by a public servant, which came to prominence during the 2010 federal election campaign. The author of the blog remained anonymous until recently. He strongly criticised the news media’s coverage of the campaign, as many others were doing, but also made some suggestions for how to improve the coverage. The ABC decided to heed Grog’s advice, leading some sections of the media to conclude that discovering and revealing the true identity of the public servant behind Grog’s Gamut was in the public interest. On Monday, September 27, 2010, James Massola, the Joshua of our little story, wrote in The Australian that the public servant behind Grog’s Gamut was one Greg Jericho. So began the new Battle of Jericho, otherwise known as Groggate.

With the ram’s horns blown, and the Grog’s Gamut persona crumbling around him, Jericho has been exposed. Defending him and the right to anonymity are the Twitterati, armed with their virtual vuvuzelas loaded with 140 character bursts of noise, which make a worse racket than ram’s horns.

What’s got everyone so hot under the collar? Why shouldn’t Jericho have been unmasked? It’s an interesting question, but as we shall see, not the most important one.

The title of Massola’s (controversial) article is “Controversial political blogger unmasked as a federal public servant“. Besides the fact that we already knew beforehand that the Grog’s Gamut blog was written by a public servant, one must ask, what makes Jericho controversial, and in whose eyes is there a controversy? There is little evidence, as far as I can see, that anyone other than The Australian found any controversy whatever in the fact that Grog’s Gamut was written by a public servant. In fact, one is hard pressed to find evidence that The Australian itself considered Grog’s Gamut to be controversial. The only two references to Grog’s Gamut in The Australian I can find prior to September 27 is the article, written by Amanda Meade, detailing the events leading to the ABC’s change in the way they covered the 2010 election, and another written by Massola pointing out the increasing relevance of Twitter and blogs. Both articles paint Grog’s Gamut in a neutral-tending-towards-positive light. Not a hint of controversy anywhere. Tellingly, Massola writes of blogs and tweets:

And as Grog’s post shows they are increasingly relevant, whatever the identity of the poster.

(Emphasis mine.) This glaring absence of any prior mention of controversy in relation to Grog’s Gamut hints at mischief on the part of James Massola and The Australian. This post hoc rationalisation of the decision to out Jericho on the basis of public interest is a cloak weaved of the finest sanctimony, designed to obscure the newspaper’s real reason (if one can call it a reason) for revealing Jericho’s name: opportunism. There are at least two casualties of this decision, and they are Greg Jericho and journalism.

If it wasn’t The Australian who initially labelled the Grog’s Gamut blog controversial, was it the public? Jericho made no attempt to hide the fact that he was a public servant. Yet, I do not recall any public outcry in regards to his employment in that role when his blog gained a little bit of fame. Further, when Massola blew Grog’s cover, nothing of consequence occurred insofar as the original story: that the author of the Grog’s Gamut blog was a public servant by the name of Greg Jericho. On the contrary, the great bulk of discussion was and still is whether The Australian newspaper had done the right thing by outing Jericho. Thus, regardless of whether you’re on the side that says Jericho was fair game or the side that says his desire to remain anonymous ought to have been respected, the fact of the matter is that The Australian created the news rather than reported it (for, as we will see, the outing of Jericho was inconsequential, except, perhaps, to Jericho himself, who may now be considering some significant life changes).

Consider that, like Joshua and his priests who marched around the ancient city for six days before striking on the seventh day, Massola and The Australian had known Grog’s true identity for months prior to publishing it. Why?

Consider also that the consequence (or lack thereof) of Massola’s story would have been the same if Grog had turned out to be, not Greg Jericho, but a public servant by the name of Bill Bloggs or Jane Jones. There was nothing to gain in putting a real name to an anonymous blogger in this case, unless the blogger turned out to be Kevin Rudd, or someone similar. Then you’ve got a news story. It seems, therefore, that Massola and the self-ordained high priests of the Church of the Public Interest in fact acted in their own interest.

So, while the issue of anonymity on the interwebs is an interesting one, arguably the more serious issue is whether our major news outlets are able to recognise what constitutes news and what does not, and importantly, whether in reporting a non-newsworthy item, they inadvertently or purposefully become the news story.

Although the main subject of this article is not the issue of anonymity, let’s examine it briefly. Unlike the view expressed by Annabel Crabb, that in her ideal world disclosure of identity would be a rebuttable presumption (that seems like a dubious use of the term, but we know what she’s getting at), in my perfect world people would play the ball and not the person. That is, it would not matter who is saying something, but what matters is what is said. Anonymity is one of the cornerstones of peer review in many fields of science, for example, and some widely read news publications such as The Economist still observe the practice of publishing without by-lines. A strong argument does not resort to ad hominem attacks. Paul Graham’s article on “How to disagree” is an excellent resource to point your friends and enemies to, should you want to suggest to them that their argumentative skills are in need of some improvement. Note, however, that the sort of pseudonymity employed by Grog’s Gamut does not prevent ad hominem attacks. Although it prevented ad hominem arguments against Greg Jericho, whilst this pseudonymity lasted anyway, it did not prevent ad hominem attacks against Grog’s Gamut, an online identity built up over the lifetime of that blog. For instance, it’s still possible to level attacks of the “well, Grog would say that, because based on his/her previous posts he’s/she’s a raging lefty” kind. (Aside: It’s interesting to note how many of the attacks against The Economist focus on the their practice of writer anonymity, rather than on its content. Take this quote from American author Michael Lewis, for example: The magazine [sic] is written by young people pretending to be old people. If American readers got a look at the pimply complexions of their economic gurus, they would cancel their subscriptions in droves.)

We may identify, then, several distinct reasons a person may want to unmask an anonymous blogger:

I just want to know who it is, dammit!

I want to find out in case the blogger has a (real) conflict of interest, in which case I will report it.

I want to find out who it is so I can demonstrate my unmatched investigative skills or the scale of my professional network, thereby drawing attention to myself.

I want to know so I can launch into an ad hominem attack on them, or cause them some other sort of grief.

Annabel Crabb’s desire would seem to fall into the first category. The Australian‘s and Massola’s stated reasons are in the second, though, as I have argued, probably align with third. Clearly, the hullabaloo on Twitter shows that some people would argue their reasons are encroaching upon the fourth. I have not seen enough evidence to give support to the claim that The Australian‘s motivations fall into the fourth category.

Nevertheless, many are choosing to interpret The Australian‘s actions as a threat against bloggers, and anonymous ones critical of The Australian in particular: “look what we did to Jericho; watch it, or we’ll take your firstborn.”

Will I continue to read The Australian? Of course, because some sections of the paper are worth reading. I don’t always agree with the editorials, but many of them take a considered and principled view in my opinion (probably the ones written by Paul Kelly; I don’t know for sure, because I don’t really care about the by-lines). It’s not all bad (despite what some people might think). And, in any case, The Australian happened to be in a position to break the “story”, but there’s nothing to say that another news outlet wouldn’t have broken the story if they were in possession of the same information. As mentioned above, the ABC‘s Annabel Crabb seems to think that what The Australian did was kosher, so presumably she would have reported Grog’s true identity if she had known it. Would I recommend the The Australian to others as a credible primary news source? Not on current form, and certainly not in isolation. However, I’m seriously contemplating Nassim Nicholas Taleb’s advice of just not reading the news, period. I’ll let you know how that one goes.

Note: This post is late. Very late. Hopefully it’s still relevant to somebody.

Tags anonymity, blogging, groggate, news

Innovation

No startup culture in Australia

Post author By ricky
Post date Friday, June 04, 2010
No Comments on No startup culture in Australia

Occasionally I go back and read some of Paul Graham‘s past essays. I find them to be a source of enlightenment, mostly on issues surrounding startups. Some gems are consigned to the footnotes:

There are two very different types of startup: one kind that evolves naturally, and one kind that’s called into being to “commercialize” a scientific discovery. Most computer/software startups are now the first type, and most pharmaceutical startups the second. When I talk about startups in this essay, I mean type I startups. There is no difficulty making type II startups spread: all you have to do is fund medical research labs; commercializing whatever new discoveries the boffins throw off is as straightforward as building a new airport. Type II startups neither require nor produce startup culture. But that means having type II startups won’t get you type I startups. Philadelphia is a case in point: lots of type II startups, but hardly any type I.

Incidentally, Google may appear to be an instance of a type II startup, but it wasn’t. Google is not pagerank commercialized. They could have used another algorithm and everything would have turned out the same. What made Google Google is that they cared about doing search well at a critical point in the evolution of the web.

In this footnote alone there is a sizeable nugget of wisdom for any government or other innovation funding body outside of the Valley that cares to listen. Whether you see it as a good thing or a bad thing, it’s clear there is no startup culture in this country. I’d guess that a disproportionate number of ventures in Australia fall into Graham’s type II category: commercialising the results of academic research with no startup culture required and none produced. Notwithstanding the regulatory risk that often accompanies startups formed around a scientific breakthrough (think biotech and pharmaceuticals), the VCs that fund these sorts of ventures would typically shoulder less financial risk than their type I-loving Valley counterparts; there’s a surer trajectory for type II ventures because there are fewer unknowns. Another way of saying this is that series A funding for type II ventures (probably the most common kind of startup in Australia) is more like a series B or C round in the Valley.

The last part of the footnote above is perhaps most important. I hope that governments here don’t think that by allocating tax-payer funded block grants to pseudo-commercial technology “incubators” with an academic bent that a Google will pop out the other end. It could happen, but not by design. What these “investments” are more likely to produce is a steady trickle of good science resulting in the occasional type II startup. If that’s what’s intended, it’s all good, but let’s be clear about it! The creation of a Google by this means would be due more to luck than careful planning, and our current funding models certainly won’t trigger a self-sustaining chain reaction of startups.

Tags australia, startups

Innovation

RSpec: verifying model instance creation

Post author By ricky
Post date Wednesday, February 03, 2010
No Comments on RSpec: verifying model instance creation

UPDATE: I think this post may be a complete waste of time. Just stub out the valid? method on your model to return true or false depending upon what you’re testing. See Ryan Bates’ RailsCast on how he tests controllers. I’m a freaking idiot sometimes.

As a good little rspeccer, I try hard to write my specs to verify behaviour rather than any particular implementation of that behaviour, and, for the moment at least, I’m in the “isolate your controllers from the models” camp. If you’re not in that camp (i.e., you don’t mock and prefer to do functional testing alone), this post probably won’t interest you. One case I often had problems with was model instance creation. There are just so many darn ways to create a new model instance! For a few examples:

@order_item = OrderItem.new(:hi => "hem", :ho => "hum")
@order_item.save!

@order_item = order.order_items.create(:hi => "hem", :ho => "hum")

@order_item = OrderItem.some_custom_creation_method(:hi => "hem", :ho => "hum")

The Problem

When you’re writing your spec (up front, of course!), you don’t want to presume too much about how the implementation will unfold. So, do you stub the create method on the model class? But what if we implement using the new/save combo (as above)? Or, what if we create the model instance through an association?

My Solution

My first pass solution to this problem is the following, based on Matthew Heidemann’s association stubbing technique:

module Spec
  module Mocks
    module Methods
      def stub_creators!(association_name, klass, stubs = {}, valid = true)
        target_mock = Spec::Mocks::Mock.new(klass, {:save => valid, :valid? => valid}.merge!(stubs))
        target_mock.stub!(:save!).and_return do
          target_mock.save
          valid || raise(ActiveRecord::RecordNotSaved)
        end
        klass.stub!(:new).and_return(target_mock)
        klass.stub!(:create).and_return do
          target_mock.save
          target_mock
        end
        klass.stub!(:create!).and_return do
          target_mock.save!
          target_mock
        end
        mock_association = Spec::Mocks::Mock.new(association_name.to_s)
        mock_association.stub!(:create).and_return do
          target_mock.save
          target_mock
        end
        mock_association.stub!(:create!).and_return do
          target_mock.save!
          target_mock
        end
        mock_association.stub!(:build).and_return(target_mock)
        self.stub!(association_name).and_return(mock_association)
        target_mock
      end
    end
  end
end

What this is doing

The thinking here is that, when I write my spec, I don’t want to be concerned with whether the implementation takes the create, new/save, build/save, or other route. In my spec I just want to know that at some point the controller asked for a model instance to be created and saved. The above code, which I put in spec_helper.rb, allows my specs to do just that. Essentially, I stub save so that it returns true or false, depending upon the optional valid parameter, and the other creation stubs derive from that: save!, create, create! on the target association class and the association itself. I also stub out new and build for convenience. This code ensures that save is always called, even though we’ve stubbed out create etc. Now if I need to check that a controller action has caused a model instance to be created, I need only ever check that save is called.

An example

In my specs I call stub_creators! on an instance of the association owner (in our example, the association owner is Order), passing it the name of the association I want to stub (order_items), the model class of the association target (OrderItem), optional stubs for instances of the association target, and whether or not we want the returned model instance to be valid (defaults to true). With this in place, I can do this:

describe OrdersController do
  before(:each)
    @current_user = mock_model(User, :login => "me", :logged_in? => true)
    @order = @current_user.stub_creators!(:order_items, Order)
  end

  it "should create a new order_item" do
    @order.should_receive(:save).and_return(true)
    post 'create'
  end
end

And it doesn’t matter which route the implementation takes to create that model instance. As long as save is called at some point, I know the controller has triggered the creation of the instance somehow.

Thoughts

Now, while this seems to work for me, I don’t really know whether this is kosher. Is it a sensible approach to take? I haven’t tested this extensively; as I said, it’s a first pass. Also, there’s bound to be stuff missing from my solution (for example, it doesn’t handle find_or_create_by_). Can a similar approach be taken to the various ways to delete an object, too? I shall continue to experiment.

Tags rails, rspec

Random observations

Installing nokogiri on Mac OS X

Post author By ricky
Post date Monday, January 25, 2010
11 Comments on Installing nokogiri on Mac OS X

A quick search reveals that I’m not the only one who’s had difficulty installing the nokogiri Ruby gem on Mac OS X. The official docs recommend installing the fink or macports versions of libxml2, and so does this nokogiri tutorial over on the Engine Yard blog. I like macports. It’s a good way to stay up to date with the latest and greatest versions of everything, but I have this thing about trying to make things work with the libraries that come as standard on Mac OS X. I don’t know, maybe it’s that it reduces dependencies, or maybe I’m just strange.

Anyway, here’s how I got nokogiri to install under Snow Leopard without resorting to macports or fink:

sudo gem install nokogiri -- --with-xml2-include=/usr/include/libxml2 --with-xml2-lib=/usr/lib --with-xslt-dir=/usr

What’s weird is that, unless I’m mistaken, those paths are exactly where nokogiri should be looking for the relevant libxml2 files in the first place! I’m still to find out whether it all works as it’s supposed to. But installation is the first step! Let me know if it works for you.

My family and me

Introducing Claire

Our daughter, Claire Elise, was born on October 29, 2009. Here’s a photo of her, and another with her proud family.

Tags claire