Categories
Random observations

Rediscovering closures and nested functions

When you’ve spent years coding pretty much everything in Java, it’s hard to break out of the Java way of doing things. It means that you tend to forget that other languages might have things called closures, for example. Here’s how a closure looks in Python:


lambda x:dosomethingto(x,anothervariable)

The neat thing is that this closure can be passed around like a first class object. Also, anothervariable, bound in the outer scope, is accessible to the lambda expression when it is invoked later. You can do stuff like:


if isinstance(somevalue, int) or isinstance(somevalue, float):
  isfunny=lambda x:log(x)>somevalue
else:
  isfunny=lambda x:isalpha(x) and islower(x)

funnyitems=[item for item in mylist if isfunny(item)]
seriousitems=[item for item in mylist if not isfunny(item)]

Sure there are other ways to do the same thing. But this is arguably more elegant.

There’s also nested functions, which are kind of like closures. Rather than contriving an example, I’ll give one from the book I’m reading, Programming Collective Intelligence. In chapter 8, a crossvalidation function is defined. Forgetting the specifics and purpose of this function, just know that the crossvalidate function returns low numbers for good solutions and high numbers for bad solutions. Earlier in the book, we’d already been introduced to optimisation problems and cost functions. The cost functions take a single argument, which is a potential solution to be costed. The crossvalidate function takes a bunch of parameters, so for this and other reasons it can’t be used directly. But you can do something like this:


def createcostfunction(algf, data):
  def costf(scale):
    sdata=rescale(data,scale)
    return crossvalidate(algf,sdata,trials=10)
  return costf

So now your costf function knows about algf and data, despite not having them as parameters. That is, the bound variables to algf and data are available to costf when it is called at some later time:


costf=createcostfunction(knearest,mydata)
annealingoptimise(wdomain,costf,step=2)

So when annealingoptimise eventually invokes costf, costf has access to knearest and data. That is the bindings of algf to knearest and data to mydata live on after the execution of createcostfunction completes. Cool.

Categories
Random observations

Mod_wsgi

Forget my previous entry about mod_python. Just use mod_wsgi. It compiles without issues on the Mac (unlike mod_python, which is why I made a binary available for it), and seems to be the future for Apache/Python integration. See Graham’s comment on my last entry.

Categories
Random observations

Mod_python for the Mac

In case anyone’s interested, I’m making available a pre-built universal binary of mod_python 3.3.1 for Apache 2.2.8. I’m running Mac OS X Leopard 10.5.2 and Python 2.5.1. The DMG contains an installer package, and it will try to install mod_python to /usr/libexec/apache2, which is where the other apache modules are usually located. Use at your own risk.

You can download it here.

Categories
Innovation

Programming Collective Intelligence

Programming Collective IntelligenceI’ve been reading a fantastic book written by Toby Segaran called Programming Collective Intelligence: Building Smart Web 2.0 Applications. I’m about two thirds of the way through, but it’s so good that I’m not going to wait until I finish reading it before blogging it. Essentially, it’s a recipe book for machine learning algorithms that you’re likely to find under the hood of many successful modern web sites: clustering, support vector machines, decision trees, simulated annealing, Bayesian classification and so on. The AI course at uni was a bit light on in terms of statistical machine learning techniques, but this book makes up for it. All the code in the book is written in Python and can be downloaded from the author’s website. The algorithms in the book may prove to be highly useful for my work in ubiquitous computing, too.

Coincidentally, according to the most recent entry in his blog, Toby will be giving a talk on a topic sort of related to one I’ve been thinking about as a possible project at NICTA: Creating Semantic mashups: Bridging Web 2.0 and the Semantic Web.

It turns out that Toby is also a fan of GTD, and he’s written his own web based GTD tool. It doesn’t look much, but it’s gained some favourable reviews.

Categories
Innovation

Startup: what you said

So it turns out that quite a few readers of this weblog use Bloglines. For some reason, Bloglines stopped sucking down the RSS feed for this weblog after March 19 until three days ago. Did other feed readers experience similar difficulties with my blog? I know Google Reader continued to work, as did the RSS screensaver on my Mac. Anyway, that partially explains why I had no feedback on my hypothetical question about startups.

Thanks to those who did end up responding. Here’s some snippets of what you said, along with some feedback I got via e-mail in no particular order:

  1. My 2c – I don’t believe there’s that much inherent difference between the major web platforms – in my opinion you’re best off going with what you have the most experience with (and what the people you can get have experience with). You could probably lose 2-3 months learning a new platform (primarily learning it’s idioms and gotchas) and it’s not clear that you’re ever going to get that back. Having said that, I would tend to recommend against the embedded scripting languages (PHP, ASP, etc) on any project of significant size – it’s not so much that they can’t scale, but they strongly encourage non-scalable design by their nature (and it can be harder to find developers who understand the difference). A ‘well-designed’ PHP application will often actually include it’s own sub-templating language (e.g. Smarty), and treat PHP as a pure programming language.
  2. Now you have the prototype you pretty much know the functionality – “recoding” is all about achieving the chrome and non-functional requirements. So, maybe Erlang – excellent for reliability/uptime (incl. for hot-upgrading and failover) and scalability – the two most important -ilities for webapps. Frontend “yaws”, plus some javascript library, and/or maybe an web framework like “erlyweb”Backend – hand-coded erlang apps. Database – “mnesia” or maybe “couchdb”
  3. Thing is, there are big sites being run on all combinations of your options. If you just need *something*, write it in what you feel comfortable in, if it needs to scale later you can use your first iteration as a learning experience. Personally, I think you’d be crazy to write anything in java, you can’t be nimble in java. I also think php is a dead end, it’s just too bodgy. Python/Ruby is the only way to go imho. Funnily enough I saw no mention of javascript for your web 2.0 site :) I would suggest that none of the backend stuff matters at all, and that the only thing that matters is which javascript library you use in the front end.
  4. I assume that this is for a “friend” :-). But I shouldn’t assume.I’m afraid to say I can’t add that much. Just don’t have the knowledge. For this kind of thing, being able to easily tinker with and evolve the system seems an important critiera, if just one of the relevant criteria.
  5. it doesn’t matter what technology you use, because you’ll rewrite the whole thing several times anyway. pick what’s fastest to explore the idea and the market now. time to market, and reaction time once you’re there, cost far more than another 100 servers while you’re in the early stage.who are your partners (or your VC’s partners)? do they have an affinity with any particular technology? what are your friends best at? you’ll need a pool of expertise (and employees), so choose something to maximise that opportunity.
  6. I was going to answer but my intial answer seemed too stupid and it was going to take too long to come up with an intelligent well thought out answer for a hypothetical that did require me to stretch my imagination to the extreme. It’s sort of like what I am finding with some of my 1st years. In one the subjects I am teaching they have to do a lot of hypothetical work and most of the time, the results are utter disasters because they simply, simply can’t stretch their brains into comprehending scenarios so outside their sphere of “being”. Since founding a startup is far, far outside my sphere of ‘being’ I decided I would much rather play pokemon. Now if you ever want to know which pokemon is best to use against a ground-type pokemon….

Mostly very useful feedback, and amusing otherwise. It’s interesting how similar most of the feedback was. Agility, ability to tinker and swiftness of development were common themes. For at least the alpha and beta, I think it would be best to go with tools/platforms where you can put something together fairly quickly, and make changes quickly if your users tell you they’re looking for something a bit different. I’m not sure Java fits that description (although I’ve always been a Java nut). There’s at least a compile step and possibly a deploy step, depending upon your development environment, between making a small change in the code and seeing the result in your browser. Ruby looks cool, but it is still lagging way behind the other serious contenders in terms of performance. PHP could be a contender, but if the system ever got really big and you had new graduates working on the application, I’d bet you’d soon end up with a mess, with business logic stuck in the presentation code and so forth – I really do agree with the first comment above on that point. So I’ve got to say that Python is looking good right now, despite its Makefile-like treatment of white space. Coupled with Django, it might be a winner. There’s also the fact that Google have provided a nice playpen for Python-based web applications.

Once again, thanks all for your input. Please keep the advice and opinions coming if you have more to add. It’s much appreciated.

Update (06:49 19/04/2008): I’m not sure my comment about Ruby performance is entirely fair. The performance difference between Ruby and Python is nothing (Python is a few times faster on most tests) compared to the difference between Python/Ruby and Java, for example (where Java is one and sometimes several orders of magnitude faster). By this reasoning, if one is happy to sacrifice some runtime performance and use Python instead of Java, one presumably wouldn’t be too worried that Ruby is slightly worse than Python. And Ruby doesn’t do too badly in terms of memory usage. Besides, if one was really worried about performance, one would use C.

Categories
My family and me

MacKaren

Karen has joined the Rebel Alliance by getting a MacBook. And she loves it, though she claims “Vista made me do it!”

Categories
Innovation

An underwhelming response

So, after waiting a few weeks, I still have no responses on this blog entry. Okay, I got one reply by e-mail, not including the advice I received from friends before posting the blog article. Was I silly to think people might actually respond? (Chorus: “Yes, Ricky, you’re very silly!”)

Categories
Innovation

Startup: a hypothetical scenario

Picture yourself in the following situation. You’ve come up with what you think is a cool idea for a so-called web 2.0 site. Furthermore, you’ve managed to convince some VC types to invest some (pre-)seed funding – enough to develop a public beta. You developed a quick and dirty proof-of-concept to show the VCs, but now it has to be thrown away. You have to start development on the real thing from scratch.

The question is, what technologies, programming languages, tools and platforms are you going to use to implement your idea? Language-wise, do you go for Python, Java, PHP, Ruby, or something else? If you take the PHP route, how do you ensure maintainability in the long term? If decide on Java, do you use JSP, Velocity or Freemarker? Would you use Struts or Spring? Do you need any of these frameworks at all? Do you run on Linux, Free BSD, Windows or Mac OS X Server? Why?

To make this question at least partly answerable, imagine for the moment we’re just considering the presentation tier, and not any of the back end magic. Also imagine that what you’re developing is similar to one of today’s social networking sites (Facebook, Bebo, MySpace or something), and that visualisations (e.g., of directed graphs) might need to be generated dynamically from data in the back end. You can assume that the beta version will have a small number of types of dynamically generated pages (less than 10, say) but later versions will end up with many more.

Answers along the lines of “It’s much of a muchness, so I would choose X, Y and Z because they’re what I know”, “I’d choose X, Y and Z because the newly graduated computer science students I’d have to hire are most likely to be comfortable with those” and “X, Y and Z are nice but too expensive for my startup, so I’d choose A, B and C instead” are completely acceptable.

I’ve already got some great input from my closest friends (at least the programmers among them), but I’d like to get some responses from a wider audience. I’m hoping some ex-DSTC engineers/researchers might have an opinion on this; you don’t need to have worked at a startup to give useful feedback!

I’m asking this question out of pure curiosity, nothing more, and I have my own feelings on this (represented by the sample answers above). Please leave your answer as a comment below.

Categories
Random observations

Python

Newsflash: Python would be okay if whitespace wasn’t meaningful beyond separating tokens. List comprehensions are kind of nice.

Categories
Innovation

Death by bigness

Big companies will slowly suck the life out of you. That’s one way of summarising Paul Graham‘s latest essay. To maximise your freedom, he says, join a start-up or start one yourself. It’s a theory that I find very appealing.