Categories
Random observations

ACSC paper

It turns out that the reviews for the paper that we (Ryan, Jaga, Audun and myself) submitted to the Australian Computer Science Conference were lost, and that our paper was supposed to be accepted. This is quite unbelievable, but our paper got accepted, so I can’t complain too much. The organisers had to contact the reviewers of our paper to double-check that our paper was supposed to be accepted. It must have been some major glitch. Firstly, we received an e-mail saying our paper had been rejected. Secondly, the follow-up e-mail, which was supposed to contain our reviews, did not contain anything of the sort. Anyway, at this stage Ryan will be presenting our paper in Newcastle in a few months’ time.

Categories
Random observations

Technorati pinger

I’ve written an exceedingly simple and braindead shell script pinger for use with Technorati. It uses the POST (lwp-request) utility that comes with libwww-perl library to deliver the ping message. Usage is as follows:

technorati <blogname> <blogurl>

The shell script is invoked whenever I add a new entry to my weblog. It’s been working for me, but I give no guarantees. Use at your own risk and do with it what you will.

Categories
Random observations

Domain Daddy

So I’ve finally acquired my very own domain (rickyrobinson.id.au). I intend to care for it, and treat it with kindness and respect, and I promise it will never be neglected. It will be diligently tended to, and whenever it cries for help, I will come running. I want only the best for my domain so that it may bloom and grow… like Edelweiss.

Categories
Random observations

PageRank

After initially being told that a paper I wrote with Ryan, Jaga and Audun had been rejected from the Australian Computer Science Conference, it now appears that there was a glitch in the system and that our paper, in fact, was supposed to be accepted. We became aware of this today after I sent off a missive to the conference organisers telling them that they’d forgotten to attach the reviews of our paper. Anyway, more on that matter as further details become available.

Now that it appears the paper will be accepted, like a good little researcher, I started updating the paper for the camera-ready version. I came across the following claim in the paper regarding Google’s PageRank algorithm:

The algorithm reduces the rank of web pages with outgoing hyperlinks and increases the rank of web pages with incoming hyperlinks. This means that a page with few outgoing links and many incoming links will have a high rating.

I had previously flagged this description of the PageRank algorithm as being in need of revision. Therefore, I reviewed Brin’s and Page’s The Anatomy of a Large-Scale Hypertextual Web Search Engine (pdf) in the hope of distilling its contents and deriving a more satisfactory, yet still concise, description of PageRank. Unless I am mistaken, which is a distinct possibility, the original description is not as wildly off target as I thought. The key point to note is that the PageRanks form a probability distribution over web pages, so the sum of all web pages’ PageRanks will be one. In PageRank, a link from page A to page B is essentially a vote (the weight of this vote depends upon a number of factors, including the number of other links on page A and the PageRank of page A, but these factors are not directly relevant to this discussion). In linking to page B, then, page A has helped to increase page B’s PageRank. But since the sum of the PageRanks must equal one, the increase of B’s PageRank must necessarily come at the expense of a decrease in the PageRanks of other pages. In other words, there’s no such thing as a free lunch: somebody has to pay. The question is, does A alone incur the cost of linking to B? The answer is no, since the calculation of A’s PageRank is not dependent in any way on the number of outgoing links from A, as shown by the PageRank algorithm (which you can find in the manuscript linked above). So where does the extra Googlejuice imbibed by B come from? One way to think about it is this. The addition of a link from A to B slightly reduces the weight of all the other links in the global collection of links. Therefore, it is not A alone that pays the price. Rather, the cost is shared by all existing pages. Another way to think about this is using the "random surfer" model suggested by Page and Brin. The PageRank of A equates to the probability that a user who starts on a random page and clicks randomly on hyperlinks will visit A before boredom sets in, at which time the surfer will randomly select another page to start from. Notice that adding a link from A to B does not lower the probability that the surfer will visit A any more than it reduces the probability of the surfer visiting any other page. What is clear, however, is that the probability that the surfer will visit B has increased, since B now has more incident edges, and has therefore increased its proportion of the total number of incident edges in the graph.

I was not at all surprised to find a great many discussions and (sometimes heated) arguments about the way PageRank works. Just Google it! The really weird stuff shows up when the search is constrained to discussions talking about outgoing links.

This investigation started me thinking about alternative algorithms for ranking results. In Google, the PageRank of page A utilises the PageRanks of all the pages that link to A. It occurred to me that not all pages linking to A would be relevant to the current search. A possible improvement to the PageRank algorithm might be to use only those pages linking to A which also appear in the query results. For instance, if my query is about "fluffy white dogs", and page A is a hit for this query, then only pages which were also a hit for this query and which link to A should be used in the calculation of A’s PageRank for this particular query. Why should a page which links to A for some other reason, say because A also discusses "lazy ginger cats", be included in the ranking of A for this query? Surely this adjustment to the algorithm would improve the ordering of results in Google. The one reason I can think of not to do this is that PageRank would be calculated at the time of query rather than offline, meaning that results would be delayed slightly longer. Mind you, it’s entirely possible that Google already does something like this, because there’s no doubt that the PageRank algorithm must have been updated and modified since 1998!

The question still remains: how do I go about adjusting our paper?

Categories
Random observations

<em>The Thin Line</em>

Welcome to The Thin Line: my new look weblog. I hope you like it. The new name was chosen for a whole swag of reasons, which I won’t bother outlining here. If you’re interested in the reasons, I’m only an e-mail away.

Categories
Random observations

Valid RSS

While I was at it, I decided I may as well add a link to show that my RSS generator conforms to the 2.0 standard. Click the RSS heart in the left margin. I’m not sure why I’m updating the blog template, because I’m about to give it a major overhaul.

Categories
Random observations

Standards compliance

You will notice a few little images that I’ve added in the margin on the left hand side. The first of these relates to copyright. Clicking on it will take you to the Creative Commons license under which the material appearing in this blog is licensed. The second image is a link and allows you check whether the blog page you are viewing conforms to the XHTML 1.0 standard. The template from which this blog is created conforms to the XHTML 1.0 standard. However, from time to time, I may slip up when creating a blog entry (actually, this will happen regularly; try checking it right now). The third image is a link which validates the style file that this blog uses. These two image links have been put here to encourage you to use a browser that conforms to these standards (like this one) if this blog is not being rendered properly by your current browser.

Categories
Random observations

Johnny Warren, rest in peace

Australian soccer’s favourite son, Johnny Warren, has passed away at the age of 61 after a battle with cancer. Anybody who has any connection to the local soccer scene will know what a sad loss this is. Johnny Warren was the captain of the only Australian team ever to play in the World Cup. But for the past several years he has featured as a commentator on SBS. It’s hard to come to terms with the fact that The World Game on SBS will be without Johnny Warren from now on. There is nobody, nobody who’s done more for the game in this country than Johnny. He was a tireless champion for bringing about change to the domestic league, and the new national competition is the fruit of his labour. I hope that in the years to come, a successful A-League will be seen as Johnny Warren’s legacy. Johnny Warren lived to witness the launch of the new league. May he rest in peace.

Categories
Random observations

DSTC update

Well, it looks like nobody will be fired from DSTC just yet. Funding doesn’t run out until around July next year.

Categories
Random observations

School mistaken for target range

Friendly fire during actual combat is one thing, but this is just completely insane!