Yay, I’m officially not a nerd! A couple of years ago I’d have been devastated at this news. But I’m actually very comfortable with this at the moment. :) (Says he who has created his own automated blogging system. Hmm…)
I’m trying to get myself motivated about work again. I’m feeling less gloomy right now because I’m toying with an idea to make distributed hashtable protocols like Chord more scalable by using what we know about scale-free networks. If I could crystalise my idea, my service discovery protocol will benefit from this better scalability. The idea hinges on allowing very capable nodes (i.e. those nodes connected via high-bandwidth links and with abundant computational and storage resources) to be able to store more than one entry for each interval in the finger table. This enables these nodes to become highly connected, meaning that the average number of nodes contacted to deliver a message decreases, because there is a high chance that the highly connected nodes can deliver the message directly or to a node which is closer to the target than would otherwise have been possible. In other words, it is possible to more than halve the distance to the target on each hop. The problem then becomes a question of allowing these highly connected nodes to find nodes within a particular identifier interval in a scalable manner, and enabling other nodes to determine which are the highly connected nodes whose IDs are less than or equal to the target key/identifier. I understand that this probably all sounds like gibberish to anyone not familiar with DHTs, but it makes sense to me!
So is there a scalable solution to those two problems? Aside note: Holy crap! My ICON paper just got accepted! Do I seem surprised? I kind of am. I never expected my publication record to be 3 from 3! Yay! So, scalable DHTs. Yes. Um… Capable nodes can simply cache the addresses of nodes that it contacts during the routing process. Eventually, a node will cache a sizeable proportion of node addresses within each identifier interval. This is scalable in terms of bandwidth consumption because it requires no further messages than are already used. But how do other nodes find this highly connected node? Perhaps, in response to lookup operations, the responding node can return the address of the most highly connected node that it is aware of within the relevant interval as well as the node whose ID is closest to the target. If the node knows of no such peer, it returns only the address of the next hop node. It’s hard to get an idea of just how quickly this information will be propagated through the network. It might be very slow. On the upside though, this solution requires just a few more bytes in each lookup response message.