Categories
Innovation

MapReduce

Last week I read a 2004 paper called MapReduce: Simplified Data Processing on Large Clusters. It was written by a couple of Google researchers, and details a simple programming model and library for processing large datasets in parallel. MapReduce is used by Google under the hood for lots of different things, from indexing to machine learning to graph computation. Very handy indeed.

So imagine my surprise to find in last Friday’s edition of ACM TechNews that this paper has been republished in Communications of the ACM this month, albeit in a slightly shorter form. Aside from a few cosmetic changes (updated figure and table), the content of the papers is the same. That is, you don’t gain any knowledge from reading one of the papers that you wouldn’t gain from reading the other. There is no indication in the more recent publication that so much content has been duplicated from an earlier paper, though there is a citation to the older paper. In short, this is not new material, having been first published more than three years ago. Communications of the ACM seems to be trialling a new model, whereby the best articles from conferences are modified and republished for the ACM audience. But seriously, the modifications in the republished MapReduce article are negligible. What gives?

By ricky

Husband, dad, R&D manager and resident Lean Startup evangelist. I work at NICTA.

1 reply on “MapReduce”

To be fair, if you read Vardi’s piece (p44-48), he does explain where they’re trying to go with this section. I don’t think it’s necessarily a bad idea in itself (reprints of “important” papers for wider distribution), and it’s certainly a cut above CACM’s typical content of late, but I agree that a simple “An earlier version of this paper originally appeared in journal-name vol xx” or equivalent subheading would have been nice.

Although you know, there’s a sad irony that CACM is reprinting a 2004 paper in an attempt to be “cutting edge”, especially one which most practicioners have probably already seen…

Leave a Reply

Your email address will not be published. Required fields are marked *