Common MapReduce misconceptions
MapReduce as a programming paradigm has no doubt attracted a lot of attention and has been mentioned throughout the blogosphere many times, both in a good and bad way. In fact one of the largest database related blogs has criticized the very programming paradigm here and gently backing off of the previous critique here. When I first read the first of these posts I was thought: "MapReduce critiqued by database guys? What the heck?"
The main points of the argument in the posts mentioned above were:
- A giant step backward in the programming paradigm for large-scale data intensive applications
- A sub-optimal implementation, in that it uses brute force instead of indexing
- Not novel at all – it represents a specific implementation of well known techniques developed nearly 25 years ago
- Missing most of the features that are routinely included in current DBMS
- Incompatible with all of the tools DBMS users have come to depend on
My response (or a clarification of misconceptions) to these arguments, as I thought over all of them for a while closely matches the one posted here
- It's one of the most fundamental ways of processing data used in functional programming and one of the most established ones in computer science.
- There's a lot of computational methods that proceed by exhausting the solution space, or going through the whole data sets and they are there for a reason, e.g. branch and bound algorithms are a perfect candidate for running as MapReduce tasks
- Not being novel and being well established is an advantage rather than not
- MapReduce is not a database. It's a programming concept.
- MapReduce is not a database. It's a programming concept.
Although many months passed since I've saw these posts for the first time I am still amazed how you can misjudge a great concept because of simply not spending enough time to read and think about it.
If you think the original MapReduce paper is too hard to understand for you then before making a final judgment maybe you could try reading this tutorial.
March 3rd, 2011 at 6:34 am
MapReduce is not a database. It\’s a programming concept.