in Essays

Hadoop Doesn’t Solve All Problems

Hadoop is a hot topic in Silicon Valley these days. Walk into any coffee shop and you’ll likely hear people discussing Hadoop. More companies are adopting it. Some companies, like Cloudera, are built solely on top of it.

Here at Zillabyte, we gratefully use Hadoop. Hands down, it’s helped us build our product quickly and efficiently. However, it is important to note that Hadoop is not the end-all-be-all for distributed processing. In fact, Hadoop only solves a small slice of the solvable-problem universe.

Map-reduce (and by extension, Hadoop) has limitations. Map-reduce performs poorly on algorithms that rely on intra-data relationships. For example, clustering algorithms are supposed to find geometric regions of data. To pull this off, the algorithm must effectively compare every data point with every other data point. These intra-data relationships are the death nail for Hadoop. Map-reduce fundamentally struggles to compare datapoints with other datapoints.

Consider another example: recommendations. A recommendation engine is an implementation of a clustering algorithm. Although it’s possible to run this on Hadoop, our experience has shown that it takes six times longer than a non-Hadoop implementation.

While we gratefully use Hadoop, we’re cognizant of its limits. Other solutions need to be found for different types of problems.