A quick timeline of Hadoop research

Hadoop for the research users, a quick time-line of how things have been progressed right from the year 2004 to till date. A quick timeline of hadoop is discussed below.


Initial versions of what is now HDFS (Hadoop Distributed File System) and MapReduce implemented by Doug Cutting and Mike Cafarella.

December, 2005:

Nutch an open source search engine implementation from Apache, ported to the new framework Hadoop, runs reliably on 20 nodes.

January, 2006:

Doug Cutting joins Yahoo! grid team.

February, 2006:

Hadoop becomes Apache top level project to support the standalone development of MapReduce and HDFS. Adaption of Hadoop by Yahoo! grid team.

April, 2006:

Sort benchmark (10GB/node ) run on 188 nodes in 47.9 hours.


Sort benchmark run on 500 nodes in 42 hours with better hardware than April benchmark.


Research cluster reaches 600 nodes.

December, 2006:

Sort benchmark run on 20 nodes in 1.8 hours, 100 nodes in 3.3 hours.

January, 2007:

Research cluster reaches 900 nodes.


2-Research clusters with reaching each of 1000nodes.


Won the 1 terabyte sort benchmark in 209 seconds on 900 nodes.


Loading 10 terabytes of data per day on to research clusters.


17 clusters with a total of 24,000 nodes.

April, 2009:

Won the minute sort by sorting 500 GB in 59 seconds on 1,400 nodes.



Leave a Comment