A quick timeline of Hadoop research

Hadoop for the research users, a quick time-line of how things have been progressed right from the year 2004 to till date. A quick timeline of hadoop is discussed below.

2004:

Initial versions of what is now HDFS (Hadoop Distributed File System) and MapReduce implemented by Doug Cutting and Mike Cafarella.

December, 2005:

Nutch an open source search engine implementation from Apache, ported to the new framework Hadoop, runs reliably on 20 nodes.

January, 2006:

Doug Cutting joins Yahoo! grid team.

February, 2006:

Hadoop becomes Apache top level project to support the standalone development of MapReduce and HDFS. Adaption of Hadoop by Yahoo! grid team.

April, 2006:

Sort benchmark (10GB/node ) run on 188 nodes in 47.9 hours.

May,2006:

Sort benchmark run on 500 nodes in 42 hours with better hardware than April benchmark.

October,2006:

Research cluster reaches 600 nodes.

December, 2006:

Sort benchmark run on 20 nodes in 1.8 hours, 100 nodes in 3.3 hours.

January, 2007:

Research cluster reaches 900 nodes.

April,2007:

2-Research clusters with reaching each of 1000nodes.

April,2008:

Won the 1 terabyte sort benchmark in 209 seconds on 900 nodes.

October,2008:

Loading 10 terabytes of data per day on to research clusters.

March,2009:

17 clusters with a total of 24,000 nodes.

April, 2009:

Won the minute sort by sorting 500 GB in 59 seconds on 1,400 nodes.

hadoop-history
Hadoop

 

Leave a Comment