Definition

Hadoop is free programming framework, based on Java. Within a distributed environment, Hadoop extends support to the processing of huge data sets. It is a component of the Apache project that has been sponsored and funded by Apache Software Foundation.

The designing and development of all the modules of Hadoop follow the assumption that the failure of hardware is a common problem and the framework of system must have the ability to handle such failures automatically.

The core components of Apache Hadoop include the following: storage part, which is referred to as the Hadoop Distributed File System (HDFS); a processing part which is called the MapReduce. In the course of running, the files are divided into large blocks by Hadoop and these are then transferred to different nodes that exist in a cluster.

In order to process data, the packaged code associated with nodes are transferred by Hadoop to be processed parallel. This process depends on the type of the data that is to be processed by the system. The approach taken to process data by Hadoop benefits from the concept of data locality — under which the nodes have an ability to manipulate data to which they have an access.

This allows the data to be processed in a more efficient and faster manner as compared to the traditional supercomputer architecture, which depends on a parallel file system in which the distribution of computation and data takes place over very high speed networks.

History of Hadoop

The source that provided the idea and initiated the development of Hadoop was the Google File System paper. This paper was published in October 2003. This research paper also gave way to the development of another researcher paper, named Google – MapReduce: Simplified Data Processing on Large Clusters, by Google.

The development work was first done on the Apache Nutch project, but the work was then transmitted to the development of a new sub project called Hadoop in the year 2006. The name of the project was taken from the toy elephant of Doug Cutting’s son. Doug Cutting was an employee of Yahoo! During that time. The preliminary code of Hadoop was taken from Nutch.

Owen O’Malley was the first committer who became a part of Hadoop in March 2006. The 0.10 version of Hadoop was formally released in the month of April, 2006. The project has continued to develop and evolve since then. These developments are attributable to the work done by the contributors that are associated with Apache Hadoop project.

Job profiles that require this skill