Apache Hadoop
Apache Hadoop is an open-source software system that allows for the distributed storage and processing of very large datasets. It implements the MapReduce framework.
The system includes these modules:
- Hadoop Common: The common libraries and utilities that support the other Hadoop modules
- Hadoop Distributed File System (HDFS™): A distributed filesystem that stores data on commodity machines, providing high-throughput access across the cluster
- Hadoop YARN: A platform for job scheduling and cluster resource management
- Hadoop MapReduce: An implementation of the Google MapReduce framework
Hadoop originated as the Google File System in 2003. Its developer, Doug Cutting, named it after his son's toy elephant. By 2006, it had become HDFS, the Hadoop Distributed File System.
In April of 2006, using MapReduce, Hadoop set a record of sorting 1.8 TB of data, distributed in 188 nodes, in under 48 hours. Two years later, it set the world record by sorting one terabyte of data in 209...