Common Hadoop frameworks
I’ve mentioned a few frameworks already such as PySpark, Spark, and Apache Pig, but there are hundreds more that do just about everything under the sun. You have data storage and database frameworks such as Hadoop Distributed File System (HDFS), NoSQL and SQL capabilities such as HBase and Hive, and machine learning with Mahout, to name a few. I have a good example of how these services came to be when I was first learning Hadoop. Over a decade ago, I got frustrated trying to run a Map Reduce job and stumbled across Apache Pig. Apache Pig (which was named after Pig Latin) was built to be a simple analytics language syntax for Hadoop and was an attempt to make it easier for users who didn’t know Java. This is also similar to what happened with PySpark, where the Hadoop users wanted a familiar language to work with Spark, so PySpark was born.
These frameworks are somewhat in the process of being disrupted with the release of Spark. Traditionally...