Setting up Hadoop on Amazon Web Services
There are many ways to set up a Hadoop cluster. We can install Hadoop on a single server in pseudo-distributed mode to simulate a cluster, or on an actual cluster of servers, or virtual machines in fully distributed mode. There are also several distributions of Hadoop available from the vanilla open source version provided by the Apache Foundation to commercial distributions such as Cloudera, Hortonworks, and MapR. Covering all the different ways of setting up Hadoop is beyond the scope of this book. We instead provide instructions for one way to set up Hadoop and other relevant tools for the purpose of the examples in this chapter. If you are using an existing Hadoop cluster or setting up one in a different way, you might have to modify some of the steps.
Note
Because Hadoop and its associated tools are mostly developed for Linux/Unix based operating systems, the code in this chapter will probably not work on Windows. If you are a Windows user, follow...