Monitoring your cluster
The convenience of using EMR to create fit-for-purpose, discardable clusters has significantly reduced the maintenance needs for Hadoop clusters compared to long-lived, multitenant on-premises clusters.
However, there is still a need to monitor how the cluster is doing in detail, in cases where you need to optimize the use or troubleshoot an issue. For instance, you might wonder what the limiting factor to performance is: is it the CPU, memory, network, disk, or something else?
In this recipe, you will see how to go deep into the cluster metrics and monitoring tools that it provides out of the box.
Getting ready
To carry out this recipe, you need to set up the SUBNET
, S3_LOGS_URL
, and KEYNAME
shell environment variables (see the Technical requirements section at the beginning of this chapter to learn how to set them up).
To complete the recipe, you will need a SOCKS5 proxy in your browser to access the cluster. Follow the instructions depending...