Scaling your cluster based on workload
The main benefit of running on the cloud compared to on-premises is the access to virtually endless capacity. When running EMR workloads, you don’t want to just have resources available but also to only pay for them when needed to be cost-effective.
In this recipe, you will see how EMR can effortlessly allow you to scale your cluster capacity based on the workload.
Getting ready
This recipe assumes that you have set up the SUBNET
environment variable as indicated in the Technical requirements section at the beginning of this chapter.
How to do it...
- Create a cluster with autoscale and idle timeout (make sure you use
\
only at the end of the lines indicated; the second command will print the cluster ID):CLUSTER_ID=$(aws emr create-cluster --name AutoScale\ --release-label emr-7.1.0 --use-default-roles \ --ec2-attributes SubnetId=${SUBNET} \ --auto-termination-policy IdleTimeout=900 \ --applications Name=Spark --instance...