Running your AWS EMR cluster on EKS
EMR offers the option of running a fully managed Spark cluster leveraging Kubernetes.
Instead of running a full Hadoop cluster with YARN and HDFS, EMR on Elastic Kubernetes Service (EKS) is a lightweight solution to run Spark applications with a reduced start time, better resource utilization, and better availability. It allows running a cluster with nodes from different Availability Zones or even AWS Outposts (on the customer data center).
Getting ready
You need to have installed both the AWS and EKS CLI client tools. You can follow the AWS instructions to install them on your machine depending on your OS:
- AWS CLI: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
- EKS CLI: https://github.com/weaveworks/eksctl
You can verify that both tools are installed by running the following commands:
aws --version eksctl version
If you haven’t already set up the credentials and region to use...