Running jobs using AWS EMR serverless
EMR serverless was created for the common case where the user just wants to run Spark and Hive jobs without having to worry about the type of nodes, capacity, and configuration.
For such cases, EMR serverless really simplifies the operation, since it does not require a cluster to be configured or maintained. You do not have to worry about which kind of EC2 is the right one or whether it is going to be present (and available in enough capacity) for your chosen region and Availability Zone. The main trade-off is that you can no longer ssh into nodes to do low-level administration and troubleshooting.
In this recipe, you will see how simple it is to run a Spark application using EMR serverless.
Getting ready
To test serverless, you will need a sample script to run. The following script is a basic example that accesses the Glue catalog from the EMR serverless job. In the shell, run the following command to create a Python file with the...