Parameterizing jobs to make them more flexible and reusable
A job without any parameters normally does the same task in each run, with specific data sources and destinations. Using parameters, you can reuse the same job on different data sources or destinations, both to run recurring jobs on new data or to reuse the same logic for different purposes, such as data transformation or cleaning.
For instance, the same type of data comes from various sources but needs the same processing in a centralized data store.
Glue allows you to define your parameters for your own purposes, which you then can use in your script. You can set default values on the job and then override them as needed for each run when starting a job run manually using the console, the AWS CLI, or an API such as boto3
or the Java SDK.
Getting ready
For this recipe, you need to follow the instructions in the Technical requirements section at the beginning of the chapter to create a role for Glue.