Integrating AWS Glue and Git version control
AWS Glue is a serverless data integration service that offers different engines and tools for different personas involved in data engineering. Git is the industry standard source code version control system. Integrating both enables version handling on Glue jobs and improves DevOps in general.
In this recipe, you’ll learn how to save and retrieve the status of a Glue job on a Git repository provided by AWS CodeCommit. This feature is supported for other kinds of jobs, including notebooks, and is also available directly on the AWS console. See the There’s more… section for further details.
Getting ready
To complete this recipe, you need a command line bash with the AWS CLI set up, as indicated in the Technical requirements section at the beginning of this chapter. The AWS user needs permission to use AWS CodeCommit.
How to do it…
- Set up a placeholder Python script on S3 for the Glue job:
BUCKET_NAME...