Creating ETL jobs visually using AWS Glue Studio
Typical ETL tasks consist of moving data from one data storage to another, with some simple transformations in the process. For such cases, building a job using a visual data diagram allows users without coding skills to develop such a pipeline, using their knowledge of the business and the data. These kinds of jobs are also easier to maintain and update, thus reducing the total cost of ownership (TCO).
One of the multiple types of jobs that AWS Glue Studio allows for creating is a visual job. This allows the user to define the pipeline as a graph of nodes, and then the code is generated automatically so that it runs like a regular script job.
Getting ready
Create a bucket and a role as indicated in the Technical requirements section.
To follow this recipe, you require a CSV file with headers and some data, which can be your own data or the sales_sample.csv
file provided on the code repository: https://github.com/PacktPublishing...