Building data cleaning and profiling jobs with DataBrew
AWS Glue DataBrew is a no-code data preparation tool that simplifies data profiling, cleansing, and validation, making it an excellent choice for data engineers looking to automate data quality checks. In this recipe, we’ll use DataBrew to perform data profiling and PII detection.
Getting ready
This recipe assumes that you have a dataset in S3 for testing out DataBrew.
How to do it…
- Navigate to the AWS DataBrew console, click on Projects | Create project, and finally click on Provide project name.
Figure 7.15 – Clicking on Create Project
- Select the appropriate dataset that you would like to use for building the DataBrew project:
Figure 7.16 – Selecting a relevant dataset
- In the Permission section, select the appropriate name under Role name, then click on Create project. If you have not created an IAM role...