Data Quality Management
Unreliable data can lead to incorrect insights, misguided business decisions, and a significant loss of resources. As organizations treat data as a product and rely more on data freshness, data engineers and analysts must implement robust data quality control mechanisms to ensure the data’s accuracy, completeness, consistency, and reliability to maintain high data quality standards.
In this chapter, we will explore various methods and tools available on AWS for maintaining data quality. We’ll provide step-by-step recipes to help you implement these tools effectively in your data engineering workflows. The recipes will guide you through practical examples, starting with data quality control using AWS DataBrew, Deequ, and Glue. Before diving into the chapter, it is important to work with your stakeholders to build a data quality control framework and an SLA for your data quality. When you lead a data quality project, besides identifying the data...