Unit testing your data quality using Deequ
Amazon Deequ is an open source data quality library developed internally at Amazon. The purpose of Deequ is to unit test data before feeding it to analytics use cases. Several analytics products such as DataBrew and Glue Data Quality were built upon the Deequ library to help serve the needs of data engineers and data scientists. See the Deequ GitHub page (https://github.com/awslabs/deequ) for more information.
In the previous recipe, we learned about Glue Data Quality. There are several key considerations when choosing between AWS Glue Data Quality and Deequ:
- Managed service versus open source library: AWS Glue Data Quality is a fully managed service built on top of the open source Deequ framework. Deequ is an open source library that you can use to implement data quality checks in your applications. Also, since Deequ is an open source library, there are metrics that might be available on Deequ but are not (yet) available on AWS...