Data lineage and data publication
Data lineage and data publication are critical aspects of data management. Data lineage is the process of tracking data as it moves from its origin to its consumption, and AWS Glue offers a data lineage feature, though it is still in the early stages of development. Compared to enterprise tools such as Talend and Collibra, AWS Glue’s data lineage feature has limited integration with external services and provides limited information about datasets and recipes at each stage. Despite its limitations, the data lineage feature in AWS Glue DataBrew is a valuable addition, and it is hoped that it will incorporate more features in the future.
Figure 2.74: Data lineage in AWS Glue DataBrew