Dimensions of data quality
As emphasized earlier, superior data quality forms the foundation upon which informed decisions and strategic insights are built. With this in mind, let us now examine which Key Performance Indicators (KPIs) we could use to measure the data quality of our assets.
Completeness
Completeness measures the extent to which data is complete and lacks missing values or fields. KPIs can include metrics such as the percentage of missing data or missing data points per record.
The following code will output the completeness percentages for each column in your dataset. A higher percentage indicates a higher level of completeness, while a lower percentage suggests more missing values:
- We’ll start by importing the
pandas
library to work with the dataset:import pandas as pd
- Next, we create a sample dataset with the following columns:
Name
,Age
,Gender
, andCity
. Some values are intentionally missing (represented asNone
):data = { Â Â ...