Summary
In this chapter, we went through a dimensional data cleansing exercise and learned that Python can be used to call Domo APIs directly to do create, read, update, and delete (CRUD) operations on Domo datasets. We saw that there is a Python package called pydomo
that makes it easy to do. Then, we used standard packages such as pandas
and fuzzywuzzy
to do some fancy work on cleaning up the LeadSource dimension. We even made an iterative pass after adjusting our matching criteria based on data profiles in Domo to further reduce the long-tail values of dimension values. It doesn't take much imagination to see how this process could be generalized across multiple dimensions and run on a schedule to scan new rows that have not been cleansed to improve data quality in a dramatic fashion.
In the next chapter, we will explore one of the Domo platform's machine learning (ML) capabilities.