PROCs for understanding data
SAS warehouse developers become accustomed to receiving datasets from other systems, then performing extract, transform, and load (ETL) procedures on these datasets to load them into the warehouse. But in order to set up the ETL procedures, an analyst has to explore and understand the data. Values in each column need to be examined, as well as other characteristics of the dataset, in order for ETL code to be developed.
This section introduces several helpful approaches for exploring these datasets. First, we talk about ways to use PROC CONTENTS
to understand new datasets. Next, the role of codebooks in providing documentation about the variables is discussed. Variables can be annotated with SAS labels, and levels of categorical variables can be annotated with user-defined SAS formats. In addition, native SAS formats can be applied to improve the display of data. Different strategies for applying labels and formats are demonstrated.