Chapter 7
An analyst gains knowledge of what each variable means and what each level of each categorical variable means in a dataset from reviewing the data dictionary for that dataset. If the dataset is derived from survey responses, a codebook should also be included in the warehouse data curation.
When preparing continuous data for processing into a data warehouse, it is necessary to know all possible values in the continuous variables, and that information is not available in
PROC UNIVARIATE
output. UsingPROC FREQ
on a continuous variable allows the analyst the ability to view the existence and distribution of every single value in the continuous variable. Although the output is very long, viewing the top and bottom of the output (extreme values) can provide instructive information about how to optimally process the variable into the warehouse.It is helpful to plan transformed variables in a data dictionary before creating ETL code for them for a few reasons. First...