Summary
This chapter provided a short history of SAS, focusing on how it has been used for data storage and analysis over the years. Initially, SAS data was stored on punch cards. Once data became electronic, the main challenge to SAS users working with big data was I/O. As SAS environments evolved from being on mainframes to being accessible by PCs, SAS developed new products and services to complement its core analytics and data management functions.
SAS data steps are procedural, and allow the programmer opportunities to greatly improve I/O through the use of certain commands, features, and approaches to programming. When SQL became popular, PROC SQL
was invented. This allowed SAS users to choose between using data steps or SQL commands when managing data in SAS.
Today, SAS is still used in data warehousing, but there are new challenges with accessing data in the cloud. SAS data warehouses today can include predominantly SAS components, such as SAS VA and CAS. Or, SAS can be part of a warehouse system that includes other components and applications, such as cloud storage in Snowflake, and supplemental analytic functions provided by R.
Modern SAS data warehousing still seeks to improve I/O and to better serve warehouse users through the development of an efficient system that meets customer needs. Creativity is required in the design of modern SAS data warehouses so that the system can leverage the best SAS has to offer while avoiding its pitfalls.
Although this chapter covers the entire history of SAS for data storage, it is important for the new data scientist to understand this information because the way SAS runs today can often be explained by certain events in its history. Particular terminology and features that are unique to SAS arise from how it has evolved over time, and it is helpful to know this background when communicating with today's SAS data warehouse developers and data scientists.
The next chapter takes a sharp focus on the act of reading data into SAS and will close with strategies that can be used when importing difficult data into SAS.