Designing a partition strategy for Azure Synapse Analytics
We learned about Azure Synapse Analytics in Chapter 2, Designing a Data Storage Structure. Synapse Analytics contains two compute engines, outlined here:
- A Structured Query Language (SQL) pool that consists of serverless and dedicated SQL pools (previously known as SQL Data Warehouse)
- A Spark pool that consists of Synapse Spark pools
But when people refer to Azure Synapse Analytics, they usually refer to the Dedicated SQL pool option. In this section, we will look at the partition strategy available for Synapse Dedicated SQL pool.
Note
We have already briefly covered partitioning in Spark as part of the Data pruning section in the previous chapter. The same concepts apply to Synapse Spark, too.
Before we explore partitioning options, let's recap the data distribution techniques of a Synapse dedicated pool from the previous chapter as this will play an important role in our partition strategy...