Understanding Redshift partitioning
Redshift supports a couple of different partitioning models or, as Redshift calls them, distribution styles in a table – AUTO, EVEN, KEY, and ALL. Many first-time Redshift users simply ignore this optional parameter when they create their first tables, but this option can be the single most important attribute in your entire cluster. When you create a table, if this parameter is not set, it will default to AUTO, and Redshift will try and guess the correct style. So, let’s dive into these different distribution styles and their considerations.
One of the easiest to understand but potentially the most impactful distribution style is the ALL option. The significance of choosing the ALL option is that it will replicate the entire dataset across every node. A Redshift database can have up to 128 nodes in a cluster. If you have a table that is one petabyte in size and you choose the ALL option, you now have a 128-PB dataset. This is pretty...