Redshift AQUA
Advanced Query Accelerator (AQUA) is a feature of Redshift that allows you to perform analysis with massive parallelization. It’s also a good feature for our previous geohash scenario. AQUA would be a good fit if we needed to read the entire planet or basically all or most geohashes that are stored as folders in S3. AQUA works in combination with SQL LIKE
queries. You would write a query that says select ID where GEOHASH LIKE %A%
, and Redshift would spin up multiple smaller compute instances that would go out and pull the data in parallel, combine it, and then return it. This is a similar approach to how Hadoop clusters use parallelization on big datasets to improve speed in performance. You take a massive amount of data, split it up, set individual nodes to pull and process the data in parallel, and then aggregate the results when they are finished. Having a capability such as this is incredibly powerful when it’s natively built into the data warehouse...