Summary
In this chapter, we learned about the power of Hadoop and distributed processing and how EMR makes it easy to manage, deploy, and automate these clusters. We also learned how to create a cluster in EMR, load a simple geospatial library, and import some geospatially enriched JSON data. You would never be able to load every building on the planet into a SQL query in a database but it is theoretically possible to do it in Hadoop. It is still early days for geospatial analytics with Hadoop. GeoAnalytics by Esri was only recently released and Sedona is also in its early days with its v1.0 release. Luckily, your timing is perfect as both solutions have been industry-proven and are ready for production workloads on EMR. A good rule of thumb with big data processing is if you start having issues storing all of your data in your database, it’s probably a good idea to start looking into building a data lake. Once the data is outside the database, you can still query it as if it...