Using Geospatial Data with Amazon EMR
In the previous chapter, we learned about machine learning with Amazon SageMaker, a powerful service for creating, testing, and tuning machine learning algorithms. In this chapter, we will learn about Elastic Map Reduce (EMR), which is essentially a managed Hadoop cluster. Hadoop is a powerful framework for the massively parallel processing of data. This ability is unique to the Hadoop architecture and is the only way to efficiently query petabytes of data using commodity hardware. Hadoop is an interesting community project that is really made up of hundreds of plug-and-play widgets. There is also a service on Hadoop to do machine learning called Mahout, as well as Spark ML. In this chapter, we will walk through a quick overview of Hadoop, EMR, and a demo for launching EMR and visualizing geospatial data.
This chapter covers the following topics:
- Introducing Hadoop
- Common frameworks
- Geospatial with EMR