Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Geospatial Data Analytics on AWS

You're reading from   Geospatial Data Analytics on AWS Discover how to manage and analyze geospatial data in the cloud

Arrow left icon
Product type Paperback
Published in Jun 2023
Publisher Packt
ISBN-13 9781804613825
Length 276 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Scott Bateman Scott Bateman
Author Profile Icon Scott Bateman
Scott Bateman
Jeff DeMuth Jeff DeMuth
Author Profile Icon Jeff DeMuth
Jeff DeMuth
Janahan Gnanachandran Janahan Gnanachandran
Author Profile Icon Janahan Gnanachandran
Janahan Gnanachandran
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Part 1: Introduction to the Geospatial Data Ecosystem
2. Chapter 1: Introduction to Geospatial Data in the Cloud FREE CHAPTER 3. Chapter 2: Quality and Temporal Geospatial Data Concepts 4. Part 2: Geospatial Data Lakes using Modern Data Architecture
5. Chapter 3: Geospatial Data Lake Architecture 6. Chapter 4: Using Geospatial Data with Amazon Redshift 7. Chapter 5: Using Geospatial Data with Amazon Aurora PostgreSQL 8. Chapter 6: Serverless Options for Geospatial 9. Chapter 7: Querying Geospatial Data with Amazon Athena 10. Part 3: Analyzing and Visualizing Geospatial Data in AWS
11. Chapter 8: Geospatial Containers on AWS 12. Chapter 9: Using Geospatial Data with Amazon EMR 13. Chapter 10: Geospatial Data Analysis Using R on AWS 14. Chapter 11: Geospatial Machine Learning with SageMaker 15. Chapter 12: Using Amazon QuickSight to Visualize Geospatial Data 16. Part 4: Accessing Open Source and Commercial Platforms and Services
17. Chapter 13: Open Data on AWS 18. Chapter 14: Leveraging OpenStreetMap on AWS 19. Chapter 15: Feature Servers and Map Servers on AWS 20. Chapter 16: Satellite and Aerial Imagery on AWS 21. Index 22. Other Books You May Enjoy

Storing geospatial data in the cloud

As you learn about the possibilities for storing geospatial data in the cloud, it may seem daunting due to the number of options available. Many AWS customers experiment with Amazon Simple Storage Service (S3) for geospatial data storage as their first project. Relational databases, NoSQL databases, and caching options commonly follow in the evolution of geospatial technical architectures. General GIS data storage best practices still apply to the cloud, so much of the knowledge that practitioners have gained over the years directly applies to geospatial data management on AWS. Familiar GIS file formats that work well in S3 include the following:

  • Shapefiles (.shp, .shx, .dbf, .prj, and others)
  • File geodatabases (.gdb)
  • Keyhole Markup Language (.kml)
  • Comma-Separated Values (.csv)
  • Geospatial JavaScript Object Notation (.geojson)
  • Geostationary Earth Orbit Tagged Image File Format (.tiff)

The physical location of data is still important for latency-sensitive workloads. Formats and organization of data can usually remain unchanged when moving to S3 to limit the impact of migrations. Spatial indexes and use-based access patterns will dramatically improve the performance and ability of your system to deliver the desired capabilities to your users.

Relational databases have long been the cornerstone of most enterprise GIS environments. This is especially true for vector datasets. AWS offers the most comprehensive set of relational database options with flexible sizing and architecture to meet your specific requirements. For customers looking to migrate geodatabases to the cloud with the least amount of environmental change, Amazon Elastic Compute Cloud (EC2) virtual machine instances provide a similar capability to what is commonly used in on-premises data centers. Each database server can be instantiated on the specific operating system that is used by the source server. Using EC2 with Amazon Elastic Block Store (EBS) network-attached storage provides the highest level of control and flexibility. Each server is created by specifying the amount of CPU, memory, and network throughput desired. Relational database management system (RDBMS) software can be manually installed on the EC2 instance, or an Amazon Machine Image (AMI) for the particular use case can be selected from the AWS catalog to remove manual steps from the process. While this option provides the highest degree of flexibility, it also requires the most database configuration and administration knowledge.

Many customers find it useful to leverage Amazon Relational Database Service (RDS) to establish database clusters and instances for their GIS environments. RDS can be leveraged by creating full-featured database Microsoft SQL Server, Oracle, PostgreSQL, MySQL, or MariaDB clusters. AWS allows the selection of specific instance types to focus on memory or compute optimization in a variety of configurations. Multiple Availability Zone (AZ)-enabled databases can be created to establish fault tolerance or improve performance. Using RDS dramatically simplifies database administration, and decreases the time required to select, provision, and configure your geospatial database using the specific technical parameters to meet the business requirements.

Amazon Aurora provides an open source path to highly capable and performant relational databases. PostgreSQL or MySQL environments can be created with specific settings for the desired capabilities. Although this may mean converting data from a source format, such as Microsoft SQL Server or Oracle, the overall cost savings and simplified management make this an attractive option to modernize and right-size any geospatial database.

In addition to standard relational database options, AWS provides other services to manage and use geospatial data. Amazon Redshift is the fastest and most widely used cloud data warehouse and supports geospatial data through the geometry data type. Users can query spatial data in Redshift’s built-in SQL functions to find the distance between two points, interrogate polygon relationships, and provide other location insights into their data. Amazon DynamoDB is a fully managed, key-value NoSQL database with an SLA of up to 99.999% availability. For organizations leveraging MongoDB, Amazon DocumentDB provides a fully managed option for simplified instantiation and management. Finally, AWS offers the Amazon OpenSearch Service for petabyte-scale data storage, search, and visualization.

The best part is that you don’t have to choose a single option for your geospatial environment. Often, companies find that different workloads benefit from having the ability to choose the most appropriate data landscape. Combining Infrastructure as a Service (IaaS) workloads with fully managed databases and modern databases is not only possible but a signature of a well-architected geospatial environment. Transactional systems may benefit from relational geodatabases, while mobile applications may be more aligned with NoSQL data stores. When you operate in a world of consumption-based resources, there is no downside to using the most appropriate data store for each workload. Having familiarity with the cloud options for storing geospatial data is crucial in strategic planning, which we will cover in the next topic.

You have been reading a chapter from
Geospatial Data Analytics on AWS
Published in: Jun 2023
Publisher: Packt
ISBN-13: 9781804613825
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image