Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Geospatial Data Analytics on AWS

You're reading from   Geospatial Data Analytics on AWS Discover how to manage and analyze geospatial data in the cloud

Arrow left icon
Product type Paperback
Published in Jun 2023
Publisher Packt
ISBN-13 9781804613825
Length 276 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Scott Bateman Scott Bateman
Author Profile Icon Scott Bateman
Scott Bateman
Jeff DeMuth Jeff DeMuth
Author Profile Icon Jeff DeMuth
Jeff DeMuth
Janahan Gnanachandran Janahan Gnanachandran
Author Profile Icon Janahan Gnanachandran
Janahan Gnanachandran
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Part 1: Introduction to the Geospatial Data Ecosystem
2. Chapter 1: Introduction to Geospatial Data in the Cloud FREE CHAPTER 3. Chapter 2: Quality and Temporal Geospatial Data Concepts 4. Part 2: Geospatial Data Lakes using Modern Data Architecture
5. Chapter 3: Geospatial Data Lake Architecture 6. Chapter 4: Using Geospatial Data with Amazon Redshift 7. Chapter 5: Using Geospatial Data with Amazon Aurora PostgreSQL 8. Chapter 6: Serverless Options for Geospatial 9. Chapter 7: Querying Geospatial Data with Amazon Athena 10. Part 3: Analyzing and Visualizing Geospatial Data in AWS
11. Chapter 8: Geospatial Containers on AWS 12. Chapter 9: Using Geospatial Data with Amazon EMR 13. Chapter 10: Geospatial Data Analysis Using R on AWS 14. Chapter 11: Geospatial Machine Learning with SageMaker 15. Chapter 12: Using Amazon QuickSight to Visualize Geospatial Data 16. Part 4: Accessing Open Source and Commercial Platforms and Services
17. Chapter 13: Open Data on AWS 18. Chapter 14: Leveraging OpenStreetMap on AWS 19. Chapter 15: Feature Servers and Map Servers on AWS 20. Chapter 16: Satellite and Aerial Imagery on AWS 21. Index 22. Other Books You May Enjoy

Cost management in the cloud

The easiest benefit to realize in a cloud-based geospatial environment is cost savings. While increased agility, hardened resiliency, improved performance, and reduced maintenance effort are also key benefits, they generally take some time to fully realize. Cost reductions and increased flexibility are immediately apparent when you leverage AWS for geospatial workloads. Charges for AWS resources are continuously visible in the AWS console, and the amount of expenditure can be adjusted in real time based on your business needs. In this section of the chapter, we will examine cost management tactics for the following areas:

  • Hardware provisioning
  • Geodatabase servers
  • File-based data
  • Geospatial application servers
  • End user compute services

Right-sizing, simplified

I recall many times in my career when I consternated for days over which server to buy. Buying a server is a big decision, and buying the wrong one can have real consequences. What if we’re more successful than projected and our user count doubles? What if we have more data than estimated? What if my processes consume more resources than expected? These are just a few of the questions that compel organizations to buy bigger servers than necessary. Of course, it makes sense to plan for the future, but what doesn’t make sense is paying for things you don’t use. This problem is only amplified when you bring resiliency and disaster recovery (DR) into the picture. I’ve designed enterprise GIS systems that have completely dormant standby instances for very expensive servers. In an on-premises data center, your "just-in-case" hardware has to be paid for even though it is not used. AWS provides a full range of DR capabilities without additional license or hardware overhead costs.

The elephant in the server room

One of the largest costs in a geospatial environment for many organizations is the relational geodatabase infrastructure. Hefty enterprise license costs, expensive hardware, and dedicated time from a specialist database administrator (DBA) and supporting resources add up quickly. Remember that having a cloned standby environment for critical systems may be required for production workloads. Power, cooling, and networking charges apply for on-premises environments.

A typical concern surrounding RDBMS migration to the cloud is performance, specifically as it relates to scale. The same week that I began working at AWS, a multi-year effort across all Amazon companies was wrapping up. All of Amazon’s internal databases were modernized, many of which were converted from Oracle. Alexa, Amazon Prime, Amazon Prime Video, Amazon Fresh, Kindle, Amazon Music, Audible, Shopbop, Twitch, and Zappos were the customer-facing brands that were part of the migration, resulting in Amazon turning off its final Oracle database in October 2019. The scale of internal databases at Amazon is mind-boggling, but the migration of 75 petabytes was realized with little or no downtime. Resulting reductions in database costs were over 60%, coupled with latency performance improvements of 40%. This project was enormous in scale, and the cost savings have been enormous as well.

Bird’s-eye view on savings

Raster data is being collected with increasing frequency and resolution. Sentinel-2 provides satellite imagery data all over the globe within the most recent 5 days. The quality of the images continues to improve, as does the file size. Storing long histories of file-based data is commonplace as historical images and data may someday be needed. Corporations may have legal obligations to retain the data. Whatever the reason, storing data generates costs. Those costs increase as the size and volume of data increase. Raster geospatial data is notoriously large and commonly stored in enterprise filesystems. When organizations have multiple copies of data for different purposes or departments, the sustained long-term expenses can be exorbitant.

The costs associated with storing large volumes of file-based data in AWS are completely under the customer’s control. Amazon S3 is simple to use and compatible with any geospatial data format. In fact, some formats that we’ll talk more about later in this book perform best in the cloud. Consolidating geospatial data to a platform with fine-grained life cycle options can be a cost game-changer. The data for both performance and cost can be optimized at the same time using an S3 Lifecycle configuration. These storage classification rules will price data differently based on the usage pattern. A great example of geospatial data is Extract, Transform, and Load (ETL) staging datasets. Processing jobs may leave behind transient datasets as large as the source data, and possibly multiple copies of them for each process run. Historical data may be accessed frequently for dates within the most recent month, but rarely for older data. Another great use case for an S3 Lifecycle configuration is data that is meant to be archived initially for the lowest long-term storage cost.

Amazon S3 provides automated rules that move files between various pricing models. The rules are customer-defined and can be changed at any time in the AWS console. Using just a few simple clicks, it is possible to realize massive storage cost savings. Most geospatial data brought into AWS starts in the S3 Standard storage class. This feature-rich, general-purpose option provides 99.999999999% (11 9s) of durability for a few cents per GB per month. While this is affordable, the S3 Glacier Deep Archive storage class is designed for long-term archives accessed infrequently for just 0.00099 per GB per month. IT backups of geospatial databases and filesystems are prime candidates for S3 Glacier Deep Archive. Details of each storage class in between, associated use cases, and pricing are available on the S3 pricing page. There is also an "easy button" to optimize your S3 Lifecycle using Intelligent-Tiering. The key takeaway is that file-based storage, mainly raster geospatial data, can be stored in the cloud for a fraction of on-premises costs. When it comes to geospatial cost management strategy, file-based data storage classification can yield tremendous cost savings.

Can’t we just add another server?

Application servers are the workhorse of a robust GIS architecture. Specialized servers deliver web map services, imagery, geoprocessing, and many other compute-intensive capabilities. While the number of application servers in an architecture generally outnumbers database servers, the storage and networking performance requirements tend to be lower. These are cheaper machines that perform specific tasks. Horizontal scaling is commonly used by providing multiple servers that can each execute independent, parallel tasks. Resource demands and traffic patterns tend to be erratic and spiky, resulting in underutilized CPU and GPU cores.

Launching geospatial application server capabilities on AWS can be done in a number of ways, but the most common is EC2 migration. If you have a geoprocessing server that supports horizontal scaling, it may be possible to add processing nodes in AWS to your existing server pool. Over time, you can adjust the utilization of servers based on the requirements and cost profile. Cloud servers can be deactivated when not in use to stop compute charges, and the EC2 spot pricing option provides a flexible way to get instances at a discount of up to 90% compared to on-demand prices. AWS Auto Scaling provides multiple options to control how and when servers start up and shut down based on demand requirements. If you have a server dedicated to a monthly process that takes 8 hours, 91% of your server capacity is unutilized. Understanding the steady-state processing profile of your geospatial environment allows you to identify where cost-saving compute opportunities exist. By leveraging AWS applied to these compute profiles, you’ll be able to get more processing done in less time, and at a lower cost.

Additional savings at every desk

I worked in the energy technology industry since before Y2K was in the headlines. One interesting cultural phenomenon I’ve seen in the workplace occurs among geoscientists, engineers, technical SMEs, and others who do specialized compute-intensive work. The power of your work machine is a badge of honor, where the highest regarded professionals are awarded more CPUs or an additional monitor. While this philosophy attracted the best petrophysicists, geologists, and other specialists, it generated significant costs. Infrequently utilized workstations sat idle, not returning any value for their purchase. This scenario is highly likely in volatile industries where layoffs are frequent and contractor usage is high. Imagine the waste if the machine is turned on 24/7, even if it is only used for a few hours a week.

DaaS provides a flexible cost-saving solution for underutilized workstations. By provisioning your workstation in the cloud, you can take advantage of larger amounts of processing power and only pay for the hours consumed. Windows license portability applies in some cases, and you can select configurations such as the GraphicsPro bundle, which packs a whopping 16 vCPU with an additional GPU, 8 GiB of video memory, and 122 GiB of general memory. At the time of writing, that machine would cost around $100 per month if only used for a few hours of monthly geospatial analysis (including the Windows license). Additional savings can be realized through reduced IT administration. AWS manages the hardware and service management, leaving the customer in charge of machine images, applications, and security administration.

As described in the preceding paragraph, the AWS cloud provides powerful and flexible services that help costs align with your geospatial activities. It all starts by building the right cloud strategy and establishing an empowered team to discover and operationalize digital innovation. Endorsement or sponsorship from forward-looking executives has proven to be correlated with success in cloud technology projects. There are new ways to get things done in the cloud that can be a fraction of the cost of traditional methods. All of these concepts factor into your evergreen geospatial computing strategy and result in better geospatial data and insights delivered to your end users.

You have been reading a chapter from
Geospatial Data Analytics on AWS
Published in: Jun 2023
Publisher: Packt
ISBN-13: 9781804613825
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image