Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Engineering with AWS Cookbook

You're reading from   Data Engineering with AWS Cookbook A recipe-based approach to help you tackle data engineering problems with AWS services

Arrow left icon
Product type Paperback
Published in Nov 2024
Publisher Packt
ISBN-13 9781805127284
Length 528 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (4):
Arrow left icon
Viquar Khan Viquar Khan
Author Profile Icon Viquar Khan
Viquar Khan
Gonzalo Herreros González Gonzalo Herreros González
Author Profile Icon Gonzalo Herreros González
Gonzalo Herreros González
Huda Nofal Huda Nofal
Author Profile Icon Huda Nofal
Huda Nofal
Trâm Ngọc Phạm Trâm Ngọc Phạm
Author Profile Icon Trâm Ngọc Phạm
Trâm Ngọc Phạm
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: Managing Data Lake Storage 2. Chapter 2: Sharing Your Data Across Environments and Accounts FREE CHAPTER 3. Chapter 3: Ingesting and Transforming Your Data with AWS Glue 4. Chapter 4: A Deep Dive into AWS Orchestration Frameworks 5. Chapter 5: Running Big Data Workloads with Amazon EMR 6. Chapter 6: Governing Your Platform 7. Chapter 7: Data Quality Management 8. Chapter 8: DevOps – Defining IaC and Building CI/CD Pipelines 9. Chapter 9: Monitoring Data Lake Cloud Infrastructure 10. Chapter 10: Building a Serving Layer with AWS Analytics Services 11. Chapter 11: Migrating to AWS – Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads 12. Chapter 12: Harnessing the Power of AWS for Seamless Data Warehouse Migration 13. Chapter 13: Strategizing Hadoop Migrations – Cost, Data, and Workflow Modernization with AWS 14. Index 15. Other Books You May Enjoy

Monitoring your S3 bucket

Enabling and monitoring S3 metrics allows you to proactively manage your S3 resources, optimize performance, ensure appropriate security and compliance measures are in place, identify cost-saving opportunities, and ensure the operational readiness of your S3 infrastructure. S3 offers various methods for monitoring your buckets, including S3 server access logs, CloudTrail, CloudWatch metrics, and S3 event notifications. S3 server access logs can be enabled to log each request made to the bucket. CloudTrail captures actions taken on S3 or API calls on the bucket, allowing you to monitor and audit actions, including object-level operations such as uploads, downloads, and deletions. CloudWatch metrics track specific metrics for your buckets and allow you to set up alarms so that you receive notifications when certain thresholds are met. S3 event notifications enable you to set up notifications for specific S3 events and configure actions in response to those events. In this recipe, we will cover enabling CloudTrail for your S3 buckets and configuring CloudWatch metrics to monitor high-volume data transfer based on these logs.

Getting ready

To proceed with this recipe, you need to enable CloudTrail so that it can log S3 data events and insights. Follow these steps:

  1. Open the AWS Management Console (https://console.aws.amazon.com/console/home?nc2=h_ct&src=header-signin) and navigate to the CloudTrail service.
  2. Click on Trails in the left navigation pane and click on Create trail to create a new trail.
  3. Provide a name for the trail in the Trail name field.
  4. For Storage location, you need to provide an S3 bucket for storing CloudTrail logs. You can select Use existing S3 bucket or Create new S3 bucket.
  5. Optionally, you can enable Log file SSE-KMS encryption and choose the KMS key.
  6. Under CloudWatch Logs, choose Yes for Send CloudTrail events to CloudWatch Logs.
  7. Configure the CloudWatch Logs settings as per your requirements. For example, you can select an existing CloudWatch Logs group or create a new one:
Figure 1.12 – Enabling CloudWatch Logs

Figure 1.12 – Enabling CloudWatch Logs

  1. For Role name, choose to create a new one and give it a name.
  2. Review the other trail settings, such as log file validation and tags, make adjustments if needed, and click on Next.
  3. Under the Events section, enable Data events and Insight events in addition to Management events, which is already enabled.
  4. Under Management events, select Read and Write:
Figure 1.13 – Configuring Events

Figure 1.13 – Configuring Events

  1. Under Data events, choose S3 for Data event type and Log all events for the Log selector template.
  2. Under Insights events, select API call rate and API error rate.
  3. Click on Next and then click on Create trail to create the trail.

Once the trail has been created, CloudTrail will start capturing S3 data events and storing the logs in the specified S3 bucket. Simultaneously, the logs will be sent to the CloudWatch Logs group specified during trail creation.

How to do it…

  1. Open the AWS Management Console (https://console.aws.amazon.com/console/home?nc2=h_ct&src=header-signin) and navigate to the CloudWatch console.
  2. Go to Log groups from the navigation pane on the left and select the CloudTrail log group you just created.
  3. Click on Create Metric Filter from the Action drop-down list.
  4. Provide { ($.eventName = CopyObject) || ($.eventName = PutObject) || ($.eventName = CompleteMultipartUpload) && $.request.bytes_transferred > 500000000} as the filter pattern. This filter pattern will capture events related to copying or uploading objects to S3 that are larger than 500 MB. The threshold value should be set based on your bucket access patterns.

    You can test your pattern by specifying one of the log files or providing a custom log in the Test pattern section. Then, you can click on Test pattern and validate the result:

Figure 1.14 – Filter pattern

Figure 1.14 – Filter pattern

  1. Click on Next.
  2. Under the Create filter name field, specify a filter name.
  3. Under the Metric Details section, specify a Metric namespace value (for example, S3Metrics) and provide a name for the metric itself under Metric name (for example, HighVolumeTransfers).
  4. Set Unit to Count for your metric and set Metric value to 1 to indicate that a transfer event has occurred. Finally, set Default value to 0:
Figure 1.15 – Metric details

Figure 1.15 – Metric details

  1. Click on Create metric filter.

How it works…

By enabling CloudTrail in your AWS account and ensuring that the logs are delivered to CloudWatch, the S3 API activities can be accessed and analyzed within your AWS environment. By creating a metric filter with a customized filter pattern that matches S3 transfer events, relevant information from the CloudTrail logs can be extracted. Once the metric filter is created, CloudWatch generates a custom metric based on the filter’s configuration. This metric represents the occurrence of high-volume S3 transfers. You can then view this metric in the CloudWatch console, where you can gain insights into your S3 transfer activity and take the necessary actions.

There’s more…

Once your metric has been created, you can create alarms based on the metric’s value to notify you when a high volume of S3 transfers has been detected.

To create an alarm for the metric you have created based on high-volume S3 transfers from CloudTrail logs on CloudWatch, follow these steps:

  1. Go to the CloudWatch console and select the Alarms tab.
  2. Click on Create Alarm. In the Create Alarm wizard, select the metric you created. You can find it by navigating the namespace and finding the metric name you configured earlier.
  3. Under the Metric section, set Statistic to Sum and Period to 15 minutes. This can be changed as per your needs:
Figure 1.16 – Metric statistics

Figure 1.16 – Metric statistics

  1. Under the Conditions section, Set Threshold type to Static and choose Greater than for the alarm condition. This indicates a high volume transfer on your bucket, as per your observations. Optionally, you can choose how many data points within the evaluation period must be breached to cause the alarm to go to the alarm state by expanding Additional configuration. This will help you avoid false positives caused by transient spikes in the metric values:
Figure 1.17 – Metric conditions

Figure 1.17 – Metric conditions

  1. Click on Next.
  2. Under the Notification section, choose In alarm to send a notification when the metric is in the alarm state. Choose Create a new topic, provide a name for it and the email endpoints that will receive the SNS notification, and click on Create topic or choose an existing SNS topic if you have one that’s been configured. You can configure other actions to be executed if the alarm state is triggered, such as executing a Lambda function or performing automated scaling actions:
Figure 1.18 – Metric notification settings

Figure 1.18 – Metric notification settings

  1. Provide a name for the alarm so that it can be identified with ease.
  2. Review the alarm settings and click on Create Alarm to create the alarm.

Once the alarm has been created, it will start monitoring the metric for high-volume S3 transfers based on the defined conditions. If the threshold is breached for the specified duration (there are more than 150 data transfer requests of more than 500 MB within 45 minutes), the alarm state will be triggered, and an SNS notification will be sent. This allows you to receive timely notifications and take appropriate remedial actions in case of high-volume S3 transfers, ensuring that any potential issues are addressed proactively.

See also

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image