Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Engineering with AWS Cookbook

You're reading from   Data Engineering with AWS Cookbook A recipe-based approach to help you tackle data engineering problems with AWS services

Arrow left icon
Product type Paperback
Published in Nov 2024
Publisher Packt
ISBN-13 9781805127284
Length 528 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (4):
Arrow left icon
Viquar Khan Viquar Khan
Author Profile Icon Viquar Khan
Viquar Khan
Gonzalo Herreros González Gonzalo Herreros González
Author Profile Icon Gonzalo Herreros González
Gonzalo Herreros González
Huda Nofal Huda Nofal
Author Profile Icon Huda Nofal
Huda Nofal
Trâm Ngọc Phạm Trâm Ngọc Phạm
Author Profile Icon Trâm Ngọc Phạm
Trâm Ngọc Phạm
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: Managing Data Lake Storage 2. Chapter 2: Sharing Your Data Across Environments and Accounts FREE CHAPTER 3. Chapter 3: Ingesting and Transforming Your Data with AWS Glue 4. Chapter 4: A Deep Dive into AWS Orchestration Frameworks 5. Chapter 5: Running Big Data Workloads with Amazon EMR 6. Chapter 6: Governing Your Platform 7. Chapter 7: Data Quality Management 8. Chapter 8: DevOps – Defining IaC and Building CI/CD Pipelines 9. Chapter 9: Monitoring Data Lake Cloud Infrastructure 10. Chapter 10: Building a Serving Layer with AWS Analytics Services 11. Chapter 11: Migrating to AWS – Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads 12. Chapter 12: Harnessing the Power of AWS for Seamless Data Warehouse Migration 13. Chapter 13: Strategizing Hadoop Migrations – Cost, Data, and Workflow Modernization with AWS 14. Index 15. Other Books You May Enjoy

Setting up retention policies for your objects

Amazon S3’s storage lifecycle allows you to manage the lifecycle of objects in an S3 bucket based on predefined rules. The lifecycle management feature consists of two main actions: transitions and expiration. Transitions involve automatically moving objects between different storage classes based on a defined duration. This helps in optimizing costs by storing less frequently accessed data in a cheaper storage class. Expiration, on the other hand, allows users to set rules to automatically delete objects from an S3 bucket. These rules can be based on a specified duration. Additionally, you can apply a combination of transitions and expiration actions to objects. Amazon S3’s storage lifecycle provides flexibility and ease of management for users and it helps organizations optimize storage costs while ensuring that data is stored according to its relevance and access patterns.

In this recipe, we will learn how to set up a lifecycle policy to archive objects in S3 Glacier after a certain period and then expire them.

Getting ready

To complete this recipe, you need to have a Glacier vault, which is a separate storage container that can be used to store archives, independent from S3. You can create one by following these steps:

  1. Open the AWS Management Console (https://console.aws.amazon.com/console/home?nc2=h_ct&src=header-signin) and navigate to the Glacier service.
  2. Click on Create vault to start creating a new Glacier vault.
  3. Provide a unique and descriptive name for your vault in the Vault name field.
  4. Optionally, you can choose to receive notifications for events by clicking Turn on notifications under the Event notifications section.
  5. Click on Create to create the vault.

How to do it…

  1. Open the AWS Management Console (https://console.aws.amazon.com/console/home?nc2=h_ct&src=header-signin) and navigate to the S3 service.
  2. Select the desired bucket for which you want to configure the lifecycle policy and navigate to the Management tab.
  3. In the left panel, select Lifecycle and click on Create lifecycle rule.
  4. Under Rule name, name the lifecycle rule to identify it.
  5. Under Choose a rule scope, you can choose Apply to all objects in the bucket or Limit the scope of this rule using one or more filters to specify the objects for which the rule will be applied. You can use one of the following filters or a combination of them:
    • Filter objects based on prefixes (for example, logs)
    • Filter objects based on tags; you can add multiple key-value pair tags to filter on
    • Filter objects based on object size by setting Specify minimum object size and/or Specify maximum object size and specifying the size value and unit

    The following screenshot shows a rule that’s been restricted to a set of objects based on a prefix:

Figure 1.4 – Lifecycle rule configuration

Figure 1.4 – Lifecycle rule configuration

  1. Under Lifecycle rule actions, select the following options:
    • Move current versions of objects between storage classes. Then, choose one of the Glacier classes and set Days after object creation in which the object will be transitioned (for example, 60 days).
    • Expire current versions of objects. Then, set Days after object creation in which the object will expire. Choose a value higher than the one you set for transitioning the object to Glacier (for example, 100).

    Review the transition and expiration actions you have set and click on Create rule to apply the lifecycle policy to the bucket:

Figure 1.5 – Reviewing the lifecycle rule

Figure 1.5 – Reviewing the lifecycle rule

Note

It may take some time for the lifecycle rule to be applied to all the selected objects, depending on the size of the bucket and the number of objects. The rule will affect existing files, not just new ones, so ensure that no applications are accessing files that will be archived or deleted as they will no longer be accessible via direct S3 retrieval.

How it works…

After you save the lifecycle rule, Amazon S3 will periodically evaluate it to find objects that meet the criteria specified in the lifecycle rule. In this recipe, the object will remain in its default storage type for the specified period (for example, 60 days) after which it will automatically be moved to the Glacier storage class. This transition is handled transparently, and the object’s metadata and properties remain unchanged. Once the objects are transitioned to Glacier, they are stored in a Glacier vault and become part of the Glacier storage infrastructure. Objects will then remain in Glacier for the remaining period of expiry (for example, 40 days), after which they will expire and be permanently deleted from your S3 bucket.

Please note that once the objects have expired, they will be queued for deletion, so it might take a few days after the object reaches the end of its lifetime for it to be deleted.

There’s more…

Lifecycle configuration can be specified as an XML when using the S3 API or AWS console, which can be helpful if you are planning on using the same lifecycle rules on multiple buckets. You can read more on setting this up at https://docs.aws.amazon.com/AmazonS3/latest/userguide/intro-lifecycle-rules.html.

See also

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image