Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Engineering with AWS Cookbook

You're reading from   Data Engineering with AWS Cookbook A recipe-based approach to help you tackle data engineering problems with AWS services

Arrow left icon
Product type Paperback
Published in Nov 2024
Publisher Packt
ISBN-13 9781805127284
Length 528 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (4):
Arrow left icon
Viquar Khan Viquar Khan
Author Profile Icon Viquar Khan
Viquar Khan
Gonzalo Herreros González Gonzalo Herreros González
Author Profile Icon Gonzalo Herreros González
Gonzalo Herreros González
Huda Nofal Huda Nofal
Author Profile Icon Huda Nofal
Huda Nofal
Trâm Ngọc Phạm Trâm Ngọc Phạm
Author Profile Icon Trâm Ngọc Phạm
Trâm Ngọc Phạm
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Chapter 1: Managing Data Lake Storage 2. Chapter 2: Sharing Your Data Across Environments and Accounts FREE CHAPTER 3. Chapter 3: Ingesting and Transforming Your Data with AWS Glue 4. Chapter 4: A Deep Dive into AWS Orchestration Frameworks 5. Chapter 5: Running Big Data Workloads with Amazon EMR 6. Chapter 6: Governing Your Platform 7. Chapter 7: Data Quality Management 8. Chapter 8: DevOps – Defining IaC and Building CI/CD Pipelines 9. Chapter 9: Monitoring Data Lake Cloud Infrastructure 10. Chapter 10: Building a Serving Layer with AWS Analytics Services 11. Chapter 11: Migrating to AWS – Steps, Strategies, and Best Practices for Modernizing Your Analytics and Big Data Workloads 12. Chapter 12: Harnessing the Power of AWS for Seamless Data Warehouse Migration 13. Chapter 13: Strategizing Hadoop Migrations – Cost, Data, and Workflow Modernization with AWS 14. Index 15. Other Books You May Enjoy

Versioning your data

Amazon S3 versioning refers to maintaining multiple variants of an object at the same time in the same bucket. Versioning provides you with an additional layer of protection by giving you a way to recover from unintended overwrites and accidental deletions as well as application failures.

S3 Object Versioning is not enabled by default and has to be explicitly enabled for each bucket. Once enabled, versioning cannot be disabled and can only be suspended. When versioning is enabled, you will be able to preserve, retrieve, and restore any version of an object stored in the bucket using the version ID. Every version of an object is the whole object, not the delta from the previous version, and you can set permissions at the version level. So, you can set different permissions for different versions of the same object.

In this recipe, we’ll learn how to delete the current version of an object to make the previous one the current version.

Getting ready

For this recipe, you need to have a version-enabled bucket with an object that has at least two versions.

You can enable versioning for your bucket by going to the bucket’s Properties tab, editing the Bucket Versioning area, and setting it to Enable:

Figure 1.6 – Enabling bucket versioning

Figure 1.6 – Enabling bucket versioning

You can create a new version of an object by simply uploading a file with the same name to the versioning-enabled bucket.

It’s important to note that enabling versioning for a bucket is irreversible. Once versioning is enabled, it will be applied to all existing and future objects in that bucket. So, before enabling versioning, make sure that your application or workflow is compatible with object versioning.

Enabling versioning for the first time will take time to take effect, so we recommend waiting 15 minutes before performing any write operation on objects in the bucket.

How to do it…

  1. Sign in to the AWS Management Console (https://console.aws.amazon.com/console/home?nc2=h_ct&src=header-signin) and navigate to the S3 service.
  2. In the Buckets list, select the S3 bucket that contains the object for which you want to set the previous version as the current one.
  3. In the Objects tab, click on Show versions. Here, you can view all your object versions:
Figure 1.7 – Object versions

Figure 1.7 – Object versions

  1. Select the current version of the object that you want to delete. It’s the top-most version with the latest modified date.
  2. Click on the Delete button and write permanently delete as prompted on the next screen.

    After deleting the current version, the previous version will automatically become the latest version:

Figure 1.8 – Object versions after version deletion

Figure 1.8 – Object versions after version deletion

  1. Verify that the previous version is now the latest version by checking the Last modified timestamps or verifying this through object listing, metadata, or download.

How it works…

Once you enable bucket versioning, each object in the bucket will have a version ID that uniquely identifies the object in the bucket, and the non-version-enabled buckets will have their version IDs set to null for their objects. The older versions of an object become non-current but continue to exist and remain accessible. When you delete the current version of the object, it will be permanently removed and the S3 versioning mechanism will automatically promote the previous version as the current one after deletion. If you delete an object without specifying the version ID, Amazon S3 doesn’t delete it permanently; instead, it inserts a delete marker into it and it becomes the current object version. However, you can still restore its previous versions:

Figure 1.9 – Object with a delete marker

Figure 1.9 – Object with a delete marker

There’s more…

S3 rates apply to every version of an object that’s stored and requested, so keeping non-current versions of objects can increase your storage cost. You can use lifecycle rules to archive the non-current versions or permanently delete them after a certain period and keep the bucket clean from unnecessary object versions.

Follow these steps to add a lifecycle rule to delete non-current versions after a certain period:

  1. Go to the bucket’s Management tab and click on the Lifecycle configuration.
  2. Click on the Add lifecycle rule button to create a new rule.
  3. Provide a unique name for the rule.
  4. Under Apply rule to, select the appropriate resources (for example, the entire bucket or specific prefixes).
  5. Set the action to Permanently delete non-current versions.
  6. Specify Days after objects become noncurrent in which the delete will be executed. Optionally, you can specify Number of newer versions to retain, which means it will keep the said number of versions for the object and all others will be deleted when they are eligible for deletion based on the specified period.
  7. Click on Save to save the lifecycle rule.

See also

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image