Using data lake formats to store your data
Historically, big data technologies on the Hadoop ecosystem have taken some trade-offs to scale to volumes that traditional databases cannot handle. In the case of Apache Hive, which became the standard Hadoop SQL database, the external tables just point to files on some object storage such as HDFS or S3, and then jobs access those files without a central system coordinating access or transactions. This is still how the standard tables work on the Glue catalog.
As a result, the atomicity, consistency, isolation, and durability (ACID) properties of RDBMSs were relaxed to allow for scalability in use cases where write concurrency or the lack of transactions is not an issue, such as historical append-only tables.
In recent years, the desire has been to bring back those ACID properties while keeping the data on a scalable object store for cheap and virtually infinite scalability, with many clients and engines using the data in a distributed...