Data ingestion from JDBC data stores
For many organizations hydrating data lakes by ingesting the data from OLTP, data stores are the primary use case for using ETL tools/frameworks. Typically, these ETL jobs are run periodically to keep the data lake up to date. As discussed in Chapter 1, Data Management - Introduction and Concepts, there are quite a few options available in AWS to achieve this outcome. The most popular ones are AWS DMS and AWS Glue.
Users can set up AWS DMS replication instances to capture ongoing changes from the source data store. At the time of writing, this feature supports Microsoft SQL Server, PostgreSQL, Oracle, and MySQL databases. Please refer to the AWS DMS documentation at https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html for more information on this feature.
Another option is to use AWS Glue Spark ETL to read JDBC data stores and move the data to Amazon S3 or other target data stores supported by Apache Spark. With this option...