Architecting a Real-Time Processing Pipeline
In the previous chapter, we learned how to architect a big data solution for a high-volume batch-based data engineering problem. Then, we learned how big data can be profiled using Glue DataBrew. Finally, we learned how to logically choose between various technologies to build a Spark-based complete big data solution in the cloud.
In this chapter, we will discuss how to analyze, design, and implement a real-time data analytics solution to solve a business problem. We will learn how the reliability and speed of processing can be achieved with the help of distributed messaging systems such as Apache Kafka to stream and process the data. Here, we will discuss how to write a Kafka Streams application to process and analyze streamed data and store the results of a real-time processing engine in a NoSQL database such as MongoDB, DynamoDB, or DocumentDB using Kafka connectors.
By the end of this chapter, you will know how to build a real...