You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Product type Paperback

Published in Sep 2023

Publisher Packt

ISBN-13 9781801070492

Length 318 pages

Edition 1st Edition

Languages

Python

Tools

MLflow

Concepts

Data Science

Author (1):

Brian Lipp

View More author details

Table of Contents (19) Chapters

Preface

1. Part 1:Fundamental Data Knowledge

2. Chapter 1: Modern Data Processing Architecture FREE CHAPTER

3. Chapter 2: Understanding Data Analytics

4. Part 2: Data Engineering Toolset

5. Chapter 3: Apache Spark Deep Dive

6. Chapter 4: Batch and Stream Data Processing Using PySpark

7. Chapter 5: Streaming Data with Kafka

8. Part 3:Modernizing the Data Platform

9. Chapter 6: MLOps

10. Chapter 7: Data and Information Visualization

11. Chapter 8: Integrating Continous Integration into Your Workflow

12. Chapter 9: Orchestrating Your Data Workflows

13. Part 4:Hands-on Project

14. Chapter 10: Data Governance

15. Chapter 11: Building out the Groundwork

16. Chapter 12: Completing Our Project

17. Index

Why subscribe?

18. Other Books You May Enjoy

Schema Registry

Kafka guarantees the delivery of events sent from producers, but it does not attempt to guarantee quality. Kafka assumes that your applications can coordinate quality data between consumers and producers. On the surface, this seems reasonable and easy to accomplish. The reality is that even in ideal situations, this type of assumed coordination is unrealistic. This type of problem is common among data producers and consumers; the solution is to enforce a data contract.

The general rule of thumb is, garbage in, garbage out. Confluent Schema Registry is an attempt at building contracts for your data schema in Kafka. Confluent Schema Registry is a layer that sits in front of Kafka and stands as the gatekeeper to Kafka. Events can’t be produced for a topic unless Confluent Schema Registry first gives its blessing. Consumers can know exactly what they will get by checking the Confluent Schema Registry first.

This process happens behind the scenes, and the Confluent...

The rest of the chapter is locked

You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Table of Contents (19) Chapters

Schema Registry

Authors (1)

Personalised recommendations for you

You're reading from Modern Data Architectures with Python A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Table of Contents (19) Chapters

Schema Registry

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you