Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Engineering with Databricks Cookbook

You're reading from   Data Engineering with Databricks Cookbook Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

Arrow left icon
Product type Paperback
Published in May 2024
Publisher Packt
ISBN-13 9781837633357
Length 438 pages
Edition 1st Edition
Arrow right icon
Author (1):
Arrow left icon
Pulkit Chadha Pulkit Chadha
Author Profile Icon Pulkit Chadha
Pulkit Chadha
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface 1. Part 1 – Working with Apache Spark and Delta Lake FREE CHAPTER
2. Chapter 1: Data Ingestion and Data Extraction with Apache Spark 3. Chapter 2: Data Transformation and Data Manipulation with Apache Spark 4. Chapter 3: Data Management with Delta Lake 5. Chapter 4: Ingesting Streaming Data 6. Chapter 5: Processing Streaming Data 7. Chapter 6: Performance Tuning with Apache Spark 8. Chapter 7: Performance Tuning in Delta Lake 9. Part 2 – Data Engineering Capabilities within Databricks
10. Chapter 8: Orchestration and Scheduling Data Pipeline with Databricks Workflows 11. Chapter 9: Building Data Pipelines with Delta Live Tables 12. Chapter 10: Data Governance with Unity Catalog 13. Chapter 11: Implementing DataOps and DevOps on Databricks 14. Index 15. Other Books You May Enjoy

What this book covers

Chapter 1, Data Ingestion and Data Extraction with Apache Spark, explores the fundamental processes of data ingestion and extraction using Apache Spark. From connecting to various data sources to efficiently extracting and loading data, you will gain hands-on experience in leveraging Apache Spark’s capabilities for seamless data integration.

Chapter 2, Data Transformation and Data Manipulation with Apache Spark, delves into the transformative power of Apache Spark, focusing on data transformation and manipulation techniques. You will learn how to harness Spark’s robust functionalities for reshaping and optimizing data, ensuring it aligns with specific business requirements and analytical needs.

Chapter 3, Data Management with Delta Lake, delves into Delta Lake, a critical component for effective data management. You will discover how to leverage Delta Lake’s ACID transactions and versioning capabilities to ensure data reliability, consistency, and efficient management within the Lakehouse architecture.

Chapter 4, Ingesting Streaming Data, initiates the exploration of ingesting streaming data using Apache Spark. It covers the basics of streaming data ingestion, setting the stage for understanding real-time data processing and analysis.

Chapter 5, Processing Streaming Data, completes the exploration of streaming data by focusing on advanced techniques and best practices for processing real-time data with Apache Spark. You will gain insights into handling dynamic data streams and maintaining data integrity in dynamic, fast-paced environments.

Chapter 6, Performance Tuning with Apache Spark, delves into the intricacies of performance tuning in Apache Spark. From optimizing code to fine-tuning configurations, you will learn practical strategies to enhance the efficiency and speed of Spark applications, ensuring optimal performance for large-scale data processing.

Chapter 7, Performance Tuning in Delta Lake, builds upon performance tuning principles and focuses specifically on optimizing Delta Lake workflows. You will gain insights into techniques for improving the speed and efficiency of data transactions, making data management within the Lakehouse architecture more performant.

Chapter 8, Orchestration and Scheduling Data Pipeline with Databricks Workflows, guides you through the orchestration and scheduling of workflows in Databricks. From designing automated data pipelines to scheduling tasks efficiently, you will learn how to streamline your data engineering processes and ensure the timely execution of critical workflows.

Chapter 9, Building Data Pipelines with Delta Live Tables, helps you explore the innovative Delta Live Tables, showing how to build robust and dynamic data pipelines. The focus is on leveraging Delta Live Tables to simplify data pipeline development, enhance collaboration, and ensure data consistency in real time.

Chapter 10, Data Governance with Unity Catalog, introduces the concept of data governance using Unity Catalog in Databricks. You will discover how to implement effective data governance practices, including metadata management, data lineage tracking, and access control, to ensure data quality and compliance.

Chapter 11, Implementing DataOps and DevOps on Databricks, addresses the integration of DataOps and DevOps practices within the Databricks environment. You will learn how to implement collaborative and automated development and deployment processes, fostering a culture of continuous improvement and efficiency in data engineering workflows.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image