Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Data Engineering with Google Cloud Platform
Data Engineering with Google Cloud Platform

Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud , Second Edition

eBook
$22.99 $33.99
Paperback
$41.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Data Engineering with Google Cloud Platform

Fundamentals of Data Engineering

Years ago, when I initially entered the world of data analytics, I used to think data was clean – clean in terms of readiness and neatly organized. I was so excited to experiment with machine learning models, find unusual patterns in data, and play around with clean data. But after years of experience working with data, I realized that data analytics in big organizations isn’t straightforward.

Most of the effort goes into collecting, cleaning, and transforming the data. If you have had any experience in working with data, I am sure you’ve noticed something similar. But the good news is that we know that all processes can be automated using proper planning, designing, and engineering skills. That was the point where I realized that data engineering would be the most critical role in the future of the data science world.

To develop a successful data ecosystem in any organization, the most crucial part is how they design the...

Understanding the data life cycle

Understanding the data life cycle is the first principle in becoming a data engineer. If you’ve worked with data, you must know that data doesn’t stay in one place; it moves from one storage to another, from one database to another database. Understanding the data life cycle means you need to be able to answer these sorts of questions if you want to display information to your end user:

  • Who will consume the data?
  • What data sources should I use?
  • Where should I store the data?
  • When should the data arrive?
  • Why does the data need to be stored in this place?
  • How should the data be processed?

To answer all those questions, we’ll start by looking back a little bit at the history of data technologies.

Understanding the need for a data warehouse

Data warehouse is not a new concept; I believe you’ve at least heard of it. In fact, this terminology is no longer appealing. In my experience, no...

Start with knowing the roles of a data engineer

In the later chapters, we will spend much of our time doing practical exercises to understand data engineering concepts. But before that, let’s quickly take a look at the data engineer role.

The job role is getting more and more popular now, but the terminology itself is relatively new compared to well-established job roles, such as accountant, lawyer, and doctor. The impact is that sometimes there is still a debate about what a data engineer should and shouldn’t do.

For example, if you came to a hospital and met a doctor, you know for sure that the doctor would do the following:

  1. Examine your condition.
  2. Make a diagnosis of your health issues.
  3. Prescribe medicine.

The doctor wouldn’t do the following:

  1. Clean the hospital.
  2. Make the medicine.
  3. Manage hospital administration.

It’s clear, and it applies to most well-established job roles. But how about data engineers...

Going through the foundational concepts for data engineering

Even though there are many data engineering concepts that we will learn throughout the book by using Google Cloud Platform (GCP), there are some basic concepts that you need to know as data engineers. In my experience of interviewing in data companies, I discovered that these foundational concepts are often asked to assess how much you know about data engineering. Take the following examples:

  • What is ETL?
  • What’s the difference between ETL and Extract, Load, and Transform (ELT)?
  • What is big data?
  • How do you handle large volumes of data?

These questions are quite common, yet particularly important to deeply understand the concepts since they may affect our decisions on architecting our data life cycles.

ETL concept in data engineering

ETL is the key foundation of data engineering. Everything in the data life cycle is ETL; any part that happens from upstream to downstream is ETL. Let&...

Summary

As a summary of the first chapter, we’ve learned the fundamental knowledge we need as data engineers. Here are some key takeaways from this chapter. First, data doesn’t stay in one place. Data moves from one place to another, called the data life cycle. We also understand that data in a big organization is mostly in silos, and we can solve these data silos using the concepts of a data warehouse and data lake.

As someone who has started to look into data engineer roles, you may be a little bit lost. The role of data engineers may vary. The key takeaway is not to be confused about the broad expectations in the market. First, you should focus on the core and then expand as you get more experience from the core. In this chapter, we’ve learned what the core of a data engineer is. At the end of the chapter, we learned some of the key concepts. There are three key concepts as a data engineer that you need to be familiar with. These concepts are ETL, big data...

Exercise

You are a data engineer at a book publishing company and your product manager has asked you to build a dashboard to show the total revenue and customer satisfaction index in a single dashboard.

Your company doesn’t have any data infrastructure yet, but you know that your company has these three applications that contain TBs of data:

  • The company website
  • A book sales application using MongoDB to store sales transactions, including transactions, book IDs, and author IDs
  • An author portal application using a MySQL Database to store authors’ personal information, including age

Do the following:

  1. List down important follow-up questions for your manager
  2. List down your technical thinking process of how to do it at a high level
  3. Draw a data pipeline architecture

There is no right or wrong answer to this practice. The important thing is that you can imagine how the data flows from upstream to downstream, how it should be processed...

Further Reading

You can visit the following links to explore more about the topics discussed in this chapter:

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get up to speed with data governance on Google Cloud
  • Learn how to use various Google Cloud products like Dataform, DLP, Dataplex, Dataproc Serverless, and Datastream
  • Boost your confidence by getting Google Cloud data engineering certification guidance from real exam experiences
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

The second edition of Data Engineering with Google Cloud builds upon the success of the first edition by offering enhanced clarity and depth to data professionals navigating the intricate landscape of data engineering. Beyond its foundational lessons, this new edition delves into the essential realm of data governance within Google Cloud, providing you with invaluable insights into managing and optimizing data resources effectively. Written by a Data Strategic Cloud Engineer at Google, this book helps you stay ahead of the curve by guiding you through the latest technological advancements in the Google Cloud ecosystem. You’ll cover essential aspects, from exploring Cloud Composer 2 to the evolution of Airflow 2.5. Additionally, you’ll explore how to work with cutting-edge tools like Dataform, DLP, Dataplex, Dataproc Serverless, and Datastream to perform data governance on datasets. By the end of this book, you'll be equipped to navigate the ever-evolving world of data engineering on Google Cloud, from foundational principles to cutting-edge practices.

Who is this book for?

Data analysts, IT practitioners, software engineers, or any data enthusiasts looking to have a successful data engineering career will find this book invaluable. Additionally, experienced data professionals who want to start using Google Cloud to build data platforms will get clear insights on how to navigate the path. Whether you're a beginner who wants to explore the fundamentals or a seasoned professional seeking to learn the latest data engineering concepts, this book is for you.

What you will learn

  • Load data into BigQuery and materialize its output
  • Focus on data pipeline orchestration using Cloud Composer
  • Formulate Airflow jobs to orchestrate and automate a data warehouse
  • Establish a Hadoop data lake, generate ephemeral clusters, and execute jobs on the Dataproc cluster
  • Harness Pub/Sub for messaging and ingestion for event-driven systems
  • Apply Dataflow to conduct ETL on streaming data
  • Implement data governance services on Google Cloud
Estimated delivery fee Deliver to Turkey

Standard delivery 10 - 13 business days

$12.95

Premium delivery 3 - 6 business days

$34.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 30, 2024
Length: 476 pages
Edition : 2nd
Language : English
ISBN-13 : 9781835080115
Vendor :
Google
Category :
Languages :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Turkey

Standard delivery 10 - 13 business days

$12.95

Premium delivery 3 - 6 business days

$34.95
(Includes tracking information)

Product Details

Publication date : Apr 30, 2024
Length: 476 pages
Edition : 2nd
Language : English
ISBN-13 : 9781835080115
Vendor :
Google
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 126.97
Data Engineering with Google Cloud Platform
$41.99
Google Machine Learning and Generative AI for Solutions Architects
$49.99
Database Design and Modeling with Google Cloud
$34.99
Total $ 126.97 Stars icon
Banner background image

Table of Contents

18 Chapters
Part 1: Getting Started with Data Engineering with GCP Chevron down icon Chevron up icon
Chapter 1: Fundamentals of Data Engineering Chevron down icon Chevron up icon
Chapter 2: Big Data Capabilities on GCP Chevron down icon Chevron up icon
Part 2: Build Solutions with GCP Components Chevron down icon Chevron up icon
Chapter 3: Building a Data Warehouse in BigQuery Chevron down icon Chevron up icon
Chapter 4: Building Workflows for Batch Data Loading Using Cloud Composer Chevron down icon Chevron up icon
Chapter 5: Building a Data Lake Using Dataproc Chevron down icon Chevron up icon
Chapter 6: Processing Streaming Data with Pub/Sub and Dataflow Chevron down icon Chevron up icon
Chapter 7: Visualizing Data to Make Data-Driven Decisions with Looker Studio Chevron down icon Chevron up icon
Chapter 8: Building Machine Learning Solutions on GCP Chevron down icon Chevron up icon
Part 3: Key Strategies for Architecting Top-Notch Solutions Chevron down icon Chevron up icon
Chapter 9: User and Project Management in GCP Chevron down icon Chevron up icon
Chapter 10: Data Governance in GCP Chevron down icon Chevron up icon
Chapter 11: Cost Strategy in GCP Chevron down icon Chevron up icon
Chapter 12: CI/CD on GCP for Data Engineers Chevron down icon Chevron up icon
Chapter 13: Boosting Your Confidence as a Data Engineer Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(6 Ratings)
5 star 66.7%
4 star 16.7%
3 star 16.7%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Steve Young Jun 21, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Google Cloud Platform can be a very broad topic and contains many different products and services, yet this author was able to articulate on data engineering within the platform in a way that was much less dense than other books I’ve read. I am a data engineer and work with GCP on a daily basis. It was a pretty easy read while containing a lot of insightful and useful information about building data pipelines and other necessary activities of data engineering. The author also provided good color on how the platform is often used in particular industries which I found both useful and interesting. This book is a must-read if you have an interest in becoming a more functional and knowledgeable data engineer using GCP.
Amazon Verified review Amazon
SHASHI ANANTH Jun 12, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
One of the best books I have ever read, author is incredible in articulating Data warehousing, Data Engineering concepts, scenario are nicely explained , GCP data engineering tools are well explained with detailed steps. overall very nice book to learn Data engineering on Google cloud platform.
Amazon Verified review Amazon
Daniel J. Hampton III Jun 17, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been struggling to find a book that covered comprehensive big picture concepts as well as technical details and I think this book balances it quite well.
Amazon Verified review Amazon
Johnnie Sep 15, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Many concepts are covered (batch and streaming data pipeline creation, job orchestration, data governance and cost strategies) - as well as GCP cloud data storage options (with discussions in data warehouse design using BigQuery).The book went into more complex data engineering concepts in GCP such as ephemeral clusters, Dataproc (examining Hadoop, Spark and Dataframe concepts) and CI/CD practices.Note:For the curious minded engineer who ask, “dude … you mentioned ephemeral clusters. What’s the difference in ephemeral and persistent clusters???”Good question! With Persistent clusters there always is some infrastructure running. But with ephemeral clusters the clusters are created, exist for the time it takes for jobs to complete, and then cease to exist when they are brought downHow about transient clusters? I’ll leave that research up to you!“Lots of examples and exercises” are provided that enable a “hands on experience” for the reader to engage for greater understanding.This book provides data engineers with the concepts, hands on activities and guidance necessary to navigate the Google Cloud Platform (GCP).
Amazon Verified review Amazon
mayanktripathi4u Sep 27, 2024
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
This book offers a comprehensive exploration of data engineering principles, specifically in the context of Google Cloud Platform (GCP). Aimed at both beginners and intermediate data engineers, it serves as an excellent resource for those looking to understand the fundamentals of building scalable data pipelines using GCP services. The book is particularly well-suited for data engineers, cloud architects, and IT professionals seeking to build robust, scalable data pipelines using Google Cloud’s services.What I liked:One of the most valuable aspects of this book is its structured approach. Adi Wijaya begins by laying a solid foundation, introducing readers to essential tools such as BigQuery, Cloud Storage, and Cloud Dataflow. From there, he builds upon that knowledge with more advanced topics like real-time data processing and machine learning integration, making it accessible for readers with varying levels of experience.The hands-on tutorials are another highlight, offering step-by-step instructions that allow readers to practice and implement what they've learned. This practical emphasis makes complex topics easier to grasp, particularly for those who prefer learning by doing. The author also includes command-line tools like gcloud and gsutil for interacting with Google Cloud services, providing readers with real-world experience in managing cloud resources. Additionally, the author does an excellent job showcasing real-world use cases, allowing readers to understand how these tools are applied in professional data engineering settings.Things which are missing as per my opinion:Although the book is packed with useful information, it may feel fast-paced for absolute beginners to cloud computing. Some prior understanding of cloud concepts would be beneficial to fully grasp the more advanced sections. Additionally, while the book provides a detailed look into GCP, readers looking for cross-platform comparisons (e.g., AWS or Azure) won’t find such insights here.Final Thoughts:Overall, "Data Engineering with Google Cloud Platform" is a highly valuable resource for anyone looking to master data engineering within GCP. Adi Wijaya delivers a balanced mix of theory and practical application, making it an ideal read for aspiring and practicing data engineers. Whether you're developing pipelines, optimizing workflows, or integrating machine learning, this book provides the knowledge you need to excel in GCP’s data ecosystem.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact [email protected] with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at [email protected] using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on [email protected] with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on [email protected] within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on [email protected] who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on [email protected] within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela