Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Journey to Become a Google Cloud Machine Learning Engineer

You're reading from   Journey to Become a Google Cloud Machine Learning Engineer Build the mind and hand of a Google Certified ML professional

Arrow left icon
Product type Paperback
Published in Sep 2022
Publisher Packt
ISBN-13 9781803233727
Length 330 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Dr. Logan Song Dr. Logan Song
Author Profile Icon Dr. Logan Song
Dr. Logan Song
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Part 1: Starting with GCP and Python
2. Chapter 1: Comprehending Google Cloud Services FREE CHAPTER 3. Chapter 2: Mastering Python Programming 4. Part 2: Introducing Machine Learning
5. Chapter 3: Preparing for ML Development 6. Chapter 4: Developing and Deploying ML Models 7. Chapter 5: Understanding Neural Networks and Deep Learning 8. Part 3: Mastering ML in GCP
9. Chapter 6: Learning BQ/BQML, TensorFlow, and Keras 10. Chapter 7: Exploring Google Cloud Vertex AI 11. Chapter 8: Discovering Google Cloud ML API 12. Chapter 9: Using Google Cloud ML Best Practices 13. Part 4: Accomplishing GCP ML Certification
14. Chapter 10: Achieving the GCP ML Certification 15. Part 5: Appendices
16. Index 17. Other Books You May Enjoy Appendix 1: Practicing with Basic GCP Services 1. Appendix 2: Practicing Using the Python Data Libraries 2. Appendix 3: Practicing with Scikit-Learn 3. Appendix 4: Practicing with Google Vertex AI 4. Appendix 5: Practicing with Google Cloud ML API

GCP big data and analytics services

Distinguished from storage and database services, the big data and analytics services focus on the big data processing pipeline: from data ingestion, storing, and processing to visualization, it helps you create a complete cloud-based big data infrastructure:

Figure 1.6 – GCP big data and analytics services

Figure 1.6 – GCP big data and analytics services

As shown in the preceding diagram, the GCP big data and analytics services include Cloud Dataproc, Cloud Dataflow, BigQuery, and Cloud Pub/Sub.

Let’s examine each of them briefly.

Google Cloud Dataproc

Based on the concept of Map-Reduce and the architecture of Hadoop systems, Google Cloud Dataproc is a managed GCP service for processing large datasets. Dataproc provides organizations with the flexibility to provision and configure data processing clusters of varying sizes on demand. Dataproc integrates well with other GCP services. It can operate directly on Cloud Storage files or use Bigtable to analyze data, and it can be integrated with Vertex AI, BigQuery, Dataplex, and other GCP services.

Dataproc helps users process, transform, and understand vast quantities of data. You can use Dataproc to run Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks. You can also use Dataproc for data lake modernization, ETL processes, and more.

Google Cloud Dataflow

Cloud Dataflow is a GCP-managed service for developing and executing a wide variety of data processing patterns, including Extract, Transform, Load (ETL), batch, and streaming jobs. Cloud Dataflow is a serverless data processing service that runs jobs written with Apache Beam libraries. Cloud Dataflow executes jobs that consist of a pipeline – a sequence of steps that reads data, transforms it into different formats, and writes it out. A dataflow pipeline consists of a series of pipes, which is a way to connect components, where data moves from one component to the next via a pipe. When jobs are executed on Cloud Dataflow, the service spins up a cluster of VMs, distributes the job tasks to the VMs, and dynamically scales the cluster based on job loads and their performance.

Google Cloud BigQuery

BigQuery is a Google fully managed enterprise data warehouse service that is highly scalable, fast, and optimized for data analytics. It has the following features:

  • BigQuery supports ANSI-standard SQL queries, including joins, nested and repeated fields, analytic and aggregation functions, scripting, and a variety of spatial functions via geospatial analytics.
  • With BigQuery, you do not physically manage the infrastructure assets. BigQuery’s serverless architecture lets you use SQL queries to answer big business questions with zero infrastructure overhead. With BigQuery’s scalable, distributed analysis engine, you can query petabytes of data in minutes.
  • BigQuery integrates seamlessly with other GCP data services. You can query data stored in BigQuery or run queries on data where it lives using external tables or federated queries, including GCS, Bigtable, Spanner, or Google Sheets stored in Google Drive.
  • BigQuery helps you manage and analyze your data with built-in features such as ML, geospatial analysis, and business intelligence. We will discuss BigQuery ML later in this book.

Google BigQuery is used in many business cases due to it being SQL-friendly, having a serverless structure, and having built-in integration with other GCP services.

Google Cloud Pub/Sub

GCP Pub/Sub is a widely used cloud service for decoupling many GCP services – it implements an event/message queue pipe to integrate services and parallelize tasks. With the Pub/Sub service, you can create event producers, called publishers, and event consumers, called subscribers. Using Pub/Sub, the publishers communicate with subscribers asynchronously by broadcasting events – a publisher can have multiple subscribers and a subscriber can subscribe to multiple publishers:

Figure 1.7 – Google Cloud Pub/Sub services

Figure 1.7 – Google Cloud Pub/Sub services

The preceding diagram shows the example we discussed in the GCP Cloud Functions section: after an object is uploaded to a GCS bucket, a request/message can be generated and sent to GCP Pub/Sub, which can trigger an email notification and a cloud function to process the object. When the number of parallel object uploads is huge, Cloud Pub/Sub will help buffer/queue the requests/messages and decouple the GCS service from other cloud services such as Cloud Functions.

So far, we have covered various GCP services, including compute, storage, databases, and data analytics (big data). Now, let’s take a look at various GCP artificial intelligence (AI) services.

You have been reading a chapter from
Journey to Become a Google Cloud Machine Learning Engineer
Published in: Sep 2022
Publisher: Packt
ISBN-13: 9781803233727
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image