Cloud Dataflow is a service based on Apache Beam, which is an open source software for creating data processing pipelines. A pipeline is essentially a piece of code that determines how we wish to process our data. Once these pipelines have been constructed and input into the service, they become a Dataflow job. This is where we can process our data ingested by Pub/Sub. It will perform steps to change our data from one format to another, and can transform both real-time stream or historical batch data. Dataflow is completely serverless and fully managed. It will spin up and destroy the necessary resources to execute our Dataflow job. As an example, a pipeline job might be made up of several steps. If a specific step requires execution on 15 machines in parallel, then Dataflow will automatically scale to these 15 machines and remove them when the job is complete....





















































