Task 14 – Implementing SQLMaxWordLength
In Chapter 2, Implementing, Testing, and Deploying Basic Pipelines, we implemented a pipeline called MaxWordLength
. In this task, we will reimplement it by using SQL and schemas. Note that although we already know how to structure code better and use PTransforms
rather than using static methods to transform one PCollection
into another, we will keep the approach from the original chapter so that we can easily spot the differences and compare both versions more easily.
For clarity, let's restate the problem.
Problem definition
Given a stream of text lines in Apache Kafka, create a stream consisting of the longest word seen in the stream from the beginning to the present. Use triggering to output the result as frequently as possible. Use Apache Beam SQL to implement the task whenever possible.
Problem decomposition discussion
The interesting parts will be centered around several problems that we need to solve:
-
...