This chapter has been a whirlwind tour regarding the core concepts of Apache Beam and how to run a basic WordCount pipeline using Apache Apex as a backend. Specifically, we looked at the following topics:
- The technical vision of Beam—any language on any data processing engine
- The main parallel processing patterns of Beam—ParDo and GroupByKey
- The features of the Beam model that support unbounded data—windowing, watermarks, and triggers
- A basic Beam pipeline to count occurrences of words
- Launching a Beam pipeline using Apache Apex on a YARN cluster
For more details on both Beam and the Apex runner for Beam, visit the Beam website at Also, follow @ApacheBeam
on Twitter and join our user mailing list at [email protected] by following the instructions at