Workshop: Build a Unified Batch and Stream Processing Pipeline with Apache Beam on AWS

Speaker(s): Steffen Hausmann, Karthi Thyagarajan & Rajan Pattel

In this workshop, we explore an end to end example that combines batch and streaming aspects in one uniform Beam pipeline. We start to analyze incoming taxi trip events in near real time with an Apache Beam pipeline. We then show how to archive the trip data to Amazon S3 for long term storage. We subsequently explain how to read the historic data from S3 and backfill new metrics by executing the same Beam pipeline in a batch fashion. Along the way, you also learn how you can deploy and execute the Beam pipeline with Amazon Kinesis Data Analytics in a fully managed environment.

So you will not only learn how you can leverage Beam’s expressive programming model to unify batch and streaming you will also learn how AWS can help you to effectively build and operate Beam based streaming architectures with low operational overhead.

Running the workshop on your own

If you were not able to join this live workshop, you can still run it on your own. The instructions and all materials are available at https://streaming-analytics.workshop.aws/beam-on-kda/

Slack channel

If you have any questions about this workshop or need assistance please join the #beam-summit-aws channel in the ASF slack workspace.