Workshop: Implement a streaming data pipeline with Google Cloud Dataflow

Speaker(s): Reza Rokni, David Sabater Dinter & Wei Hsia

Join us for an exciting workshop to see Google Cloud Dataflow applied to a real life application with a retail demo!

The first part of the workshop will provide an overview of Google Cloud Dataflow (Google’s fully managed scalable Apache Beam runner) followed by an in-depth simulation of a retail application that will showcase the powerful features of Dataflow.

The second part of the workshop will consist of hands-on labs where participants will interact with a real Google Cloud project and implement a Google Cloud Dataflow pipeline. You will build a batch Extract-Transform-Load pipeline in Apache Beam from scratch, which takes data from Google Cloud Storage and writes it to Google BigQuery. You will also run and build the pipeline on Google Cloud Dataflow. The pipeline will introduce the concepts of ParDo, Beam Schemas, Pcollections, and IO transforms in Beam using a weblog scenario.

Prerequisites

  • Level: Beginner/Intermediate.
  • Labs will be written in Java, attendees should have rudimentary knowledge of Java and/or other similar languages, building tools like Maven and basic knowledge of cloud platforms.
  • We will be using Qwiklabs for running the labs. Please sign up at https://ce.qwiklabs.com .