Workshop: Building big data pipelines for deep learning

Speaker(s): Jenny Lu

Real world machine learning systems are comprised of a small amount of actual ML code surrounded by a vast infrastructure - most of which revolves around manipulating and handling data.

In this hands-on workshop we will work through an end-to-end example of using the Python Beam SDK to build maintainable, reliable, and scalable production pipelines for deep learning.

We will begin by taking time series data from a warehouse like BigQuery and, through Beam, engineer features using common data science libraries such as NumPy and Pandas, then transform the data into TFRecords. We will train an RNN using these TFRecords and finally use the same Beam pipeline to perform inference.

By the end you will be able to kickstart using Beam for your own ML applications!