Creating art with Bigtable’s monitoring tools
By

Creating art with Bigtable’s monitoring tools

Google Cloud Bigtable’s key visualizer is a monitoring tool meant for petabyte-scale databases. I used Apache Beam to perform massive IO operations in specific patterns in order to generate some of the most recognizable paintings.

Read More
Dataflow Flex Templates
By

Dataflow Flex Templates

Legacy Dataflow templates allow you to stage your pipelines on Cloud Storage and run them from a variety of environments. This talk covers the new Flex Templates feature, in which staging and execution are separate steps. The separation provides additional flexibility to decide who can run jobs, where the jobs are run from, and job execution steps based on input and output parameters.

Read More
DBeam: exporting SQL tables into Avro records using Beam SDK
By

DBeam: exporting SQL tables into Avro records using Beam SDK

At Spotify we built and open-sourced DBeam: a connector to export SQL tables into Avro records using Beam SDK. This session will tell a bit of this piece of Spotify data engineering. How and why we built it using Beam SDK and Google Dataflow.https://github.com/spotify/dbeam

Read More
Event-driven Movie Magic
By

Event-driven Movie Magic

Learn how Luma Pictures uses Apache Beam to automate the creation of visual effects on films like Marvel’s Spider-Man: Far From Home. Pulling off effects at this scale requires hundreds of artists working across multiple time zones on sophisticated simulations of natural phenomenon and massive data sets. Beam reacts to events within our studio in real-time, acting as the conveyor belt between departments in our digital assembly-line.

Read More
Feature Powered by Apache Beam - Beyond Lambda
By

Feature Powered by Apache Beam - Beyond Lambda

To unify feature extraction and selection in online and offline, to speedup E2E iteration for model training, evaluation and serving, to support different types (streaming, runtime, batch) of features, etc. eBay leverages Apache Beam for their streaming feature SDK as a foundation to integrate with Kafka, Hadoop, Flink, Airflow and others in eBay.

Read More
Four Apache Technologies Combined for Fun and Profit
By

Four Apache Technologies Combined for Fun and Profit

When enriching one stream of data from another stream of data, one pretty easy way on GCP is to ingest both via Apache Beam into a zero-maintenance SQL database like BigQuery and doing a join there. However, since the cost for BigQuery is based on the number of bytes scanned and it’s a columnar oriented database, this is not the optimal solution. In some reference architectures BigQuery and BigTable, a wide-column NoSQL database allowing fast key-based lookups, are fed at the same time by Beam to provide data for different use cases.

Read More
From pipeline to execution: What happens when you run() your pipeline?
By

From pipeline to execution: What happens when you run() your pipeline?

Apache Beam is a powerful framework: unified batch and stream processing, support for multiple execution engines, as well as writing code in multiple languages. It can be hard to wrap your head around how all of this works. Fortunately, Beam’s architecture can be broken down into several components which are easy to understand. Let’s look at what happens when you run your Beam pipeline, how it gets translated, submitted, and executed.

Read More