Start of main content

Spark magic: How high-level pipelines become distributed hardcore

Day 2


Spark is the most popular tool for building data pipelines. Every data engineer knows Spark, blah-blah-blah… OK, but Spark is just a distributed Java Streams, right? But how does it work then? Oh, it turns out you can't just call "flatMap" or "groupBy" to a remote machine. Codegen! Interested? Come and find more!

  • #bigdata
  • #codegen
  • #kotlin


Invited experts