O'Reilly logo
live online training icon Live Online training

Real-time data foundations: Flink

Getting started with Flink

Ted Malaska

Ted Malaska leads a deep dive into Apache Flink for streaming use cases. You'll discover why Flink's architecture makes it special and how Flink directly compares to Apache Spark as you cover core concepts like windowing, state management, configurations, deployment, and performance.

Special note: This is the third course in a four-part series focused on building a foundation in near-real-time processing of IoT data. Although these courses are designed to be taken in any order, we suggest you take this course after Real-time data foundations: Spark for best results.

  1. Real Time Data Foundations: Kafka
  2. Real Time Data Foundations: Spark
  3. Real Time Data Foundations: Flink
  4. Real Time Data Foundations: Time Series Architectures

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • The architecture design behind Flink Streaming
  • How to use Flink as a streaming engine

And you’ll be able to:

  • Work with core APIs
  • Use Spark for deployment, state management, and monitoring

This training course is for you because...

  • You're a data engineer who wants to bridge the gap from batch to streaming.
  • You're a product manager who is trying to figure out what use cases and functionality are provided by stream processing.

Prerequisites

  • A basic understanding of working with data
  • Familiarity with Java or Scala (useful but not required)

Materials or downloads needed in advance:

  • A machine with Docker and the IDE of your choice installed
  • A GitHub account

Recommended preparation:

About your instructor

  • Ted Malaska is the director of enterprise architecture at Capital One. Previously, he was on the Battle.net team at Blizzard Entertainment, he was also a principal solutions architect at Cloudera, where he helped clients succeed with Hadoop and the Hadoop ecosystem, and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is the coauthor of Hadoop Application Architectures, a frequent conference speaker, and a blogger on data architectures.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

  • Apache Flink architecture (component rundown) (25 minutes)
  • Window and state management (15 minutes)
  • Flink use cases (10 minutes)
  • Break (10 minutes)
  • Deployment (10 minutes)
  • Failure and recovery (10 minutes)
  • Configuration (10 minutes)