O'Reilly logo
live online training icon Live Online training

Real-Time Streaming Analytics and Algorithms for AI Applications

Surfacing actionable insights from real-time streaming data and triggering actions

Arun Kejariwal
Karthik Ramasamy

Across diverse segments - for example, but not limited to, e-commerce, digital assistants, autonomous driving, real-time marketing, health care - in industry, there has been a shift in focus from big data to fast data, stemming from both the deluge of high-velocity data streams and the need for instant data-driven insights. It's critical to mine business insights from data streams in a robust and timely fashion. This in turn requires the knowhow of an optimal set of algorithms to power the application(s) at hand and, in addition, requires a highly available, reliable, and performant end-to-end stream processing system.

In this live training we shall walk the audience through the different types of analytics that are commonly supported in modern real-time streaming applications. In particular, we shall lead a deep dive into the underlying algorithms (also referred to as data sketches) that power the analytics. We shall discuss the trade-off between speed and accuracy. Further, we shall overview the state-of-the-art streaming systems and their deployment in production at internet scale. You'll discover the typical challenges in modern real-time big data platforms and learn how to address them. Along the way, we shall explain how advances in technology might impact the streaming applications of the future and speculate about future developments.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • Different types of streaming analytics you can leverage in your application(s)
  • Given a problem statement in a streaming context, which algorithms to use and why? Particularly, what are the trade-offs between speed and accuracy
  • The various facets of a stream processing pipeline. Which technologies are best suited in which context

And you’ll be able to:

  • Integrate streaming analytics in to your applications
  • Determine which streaming technologies - for processing, messaging and storage - are best suited to your needs

This training course is for you because...

  • You're an application developer who has been tasked with building a stream processing pipeline with rich support for analytics
  • You're an analyst who needs to mine inbound data streams to guide decision making.

Prerequisites

  • A basic understanding of working with data and awareness of the challenges of working with data
  • Familiarity with Java or Scala (useful but not required)
  • Familiarity with Hadoop (useful but not required)

About your instructor

  • Arun Kejariwal is a statistical learning principal at Machine Zone (MZ), where he leads a team of top-tier researchers and works on research and development of novel techniques for install and click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns. In addition, his team is building novel methods for bot detection, intrusion detection, and real-time anomaly detection. Previously, Arun worked at Twitter, where he developed and open-sourced techniques for anomaly detection and breakout detection. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high-performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

  • Karthik Ramasamy is the engineering manager and technical lead for Real Time Analytics at Twitter. He has two decades of experience working in parallel databases, big data infrastructure and networking. He cofounded Locomatix, a company that specializes in real timestreaming processing on Hadoop and Cassandra using SQL that was acquired by Twitter. Before Locomatix, he had a brief stint with Greenplum where he worked on parallel query scheduling. Greenplum was eventually acquired by EMC for more than $300M. Prior to Greenplum, Karthik was at Juniper Networks where he designed and delivered platforms, protocols, databases and high availability solutions for network routers that are widely deployed in the Internet. Before joining Juniper at University of Wisconsin, he worked extensively in parallel database systems, query processing, scale out technologies, storage engine and online analytical systems. Several of these research were spun as a company later acquired by Teradata. He is the author of several publications, patents and one of the best selling book “Network Routing: Algorithms, Protocols and Architectures.” He has a Ph.D. in Computer Science from UW Madison with a focus on databases.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Presentation: 40-45 mins

  • Discussion: We encourage the audience to ask questions on-the-fly
  • Q&A: 5-10 mins This is to give folks in the audience another opportunity to ask questions on the subject covered thus far.
  • Break: 5 mins

Introduction to streaming (30 minutes)

  • Lecture: How streaming related to today’s AI Applications?
  • Lecture: Landscape of Platforms for Streaming Data
  • Discussion
  • Q&A

Messaging (45 minutes)

  • Lecture: Properties
  • Lecture: Apache Pulsar
  • Lecture: Deployment in the real world
  • Lecture: Serverless and Streaming
  • Pulsar Functions
  • Deployment in the real world
  • Discussion
  • Q&A

Processing (45 minutes)

  • Lecture: Properties
  • Lecture: Heron
  • Lecture: Deployment in the real world
  • Dhalion
  • Discussion
  • Q&A

Unification (45 minutes)

  • Lecture: Messaging
  • Lecture: Processing
  • Lecture: Storage
  • Discussion
  • Q&A

Algorithms (45 minutes)

  • Lecture: Properties
  • Lecture: Family of Data Sketches
  • Lecture: Applications Domains for Real-time AI
  • Real-time Business Intelligence
  • Real-time Trending
  • Real-time Monitoring
  • Real-time Marketing
  • Real-time Bidding
  • Real-time Security
  • IoT
  • Discussion
  • Q&A