O'Reilly logo
live online training icon Live Online training

Inside unsupervised learning: Anomaly detection using dimensionality reduction

Build systems to detect rare events such as fraud, cyberattacks, and more

Ankur Patel

Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to general artificial intelligence. Conventional supervised learning cannot be applied to unlabeled data—which comprises the majority of the world's data. In these cases, unsupervised learning can help discover meaningful patterns buried deep in unlabeled datasets, patterns that otherwise would be near impossible for humans to uncover.

Join Ankur Patel to explore one of the core concepts in unsupervised learning: dimensionality reduction. Dimensionality reduction serves two main purposes. First, it reduces the computational complexity of working with very large datasets. Second, it removes irrelevant information in a dataset, surfacing the information that matters most. In just 90 minutes, you'll learn how to use dimensionality reduction algorithms to build an anomaly detection system to detect credit card fraud without using any labels—knowledge you'll be able to apply to create your own anomaly detection systems for fraud, crime, or other adverse events.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • How to reduce the computational complexity of working with very large datasets
  • How to remove irrelevant information in datasets and surface the most salient information
  • How to use dimensionality reduction to perform anomaly detection

And you’ll be able to:

  • Perform linear and nonlinear dimensionality reduction
  • Build a credit card fraud detection system
  • Leverage your knowledge to build other types of anomaly detection systems

This training course is for you because...

  • You're a data scientist or engineer who wants to work with unlabeled data.
  • You want to perform anomaly detection to solve a business use case.

Prerequisites

  • A working knowledge of Python
  • A basic understanding of machine learning

Recommended preparation: - Read "Dimensionality Reduction" and "Anomaly Detection" (chapters 3 and 4 in Hands-On Unsupervised Learning Using Python)

Recommended follow-up:

About your instructor

  • Ankur A. Patel is the Vice President of Data Science at 7Park Data, a Vista Equity Partners portfolio company. At 7Park Data, Ankur and his data science team use alternative data to build data products for hedge funds and corporations and develop machine learning as a service (MLaaS) for enterprise clients. MLaaS includes natural language processing (NLP), anomaly detection, clustering, and time series prediction. Prior to 7Park Data, Ankur led data science efforts in New York City for Israeli artificial intelligence firm ThetaRay, one of the world's pioneers in applied unsupervised learning.

    Ankur began his career as an analyst at J.P. Morgan, and then became the lead emerging markets sovereign credit trader for Bridgewater Associates, the world's largest global macro hedge fund, and later founded and managed R-Squared Macro, a machine learning-based hedge fund, for five years. A graduate of the Woodrow Wilson School at Princeton University, Ankur is the recipient of the Lieutenant John A. Larkin Memorial Prize.

    He currently resides in Tribeca in New York City but travels extensively internationally.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to unsupervised learning (15 minutes)

  • Lecture: How unsupervised learning fits into the machine learning ecosystem; common problems in machine learning—insufficient labeled data, the curse of dimensionality, and outliers

Motivation for dimensionality reduction (15 minutes)

  • Lecture and hands-on exercises: Reduce computational complexity of large data; remove irrelevant information and surface salient information; perform anomaly detection; perform clustering; an introduction to the credit card fraud detection problem
  • Q&A (5 minutes)
  • Break (5 minutes)

Data preparation (10 minutes)

  • Lecture and hands-on exercises: Explore data in a Jupyter notebook; prepare the credit card dataset

Linear dimensionality reduction (15 minutes)

  • Lecture and hands-on exercises: The evaluation function; apply PCA and evaluate results; apply random projection and evaluate results

Nonlinear dimensionality reduction (15 minutes)

  • Lecture and hands-on exercises: Apply dictionary learning and evaluate results; apply ICA and evaluate results

Wrap-up and Q&A (10 minutes)