O'Reilly logo
live online training icon Live Online training

Inside Unsupervised Learning: Feature Extraction using Autoencoders and Semi-Supervised Learning

Explore automatic feature engineering using autoencoders and build semi-supervised solutions

Ankur Patel

Many industry experts consider unsupervised learning the next frontier in artificial intelligence, one that may hold the key to general artificial intelligence. Since the majority of the world's data is unlabeled, conventional supervised learning cannot be applied; unsupervised learning is necessary. Unsupervised learning can be applied to unlabeled datasets to discover meaningful patterns buried deep in the data, patterns that otherwise would be near impossible for humans to uncover.

In this 90-minute course, O’Reilly author Ankur Patel will explore one of the core concepts in unsupervised learning, autoencoders, and introduce semi-supervised learning. Autoencoders are a shallow neural network that learn representations of the original input data and output the newly learned representations. In other words, autoencoders perform automatic feature engineering, limiting the need for manual feature engineering and accelerating the build of machine learning systems. Autoencoders are also a means to leverage information in a partially labeled dataset. With autoencoders, we are able to turn unsupervised machine learning problems into semi-supervised ones.

In this course, we build unsupervised, supervised, and semi-supervised (using autoencoders) credit card fraud detection systems. First, we will employ a pure unsupervised approach, without the use of any labels. Next, we will employ a supervised approach on a partially labeled dataset. Finally, we will apply autoencoders to the partially labeled dataset (an unsupervised learning technique) and combine this with a supervised approach, building a semi-supervised solution. To conclude, we will compare and contrast the results of all three approaches.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • How to work with TensorFlow and Keras
  • Why neural networks are so powerful
  • How to learn representations using autoencoders
  • How to turn unsupervised learning problems to semi-supervised ones

And you’ll be able to:

  • Build shallow neural networks (e.g., autoencoders)
  • Apply autoencoders to a partially labeled dataset and feed newly learned representations into a supervised model, developing a semi-supervised solution

This training course is for you because...

  • You are a data scientist or engineer and want to work with unlabeled data
  • You want to perform semi-supervised learning to solve a business use case

Prerequisites

  • Working knowledge of Python
  • Understanding of Machine Learning

Recommended preparation:

Recommended follow-up:

About your instructor

  • Ankur A. Patel is the Vice President of Data Science at 7Park Data, a Vista Equity Partners portfolio company. At 7Park Data, Ankur and his data science team use alternative data to build data products for hedge funds and corporations and develop machine learning as a service (MLaaS) for enterprise clients. MLaaS includes natural language processing (NLP), anomaly detection, clustering, and time series prediction. Prior to 7Park Data, Ankur led data science efforts in New York City for Israeli artificial intelligence firm ThetaRay, one of the world's pioneers in applied unsupervised learning.

    Ankur began his career as an analyst at J.P. Morgan, and then became the lead emerging markets sovereign credit trader for Bridgewater Associates, the world's largest global macro hedge fund, and later founded and managed R-Squared Macro, a machine learning-based hedge fund, for five years. A graduate of the Woodrow Wilson School at Princeton University, Ankur is the recipient of the Lieutenant John A. Larkin Memorial Prize.

    He currently resides in Tribeca in New York City but travels extensively internationally.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to Unsupervised Learning - 10 minutes

  • How unsupervised learning fits into the machine learning ecosystem
  • Common problems in machine learning
  • Finding patterns without use of labels
  • Leveraging partially labeled datasets to build good machine learning solutions

Motivation for Representation Learning - 10 minutes

  • Why neural networks are powerful
  • Why automatic feature engineering is important
  • How representation learning improves machine learning performance

Motivation for Semi-Supervised Learning - 10 minutes

  • How supervised and unsupervised learning complement each other
  • How to capture information embedded in a partially labeled dataset and use the embedded information to improve machine learning performance
  • Q&A - 5 minutes
  • Break - 5 minutes

Data Preparation - 10 minutes

  • Explore data in Jupyter notebook
  • Prepare the credit card dataset

Autoencoders - 15 minutes

  • Introduce autoencoders
  • Train autoencoders

Semi-supervised Learning - 15 minutes

  • Build unsupervised learning fraud detection system
  • Build supervised learning fraud detection system
  • Build semi-supervised learning fraud detection system
  • Compare and contrast results

Conclusion / Q&A - 10 minutes