O'Reilly logo
live online training icon Live Online training

Security for machine learning

Protecting your data and models from harm

Katharine Jarmul

As we deploy more models into live settings and train our models online, there's a growing need to secure our data preparation code and the models they produce. Protecting our models against malicious input, poisoning attacks, and attempts to extract data is an essential part of deploying machine learning today.

Join expert Katharine Jarmul for a hands-on, in-depth exploration of security best practices for machine learning. You’ll get an introduction to important concepts for deploying and securing your machine learning models as well as practical tools to use to test robustness of your models. Along the way, Katharine walks you through approaches for securing models at the source, including obfuscation and anonymization strategies. Don't miss this chance to explore key topics, tools, and research related to security best practices for machine learning systems and learn how to implement them in your current workflows.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • The vulnerabilities of current machine learning training and deployment practices
  • How to safeguard your models against attacks
  • Methods to increase data security and privacy in your ML system

And you’ll be able to:

  • Utilize open source Python or shell-based tools for testing model robustness
  • Determine how to integrate privacy and security into your data workflows
  • Evaluate potential data security issues in your current data extraction and management

This training course is for you because...

  • You’re a data scientist or data engineer with at least one year of experience, and you need to secure your machine learning models and deployments and create a more privacy-aware approach for your data science workflows.

Prerequisites

  • An intermediate knowledge of Python
  • Experience working on machine learning tools in Python

Recommended follow-up:

About your instructor

  • Katharine Jarmul is a data analyst based in Berlin, Germany. She has worked with Python wrangling data since 2008 for both small and large companies. Automated data workflows, Natural Language Processing and data tests are her passions. She is co-author of Data Wrangling with Python and has authored several O'Reilly video courses focused on data analysis with Python

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Machine learning and security (10 minutes)

  • Lecture: Real-world machine learning security problems; how increased machine learning use has affected security best practices
  • Hands-on exercise: Discuss current practices you follow for securing your data science pipelines and models

Adversarial ML: Deep learning (30 minutes)

  • Lecture: Open source tools for evaluating a series of attacks on machine learning models, focusing on adversarial learning and poisoning attacks
  • Hands-on exercise: Create adversarial computer vision examples for a deep learning model in Jupyter notebooks on JupyterLab

Adversarial ML: Poisoning (30 minutes)

  • Lecture: Open source tools for evaluating a series of attacks on machine learning models, focusing on adversarial learning and poisoning attacks
  • Hands-on exercise: Implement a poisoning attack on a spam detection model in Jupyter notebooks on JupyterHub and as an open source repository
  • Break (5 minutes)

Protecting your data (10 minutes)

  • Lecture: Real-world data security issues relating to machine learning
  • Hands-on exercise: Discuss best practices for information and data security

Data and model extraction (30 minutes)

  • Lecture: Ways adversaries can extract data and model design from model APIs
  • Hands-on exercise: Experiment using example model extraction attacks against a classifier
  • Break (5 minutes)

Model defenses (40 minutes)

  • Lecture: Research on and an application of the latest adversarial and extraction defenses; building a model from sensitive data using the latest approaches in adversarial training and data protection
  • Hands-on exercise: Build a more secure model in a Jupyter notebook using input validation, feature squeezing, or adversarial regularization

Wrap-up and Q&A (20 minutes)

  • Lecture: Future research and new pursuits in data protection