O'Reilly logo
live online training icon Live Online training

Mastering pandas

Powered by Jupyter logo

Visualization, missing data, and pivoting

Matt Harrison

The pandas library allows you to perform data ingestion, exporting, transformation, and visualization with ease. As a result, it's very popular among data scientists, quants, Excel junkies, and Python developers.

Matt Harrison leads a deep dive into some advanced features of pandas, such as plotting, the integration with matplotlib, and filtering data. Using the Jupyter Notebook, you'll load data, inspect it, tweak it, visualize it, and do some analysis with only a few lines of code. By the end of this three-hour hands-on training, you’ll be able to use the split-apply-combine paradigm with GroupBy and pivot and be familiar with stacking and unstacking data.

Special Note: This course is paired with Getting started with pandas: Data ingesting, tweaking, and summarizing. Although these courses are designed to be taken in either order, we suggest you take that course first for best results.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • How to use Jupyter to interact with Python scripts
  • How pandas can make life easier for data scientists and programmers

And you’ll be able to:

  • Load, inspect, tweak, visualize, and analyze data with pandas
  • Use the split-apply-combine paradigm with GroupBy and pivot
  • Understand how to get help when you get stuck
  • Practice debugging doing analytics with pandas

This training course is for you because...

  • You're a data scientist with experience in R or SAS who wants to learn about pandas and the Python ecosystem.
  • You're a developer with programming experience in Python who wants to use some of the more advanced features in pandas.

Prerequisites

  • All of the coding exercises in the course will be hosted on JupyterHub, and we'll send the URL out at the start of class. Purely browser-based, no installations required.

  • Alternatively, if you would like to setup your machine locally, a machine with Anaconda (https://www.continuum.io/downloads) for Python 3.6 or above and the Jupyter Notebook (https://jupyter.org/install.html) installed.

Recommended preparation:

About your instructor

  • Matt runs MetaSnake, a Python and Data Science training and consulting company. He has over 15 years of experience using Python across a breadth of domains: Data Science, BI, Storage, Testing and Automation, Open Source Stack Management, and Search.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Plotting (25 minutes)

  • Lecture: The integrated plotting functionality in pandas; using it to visually inspect your data
  • Hands-on exercise: Use plotting to inspect your data

Filtering (25 minutes)

  • Lecture: Digging into your data (slicing and dicing as needed)
  • Hands-on exercise: Filter your data

Break (10 minutes)

Dealing with NaN (25 minutes)

  • Lecture: Examining and dealing with missing values
  • Hands-on exercise: Deal with NaN

Grouping (30 minutes)

  • Lecture: pandas advanced features for grouping; grouping your data by various features; aggregating and returning the results
  • Hands-on exercise: Group your data

Break (10 minutes)

Pivoting (30 minutes)

  • Lecture: Programmatically creating pivot tables; combining them with grouping to easily summarize your data
  • Hands-on exercise: Use pivoting

Stacking (25 minutes)

  • Lecture: Using stacking to enable easy plotting of multiple variables
  • Hands-on exercise: Use stacking