O'Reilly logo
live online training icon Live Online training

Programming with Data: Advanced Python and Pandas

Daniel Gerlanc

Do you use Pandas in your daily workflow but wonder if the advanced features of the library could accelerate your analyses? In this training, you will learn how to solve complex data manipulation problems using Python and advanced features of Pandas.

In this training, we will principally study two classes of problems, then learn how to solve them with Pandas. First, we review data manipulations that would be challenging to achieve without a SQL execution engine or a significant investment in custom tooling. Second, we review problems that are difficult to solve with SQL. These include merging and joining datasets with appropriate handling of missingness, reshaping data from wide to long format, and manipulating time series.

Having completed this workshop, you will be ready to use advanced Pandas functionality in your own analyses.

What you'll learn-and how you can apply it

  • Perform advanced merges including combining daily data with irregular frequency data, e.g. one-time events
  • Transforming data between “wide” and “long” formats and generating pivot tables
  • Filter, upsample, downsample, and compute on time series data

This training course is for you because...

  • You have a solid understanding of Python programming and experience using series and data frames in Pandas
  • You want to apply advanced Pandas data manipulation techniques to your own analyses
  • You are an experienced Pandas user but want to refresh your knowledge and keep up-to-date on new features added in recent Pandas versions

Prerequisites

  • Intermediate level understanding of Pandas data structures, equivalent to Programming with Data: Python and Pandas (live online training course with Daniel Gerlanc)
  • Intermediate-level programming ability in Python. Attendees should know the difference between a dict, list, and tuple. Familiarity with control-flow (if/else/for/while) and error handling (try/catch) are required.
  • No statistics background is required.

Course Set-up:

  • Step-by-step instructions for setting up a working Python environment with using Anaconda are available here. You will need a working environment to complete the exercises in Jupyter notebook. Alternatively, you may view the notebooks here.

Recommended Preparation:

Recommended Follow-up:

About your instructor

  • Daniel Gerlanc is the Founder and President of EnPlus Advisors, a consultancy specializing in data science and custom software development. He started EnPlus in 2011 after working as a hedge fund quant for 5 years. At EnPlus, he focuses on projects that require expertise in both data analysis and software engineering. He has coauthored several open source R packages, published in peer-reviewed journals, and been an invited speaker at conferences including ODSC and PGConf. He is a graduate of Williams College.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Segment 1: Merge, Join, & Combine (45 min)

  • Training Overview (5 min)
  • Introduction to Types of Joins (5 min)
  • Using merge for General Purpose Joins (10 min)
  • Concatenation (10 min)
  • Advanced Merging (15 min)

Segment 1 Exercises (15 min)

  • Instructor demonstrates solving exercises (10 min)
  • Break (length: 5 min)

Segment 2: Advanced Merging & Reshaping (45 min)

  • Advanced Merging: Grouped and Ordered Data (15 min)
  • Reshaping Overview (5 min)
  • Wide to Long and Long to Wide Reshaping (7.5 min)
  • Convenience methods for reshaping (7.5 min)
  • Pivot Tables (10 min)

Segment 2 Exercises (15 min)

  • Instructor demonstrates solving exercises (10 min)
  • Break (length: 10 min)

Segment 3: Time Series (35 min)

  • Creating Time Series (5 min)
  • Selecting from Time Series (5 min)
  • Lead/Lag Operations (5 min)
  • Resampling (5 min)
  • Filling in Missing Data (5 min)
  • Aligning Time Series (5 min)
  • Rolling Calculations (5 min)

Segment 3 Exercises (15 min)

  • Instructor demonstrates solving exercises (10 min)