O'Reilly logo
live online training icon Live Online training

Data Visualization with matplotlib and seaborn

Fundamental and advanced techniques

Bruno Gonçalves

In this tutorial we will discuss the way in which our eyes and visual cortex process colors and shapes and how we may use it to our advantage. Ideas and concepts will be presented in an intuitive and practical way while providing references for the more technical descriptions and explanations available in the relevant scientific literature.

Matplotlib is the workhorse of visualization in Python and underlies all other major Python visualization packages and it is particularly well integrated into the Jupyter ecosystem. Mastering it is a fundamental requirement to be proficient in Python data visualization.

Seaborn, on the other hand, is a more recent package that builds on top of matplotlib and simplifies it for some of the most common use cases, making it more productive. We will cover both tools through practical examples and highlight the main differences and advantages of each one.

My approach is to start by first explaining what makes a good visualization, how to choose the correct way to display the data, and then show how to implement them in practice.

What you'll learn-and how you can apply it

  • Fundamentals of human perception and data visualization
  • Effective use of matplotlib and seaborn
  • How to use seaborn and matplotlib together for best results

This training course is for you because...

  • You want to efficiently communicate your analysis results to decision makers
  • You want to develop attractive and useful graphs and charts
  • You want to correctly use data visualization techniques to better explore your datasets
  • You’re an academic or industry data scientist in charge of generating visualizations to stakeholders.

Prerequisites

  • Familiarity with Python

Course Set-up

  • Python 3.5+ with matplotlib, numpy, scipy and seaborn installed

Recommended Preparation

If you need to brush up on Python:

About your instructor

  • Bruno Gonçalves is currently a Senior Data Scientist working at the intersection of Data Science and Finance. Previously, he was a Data Science fellow at NYU's Center for Data Science while on leave from a tenured faculty position at Aix-Marseille Université. Since completing his PhD in the Physics of Complex Systems in 2008 he has been pursuing the use of Data Science and Machine Learning to study Human Behavior. Using large datasets from Twitter, Wikipedia, web access logs, and Yahoo! Meme he studied how we can observe both large scale and individual human behavior in an obtrusive and widespread manner. The main applications have been to the study of Computational Linguistics, Information Diffusion, Behavioral Change and Epidemic Spreading. In 2015 he was awarded the Complex Systems Society's 2015 Junior Scientific Award for "outstanding contributions in Complex Systems Science" and in 2018 is was named a Science Fellow of the Institute for Scientific Interchange in Turin, Italy.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Segment 1 - Human Perception (Length: 30 min)

  • Understanding Color Theory
  • Overview of Human Vision
  • Examples and Explanations of Optical Illusions

Break (10 min)

Segment 2 - Analytical Design (Length: 40 min)

  • Understand the fundamental principles of analytical design
  • Describe the fundamental tools of visualization
  • Explore the advantages and disadvantages of different chart types

Break (10 min)

Segment 3 - Matploltib (Length: 60 min)

  • Understand the fundamental components of a matplotlib plot
  • Explore the matplotlib API
  • Implement different chart types
  • Combine charts using subplots

Break (10 min)

Segment 3 - Seaborn (Length: 50 min)

  • Understand the structure of Seaborn
  • Understand the differences with matplotlib
  • Explore the seaborn API
  • Combine seaborn and Matplotlib