O'Reilly logo
live online training icon Live Online training

Python-Powered Spreadsheets Beyond the Basics

enter image description here

Next Steps in Reading, Writing, and Formatting Data

Jason Graham

Excel is ubiquitous in the business world, and whether we admit it or not, we all have spreadsheets that are “critical” to the organization. When it comes to analyzing and reporting on data, it’s the tool of choice for a majority of business and data professionals. We use it for a range of projects--from personal budgets to mini-databases to complex data analysis tools.

But Excel, by itself, doesn’t solve every problem, like weekly reports that take too long to build, format, and check for errors. Perhaps you’ve tried templates, complex formulas, even VBA. None of those options provide the ideal solution: Quick, dynamic, automated, nearly error-free reports that can be built in a timely fashion. That’s where Python comes in.

This course goes beyond the basics of reading, writing, and formatting spreadsheets using Python, Pandas, and XLSX Writer. In today’s business world, source data doesn’t always come from spreadsheets, so we will cover reading data from various outside sources like text files, PDFs, and web pages. We will dive into cleaning, editing, and formatting bad data. Then we will explore more charting and graphing capabilities in Excel, all of which will be powered by Python and Pandas. Finally, we will review using Jupyter Notebooks with Plotly as an alternative to Excel reports.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • Reading data from text files, PDFs, HTML tables, and web APIs
  • Intermediate data cleaning and editing in Pandas
  • Intermediate spreadsheet formatting with Python
  • Intermediate spreadsheet graphs and charts with Python
  • Using Jupyter Notebooks and Plotly as an alternative to Excel

And you’ll be able to:

  • Read data from a variety of external sources
  • Clean and edit complex data and turn it into usable information
  • Build more complex graphs and charts

This training course is for you because...

  • You are a business professional who relies on spreadsheets for reporting and analytics
  • You need to convert data from text files, PDFs, or web pages to spreadsheets
  • You have spreadsheets that are more “formula” than data
  • You need the power of formulas and VBA without the complexity
  • You’re interested in automating reports but don’t know how
  • You combine data from multiple sources for your spreadsheets
  • You want to reduce the amount of time you spend creating and editing spreadsheets

Prerequisites

  • Basic familiarity with Python and Pandas
  • Basic familiarity with reading and writing spreadsheets using Pandas and XLSX Writer

Recommended preparation:

Required Setup:

  • Materials will be delivered using O’Reilly’s JupyterHub service, so no advance setup is needed

Recommended follow-up:

About your instructor

  • Jason Graham is Software Engineer at GoMoto, Inc. He's been using Excel for more than a decade to perform Data Analysis and Information Management and builds web-based Data Analysis tools for end-users with Python. When he isn't building analysis tools, he trains others in Data Analysis and Python. Jason is an eight-year veteran of the U.S. Marines, 12-year veteran of the U.S. Army, and a graduate of the Marines' Formal School Instructor Course. Prior to coming to GoMoto, he was a Senior System Engineer at Cerner, and four years teaching Civil Information Management in the Army Reserves.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Course Overview (10 minutes)

  • Introductions and welcome

Reading Old Data: Flat Files (45 minutes)

  • Reading data from text files and PDFs
  • Handling non-standard text files and PDFs
  • Writing more than just DataFrames to spreadsheets
  • Bar graphs
  • Exercise: Building a bar graph
  • Q&A
  • Break (5 minutes)

Getting Online: Reading Data from the Web (45 minutes)

  • Converting HTML tables to Pandas DataFrames
  • Getting data from web APIs
  • Date, time, and timezone cleaning and formatting
  • Line graphs
  • Exercise: Building a line graph
  • Q&A
  • Break (5 minutes)

Into the Cloud: An introduction to Databases in Python and Pandas (45 minutes)

  • SQL: Retrieving data from relational databases
  • NoSQL: Retrieving data from non-relational databases
  • Writing multiple tables to one sheet
  • Multi-line graphs
  • Labeling your graphs
  • Exercise: Building a multi-line graph
  • Q&A
  • Break (5 minutes)

Beyond Excel: Interactive Graphs and Charts in Jupyter Notebooks (70 minutes)

  • Jupyter Notebooks as a Presentation Device
  • Customizing your notebook
  • Formatting tables in Jupyter
  • Break (5 minutes)
  • Interactive charts and graphs with Python and Plotly

Final Q&A (10 minutes)