O'Reilly logo
live online training icon Live Online training

Data Analysis Paradigms in the Tidyverse

Case-studies in identifying and implementing data analysis paradigms.

Rick Scavetta

This course is to understand that there are recurring paradigms in data analysis that the tidyverse ecosystem is well-suited to address. After taking this course, participants will be able to identify these recurring patterns and trace the path from raw data to results using real-world examples and a small collection of useful functions. Participants will understand the close link between data structure and these functions, and how they facilitate efficient data analysis that can apply directly to their own data.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • Essential tidyverse functions and data analysis paradigms
  • The purpose of different packages and how they work in concert

And you’ll be able to:

  • Identify recurring themes in data analysis workflows, moving from raw data to completed results in an efficient manner
  • Dig deeper into the tidyverse packages not covered in the course and understand how they all fit together to build an ecosystem

This training course is for you because...

  • You have existing R scripts that you need to modify.
  • You have been using only base package and now see the need to expand your knowledge base to use the tidyverse.

Prerequisites

  • Basic knowledge of RStudio
  • Fundamentals of base package R -- functions, objects, indexing, and logical expressions
  • Most common data structures in R: vectors, lists and data frames

Materials or downloads required in advance of the course:

  • An RStudio account is needed. RStudio Cloud projects will be provided. At the moment this service is free and undergoing alpha testing. Participants will log into a web-based RStudio Cloud instance so that no additional software needs to be installed (Link to be provided).
  • Datasets used will be built-in datasets available in R or provided via a GitHub repository.
  • Quiz questions implemented with the learnr package will be used as supplementary material and can be hosted on my shiny server (with a public access).

About your instructor

  • Rick Scavetta has worked as an independent data science trainer since 2012. Operating as Scavetta Academy, Rick has a close and recurring presence at primary research institutes all over Germany, including many Max Planck Institutes and Excellence Clusters, in fields as varied as primatology, earth sciences, marine biology, molecular genetics, and behavioral psychology.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Session 0 - Intro (20 minutes)

  • Discussion: Take up pre-work exercises
  • Presentation: Limitation of poor data structure and base R - the impetus for the tidyverse
  • Q & A

Session 1 - tidyr, tibble & readr (20 minutes)

  • Presentation: The structure of messy and tidy data
  • Discussion: Identifying problems with data structure
  • Presentation: Tidyverse solutions
  • Exercise: Identify various paradigms in data analysis workflows
  • Q & A
  • {break, 5 mins}

Session 2 - dplyr, stringr & (bare minimum) ggplot2 (60 minutes)

  • Presentation: The five verbs, one adjective and punctuation of dplyr
  • Discussion: Building functional sequences with dplyr grammar
  • Exercise: Implementing dplyr
  • Presentation: Select helper functions and variants of summarise and mutate
  • Exercise: Using dplyr helper functions and special variants
  • Presentation: Pattern matching with stringr and plotting as part of the tidyverse
  • Exercise: Cleaning up data and plotting
  • Q & A
  • {break, 5 mins}

Session 3 - purrr & forcats (40 minutes)

  • Presentation: Introduction to reiteration
  • Discussion: Use case scenarios for map vs walk
  • Exercise: Coding challenge exercise
  • Discussion: Results and alternative approaches
  • Q & A

Wrap-up (10 mins)