O'Reilly logo
live online training icon Live Online training

Data analysis paradigms in the R-tidyverse

Understanding and exploiting the Tidyverse ecosystem for data science

Rick Scavetta

The relationship between tidyverse packages and base R packages can be complicated. And it’s sometimes difficult to see how tidyverse packages relate to each other—let alone use them in concert.

Join expert Rick Scavetta to explore the recurring paradigms in data analysis that the tidyverse ecosystem is well suited to address. Over three hours, you’ll learn to identify these small recurring patterns as you trace the path from raw data to results using real-world examples and a small collection of useful functions. By the time you’re through, you’ll understand the close link between data structure and these functions as well as how they facilitate efficient data analysis that you can apply directly to your own projects.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • Tidyverse functions and data analysis paradigms
  • The purpose of different packages and how they work in concert

And you’ll be able to:

  • Identify recurring themes in data analysis workflows, moving from raw data to completed results in an efficient manner
  • Dig deeper into tidyverse packages not covered in the course and understand how they all fit together to build an ecosystem
  • Determine what each package is useful for and implement the core functions within each

This training course is for you because...

  • You are familiar with base R but lack an overview of the Tidyverse.
  • You have existing R scripts that you need to modify.
  • You need a more efficient and human-readable way of dealing with complex data problems.

Prerequisites

  • A basic knowledge of RStudio
  • A working knowledge of base R package fundamentals (functions, objects, indexing, and logical expressions)
  • Familiarity with the most common data structures in R (vectors, lists, and data frames)
  • An RStudio account (You’ll be provided a web-based RStudio cloud instance for the course)

Recommended preparation:

About your instructor

  • Rick Scavetta has worked as an independent data science trainer since 2012. Operating as Scavetta Academy, Rick has a close and recurring presence at primary research institutes all over Germany, including many Max Planck Institutes and Excellence Clusters, in fields as varied as primatology, earth sciences, marine biology, molecular genetics, and behavioral psychology.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (20 minutes)

  • Lecture: Poor data structure and base R—the impetus for the tidyverse
  • Group discussion: Prework exercises
  • Q&A

tidyr, tibble, and readr (20 minutes)

  • Lecture: The structure of messy and tidy data; tidyverse solutions
  • Hands-on exercises: Identify various paradigms in data analysis workflows
  • Group discussion: Problems with data structure
  • Break (10 minutes)

dplyr, stringr, and (bare minimum) ggplot2 (60 minutes)

  • Lecture: Five verbs, one adjective, and punctuation of dplyr; helper functions and variants of summarize and mutate; pattern matching with stringr and plotting as part of the tidyverse
  • Hands-on exercises: Implement dplyr; use dplyr helper functions and special variants; clean up data and plotting
  • Group discussion: Building functional sequences with dplyr grammar
  • Q&A
  • Break (10 minutes)

purrr and forcats (45 minutes)

  • Lecture: Introduction to reiteration
  • Hands-on exercise: Complete a coding challenge
  • Group discussion: Use case scenarios for map versus walk; results and alternative approaches

Wrap-up and Q&A (15 minutes)