O'Reilly logo
live online training icon Live Online training

First steps in data analysis

Revealing the logic and strengths of R for working with data

Rick Scavetta

Are you interested in data science but intimidated by R? Maybe you want to start working with data but have no idea where to begin. If that sounds familiar, this is the course for you.

Expert Rick Scavetta offers an introduction to programming and data science concepts, aimed at absolute beginners. You'll discover the power of R as you explore the language's three core strengths: data manipulation, statistics, and data visualization. Join in to learn how to use R to make your data analysis steps efficient, transparent, and reproducible—the hallmarks of all good scientific practice.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • Base R and tidyverse syntax
  • How to use basic functions in the tidyverse to process raw data for typical data analysis questions
  • The most common data structures in R (classes and types) and how they relate to each other
  • How to use logical expressions and indexing to ask specific questions of data
  • Common pitfalls with vectorization and indexing

And you’ll be able to:

  • Complete a basic data analysis workflow
  • Calculate groupwise descriptive statistics
  • Define basic linear models and calculate ANOVAs
  • Draw appropriate and typical visualizations of bivariate data
  • Apply the case study on a different dataset to build on the examples

This training course is for you because...

  • You have a dataset to analyze but have only used GUI-based software so far.
  • You're interested in learning data science but are intimidated by R and have no idea where to start.
  • You want to improve your skills to better your future career prospects.
  • You're a business professional who wants insight into the work of your data science team.

Prerequisites

  • Basic knowledge of data analysis questions and scenarios (e.g., Given a dataset, what questions would you ask, as either the generator or recipient of the data?)
  • An RStudio account (You'll be provided RStudio Cloud projects during the course.)

Recommended follow-up:

About your instructor

  • Rick Scavetta has worked as an independent data science trainer since 2012. Operating as Scavetta Academy, Rick has a close and recurring presence at primary research institutes all over Germany, including many Max Planck Institutes and Excellence Clusters, in fields as varied as primatology, earth sciences, marine biology, molecular genetics, and behavioral psychology.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (20 minutes)

  • Group discussion: Common questions and tasks in data analysis
  • Lecture: What is R?
  • Q&A

R fundamentals (20 minutes)

  • Lecture: Basic R syntax; navigating RStudio
  • Group discussion: Approaching our first data set
  • Hands-on exercise: Apply knowledge in R
  • Q&A
  • Break (5 minutes)

Case study I (60 minutes)

  • Lecture: Descriptive and inferential statistics; data visualization
  • Hands-on exercise: Apply the class’s solutions
  • Q&A
  • Break (5 minutes)

Case study II (60 minutes)

  • Group discussion: The second dataset; analytical questions
  • Hands-on exercise: Apply previous knowledge to the second dataset
  • Lecture: Further analysis using the second dataset
  • Q&A

Wrap-up and Q&A (10 minutes)