O'Reilly logo
live online training icon Live Online training

Inferential Statistics using R

Reveal the underlying concepts of inferential statistics from the ground up

Rick Scavetta

In this course, the big idea is to understand what inferential statistics is, how it operates and what it can tell us about our data. Key terms like the estimation, confidence intervals, and the p-value will be explored.

At the end of this course, participants will understand what the signal-to-noise ratio is and how it functions as a core concept to unite the diverse tests used in inferential statistics. You will have a better understanding of the statistical tests used on a regular basis and be able to better critique published results.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • The role of the Central Limit Theorem
  • What a p-value means and how to interpret it
  • Understand what influences the results of inferential statistics
  • See the common themes (such as the signal-to-noise ratio) that unite seemingly disparate tests
  • The key terms in inferential statistics (e.g. error, bias, power, p-values, confidence intervals, normal and t distributions)

And you’ll be able to:

  • Judge the credibility of reported results
  • Identify the common theme underlying all inferential statistics so that you have a better understanding for advancing your skills independently

This training course is for you because...

  • You encounter published reports using inferential statistics including p-values, and confidence intervals and are not clear what it means.
  • You don’t understand the importance of the Central Limit Theorem as the foundation of estimation and hypothesis testing.
  • You have to apply inferential statistics but are unclear as to how to interpret the results or what the various tests are actually doing.

Prerequisites

  • Basic knowledge of R and RStudio
  • An understanding of fundamental concepts in data collection and descriptive statistics:
    • Sampling
    • Randomization
    • Systematic vs Random error, bias
    • Measures for Location and spread

Materials or downloads required in advance of the course:

  • An RStudio account is needed. RStudio Cloud projects will be provided. At the moment this service is free and undergoing alpha testing. Participants will log into a web-based RStudio Cloud instance so that no additional software needs to be installed (Link to be provided).
  • Datasets used will be built-in datasets available in R or provided via a GitHub repository.

About your instructor

  • Rick Scavetta has worked as an independent data science trainer since 2012. Operating as Scavetta Academy, Rick has a close and recurring presence at primary research institutes all over Germany, including many Max Planck Institutes and Excellence Clusters, in fields as varied as primatology, earth sciences, marine biology, molecular genetics, and behavioral psychology.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Session 0 - Intro (20 minutes)

  • Exercise: Survey results using learnR modules
  • Discussion: Review of key terms from pre-work exercises
  • Exercise: Fundamentals of random sampling and descriptive statistics
  • Q & A

Session 1 - Theoretical Probability Distributions (30 minutes)

  • Presentation: Binomial and Normal distributions
  • Exercise: Exploring distributions and related functions
  • Presentation: Z- scores as signal:noise ratio
  • Exercise: Calculating z-scores
  • Exercise Using Q-Q plots to explore distributions
  • Q & A
  • {break, 5 mins}

Session 2 - Estimation (30 minutes)

  • Presentation: From the Normal distribution to the Central Limit Theorem (CLT)
  • Exercise: Simulating the CLT
  • Presentation: From the CLT to confidence intervals
  • Exercise: Calculating confidence intervals
  • Q & A

Session 3 - Hypothesis testing (45 minutes)

  • Presentation: The signal:noise ratio using the CLT and the t-distribution
  • Exercise: Calculating the signal:noise ratio from scratch
  • Presentation: What is a p-value?
  • Exercise: Calculating p-values
  • Presentation: Factors influencing the p-value
  • Q & A
  • {break, 5 mins}

Session 4 - Hypothesis testing in action: t-tests (35 minutes)

  • Presentation: Putting it all together: one-sample, two-sample and paired t-tests
  • Exercise: Calculating t-tests in R
  • Discussion: Understanding how to interpret results
  • Q & A

Wrap-up (10 mins)