O'Reilly logo
live online training icon Live Online training

Beginning R Programming

Jared Lander

This class is designed to get you started using R, from a tidyverse perspective. We start at the very beginning with assigning variables and storing data in a spreadsheet-like manner. Then we read in data from CSVs and Excel spreadsheets. With that foundation we use the dplyr package for data manipulation, including grouped summaries similar to pivot tables. From there we get a bit more advanced and use purrr to perform computations on lists. At the end we learn about plotting with the legendary ggplot2 package.

What you'll learn-and how you can apply it

  • Assigning Variables
  • Reading Data
    • CSV
    • Excel
    • Databases
    • json
  • Writing Functions
  • Working with Strings
  • Data Manipulation with dplyr
    • select
    • filter
    • muttate
    • group_by
    • summarize
    • joins
  • Iterating Over Lists with purrr
  • Transforming Data with tidyr
  • Plotting with ggplot2
    • Scatterplots
    • Histograms
    • Violin Plots
    • Faceting

This training course is for you because...

  • Beginner Data Scientists
  • Recovering Excel Jockeys
  • People New to R

Prerequisites

  • Some prior experience with R but not much
  • Some familiarity with basic programming concepts
  • Variables
  • Functions

Materials, downloads, or Supplemental Content needed in advance:

  • R
  • RStudio
  • The following R Packages
    • dplyr
    • tidyr
    • purrr
    • readr
    • readxl
    • odbc
    • RSQLite
    • ggplot2
  • Sample Diamonds Database from https://data.world/landeranalytics/diamonds
  • The files at https://data.world/landeranalytics/rforeveryone

Resources:

About your instructor

  • Jared P. Lander is the Chief Data Scientist of Lander Analytics, a data science and artificial intelligence consulting and training firm based in New York City; the organizer of the New York Open Statistical Programming Meetup—the world’s largest R meetup—–and the New York R Conference); author of R for Everyone and an adjunct professor at Columbia University. With an M.A. from Columbia University in statistics and a B.S. from Muhlenberg College in mathematics, he has experience in both academic research and industry. Very active in the data community, Jared is a frequent speaker at conferences, universities and meetups around the world. His writings on statistics can be found at jaredlander.com and his work has been featured in publications such as Forbes and the Wall Street Journal.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

  • Variables
  • Putting tabular data in tibbles
  • Reading data
    • CSV
    • Excel
  • Chaining functions together with pipes
  • Group Manipulation with dplyr
    • select
    • filter
    • slice
    • mutate
    • group_by
    • summarize
    • arrange
  • Plotting with ggplot2
    • scatterplots
    • point shape
    • point size
    • color coding
    • histograms
    • violin plots
    • facets
    • themes