O'Reilly logo
live online training icon Live Online training

Introduction to Statistics for Data Analysis with Python

enter image description here

Learn the fundamentals of statistics answering real-world questions

Harshit Tyagi

This training session focuses on learning ways to implement fundamental concepts of statistics which are essential for every data scientist. We'll witness how statistics enable us to derive insights from raw information to answer our real-world problems/questions. For every aspiring data scientist, statistics opens up the doors to all the major domains which make use of data science.

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • Data exploration and visualization
  • Fundamentals of Descriptive strategy - mean, median mode, measurement of spread, standard deviation, percentile, variance, skewness, correlation, etc
  • Inferential statistics - basic principles behind using data for estimation and for assessing theories

And you’ll be able to:

  • Explore the data using statistics.
  • build statistical models.

This training course is for you because...

  • You are a programmer or an aspiring data analyst/scientist.
  • For all the beginners in the field of Data/ML/AI with some familiarity with elementary mathematics, and python programming.

Prerequisites

  • Python Programming, Pandas, Matplotlib
  • Basic Mathematics
  • No prior experience with statistics necessary

About your instructor

  • Harshit Tyagi is a Full Stack Developer and Data Engineer at Elucidata, a Cambridge based Biotech company. He develops algorithms for research scientists at the world’s best medical schools like Yale, UCLA, and MIT. Before Elucidata, he was working as a Systems Development Engineer at an Investment Management firm called Tradelogic where he designed a framework to analyze financial news from all prominent sources to produce accurate trading signals. He is a Python evangelist and loves to contribute to tech communities like Google Developers Groups, Python Delhi User Groups, and other E-learning platforms. With the skills acquired over years and being a mentor and reviewer for more than 3 years in the E-learning era, it’d be great to share the enterprise-grade practices to produce more skillful data scientists and quantitative traders.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction to Data Visualisation (50 mins)

  • Presentation (15min): Learning how to extract and explore data and understand what different plots and charts mean and represent.
  • Discussion (5 mins): Libraries we can use in python for plotting?
  • Presentation (15 mins): Overview of different Python plotting libraries, including Numpy, Pandas, Statsmodels, Matplotlib, and Seaborn.
  • Exercise (15mins): Practice plotting and Exploratory Data Analysis
  • Q&A (5 mins)

Introduction to Descriptive Strategy (50 mins)

  • Presentation (20 mins): Basics of Descriptive strategy Mean, Median, Mode, variance, standard deviation, central tendency, etc
  • Discussion (10 mins): How can we answer real-world questions using statistics - ex: Who is the best player of football in the world?
  • Presentation (15 mins): How does Netflix know what we like? - Percentile, variance, skewness, correlation.
  • Exercise (15 mins): Problem: Should we buy an extended warranty on electrical appliances?
  • Q&A (5 mins)

Basics of inferential statistics (60mins)

  • Presentation (20 mins): Basic principles behind inferential statistics - analyzing categorical and qualitative data, constructing confidence intervals and sampling.
  • Codelab walkthrough (15 mins): Use numpy, pandas, statsmodel and seaborn to analyse case studies.
  • Exercise (15 mins): Use the concepts to work on an industry problem
  • Q&A (10 mins)

Take-home exercise:

  1. Exercise: Create a statistical model to recommend the type of insurance to individuals based on their location, occupation, marital status, and many other features.