O'Reilly logo
live online training icon Live Online training

Practical Data Science with Python

Deep dive into multiple aspects of data science with Machine learning and Python

Ahmed Sherif

The data science process is quite broad, however this course will break things down into different categories and areas of specialization that can be adopted and implemented immediately. This training will teach you to improve your ability to solve data problems using visualizations, data wrangling, and machine learning -- all under the umbrella of data science, with Python as the medium.

We will begin with identifying a data set that has potential for hidden gems that can be extracted through varying data science techniques. These techniques will be applied using the following:

  • Data analysis and numerical computing libraries in Python
  • Data visualization libraries in Python
  • Methods for applying machine learning models on data sets, to make future predictions and decisions
  • Learning how to deploy the model for future use

What you'll learn-and how you can apply it

In this course, you’ll learn how to: - Develop a data science work environment, with Jupyter notebooks - Perform data wrangling with the pandas library - Utilize the numpy library to perform matrix and array manipulation - Create machine learning models with the scikit-learn library - Interpret the mathematics behind machine learning modeling, and the statistics behind data science - Deploy models to a Docker container

This training course is for you because...

The course is aimed at Data Analysts, Data Scientists, and Data Engineers that are looking to enhance their existing skills or develop new data skills with Python.

Prerequisites

  • Python Machine Learning by Packt
  • Experience with data analysis
  • Cursory knowledge of a programming language like Python or R
  • Introductory statistics

Materials, downloads, or Supplemental Content needed in advance

  • An individual Docker Account
  • An Anaconda account, or some type of a Python IDE tool that will allow you to use Jupyter (even a pip install of Jupyter would be sufficient with a Mac or Linux machine)

About your instructor

  • Ahmed Sherif is a data scientist who has been working with data in various roles since 2005. He started off with BI solutions and transitioned to data science in 2013. In 2016, he obtained a master's in Predictive Analytics from Northwestern University, where he studied the science and application of machine learning and predictive modeling using both Python and R. As a data scientist, he strives to architect predictive capabilities with big data solutions for companies to better leverage their data and make more informed decisions. Lately, he has been developing machine learning and deep learning solutions on the cloud using Azure. In 2016, he published his first book, Practical Business Intelligence. In 2018, he published his second book, Apache Spark Deep Learning Cookbook. He currently works as a Technology Solution Professional in Data and AI for Microsoft.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

DAY 1

Section 1: Setting up Jupyter Notebook on Anaconda (30 mins)

  • Install Anaconda on your machine
  • Get familiar with all of the features available for performing interactive Python scripting using Anaconda
  • Start your first Jupyter Notebook and Project in Python 3

Section 2: Installing Dependencies for Data Science Libraries (30 mins)

  • Create a virtual python environment to manage dependencies that do not conflict with your system Python environment
  • Install numpy, requests, beautiful soup, pandas, matplotlib, and scikit-learn libraries
  • Check versions and help for libraries to confirm installations

Break: 10 mins

Section 3: Data Wrangling with pandas (30 mins)

  • Identify an online data source that will make for a good data analysis project
  • Scrape data from the web
  • Import dataset into Dataframe using pandas

Lab 1: Data Visualization with matplotlib (30 mins)

  • Create a visualization to identify correlation between fields in the data set
  • Create a visualization to identify outliers

Break: 10 mins

Section 4: Data Visualization with matplotlib (30 mins)

  • Plotting a simple bar and line chart inline within a Jupyter notebook
  • Identifying x and y-axes and labeling them
  • Styling matplotlib chart with comments and color

Lab 2: Data Wrangling with pandas (30 mins)

  • Data Analysis on Dataframe using pandas
  • Identifying erroneous data
  • Replacing and Imputing Erroneous data
  • Discuss the concept of ‘Tyranny of the Mean’

Break: 10 mins

Section 4: Encoding with numpy (30 mins)

  • Explaining how feature engineering is performed with encoding
  • Converting dataframe into a numpy array or matrix
  • Encoding dataframe
  • Identifying predictors and labels in dataset

Lab 3: Feature Engineering with numpy (30 mins)

  • Optimize encoded arrays with normalization
  • Replace denormalized arrays with normalized arrays

DAY 2

Section 5: Machine Learning Concepts (1 hour)

  • Getting familiar with the machine learning models available in scikit-learn
  • Supervised vs. Unsupervised models
  • Classification vs. Regression vs Clustering models

Lab 4: Determining Supervised vs. Unsupervised approach (30 mins)

  • Develop a scenario with the current dataset to build out a classification supervised approach
  • Develop a scenario with the current dataset to build out a regression supervised approach
  • Develop a scenario with the current dataset to build out an unsupervised approach

Break: 10 mins

Section 6: Apply Machine Learning model (1.5 hours)

  • Split data into a test and training dataset
  • Create a linear regression model in scikit-learn
  • Create a logistic regression model in scikit-learn
  • Create a clustering model in scikit-learn

Lab 5: Evaluate Machine Learning model (30 mins)

  • Evaluate accuracy, precision, recall, and FI Scores of trained model against test dataset
  • Interpret model output and results

Break: 10 mins

Section 8:Deploying Machine Learning model to Docker Container (1 Hour)

  • Export model as pickle file
  • Deploy model to a docker container
  • Leverage new predictions with new data against docker container
  • Wrap-up: Summary, Discussions (30 min)
  • Interactive Discussion