Practical Data Science with Python
Deep dive into multiple aspects of data science with Machine learning and Python
The data science process is quite broad, however this course will break things down into different categories and areas of specialization that can be adopted and implemented immediately. This training will teach you to improve your ability to solve data problems using visualizations, data wrangling, and machine learning  all under the umbrella of data science, with Python as the medium.
We will begin with identifying a data set that has potential for hidden gems that can be extracted through varying data science techniques. These techniques will be applied using the following:
 Data analysis and numerical computing libraries in Python
 Data visualization libraries in Python
 Methods for applying machine learning models on data sets, to make future predictions and decisions
 Learning how to deploy the model for future use
What you'll learnand how you can apply it
In this course, you’ll learn how to:  Develop a data science work environment, with Jupyter notebooks  Perform data wrangling with the pandas library  Utilize the numpy library to perform matrix and array manipulation  Create machine learning models with the scikitlearn library  Interpret the mathematics behind machine learning modeling, and the statistics behind data science  Deploy models to a Docker container
This training course is for you because...
The course is aimed at Data Analysts, Data Scientists, and Data Engineers that are looking to enhance their existing skills or develop new data skills with Python.
Prerequisites
 Python Machine Learning by Packt
 Experience with data analysis
 Cursory knowledge of a programming language like Python or R
 Introductory statistics
Materials, downloads, or Supplemental Content needed in advance
 An individual Docker Account
 An Anaconda account, or some type of a Python IDE tool that will allow you to use Jupyter (even a pip install of Jupyter would be sufficient with a Mac or Linux machine)
About your instructor

Ahmed Sherif is a data scientist who has been working with data in various roles since 2005. He started off with BI solutions and transitioned to data science in 2013. In 2016, he obtained a master's in Predictive Analytics from Northwestern University, where he studied the science and application of machine learning and predictive modeling using both Python and R. As a data scientist, he strives to architect predictive capabilities with big data solutions for companies to better leverage their data and make more informed decisions. Lately, he has been developing machine learning and deep learning solutions on the cloud using Azure. In 2016, he published his first book, Practical Business Intelligence. In 2018, he published his second book, Apache Spark Deep Learning Cookbook. He currently works as a Technology Solution Professional in Data and AI for Microsoft.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
DAY 1
Section 1: Setting up Jupyter Notebook on Anaconda (30 mins)
 Install Anaconda on your machine
 Get familiar with all of the features available for performing interactive Python scripting using Anaconda
 Start your first Jupyter Notebook and Project in Python 3
Section 2: Installing Dependencies for Data Science Libraries (30 mins)
 Create a virtual python environment to manage dependencies that do not conflict with your system Python environment
 Install numpy, requests, beautiful soup, pandas, matplotlib, and scikitlearn libraries
 Check versions and help for libraries to confirm installations
Break: 10 mins
Section 3: Data Wrangling with pandas (30 mins)
 Identify an online data source that will make for a good data analysis project
 Scrape data from the web
 Import dataset into Dataframe using pandas
Lab 1: Data Visualization with matplotlib (30 mins)
 Create a visualization to identify correlation between fields in the data set
 Create a visualization to identify outliers
Break: 10 mins
Section 4: Data Visualization with matplotlib (30 mins)
 Plotting a simple bar and line chart inline within a Jupyter notebook
 Identifying x and yaxes and labeling them
 Styling matplotlib chart with comments and color
Lab 2: Data Wrangling with pandas (30 mins)
 Data Analysis on Dataframe using pandas
 Identifying erroneous data
 Replacing and Imputing Erroneous data
 Discuss the concept of ‘Tyranny of the Mean’
Break: 10 mins
Section 4: Encoding with numpy (30 mins)
 Explaining how feature engineering is performed with encoding
 Converting dataframe into a numpy array or matrix
 Encoding dataframe
 Identifying predictors and labels in dataset
Lab 3: Feature Engineering with numpy (30 mins)
 Optimize encoded arrays with normalization
 Replace denormalized arrays with normalized arrays
DAY 2
Section 5: Machine Learning Concepts (1 hour)
 Getting familiar with the machine learning models available in scikitlearn
 Supervised vs. Unsupervised models
 Classification vs. Regression vs Clustering models
Lab 4: Determining Supervised vs. Unsupervised approach (30 mins)
 Develop a scenario with the current dataset to build out a classification supervised approach
 Develop a scenario with the current dataset to build out a regression supervised approach
 Develop a scenario with the current dataset to build out an unsupervised approach
Break: 10 mins
Section 6: Apply Machine Learning model (1.5 hours)
 Split data into a test and training dataset
 Create a linear regression model in scikitlearn
 Create a logistic regression model in scikitlearn
 Create a clustering model in scikitlearn
Lab 5: Evaluate Machine Learning model (30 mins)
 Evaluate accuracy, precision, recall, and FI Scores of trained model against test dataset
 Interpret model output and results
Break: 10 mins
Section 8:Deploying Machine Learning model to Docker Container (1 Hour)
 Export model as pickle file
 Deploy model to a docker container
 Leverage new predictions with new data against docker container
 Wrapup: Summary, Discussions (30 min)
 Interactive Discussion