O'Reilly logo
live online training icon Live Online training

Developing a Data Science Project

Three Key Phases to Successful Data Products

Dr. Philip Winder

If you’ve taken my introductory course then you understand the process used to develop a Data Science process. And you know there are a range of analytic techniques that are used to develop a product. But we haven’t yet delved into the techniques used in each of these phases. This training provides an introduction to the algorithms used to prepare and model data and evaluate the algorithms. The goal is to make you aware of the main categories of algorithms; we will not go into depth.

You will walk away appreciating many of the algorithms used in industry today. Use this course to obtain a review of the three key phases of data science development. This provides a perfect overview of many of the hyper-specific techniques in use today. Don’t waste your time learning about a specific piece of software. Use this course to find out what techniques you really need.

What you'll learn-and how you can apply it

Participants will be able to: - Describe the three key phases of Data Science development - Demonstrate why data preparation is important and how to implement - Develop models that are appropriate for the type and complexity of the data - Effectively evaluate and present models without bias or errors

This training course is for you because...

  • Learning about hyper-specific tools or technologies don’t provide the breadth required to select an optimal solution to a problem
  • You want to learn the three key phases of Data Science development
  • You need to establish which algorithms to focus on for your specific problem set

Prerequisites

  • Python experience for workshops
  • Introductory Data Science experience to understand terminology

Recommended Preparation

  • Some experience of Python is required in order to understand the exercises. For example:

https://sunburn.in/?page=library/view/python-in-a/9781491913833/

https://sunburn.in/?page=library/view/the-hitchhikers-guide/9781491933213/

https://sunburn.in/?page=library/view/head-first-python/9781491919521/

Recommended Follow-up

About your instructor

  • Dr. Philip Winder is a multidisciplinary Engineer who creates data-driven software products. His work incorporates Data Science, Cloud Native and traditional software development using a range of languages and tools.

    Phil is the CEO of Winder, a Data Science consultancy in the UK, which operates throughout Europe delivering training, development and consultancy services. He has Ph.D. and a Masters degree in Electronics from the University of Hull, UK.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction and Recap: Doing Data Science

Length: 10 min - Provide a brief introduction and recap about typical Data Science workflows.

Data Preprocessing: The Fruits of Labour

Length: 60 min - Instructors will introduce data preprocessing; what it is, why you should do it, when you should do it and how. We will also introduce dimensionality reduction and why it is important. This will be accompanied by real world examples but does not go into depth. (30 min) - Participants will learn how fruitful data preprocessing can be. They will be provided with a notebook that provides a demonstration of preprocessing and dimensionality reduction. Including Q&A. (25 min) - Assignment: A handheld walkthrough of preprocessing and dimensionality reduction. A python notebook.

Break: 5 min

Modelling: Not Just for Scientists

Length: 60 min - Instructors will introduce all the different types of models that occur within Data Science. We will not go into depth about specific implementations, but rather provide an overview of the commonly used algorithms. One key is the compromise between generalisation and overfitting. We will discuss this at length. - Participants will continue the handheld walkthrough of a real life example. We will develop a model and consider the potential problems of overfitting. Q&A. (25 min) - Assignment: A handheld walkthrough of modelling and overfitting. A python notebook.

Break: 5 min

Evaluation: How to Prove it Works

Length: 60 min - Instructors will demonstrate how to prove that models are robust and performant. We will introduce a range of numerical and visual methods of evaluation and introduce the art of visualisation. - Participants will finalise the guided walkthrough by measuring performance and visualising their results that will be presented to stakeholders. Participants will be asked to critique the visuals that are produced. Q&A. (25 min) - Assignment: A handheld walkthrough of evaluation and visualisation. A Python notebook.

Break: 5 min