By Ajit Jaokar and Dan Howarth. With contributions from Ayse Mutlu.
Exclusively for Data Science Central members, with free access.
You can download this book (PDF) here. This tutorial began as a series of weekend workshops created by Ajit Jaokar and Dan Howarth.
The idea was to work with a specific (longish) program such that we explore as much of it as possible in one weekend. This book is an attempt to take this idea online. The best way to use this book is to work with the Python code as much as you can. The code has comments. But you can extend the comments by the concepts explained here.
Content
1. Introduction and approach 4
2. Background, tools and philosophy 6
What you will learn from this book? 6
Components for book 7
Big Picture Diagram 7
3. Code outline 7
Regression code outline 7
Classification Code Outline 8
4. Exploratory data analysis and graphics 8
Numeric descriptive statistics 8
Interpreting descriptive statistics 9
Understanding the distribution 10
Histograms 10
Boxplots and IQR 10
Correlation 11
heatmaps for co-relation 12
Analysing the target variable 13
5. Pre-processing data 13
Dealing with missing values 13
Treatment of categorical values 13
Normalise the data 14
Split the data 15
6. Choose a Baseline algorithm 15
Defining / instantiating the baseline model 15
Fitting the model we have developed to our training set 16
Define the evaluation metric 16
Predict scores against our test set and assess how good it is 18
7. Evaluation metrics for classification 18
Improving a model – from baseline models to final models 21
Understanding cross validation 21
Feature engineering 24
Regularization to prevent overfitting 24
Ensembles – typically for classification 26
Test alternative models 27
Hyperparameter tuning 28
8. Conclusion 28
A1. Regression Code 29
A2. Classification Code 36
To access the book, and if you are not yet a DSC member, you can register as a member, following this link.
Some opinions expressed in this article may be those of a guest author and not necessarily Analytikus. Staff authors are listed in https://www.datasciencecentral.com/profiles/blogs/free-book-classification-and-regression-in-a-weekend