Ethan Wicker

Cross-validation #1

Validation sets, leave-one-out cross-validation, and k-fold cross-validation

February 15, 2021

Resampling methods are a crucial tool used commonly in modern statistics and data science. These methods involve taking repeated samples from a training dataset and refitting a model of interest on each individual sample to obtain additional information about the fitted model. These methods allow us to learn new information... [Read More]

Quadratic discriminant analysis

An introduction, the bias-variance trade-off, and a comparison to linear discriminant analysis using scikit-learn

February 10, 2021

In this post, I’ll be exploring quadratic discriminant analysis. I’ll compare and contrast this method with linear discriminant analysis, and work through an example using scikit-learn and the slimmed down Titanic dataset from one of my prior posts on logistic regression. [Read More]

Linear discriminant analysis #2

scikit-learn, precision, recall, F-scores, ROC curves, and a comparison to logistic regression

February 7, 2021

This post is the second in a series on linear discriminant analysis (LDA) for classification. In the first post, I introduced much of the theory behind linear discriminant analysis. In this post, I’ll explore the method using scikit-learn. I’ll also discuss classification metrics such as precision and recall, and compare... [Read More]

Linear discriminant analysis #1

A brief introduction

February 3, 2021

This post is the first in a series on the linear discriminant analysis method. In this series, I’ll discuss the underlying theory of linear discriminant analysis, as well as applications in Python. [Read More]

Exploring a pandas to scikit-learn workflow

Using scikit-learn's ColumnTransformer and Pipeline for encoding, imputing and scaling features

February 1, 2021

I recently read through this excellent Medium article about the ColumnTransformer estimator in scikit-learn and how it can be used in tandem with Pipelines and the OneHotEncoder estimator. To strengthen my own understanding of the concept, I decided to follow the post with my own working example, and summarize the... [Read More]

Logistic regression #2

scikit-learn, statsmodels, plotly, one-hot encoding & multiclass logistic regression

January 27, 2021

This post is the second in a series on the logistic regression model. In this post, I’ll work through an example using the well known Titanic dataset, scikit-learn and statsmodels. I’ll discuss one-hot encoding, create a 3D logistic regression plot using Plotly, and demonstrate multiclass logistic regression with scikit-learn. [Read More]

Logistic regression #1

A brief introduction, maximum likelihood estimation, multiclass logistic regression, and more

January 27, 2021

This is the first post in a series on the logistic regression model. The structure of this post was influenced by the fourth chapter of An Introduction to Statistical Learning: with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. [Read More]

Multiple linear regression #5

Using scikit-learn, statsmodels, seaborn, and plotly

January 20, 2021

This is the fifth post in a series on the multiple linear regression model. In previous posts, I introduced the theory behind the model, exploring using Python’s scikit-learn and statsmodels libraries, and discussed potential problems with the model, such as collinearity and correlation of the error terms. [Read More]

Multiple linear regression #4

Potential problems: Heteroscedasticity, collinearity, and more

January 19, 2021

This post is the fourth in a series on the multiple linear regression model. In previous posts, I explored the topic, including methods of relaxing various assumptions made by the model. I also performed a comparison of Python’s scikit-learn and statsmodels libraries for multiple linear regression. [Read More]

Multiple linear regression #3

Qualitative predictors, interaction terms, and non-linear relationships

January 15, 2021

This post is the third in a series on the multiple linear regression model. In previous posts, I introduced the multiple linear regression model and explored a comparison of Python’s scikit-learn and statsmodels libraries. However, both of these previous posts exclusively explored quantitative predictors. [Read More]