This is a second post detailing 11 more takeaways I had after recently reading Chip Huyen’s Designing Machine Learning Systems. You can find my first post here.
[Read More]
I recently read through Chip Huyen’s Designing Machine Learning Systems, first published in 2022. I found it a highly useful overview of a rapidly evolving field. As I expected from her other writings, Chip provided clear explanations while straddling the line nicely between technical depth and summarization. Machine learning engineering...
[Read More]
Streaming analytics refers to the processing and analyzing of data continuously, as opposed to regular batches. Streams are triggered by specific events as the result of an action or set of actions. Examples of these triggering events might include financial transactions, thermostat readings, student responses, or website purchases. Streaming analytics...
[Read More]
Principal components analysis (PCA) is a common and popular technique for deriving a low-dimensional set of features from a large set of variables. For more information on PCA, please refer to my earlier post on the technique. In this post, I’ll explore using PCA as a dimension reduction technique for...
[Read More]
Principal components analysis (PCA) is a technique that computes the principal components of a dataset and then subsequently uses these components in understanding the data. PCA is an unsupervised approach. In a future post, I’ll explore principal components regression, a related supervised technique that makes uses of the principal components...
[Read More]
This is the second post in a short series discussing the common regularization methods of ridge regression and the lasso. In an earlier post, I introduced much of the theory surrounding these methods. For a more detailed overview of regularization, please see that earlier post.
[Read More]
Regularization is a method of fitting a model containing all predictors $p$ that regularize the coefficient estimates towards zero. Also known as constraining or shrinking the model’s coefficient estimates, regularization can significantly reduce the model’s variance and thus improve test error estimates and model performance. The two most commonly used...
[Read More]
The bootstrap is a widely used resampling technique first introduced by Bradley Efron in 1979 commonly used to quantify the uncertainty associated with a given estimator or statistical learning method. The bootstrap can be applied to many problems and methods, and is commonly used to estimate the standard errors of...
[Read More]