Published in · 13 min read · Oct 2, 2020
--
If you are a beginner in data science or statistics with some background on linear regression and are looking for ways to evaluate your models, then this guide might be for you.
This article will discuss the following metrics for choosing the ‘best’ linear regression model: R-Squared (R²), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root-Mean Square Error (RMSE), Akaike Information Criterion (AIC), and corrected variants of these that account for bias. A knowledge of linear regression will be assumed. I hope you enjoy reading this article, find it useful and learn something new :)
R-Squared (R²)
The R² value, also known as coefficient of determination, tells us how much the predicted data, denoted by y_hat, explains the actual data, denoted by y. In other words, it represents the strength of the fit, however it does not say anything about the model itself — it does not tell you if the model is good, whether the data you’ve chosen is biased, or even if you’ve chosen the correct modelling method¹. I will show this using examples below.