Linear Regression is inaccurate and misleading! - Super Heuristics (2024)

Linear Regression is inaccurate and misleading! - Super Heuristics (1)

There is something about predictions that fascinates us. Those who have even a little bit of familiarity with statistics would know that Linear Regression is probably the first thing you learn in the context of prediction.

In this article, I discuss Linear Regression as a method for sales forecasting (or for that matter, any forecasting).

And if you thought that Linear Regression is great, I will show you how Linear Regression is inaccurate and misleading in most of the circ*mstances.

In the first article in this series on How to do (accurate) Sales Forecasting, I gave you an overview of what sales forecasting is.

I also highlighted that why as a marketer it is your responsibility to do it (accurately) than it is of any salesperson.

Now, this article is about applying Linear Regression to the dataset that I shared with you in the last post.

But, before that, let me begin by quickly refreshing your memory about why Linear Regression is.

What is Linear Regression?

Linear Regression is a predictive analysis tool.

Definition

Linear Regression is used on datasets that have one or more independent variables (predictors) and one dependent variable (dependent on the predictors).

When applied to a given dataset the tool tells us two things:

  1. How well do the predictors explain the dependent variable?
  2. With what magnitude does each of the predictor variables impact the dependent variable

The simplest linear regression equation is this

Linear Regression is inaccurate and misleading! - Super Heuristics (2)

How to perform Linear Regression

If you haven’t read the previous post in the series where I had shared the sample dataset, you may download the file from here.

I would again like to give credits to this dataset to my Marketing Analytics professor, Prof. Prantosh Banerjee.

He has spent years and years as an Analytics and Consumer Behaviour consultant in the industry. He is a PGDM from IIM Calcutta and has been teaching at a few IIMs across the country including mine.

In the image below, you can see the snippet of the dataset with columns Ad_Exp and Sales_Prom_Exp as the predictor variables and the column Sales as the dependent variable.

Linear Regression is inaccurate and misleading! - Super Heuristics (3)

To perform Linear Regression in Excel go to the Data tab and then go to Data Analysis.

Linear Regression is inaccurate and misleading! - Super Heuristics (4)

Select Regression from the menu that pops-up

Linear Regression is inaccurate and misleading! - Super Heuristics (5)

Now, remember that we wish to see the relationship of both the predictors with the Sales.

Set up the Ad_Exp and Sales_Prom_Exp as the input X range and Sales as the input Y range as I did it in the image below.

This is in accordance with the linear regression equation that I showed you above where y was the dependent variable and x were the predictors.

Linear Regression is inaccurate and misleading! - Super Heuristics (6)

Now, have faith in God and in yourself and click ‘OK’

What you will come across on your screen is the regression output. The interpretation of what each element of this output means is the subject matter for another blog post on what are the basics of regression.

Here, in the output focus on the ANOVA table and the intercepts mentioned below it. This one.

Linear Regression is inaccurate and misleading! - Super Heuristics (7)

Here, remember that X Variable 1 is Ad_Exp and X Variable 2 is Sales_Prom_Exp (because that is the order in which they were in our dataset)

I copy and pasted those coefficients and intercept values below the data table and removed everything else that I got in the output.

Linear Regression is inaccurate and misleading! - Super Heuristics (8)

This leaves us with the linear model which can be described by the equation:

Sales = 631.12 + 6.04*(Advertisem*nt Exp.) – 1.67*(Sales Promotion)

Compare this with the standard linear regression equation and things will be clearer to you.

Linear Regression is inaccurate and misleading! - Super Heuristics (9)

Let’s take a moment or two to understand what this means. Here are a few inferences that you can draw from this.

  1. The situation seems to be such that even with zero advertisem*nts and zero sales promotion expenditure, the brand manager would see the sales of ₹6,31,120
  2. While advertisem*nt expenditure has a positive impact on sales (positive coefficient), sales promotion happens to have a negative impact on sales.

Point 2 is a little difficult to digest, but then that is where data and its correct analysis comes in handy.

We can see that the coefficient of Sales promotion is negative, which means as the expenditure on Sales Promotion increases, I am losing out on sales – the sales are decreasing.

Re-checking the Linear model

Now, we reached the point where we got the formula of our regression model which was this:

Sales = 631.12 + 6.04*(Advertisem*nt Exp.) – 1.67*(Sales Promotion)

I will now apply the same formula in the column next to the column with the actual sales data.

My intention is to input the Ad_Exp and Sales_Prom_Exp of each of the given quarters to sort of cross-check if the model is formed correctly or not.

Linear Regression is inaccurate and misleading! - Super Heuristics (10)

Note: Please be careful of where you put the $ sign in the formula.

Further, drag the formula in the entire column E to compare the predicted and the actual sales. I rounded off the values to two decimal places.

Linear Regression is inaccurate and misleading! - Super Heuristics (11)

Our figures, each of them, seem close to the actual sales figures given in column D and therefore we can be sure of the at least the model has been correctly created.

How accurate is our model?

If you observe closely, you would realize that the accuracy is horrible.

To make the accuracy clearer to you, I calculated the difference between sales calculated from the LM model (E) and the actual sales (D) in column F.

Linear Regression is inaccurate and misleading! - Super Heuristics (12)

The difference is stark.

Right in the first quarter, you can see a difference of 20.36 which is ₹20,360!

This makes us think that was all this exercise of creating a predictive model worth it?

In fact, these results make us doubt that – can any prediction that this model makes for beyond the 12th quarter be trusted?

What if the error is even greater in that case?

These concerns and valid.

And chances are that the future predictions will be even more departed from the actual sales that might happen. This is because the model does not capture the true essence of the nature of the relationship.

And because it does not represent the relationship correctly, what are the chances that it will predict anything correctly?

Finally, what is wrong with Linear Regression?

The basic assumption in a linear regression model is – as the name suggests – linearity.

If we are performing a linear regression we are implying that a particular amount of increase in the advertisem*nts leads to an equal (or equivalent) amount of increase or decrease in the actual sales.

But, is that always true? Is that always the best representation of the reality?

Let’s have a look.

Here is a scatter plot of the Ad_Exp with Sales

Linear Regression is inaccurate and misleading! - Super Heuristics (13)

I added a trendline in the plot (the line that you see passing through the dots).

This trendline is a linear trendline, therefore, representing our linear regression model.

From the looks of it, it seems that it is close to all of our dots and represents the model perfectly.

But that impression lasts only once you try out other model types.

Before trying any other trendline, I checked the R² for the linear model.

Definition

R-squared is a statistical measure of how close the data are to the fitted regression line. Higher the R² better is the model.

Linear Regression is inaccurate and misleading! - Super Heuristics (14)

As we can see, the R² of the linear model is 0.975. Now, unless we find an R² higher than this, we can be sure that the linear model represents our data most accurately.

But as it turns out, just when I tried another model – the logarithmic model, I got an R² higher than 0.975

Linear Regression is inaccurate and misleading! - Super Heuristics (15)

This is the logarithmic trendline (slightly curved if you can see), which gives us an R² of 0.9994. This is a near perfect R².

And this shows that a logarithmic model (which is non-linear) represents the Ad_Exp and Sales relationship much better than a linear model.

Therefore, this is why Linear Regression is a highly limited and inaccurate model in this case.

And that’s because it assumes that the relationship is linear.

What’s next?

We did say that the Logarithmic model better explains the relationship than the linear model. But are we sure that the logarithmic model explains it the best?

No. We can’t be sure of that unless and until we try each of the other models and check their R² value.

So far in these two blog posts of the series I have shared with you what is not to be done.

In the subsequent posts of this series, we will see how to find out the best model that will represent the predictors and variables.

Further, we will see how to create a final model that describes the relationship between all the predictors and the dependent variable. Accurately.

Let me know in the comments how you liked this post and if you have any doubts.

Found the article interesting? Share it with your friends:

Linear Regression is inaccurate and misleading! - Super Heuristics (2024)

FAQs

Why is linear regression inaccurate? ›

Linear regression models are sensitive to outliers. An outlier is a data point that differs significantly from other data points. Outliers can influence the slope and intercept of the regression line, leading to inaccurate predictions.

How accurate is linear regression? ›

Linear Regression have simple numbers it is common to have 100% accuracy on large dataset. Try with other datasets once.

Why is a linear regression not appropriate? ›

There are two things that explain why Linear Regression is not suitable for classification. The first one is that Linear Regression deals with continuous values whereas classification problems mandate discrete values. The second problem is regarding the shift in threshold value when new data points are added.

What are the weaknesses of linear regression? ›

Limitations of linear regression
  • Linearity: The assumption of linearity between variables restricts linear regressions. ...
  • Overfit: It's not recommended to use linear regressions when the observations aren't proportional to the features. ...
  • Outliers: Linear regressions are prone to mistakes and outliers.
Jun 28, 2024

What can be a major problem with linear regression? ›

3 Disadvantage: Sensitive to outliers and noise. One of the main disadvantages of using linear regression for predictive analytics is that it is sensitive to outliers and noise. Outliers are data points that deviate significantly from the rest of the data, and noise is random variation or error in the data.

What are the errors in linear regression? ›

Within a linear regression model tracking a stock's price over time, the error term is the difference between the expected price at a particular time and the price that was actually observed.

How do you know if linear regression is correct? ›

How to Test the Assumptions of Linear Regression?
  1. Assumption One: Linearity of the Data.
  2. Assumption Two: Predictors (x) are Independent & Observed with Negligible Error.
  3. Assumption Three: Residual Errors have a Mean Value of Zero.
  4. Assumption Four: Residual Errors have Constant Variance.

How to check the accuracy of linear regression? ›

Goka Tharun Kumar
  1. you can predict the test data and store it in x, and store the test data itself in y. And make a scatter plot between x and y . ...
  2. you can also measure the accuracy with the function "mean_squared_error(x,y)".It has to be possible minimum value.
  3. you can also find the r2_score value of x and y .

How do you make linear regression more accurate? ›

A few effective ways to improve the accuracy of your regression models are:
  1. Regularization.
  2. Handling Missing & Null Values. Deleting Missing Values. Imputing Missing Values. Imputing by Model-based Prediction.
  3. Categorical Feature Encoding. Label Encoding. One-Hot Encoding.
  4. Feature Engineering.
  5. Conclusion.
Aug 7, 2024

What is the main problem with using the regression line? ›

Answer: The main problem with using single regression line is it is limited to Single/Linear Relationships. linear regression only models relationships between dependent and independent variables that are linear. It assumes there is a straight-line relationship between them which is incorrect sometimes.

What are common regression mistakes? ›

Know the main issues surrounding other regression pitfalls, including overfitting, excluding important predictor variables, extrapolation, missing data, and power and sample size.

How do you know if your linear regression model is good? ›

To determine if your regression model is valid, you must test if the coefficients are statistically significant, or different from zero. If a coefficient is significant, it means that its corresponding independent variable has a meaningful and reliable influence on the dependent variable.

Why does linear regression fail? ›

This can be caused by accidentally duplicating a variable in the data, using a linear transformation of a variable along with the original (e.g., the same temperature measurements expressed in Fahrenheit and Celsius), or including a linear combination of multiple variables in the model, such as their mean.

When should you not use regression? ›

Do not use the regression equation to predict values of the response variable (y) for explanatory variable (x) values that are outside the range found with the original data.

Why is linear regression not suitable for prediction? ›

Because linear regression makes predictions based upon linear and continuous data. Classification involves nonlinear (usually) and discrete data. In regression, we are typically predicting the next number (or set of numbers) in a sequence.

What are the mistakes in regression analysis? ›

Avoid These Common Regression Analysis Mistakes

Using linear regression instead of nonlinear regression. Confusing linear regression with correlation. Fitting a model to smoothed data. Incorrectly removing outliers.

Top Articles
The Sweet and Sour of simple syrup - Does Simple Syrup Go bad?
The 12 Most Expensive Sneakers Ever Sold
Kevin Cox Picks
Research Tome Neltharus
Fort Carson Cif Phone Number
Culver's Flavor Of The Day Wilson Nc
Ribbit Woodbine
Best Cheap Action Camera
10000 Divided By 5
Nyuonsite
Publix 147 Coral Way
2013 Chevy Cruze Coolant Hose Diagram
Brenna Percy Reddit
General Info for Parents
Trini Sandwich Crossword Clue
D10 Wrestling Facebook
Cvs Appointment For Booster Shot
Tcu Jaggaer
Palm Coast Permits Online
Everything We Know About Gladiator 2
Cbssports Rankings
Aerocareusa Hmebillpay Com
Masterkyngmash
Shadbase Get Out Of Jail
Encyclopaedia Metallum - WikiMili, The Best Wikipedia Reader
Craigslist Lake Charles
Used Patio Furniture - Craigslist
Spiritual Meaning Of Snake Tattoo: Healing And Rebirth!
Workshops - Canadian Dam Association (CDA-ACB)
4Oxfun
Urbfsdreamgirl
4 Methods to Fix “Vortex Mods Cannot Be Deployed” Issue - MiniTool Partition Wizard
Harbor Freight Tax Exempt Portal
The Procurement Acronyms And Abbreviations That You Need To Know Short Forms Used In Procurement
Napa Autocare Locator
Moonrise Time Tonight Near Me
Wcostream Attack On Titan
Rust Belt Revival Auctions
Bee And Willow Bar Cart
Daily Journal Obituary Kankakee
Quake Awakening Fragments
Muziq Najm
Crazy Balls 3D Racing . Online Games . BrightestGames.com
FREE - Divitarot.com - Tarot Denis Lapierre - Free divinatory tarot - Your divinatory tarot - Your future according to the cards! - Official website of Denis Lapierre - LIVE TAROT - Online Free Tarot cards reading - TAROT - Your free online latin tarot re
Dickdrainersx Jessica Marie
Senior Houses For Sale Near Me
Iupui Course Search
Walmart Careers Stocker
Conan Exiles Colored Crystal
Cara Corcione Obituary
Paradise leaked: An analysis of offshore data leaks
Call2Recycle Sites At The Home Depot
Latest Posts
Article information

Author: Rev. Leonie Wyman

Last Updated:

Views: 5730

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Rev. Leonie Wyman

Birthday: 1993-07-01

Address: Suite 763 6272 Lang Bypass, New Xochitlport, VT 72704-3308

Phone: +22014484519944

Job: Banking Officer

Hobby: Sailing, Gaming, Basketball, Calligraphy, Mycology, Astronomy, Juggling

Introduction: My name is Rev. Leonie Wyman, I am a colorful, tasty, splendid, fair, witty, gorgeous, splendid person who loves writing and wants to share my knowledge and understanding with you.