Linear Regression - Data Science Discovery (2024)

Linear Regression

The idea of trying to fit a line as closely as possible to as many points as possible is known as linear regression. The most common technique is to try to fit a line that minimizes the squared distance to each of those points. This is called OLS or Ordinary Least Squares Regression.

We can find the equation of this line and use it to make predictions. Since our regression estimates form a straight line, we can describe them using an equation in slope-intercept form:

Regression Equation

Linear Regression - Data Science Discovery (1)

When we have one x-variable (x1) and one y-variable (y-hat), this is called simple linear regression. This means that we are using one independent variable to predict the y-variable. We can have multiple independent variables to predict the y-variable and this is called multiple regression. For now, we are going to focus on simple linear regression because it's easy to interpret the results.

The Slope and Y Intercept of the Regression Line

In our regression equation, b0 is the y-intercept and b1 is the slope. Here's how you calculate the slope and y-intercept:

Linear Regression - Data Science Discovery (2)

Here's how you interpret them:

  • SLOPE= The average increase in Y associated with a 1-unit increase in X.
  • Y-INTERCEPT= The predicted value of Y when X is equal to 0.

In order to make predictions using the equation of the regression line, first find the slope and y-intercept. Next, you can plug in values of x to get predicted values of y.

Warning About Regression

When making predictions using regression, it's important to be aware of the following:

  • Predicting y at values of x beyond the range of x in the data is called extrapolation.
  • This is risky because we have no evidence to believe that the association between x and y remains linear for unseen values of x.
  • Extrapolated predictions can be absolutely wrong.

Residuals and RMSE

Unless there is a perfect correlation, our predictions are not going to be perfect. When thinking about this graphically, this means that for most of the points in any scatter plot, the actual y-values and the predicted y-values are different. The distance between the actual value and the predicted value from the line is called the residual or prediction error.

The residual is calculated by taking the actual value of y - the predicted value of y.

The residuals are the vertical distances between the points and the line.

  • If the point is above the regression line, the residual is positive.
  • If the point is below the regression line, the residual is negative.
  • If the point is exactly on the regression line, the residual is 0.

Two Key Features of the Regression Line:

  1. For any regression line, the average (and the sum) of the errors is always zero because the positives and negatives cancel out.
  2. The SD of the errors (also called the Root Mean Square Error or RMSE), is a measure of the typical spread of the data around the regression line.

RMSE=SDerrors: The SD of the prediction errors is a measure of how accurate our predictions are. The better the predictions, the smaller the size of the errors and the smaller the RMSE.

Rather than finding all the errors and then taking their root mean square, it's much easier to use this formula below. The RMSE is in the same units as your y variable.

Linear Regression - Data Science Discovery (3)

Video 1: Simple Linear Regression

Follow along with the worksheet to work through the problem:

Video 2: Residuals and RMSE

Follow along with the worksheet to work through the problem:

Q1: Which one is better?

Q2: Linear Regression - Data Science Discovery (4)
What is the Y-INTERCEPT for the given straight line?

Q3: Suppose we have clinical data for 400 patients and the task is to predict if a patient has cancer from the given data. Should we use linear regression in this situation?

`); } else { $e.prop("disabled", true); $e.html((i, html) => "❌ " + html); $e.after(`

Try Again. ${d.comment}

Linear Regression - Data Science Discovery (2024)
Top Articles
How To Trade Cryptocurrency
How to connect to multiple VPNs at the same time
Golden Abyss - Chapter 5 - Lunar_Angel
Sprinter Tyrone's Unblocked Games
Erika Kullberg Wikipedia
Overnight Cleaner Jobs
Davante Adams Wikipedia
Aiken County government, school officials promote penny tax in North Augusta
Urinevlekken verwijderen: De meest effectieve methoden - Puurlv
Raid Guides - Hardstuck
O'reilly's Auto Parts Closest To My Location
TS-Optics ToupTek Color Astro Camera 2600CP Sony IMX571 Sensor D=28.3 mm-TS2600CP
Connect U Of M Dearborn
Katherine Croan Ewald
Best Uf Sororities
Healthier Homes | Coronavirus Protocol | Stanley Steemer - Stanley Steemer | The Steem Team
Teacup Yorkie For Sale Up To $400 In South Carolina
Bernie Platt, former Cherry Hill mayor and funeral home magnate, has died at 90
Barber Gym Quantico Hours
Wbiw Weather Watchers
Ford F-350 Models Trim Levels and Packages
Xfinity Outage Map Fredericksburg Va
Best Sports Bars In Schaumburg Il
Boise Craigslist Cars And Trucks - By Owner
Lexus Credit Card Login
Times Narcos Lied To You About What Really Happened - Grunge
4.231 Rounded To The Nearest Hundred
Florence Y'alls Standings
Used 2 Seater Go Karts
Gasbuddy Lenoir Nc
The Pretty Kitty Tanglewood
Junee Warehouse | Imamother
Cvb Location Code Lookup
Edict Of Force Poe
Are you ready for some football? Zag Alum Justin Lange Forges Career in NFL
Instafeet Login
Gets Less Antsy Crossword Clue
Tokyo Spa Memphis Reviews
Craigslist Gigs Wichita Ks
Felix Mallard Lpsg
11301 Lakeline Blvd Parkline Plaza Ctr Ste 150
Entry of the Globbots - 20th Century Electro​-​Synthesis, Avant Garde & Experimental Music 02;31,​07 - Volume II, by Various
Cpmc Mission Bernal Campus & Orthopedic Institute Photos
Pain Out Maxx Kratom
Hk Jockey Club Result
Avatar: The Way Of Water Showtimes Near Jasper 8 Theatres
Okta Login Nordstrom
Barber Gym Quantico Hours
Fallout 76 Fox Locations
Craigslist Cars And Trucks For Sale By Owner Indianapolis
Factorio Green Circuit Setup
Latest Posts
Article information

Author: Kerri Lueilwitz

Last Updated:

Views: 6160

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.