Choosing the Correct Type of Regression Analysis - DataScienceCentral.com (2024)

Guest blog by Jim Frost.

Regression analysismathematically describes the relationship between a set ofindependent variablesand adependent variable. There are numerous types of regression models that you can use. This choice often depends on the kind of data you have for the dependent variable and the type of model that provides the best fit. In this post, I cover the more common types of regression analyses and how to decide which one is right for your data.

I’ll provide an overview along with information to help you choose. I organize the types of regression by the different kinds of dependent variable. If you’re not sure which procedure to use, determine which type of dependent variable you have, and then focus on that section in this post. This process should help narrow the choices! I’ll cover regression models that are appropriate for dependent variables that measure continuous, categorical, and count data.

Regression Analysis with Continuous Dependent Variables

Regression analysis with a continuous dependent variable is probably the first type that comes to mind. While this is the primary case, you still need to decide which one to use.

Continuous variablesare a measurement on a continuous scale, such as weight, time, and length.

Linear regression

OLS produces the fitted line that minimizes the sum of the squared differences between the data points and the line.

Linear regression, also known asordinary least squares(OLS) and linear least squares, is the real workhorse of the regression world. Use linear regression to understand themeanchange in a dependent variable given a one-unit change in each independent variable. You can also use polynomials to model curvature and include interaction effects. Despite the term “linear model,” this type can model curvature.

This analysisestimatesparametersby minimizing the sum of the squared errors (SSE). Linear models are the most common and most straightforward to use. If you have a continuous dependent variable, linear regression is probably the first type you should consider.

There are some special options available for linear regression.

Fitted line plots: If you have one independent variable and the dependent variable, use a fitted line plot to display the data along with the fitted regression line and essential regression output. These graphs make understanding the model more intuitive.

Advanced types of linear regression

Linear models are the oldest type of regression. It was designed so thatstatisticianscan do the calculations by hand. However, OLS has several weaknesses, including a sensitivity to bothoutliersandmulticollinearity, and it is prone tooverfitting.To address these problems, statisticians have developed several advanced variants:

  • Ridge regressionallows you to analyze data even when severe multicollinearity is present and helps prevent overfitting. This type of model reduces the large, problematic variance that multicollinearity causes by introducing a slight bias in the estimates. The procedure trades away much of the variance in exchange for a little bias, which produces more usefulcoefficientestimates when multicollinearity is present.
  • Lasso regression(least absolute shrinkage and selection operator) performs variable selection that aims to increase prediction accuracy by identifying a simpler model. It is similar to Ridge regression but with variable selection.
  • Partial least squares (PLS) regressionis useful when you have very few observations compared to the number of independent variables or when your independent variables are highly correlated. PLS decreases the independent variables down to a smaller number of uncorrelated components, similar to Principal Components Analysis. Then, the procedure performs linear regression on these components rather the original data. PLS emphasizes developing predictive models and is not used for screening variables. Unlike OLS, you can include multiple continuousdependentvariables. PLS uses thecorrelationstructure to identify smaller effects and model multivariate patterns in the dependent variables.

Nonlinear regression

Nonlinear regression also requires a continuous dependent variable, but it provides a greater flexibility to fit curves than linear regression.

Like OLS, nonlinear regression estimates the parameters by minimizing the SSE. However, nonlinear models use an iterative algorithm rather than the linear approach of solving them directly with matrix equations. What this means for you is that you need to worry about which algorithm to use, specifying good starting values, and the possibility of either not converging on a solution or converging on a local minimum rather than a global minimum SSE. And, that’s in addition to specifying the correct functional form!

Nonlinear model of electron mobility by density.

Most nonlinear models have one continuous independent variable, but it is possible to have more than one. When you have one independent variable, you can graph the results using a fitted line plot.

My advice is to fit a model using linear regression first and then determine whether the linear model provides an adequate fit bychecking the residual plots. If you can’t obtain a good fit using linear regression, then try a nonlinear model because it can fit a wider variety of curves. I always recommend that you try OLS first because it is easier to perform and interpret.

I’ve written quite a bit about the differences between linear and nonlinear models. Read the following posts to learn the differences between these two types, how to choose which one is best for your data, and how to interpret the results.

Regression Analysis with Categorical Dependent Variables

So far, we’ve looked at models that require a continuous dependent variable. Next, let’s move on to categorical independent variables. Acategorical variablehas values that you can put into a countable number of distinct groups based on a characteristic.Logistic regression transforms the dependent variable and then uses Maximum Likelihood Estimation, rather than least squares, to estimate the parameters.

Logistic regression describes the relationship between a set of independent variables and a categorical dependent variable. Choose the type of logistic model based on the type of categorical dependent variable you have.

Binary Logistic Regression

Usebinary logistic regressionto understand how changes in the independent variables are associated with changes in the probability of an event occurring. This type of model requires a binary dependent variable. Abinary variablehas only two possible values, such as pass and fail.

Example:Political scientists assess the odds of the incumbent U.S. President winning reelection based on stock market performance.

Read my post about a binary logistic model thatestimates the probability of House Republicans belonging to the Fre….

Ordinal Logistic Regression

Ordinal logistic regressionmodels the relationship between a set of predictors and anordinal responsevariable. An ordinal response has at least three groups which have a natural order, such as hot, medium, and cold.

Example:Market analysts want to determine which variables influence the decision to buy large, medium, or small popcorn at the movie theater.

Nominal Logistic Regression

Nominal logistic regressionmodels the relationship between a set of independent variables and a nominal dependent variable. Anominal variablehas at least three groups which do not have a natural order, such as scratch, dent, and tear.

Example: A quality analyst studies the variables that affect the odds of the type of product defects: scratches, dents, and tears.

Regression Analysis with Count Dependent Variables

If your dependent variable is a count of items, events, results, or activities, you might need to use a different type of regression model. Counts are nonnegative integers (0, 1, 2, etc.). Count data with higher means tend to be normally distributed and you can often use OLS. However, count data with smaller means can beskewed, and linear regression might have a hard time fitting these data. For these cases, there are several types of models you can use.

Poisson regression

Count data frequently follow the Poisson distribution, which makes Poisson Regression a good possibility.Poisson variablesare a count of something over a constant amount of time, area, or another consistent length of observation. With a Poisson variable, you can calculate and assess a rate of occurrence. A classic example of a Poisson dataset is provided by Ladislaus Bortkiewicz, a Russian economist, who analyzed annual deaths caused by horse kicks in the Prussian Army from 1875-1984.

Use Poisson regression to model how changes in the independent variables are associated with changes in the counts. Poisson models are similar to logistic models because they use Maximum Likelihood Estimation and transform the dependent variable using the natural log. Poisson models can be suitable for rate data, where the rate is a count of events divided by a measure of that unit’sexposure(a consistent unit of observation). For example, homicides per month.

Example: An analyst uses Poisson regression to model the number of calls that a call center receives daily.

Alternatives to Poisson regression for count data

Not all count data follow the Poisson distribution because this distribution has some stringent restrictions. Fortunately, there are alternative analyses you can perform when you have count data.

Negative binomial regression: Poisson regression assumes that the variance equals the mean. When the variance is greater than the mean, your model has overdispersion. A negative binomial model, also known as NB2, can be more appropriate when overdispersion is present.

Zero-inflated models: Your count data might have too many zeros to follow the Poisson distribution. In other words, there are more zeros than the Poisson regression predicts. Zero-inflated models assume that two separate processes work together to produce the excessive zeros. One process determines whether there are zero events or more than zero events. The other is the Poisson process that determines how many events occur, some of which some can be zero. An example makes this clearer!

Suppose park rangers count the number of fish caught by each park visitor as they exit the park. A zero-inflated model might be appropriate for this scenario because there are two processes for catching zero fish:

  • Some park visitors catch zero fish because they did not go fishing.
  • Other visitors went fishing, and some of these people caught zero fish.

Whew! That’s many different types of regression analysis! If you’re trying to figure out which one to choose, I hope you will use this information to point yourself in the right direction!

If you’re learning regression, check out myRegression Tutorial!

Originally posted here.

Choosing the Correct Type of Regression Analysis - DataScienceCentral.com (2024)

FAQs

How can I determine which type of regression analysis to use? ›

My advice is to fit a model using linear regression first and then determine whether the linear model provides an adequate fit by checking the residual plots. If you can't obtain a good fit using linear regression, then try a nonlinear model because it can fit a wider variety of curves.

How do you select a regression type? ›

If the relationship between the independent and dependent variables appears to be linear, consider linear regression. If the relationship is not linear, you might need a non-linear regression model, such as polynomial regression, exponential regression, or logarithmic regression.

Which type of regression analysis is used when the dependent variable is continuous and normally distributed? ›

Linear regression analysis rests on the assumption that the dependent variable is continuous and that the distribution of the dependent variable (Y) at each value of the independent variable (X) is approximately normally distributed.

Which study variable is used to determine the appropriate type of regression analysis to be done? ›

The type of regression analysis that should be used, depends on the number of independent variables and the scale of measurement of the dependent variable. If you only want to use one variable for prediction, a simple regression is used. If you use more than one variable, you need to perform a multiple regression.

How to select the appropriate regression model? ›

You need to choose the model that has the highest performance metrics, the lowest complexity, and the best interpretability. You also need to consider the trade-offs between these factors, as well as your goal and context.

What are the four types of regression analysis? ›

Regression analysis is essential for predicting and understanding relationships between dependent and independent variables. There are various regression models, including linear regression, logistic regression, polynomial regression, ridge regression, and lasso regression, each suited for different data scenarios.

How do you choose regression or classification? ›

The most significant difference between regression vs classification is that while regression helps predict a continuous quantity, classification predicts discrete class labels. There are also some overlaps between the two types of machine learning algorithms.

What are the three types of regression? ›

Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common models are simple linear and multiple linear. Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship.

Which type of regression analysis is often used to model? ›

Linear Regression Analysis

It is one of the most widely known modeling techniques, as it is amongst the first elite regression analysis methods picked up by people at the time of learning predictive modeling.

How to choose dependent and independent variables in regression analysis? ›

In regression, the order of the variables is very important. The explanatory variable (or the independent variable) always belongs on the x-axis. The response variable (or the dependent variable) always belongs on the y-axis.

How to know if a linear regression model is appropriate? ›

The adequacy of a linear regression model can be determined through four checks.
  1. Check if the data and corresponding regression line look visually acceptable.
  2. Check how many scaled residuals are in the [−2,2] range.
  3. Check the coefficient of determination.
  4. Check the assumption of the inherent randomness of the residuals.
Oct 5, 2023

How do you determine which type of regression model to use? ›

The determinant of the type of regression analysis to be used is the nature of the outcome variable. Linear regression is used for continuous outcome variables (e.g., days of hospitalization or FEV1), and logistic regression is used for categorical outcome variables, such as death.

How do you select variables for regression analysis? ›

The selection is based on two criteria: relevance to the research context and statistical significance. This methodical entry of variables allows for a detailed examination of confounding factors and the grouping of highly correlated variables, enhancing the clarity and effectiveness of the regression analysis.

How do you determine regression analysis? ›

Linear Regression Analysis consists of more than just fitting a linear line through a cloud of data points. It consists of 3 stages – (1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model.

How to know when to use linear regression or logistic regression? ›

You can use linear regression when you want to predict a continuous dependent variable from a scale of values. Use logistic regression when you expect a binary outcome (for example, yes or no). Here are examples of linear regression: Predicting the height of an adult based on the mother's and father's height.

How do you know that linear regression is suitable for any given data? ›

Your data should have no significant outliers. Check for hom*oscedasticity — a statistical concept in which the variances along the best-fit linear-regression line remain similar all through that line. The residuals (errors) of the best-fit regression line follow normal distribution.

How to know which regression model to use in SPSS? ›

If you are working with continuous dependent variables, then you should consider the linear regression model first. Linear regression has some special options available: Fitted line plots: Use them when you have one independent and one dependent variable.

Top Articles
What Is Value-Added Tax (VAT)?
Management Liability Insurance | Insureon
Truist Bank Near Here
Po Box 7250 Sioux Falls Sd
Http://N14.Ultipro.com
Exam With A Social Studies Section Crossword
Z-Track Injection | Definition and Patient Education
Ventura Craigs List
CA Kapil 🇦🇪 Talreja Dubai on LinkedIn: #businessethics #audit #pwc #evergrande #talrejaandtalreja #businesssetup…
J Prince Steps Over Takeoff
Autozone Locations Near Me
Derpixon Kemono
Anki Fsrs
Globe Position Fault Litter Robot
Bme Flowchart Psu
Craigslist Labor Gigs Albuquerque
Cvs Learnet Modules
Insidekp.kp.org Hrconnect
Eka Vore Portal
Shannon Dacombe
Missed Connections Dayton Ohio
Q33 Bus Schedule Pdf
Hermitcraft Texture Pack
A Biomass Pyramid Of An Ecosystem Is Shown.Tertiary ConsumersSecondary ConsumersPrimary ConsumersProducersWhich
Rqi.1Stop
Panolian Batesville Ms Obituaries 2022
Form F-1 - Registration statement for certain foreign private issuers
Conscious Cloud Dispensary Photos
MyCase Pricing | Start Your 10-Day Free Trial Today
Jayah And Kimora Phone Number
Bill Remini Obituary
Best Boston Pizza Places
Gs Dental Associates
Inter Miami Vs Fc Dallas Total Sportek
Jamielizzz Leaked
Ordensfrau: Der Tod ist die Geburt in ein Leben bei Gott
Federal Express Drop Off Center Near Me
Obsidian Guard's Skullsplitter
Bfri Forum
What does wym mean?
Dubois County Barter Page
Utexas Baseball Schedule 2023
Moxfield Deck Builder
Citibank Branch Locations In Orlando Florida
Gamestop Store Manager Pay
White County
Paradise leaked: An analysis of offshore data leaks
Clock Batteries Perhaps Crossword Clue
Mail2World Sign Up
Barber Gym Quantico Hours
Twizzlers Strawberry - 6 x 70 gram | bol
Ranking 134 college football teams after Week 1, from Georgia to Temple
Latest Posts
Article information

Author: Jonah Leffler

Last Updated:

Views: 6303

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Jonah Leffler

Birthday: 1997-10-27

Address: 8987 Kieth Ports, Luettgenland, CT 54657-9808

Phone: +2611128251586

Job: Mining Supervisor

Hobby: Worldbuilding, Electronics, Amateur radio, Skiing, Cycling, Jogging, Taxidermy

Introduction: My name is Jonah Leffler, I am a determined, faithful, outstanding, inexpensive, cheerful, determined, smiling person who loves writing and wants to share my knowledge and understanding with you.