- All
- Business Administration
- Data Analysis
Powered by AI and the LinkedIn community
1
Linear regression
2
Logistic regression
3
Polynomial regression
4
Multiple regression
5
Other types of regression
6
Here’s what else to consider
Regression analysis is a powerful technique for exploring the relationship between a dependent variable and one or more independent variables. But how do you choose the best regression model for your data analysis? In this article, you'll learn about some common types of regression models, their assumptions, advantages, and limitations, and how to apply them to different kinds of data.
Top experts in this article
Selected by the community from 20 contributions. Learn more
Earn a Community Top Voice badge
Add to collaborative articles to get recognized for your expertise on your profile. Learn more
- António Júnior Data Science Leader @ Grupo Boticário
10
- Jackson Walters Mathematician, Programmer
2
-
2
1 Linear regression
Linear regression is the simplest and most widely used form of regression analysis. It assumes that the dependent variable is a linear function of the independent variables, plus some random error. Linear regression can be used to estimate the slope and intercept of the relationship, test hypotheses, and measure the strength and direction of the correlation. However, linear regression also has some drawbacks, such as sensitivity to outliers, multicollinearity, heteroscedasticity, and non-normality of residuals. To use linear regression, you need to check and satisfy these assumptions, or use appropriate transformations or corrections.
Help others by sharing more (125 characters min.)
- Jackson Walters Mathematician, Programmer
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
- Linear regression is quick and dirty and a good way to obtain a baseline for other regression models. It can be trained in sklearn quickly, usually a minute or two for millions of rows. Random Forest Regression - By aggregating many decision trees, they are more accurate than linear models at the cost of training time. n_estimators is the key parameter, ranging from one to a thousand or so. Trained in sklearn, no GPU acceleration.Deep Learning Regressor - A less common approach, there are fully connected general neural networks with a single output. GPUs accelerate training.ARIMA - Autoregressive integrated moving average. Adapted for time series data, these are powerful but are more difficult to set up, requiring removing trends.
LikeLike
Celebrate
Support
Love
Insightful
Funny
2
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
For any regression problem, it's advisable to initially apply a linear regression model to your data. Despite its simplicity, the linear regression model is highly interpretable. Even if it doesn't yield significance, it can provide valuable preliminary insights worth considering.
LikeLike
Celebrate
Support
Love
Insightful
Funny
1
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Linear regression can help us understand how variables are related, but we should not confuse correlation with causation. Two variables that change together are not necessarily connected by a cause-and-effect relationship. Moreover, linear regression works best when the variables have a linear relationship, but this is not always true for real-world data. Sometimes, we might need to use other kinds of regression models, such as polynomial regression or non-parametric regression, that can handle non-linear patterns.
LikeLike
Celebrate
Support
Love
Insightful
Funny
- Iria Amado Irago Operational Efficiency and Data Analyst| Data Science 👩💻| Python 🐍| SQL📊 | Matplotlib 📈 | Seaborn 📊 | pandas 🐼
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
El "mejor" modelo de regresión para un análisis de datos depende de factores como la naturaleza de los datos, la relación entre las variables, y los objetivos del análisis. Aquí tienes un resumen de algunos modelos comunes y sus aplicaciones típicas:Regresión Lineal: Buena para relaciones lineales simples; fácil de implementar y de interpretar, pero limitada a relaciones lineales.
Translated
- Manasa B R Data Analyst
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Linear Regression is often a suitable choice as the best regression model for data analysis when the relationship between the dependent variable and independent variables can be adequately represented by a linear equation. It assumes a linear association between the predictors and the target variable, making it a straightforward and interpretable model. Linear Regression aims to minimize the sum of squared differences between the observed and predicted values, allowing for the estimation of coefficients that quantify the impact of each predictor on the target variable.
LikeLike
Celebrate
Support
Love
Insightful
Funny
Load more contributions
2 Logistic regression
Logistic regression is a type of regression analysis that is suitable for binary or categorical dependent variables, such as yes/no, success/failure, or 0/1 outcomes. It assumes that the log-odds of the dependent variable is a linear function of the independent variables, plus some random error. Logistic regression can be used to estimate the odds ratio and the probability of the outcome, test hypotheses, and measure the goodness of fit and the predictive power of the model. However, logistic regression also has some challenges, such as non-linearity, multicollinearity, overfitting, and separation of data. To use logistic regression, you need to check and satisfy these assumptions, or use appropriate regularization or penalization techniques.
Help others by sharing more (125 characters min.)
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Many industries from marketing, telecom, and other service providers rely on logistic regression to calculate the Churn rate based on individual customer parameters and services used. Not only that, but it offers the companies the basis for customer behaviour predictability to devise the strategy to retain the customers or offer other incentives.
LikeLike
Celebrate
Support
Love
Insightful
Funny
2
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Logistic regression is a suitable choice for binary outcomes, but it has some limitations. One of them is that the coefficients are expressed in log-odds, which are not easy to understand intuitively. Another issue is that logistic regression requires a large sample size to produce reliable and meaningful results. If you have a small dataset, you may consider other methods, such as decision trees or naive Bayes classifiers. The best model depends on your data and your specific needs.
LikeLike
Celebrate
Support
Love
Insightful
Funny
1
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Similar to the linear regression model, logistic regression is a straightforward approach for classification problems. Despite its numerous assumptions and limitations, it remains highly interpretable, making it a valuable tool in understanding and explaining classification outcomes.
LikeLike
Celebrate
Support
Love
Insightful
Funny
1
- Iria Amado Irago Operational Efficiency and Data Analyst| Data Science 👩💻| Python 🐍| SQL📊 | Matplotlib 📈 | Seaborn 📊 | pandas 🐼
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Regresión logística, se utiliza para problemas de clasificación binaria; no para predecir valores numéricos continuos, sino categorías.
Translated
LikeLike
Celebrate
Support
Love
Insightful
Funny
- Manasa B R Data Analyst
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Logistic Regression is a valuable regression model when the target variable is binary or categorical. Unlike Linear Regression, Logistic Regression predicts the probability of an event occurring, making it particularly useful for classification tasks. It models the log-odds of the probability as a linear combination of independent variables, mapping the output to a range between 0 and 1 using the logistic function. This makes Logistic Regression adept at handling scenarios where the outcome is dichotomous, such as predicting whether an email is spam or not. Despite the name, Logistic Regression is a classification algorithm widely utilized for its simplicity and effectiveness in situations where linear separation is not suitable.
Like
3 Polynomial regression
Polynomial regression is a type of regression analysis that allows for non-linear relationships between the dependent variable and the independent variables. It assumes that the dependent variable is a polynomial function of the independent variables, plus some random error. Polynomial regression can be used to capture the curvature and complexity of the relationship, test hypotheses, and measure the strength and direction of the correlation. However, polynomial regression also has some pitfalls, such as overfitting, underfitting, multicollinearity, and high variance. To use polynomial regression, you need to choose the optimal degree of the polynomial, avoid extrapolation, and use cross-validation or other methods to evaluate the model.
Help others by sharing more (125 characters min.)
- Manasa B R Data Analyst
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Polynomial Regression is a flexible regression model that accommodates non-linear relationships between the dependent and independent variables. By introducing polynomial terms, such as quadratic or cubic features, this model can capture more complex patterns within the data. While powerful in capturing intricate curves or bends, Polynomial Regression runs the risk of overfitting, especially with higher-degree polynomials. The choice of the degree requires careful consideration to balance model complexity and generalizability. Polynomial Regression finds utility in scenarios where relationships exhibit non-linear behavior, allowing for a more nuanced representation of the underlying data patterns.
LikeLike
Celebrate
Support
Love
Insightful
Funny
4 Multiple regression
Multiple regression is a type of regression analysis that involves more than one independent variable. It assumes that the dependent variable is a linear or non-linear function of the independent variables, plus some random error. Multiple regression can be used to examine the effect and significance of each independent variable, test hypotheses, and measure the overall fit and the explanatory power of the model. However, multiple regression also has some complications, such as multicollinearity, interaction effects, confounding factors, and model selection. To use multiple regression, you need to check and satisfy the assumptions of the specific type of regression, such as linear, logistic, or polynomial, and use appropriate criteria or methods to select the best model.
Help others by sharing more (125 characters min.)
- Manasa B R Data Analyst
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Multiple Regression is a versatile and widely used regression model that extends the principles of simple linear regression to account for multiple independent variables influencing a single dependent variable. It is a valuable tool when dealing with complex real-world scenarios where multiple factors contribute to the outcome. By estimating coefficients for each predictor, Multiple Regression allows for the simultaneous analysis of the impact of multiple variables, while controlling for each other. This model provides valuable insights into the nuanced relationships within the data, making it a fundamental choice in various fields such as economics, social sciences, and business analytics.
LikeLike
Celebrate
Support
Love
Insightful
Funny
1
5 Other types of regression
There are many other types of regression analysis that can be used for different purposes and data types, such as ridge regression, lasso regression, elastic net regression, robust regression, quantile regression, poisson regression, negative binomial regression, Cox proportional hazards regression, and so on. Each type of regression has its own assumptions, advantages, and limitations, and requires specific tools and skills to apply. To learn more about these types of regression, you can consult online resources, books, or courses on data analysis.
Choosing the best regression model for your data analysis depends on several factors, such as the nature and distribution of your data, the research question and hypothesis, the available tools and software, and the desired outcome and interpretation. There is no one-size-fits-all solution, but rather a process of exploration, comparison, and evaluation. By understanding the basics and characteristics of different types of regression models, you can make informed and effective decisions for your data analysis.
Help others by sharing more (125 characters min.)
-
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Another most notable regression is Time Series regression which incorporates time as the independent variable allowing the companies to forecast growth, revenues, and other projections. However, caution is necessary while dealing with time series data as excessive data cleaning is required to remove the outliers and other seasonalities.
LikeLike
Celebrate
Support
Love
Insightful
Funny
1
- Faransina Olivia Rumere MS Applied Data Science at @Uchicago | Co-Director @saperempuanpapua
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
My undergraduate thesis was about "Restricted Ridge Regression Estimator as a parameter estimation in multiple linear regression model for multicollinearity case", I chose this method rather than the Ridge Regression because:(1) There is a high multicollinearity among the independent variables;(2) In any way, I have prior knowledge (from the expert domain or external resources) that some coefficient in my variables is known or has a fixed or specific value. For example, in a policy-making process context, there might be some regulatory requirements that need to be adjusted in the modeling process.
LikeLike
Celebrate
Support
Love
Insightful
Funny
- Manasa B R Data Analyst
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
Beyond the commonly used regression models like Linear, Logistic, Polynomial, and Multiple Regression, there exists a diverse array of specialized regression techniques tailored to specific data scenarios. Ridge and Lasso Regression, for instance, address multicollinearity and feature selection by introducing regularization terms. Support Vector Regression is effective in capturing non-linear relationships by mapping data into higher-dimensional spaces. Bayesian Regression incorporates Bayesian principles to quantify uncertainties in parameter estimates. Time Series Regression models, such as Autoregressive Integrated Moving Average (ARIMA), focus on temporal dependencies in sequential data.
LikeLike
Celebrate
Support
Love
Insightful
Funny
6 Here’s what else to consider
This is a space to share examples, stories, or insights that don’t fit into any of the previous sections. What else would you like to add?
Help others by sharing more (125 characters min.)
- António Júnior Data Science Leader @ Grupo Boticário
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
A escolha da regressão depende da variável resposta (VR). Por exemplo, se a VR é contínua e segue aproximadamente uma distribuição normal, podemos aplicar a Regressão Linear. Se VR é binária geralmente aplicamos a Regressão Logística, mas também temos a Regressão Probit. Se a VR é de contagem, é possível aplicar a Regressão Poisson ou, se houver superdispersão, Regressão Binomial Negativa. E por aí vai...Para melhor compreensão, pesquise sobre Modelos Lineares Generalizados. Basicamente trata-se de uma extensão dos modelos lineares "tradicionais" para modelagem de VRs pertencentes a outras distribuições da família exponencial, além da normal.
Translated
LikeLike
Celebrate
Support
Love
Insightful
Funny
10
- Faransina Olivia Rumere MS Applied Data Science at @Uchicago | Co-Director @saperempuanpapua
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
When choosing which type of regression works better, there are steps that need to be done first:(1) Define the objectives of the project which include defining the research questions and hypothesis.(2) Understand the existing data and decide whether we expect the relationship of the variable would be linear or non-linear (exponential, polynomial, etc.);(3) Checking on the key assumption of regression (linear or non-linear regression). Take linear regression as an example, then we should check on:(a) Linearity: the relation between independent and dependent variable(s) is assumed to be linear (b) The error should be independent (c) Variance of error should be constant(d) Error is normally distributed(e) No multicollinearity
LikeLike
Celebrate
Support
Love
Insightful
Funny
- Iria Amado Irago Operational Efficiency and Data Analyst| Data Science 👩💻| Python 🐍| SQL📊 | Matplotlib 📈 | Seaborn 📊 | pandas 🐼
- Report contribution
Thanks for letting us know! You'll no longer see this contribution
La elección del modelo adecuado requiere entender los datos, probar diferentes modelos y evaluar su rendimiento mediante métricas adecuadas. No hay un modelo "único" que sea el mejor para todos los escenarios; la elección depende de las particularidades del problema y del conjunto de datos
Translated
LikeLike
Celebrate
Support
Love
Insightful
Funny
Data Analysis
Data Analysis
+ Follow
Rate this article
We created this article with the help of AI. What do you think of it?
It’s great It’s not so great
Thanks for your feedback
Your feedback is private. Like or react to bring the conversation to your network.
Tell us more
Tell us why you didn’t like this article.
If you think something in this article goes against our Professional Community Policies, please let us know.
We appreciate you letting us know. Though we’re unable to respond directly, your feedback helps us improve this experience for everyone.
If you think this goes against our Professional Community Policies, please let us know.
More articles on Data Analysis
No more previous content
- Here's how you can navigate conflicts with stakeholders as a data analyst during a project. 2 contributions
- Here's how you can stay updated on emerging trends in data analysis technology. 1 contribution
- What do you do if you're tasked with simplifying data analysis for non-technical interviewees? 7 contributions
- You're striving for impeccable data analysis. How do you guarantee precise and dependable results? 10 contributions
- What do you do if your boss challenges your data analysis insights? 7 contributions
- Here's how you can acquire the top technical skills for entry-level Data Analysts. 28 contributions
- You're eager to connect with Data Analysts. How can you leverage online platforms like GitHub for networking? 12 contributions
No more next content
Explore Other Skills
- Business Communications
- Business Strategy
- Business Management
- Product Management
- Business Development
- Business Intelligence (BI)
- Project Management
- Consulting
- Business Analysis
- Entrepreneurship
More relevant reading
- Data Analytics What are the advantages and disadvantages of different regression models?
- Data Management What are the key steps in preparing data for regression analysis?
- Business Intelligence What are the most effective regression models for different types of data?
- Data Science How can you manage heterogeneity in regression analysis?