You have 3 free guides left 😟

Light

You have 3 free guides left 😟

10.2 Multiple linear regression

5 min read•august 16, 2024

Multiple linear regression expands on simple linear regression by using multiple predictors to estimate a single outcome. This powerful tool allows us to model complex relationships between variables, accounting for various factors that influence the .

In this section, we'll learn how to set up and interpret multiple regression models. We'll explore key concepts like coefficient estimation, model fit, and diagnostics, equipping us to tackle real-world prediction problems with multiple variables.

Multiple Linear Regression

Extending Simple Linear Regression

Top images from around the web for Extending Simple Linear Regression

Linear Regression (2 of 4) | Concepts in Statistics View original
Is this image relevant?
Multiple linear regression (MLR) [Python and R codes included] View original
Is this image relevant?
Types of Regression View original
Is this image relevant?
Linear Regression (2 of 4) | Concepts in Statistics View original
Is this image relevant?
Multiple linear regression (MLR) [Python and R codes included] View original
Is this image relevant?

1 of 3

Top images from around the web for Extending Simple Linear Regression

Linear Regression (2 of 4) | Concepts in Statistics View original
Is this image relevant?
Multiple linear regression (MLR) [Python and R codes included] View original
Is this image relevant?
Types of Regression View original
Is this image relevant?
Linear Regression (2 of 4) | Concepts in Statistics View original
Is this image relevant?
Multiple linear regression (MLR) [Python and R codes included] View original
Is this image relevant?

1 of 3

Multiple linear regression incorporates two or more independent variables to predict a single dependent variable
General form of a multiple linear regression model $Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε$ $Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + ... + β_{k} X_{k} + ε$
- Y represents dependent variable
- X₁, X₂, ..., Xₖ represent independent variables
- β₀ represents y-
- β₁, β₂, ..., βₖ represent regression coefficients
- ε represents error term
Method of least squares estimates regression coefficients by minimizing sum of squared
Assumptions of multiple linear regression
- between dependent and independent variables
- Independence of errors
- (constant variance of residuals)
- Normality of residuals
- Absence of among independent variables
(²) measures proportion of variance in dependent variable explained by independent variables collectively
Adjusted R² accounts for number of predictors in model, providing more accurate measure of model fit
Partial regression plots visualize relationship between dependent variable and each while controlling for effects of other predictors (house price vs. square footage, controlling for number of bedrooms)

Model Estimation and Visualization

(OLS) method estimates regression coefficients
Matrix algebra used for efficient computation of coefficient estimates
helps assess model assumptions and identify potential outliers
Scatter plot matrix visualizes relationships between all pairs of variables in the model
3D scatter plots can be used for models with two independent variables (house price vs. square footage and number of bedrooms)
Added-variable plots show the effect of adding a new predictor to an existing model
Leverage plots identify influential observations that have a large impact on the regression results

Interpreting Coefficients and Significance

Understanding Regression Coefficients

Each (β₁, β₂, ..., βₖ) represents change in dependent variable for one-unit change in corresponding independent variable, holding all other variables constant
Intercept (β₀) represents expected value of dependent variable when all independent variables equal zero
of coefficients used to construct and perform hypothesis tests for individual predictors
for each coefficient calculated by dividing coefficient by its standard error $t = \frac{\hat{\beta}_i}{SE(\hat{\beta}_i)}$
associated with each t-statistic indicates probability of obtaining such a result if were true (coefficient equals zero)
(beta coefficients) allow comparison of relative importance of predictors measured on different scales
measure unique contribution of each predictor to explained variance in dependent variable, controlling for other predictors
measure unique contribution of each predictor to total variance in dependent variable

Hypothesis Testing and Confidence Intervals

Null hypothesis for each coefficient $H_0: \beta_i = 0$
for each coefficient $H_1: \beta_i \neq 0$ (two-tailed test)
Confidence intervals for coefficients provide range of plausible values for true population parameters
Interpretation of confidence intervals (95% CI for β₁: 0.5 to 1.2 indicates we are 95% confident true population value lies between 0.5 and 1.2)
(α) determines threshold for rejecting null hypothesis (typically 0.05)
occurs when rejecting true null hypothesis
occurs when failing to reject false null hypothesis
Power of the test represents probability of correctly rejecting false null hypothesis

Evaluating Model Fit and Power

Overall Model Significance and Explanatory Power

tests overall significance of regression model by comparing explained variance to unexplained variance
P-value associated with F-statistic indicates probability of obtaining such a result if all regression coefficients were zero
R² provides measure of model's explanatory power, ranging from 0 to 1
Adjusted R² accounts for number of predictors in model, penalizing overly complex models
(RMSE) quantifies average prediction error in original units of dependent variable $RMSE = \sqrt{\frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n}}$
(MAE) provides alternative measure of average prediction error $MAE = \frac{\sum_{i=1}^n |y_i - \hat{y}_i|}{n}$
techniques assess model's predictive performance on unseen data (k-fold cross-validation)

Model Selection and Diagnostics

(AIC) balances model fit with model complexity for model selection
(BIC) provides alternative to AIC, penalizing model complexity more heavily
Residual plots help assess model assumptions and identify potential issues
- Residuals vs. fitted values plot checks linearity and homoscedasticity assumptions
- Q-Q plot assesses normality of residuals
Influence diagnostics identify observations with disproportionate impact on model results
- measures overall influence of each observation
- DFBETAS measure influence of each observation on individual coefficient estimates
Partial F-tests compare nested models to determine if adding predictors significantly improves model fit

Addressing Issues in Multiple Regression

Multicollinearity and Variable Selection

Multicollinearity occurs when independent variables highly correlated, leading to unstable and unreliable coefficient estimates
(VIF) detects multicollinearity $VIF_i = \frac{1}{1-R_i^2}$ $V I F_{i} = \frac{1}{1 - R _{i}^{2}}$
- VIF values greater than 5 or 10 indicate potential issues
Strategies to address multicollinearity
- Remove redundant variables
- Combine correlated predictors (principal component analysis)
- Use regularization techniques (, LASSO)
Overfitting occurs when too many predictors included in model, leading to poor generalization to new data
methods for automated variable selection
- Forward selection starts with no predictors and adds variables
- Backward elimination starts with all predictors and removes variables
- Stepwise selection combines forward and backward approaches
Bias-variance tradeoff crucial in model selection, balancing model complexity with predictive accuracy

Advanced Modeling Techniques

Interaction terms model non-additive relationships between predictors and dependent variable
Polynomial regression captures non-linear relationships between predictors and dependent variable
Weighted least squares addresses heteroscedasticity by giving less weight to observations with higher variance
Robust regression methods (M-estimation, least trimmed squares) reduce influence of outliers on model estimates
Generalized linear models extend multiple regression to non-normal response variables (logistic regression for binary outcomes)
Regularization techniques (ridge regression, LASSO) shrink coefficient estimates to prevent overfitting
Ensemble methods (random forests, gradient boosting) combine multiple models to improve predictive performance

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

Stay Connected

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Glossary

You have 3 free guides left 😟

You have 3 free guides left 😟

10.2 Multiple linear regression

Multiple Linear Regression

Extending Simple Linear Regression

Top images from around the web for Extending Simple Linear Regression

Top images from around the web for Extending Simple Linear Regression

Model Estimation and Visualization

Interpreting Coefficients and Significance

Understanding Regression Coefficients

Hypothesis Testing and Confidence Intervals

Evaluating Model Fit and Power

Overall Model Significance and Explanatory Power

Model Selection and Diagnostics

Addressing Issues in Multiple Regression

Multicollinearity and Variable Selection

Advanced Modeling Techniques

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

Stay Connected

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Next