You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Multiple linear regression expands on simple linear regression by using multiple predictors to estimate a single outcome. This powerful tool allows us to model complex relationships between variables, accounting for various factors that influence the .

In this section, we'll learn how to set up and interpret multiple regression models. We'll explore key concepts like coefficient estimation, model fit, and diagnostics, equipping us to tackle real-world prediction problems with multiple variables.

Multiple Linear Regression

Extending Simple Linear Regression

Top images from around the web for Extending Simple Linear Regression
Top images from around the web for Extending Simple Linear Regression
  • Multiple linear regression incorporates two or more independent variables to predict a single dependent variable
  • General form of a multiple linear regression model Y=β0+β1X1+β2X2+...+βkXk+εY = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε
    • Y represents dependent variable
    • X₁, X₂, ..., Xₖ represent independent variables
    • β₀ represents y-
    • β₁, β₂, ..., βₖ represent regression coefficients
    • ε represents error term
  • Method of least squares estimates regression coefficients by minimizing sum of squared
  • Assumptions of multiple linear regression
    • between dependent and independent variables
    • Independence of errors
    • (constant variance of residuals)
    • Normality of residuals
    • Absence of among independent variables
  • (²) measures proportion of variance in dependent variable explained by independent variables collectively
  • Adjusted R² accounts for number of predictors in model, providing more accurate measure of model fit
  • Partial regression plots visualize relationship between dependent variable and each while controlling for effects of other predictors (house price vs. square footage, controlling for number of bedrooms)

Model Estimation and Visualization

  • (OLS) method estimates regression coefficients
  • Matrix algebra used for efficient computation of coefficient estimates
  • helps assess model assumptions and identify potential outliers
  • Scatter plot matrix visualizes relationships between all pairs of variables in the model
  • 3D scatter plots can be used for models with two independent variables (house price vs. square footage and number of bedrooms)
  • Added-variable plots show the effect of adding a new predictor to an existing model
  • Leverage plots identify influential observations that have a large impact on the regression results

Interpreting Coefficients and Significance

Understanding Regression Coefficients

  • Each (β₁, β₂, ..., βₖ) represents change in dependent variable for one-unit change in corresponding independent variable, holding all other variables constant
  • Intercept (β₀) represents expected value of dependent variable when all independent variables equal zero
  • of coefficients used to construct and perform hypothesis tests for individual predictors
  • for each coefficient calculated by dividing coefficient by its standard error t=β^iSE(β^i)t = \frac{\hat{\beta}_i}{SE(\hat{\beta}_i)}
  • associated with each t-statistic indicates probability of obtaining such a result if were true (coefficient equals zero)
  • (beta coefficients) allow comparison of relative importance of predictors measured on different scales
  • measure unique contribution of each predictor to explained variance in dependent variable, controlling for other predictors
  • measure unique contribution of each predictor to total variance in dependent variable

Hypothesis Testing and Confidence Intervals

  • Null hypothesis for each coefficient H0:βi=0H_0: \beta_i = 0
  • for each coefficient H1:βi0H_1: \beta_i \neq 0 (two-tailed test)
  • Confidence intervals for coefficients provide range of plausible values for true population parameters
  • Interpretation of confidence intervals (95% CI for β₁: 0.5 to 1.2 indicates we are 95% confident true population value lies between 0.5 and 1.2)
  • (α) determines threshold for rejecting null hypothesis (typically 0.05)
  • occurs when rejecting true null hypothesis
  • occurs when failing to reject false null hypothesis
  • Power of the test represents probability of correctly rejecting false null hypothesis

Evaluating Model Fit and Power

Overall Model Significance and Explanatory Power

  • tests overall significance of regression model by comparing explained variance to unexplained variance
  • P-value associated with F-statistic indicates probability of obtaining such a result if all regression coefficients were zero
  • R² provides measure of model's explanatory power, ranging from 0 to 1
  • Adjusted R² accounts for number of predictors in model, penalizing overly complex models
  • (RMSE) quantifies average prediction error in original units of dependent variable RMSE=i=1n(yiy^i)2nRMSE = \sqrt{\frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{n}}
  • (MAE) provides alternative measure of average prediction error MAE=i=1nyiy^inMAE = \frac{\sum_{i=1}^n |y_i - \hat{y}_i|}{n}
  • techniques assess model's predictive performance on unseen data (k-fold cross-validation)

Model Selection and Diagnostics

  • (AIC) balances model fit with model complexity for model selection
  • (BIC) provides alternative to AIC, penalizing model complexity more heavily
  • Residual plots help assess model assumptions and identify potential issues
    • Residuals vs. fitted values plot checks linearity and homoscedasticity assumptions
    • Q-Q plot assesses normality of residuals
  • Influence diagnostics identify observations with disproportionate impact on model results
    • measures overall influence of each observation
    • DFBETAS measure influence of each observation on individual coefficient estimates
  • Partial F-tests compare nested models to determine if adding predictors significantly improves model fit

Addressing Issues in Multiple Regression

Multicollinearity and Variable Selection

  • Multicollinearity occurs when independent variables highly correlated, leading to unstable and unreliable coefficient estimates
  • (VIF) detects multicollinearity VIFi=11Ri2VIF_i = \frac{1}{1-R_i^2}
    • VIF values greater than 5 or 10 indicate potential issues
  • Strategies to address multicollinearity
    • Remove redundant variables
    • Combine correlated predictors (principal component analysis)
    • Use regularization techniques (, LASSO)
  • Overfitting occurs when too many predictors included in model, leading to poor generalization to new data
  • methods for automated variable selection
    • Forward selection starts with no predictors and adds variables
    • Backward elimination starts with all predictors and removes variables
    • Stepwise selection combines forward and backward approaches
  • Bias-variance tradeoff crucial in model selection, balancing model complexity with predictive accuracy

Advanced Modeling Techniques

  • Interaction terms model non-additive relationships between predictors and dependent variable
  • Polynomial regression captures non-linear relationships between predictors and dependent variable
  • Weighted least squares addresses heteroscedasticity by giving less weight to observations with higher variance
  • Robust regression methods (M-estimation, least trimmed squares) reduce influence of outliers on model estimates
  • Generalized linear models extend multiple regression to non-normal response variables (logistic regression for binary outcomes)
  • Regularization techniques (ridge regression, LASSO) shrink coefficient estimates to prevent overfitting
  • Ensemble methods (random forests, gradient boosting) combine multiple models to improve predictive performance
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary