You have 3 free guides left 😟

Light

You have 3 free guides left 😟

12.4 Multicollinearity and Variable Transformation

3 min read•july 23, 2024

in regression can mess up your results. It happens when your predictor variables are too closely related, making it hard to figure out which ones are really important. This can lead to weird coefficient estimates and unreliable predictions.

There are ways to spot and fix multicollinearity. You can use correlation matrices, VIF, or condition numbers to detect it. If you find it, try transforming your variables through centering, , or more advanced techniques like PCA or PLS regression.

Multicollinearity and Variable Transformation

Multicollinearity in regression analysis

Top images from around the web for Multicollinearity in regression analysis

Frontiers | In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can ... View original
Is this image relevant?
Determining Multicollinearity for the given ScatterPlot Matrix - Cross Validated View original
Is this image relevant?
multicollinearity - How to interpret ridge regression plot - Cross Validated View original
Is this image relevant?
Frontiers | In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can ... View original
Is this image relevant?
Determining Multicollinearity for the given ScatterPlot Matrix - Cross Validated View original
Is this image relevant?

1 of 3

Top images from around the web for Multicollinearity in regression analysis

Frontiers | In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can ... View original
Is this image relevant?
Determining Multicollinearity for the given ScatterPlot Matrix - Cross Validated View original
Is this image relevant?
multicollinearity - How to interpret ridge regression plot - Cross Validated View original
Is this image relevant?
Frontiers | In Epigenomic Studies, Including Cell-Type Adjustments in Regression Models Can ... View original
Is this image relevant?
Determining Multicollinearity for the given ScatterPlot Matrix - Cross Validated View original
Is this image relevant?

1 of 3

High correlation among independent variables in a multiple regression model
- Occurs when two or more predictor variables are linearly related (income and education level)
Leads to unstable and unreliable estimates of regression coefficients
- Standard errors of the coefficients may be inflated, making it difficult to assess the significance of individual predictors (price and quality ratings for products)
Reduces the model's predictive power and interpretability
Can cause the coefficients to have unexpected signs or magnitudes (negative coefficient for a positive relationship)

Diagnostic measures for multicollinearity

examines pairwise correlations between independent variables
- High correlations (above 0.8 or 0.9) indicate potential multicollinearity (age and years of experience)
(VIF) measures the extent to which the variance of a regression coefficient is inflated due to multicollinearity
- $VIF = \frac{1}{1-R_j^2}$ , where $R_j^2$ is the -squared value obtained by regressing the jth predictor on the remaining predictors
- VIF value greater than 5 or 10 suggests the presence of multicollinearity (VIF of 8 for a predictor variable)
is the ratio of the largest to the smallest eigenvalue of the correlation matrix of the independent variables
- Condition number greater than 30 indicates severe multicollinearity (condition number of 50)

Variable transformation for multicollinearity

Centering subtracts the mean value of each independent variable from its respective values
- Reduces multicollinearity caused by interaction terms in the model (centering age and income variables)
Standardization (Z-score normalization) subtracts the mean and divides by the standard deviation for each independent variable
- Scales the variables to have a mean of 0 and a standard deviation of 1 (standardizing test scores)
(PCA) transforms the original variables into a new set of uncorrelated variables called principal components
- Principal components are linear combinations of the original variables and can be used as predictors in the regression model (PCA on a set of correlated financial ratios)
Partial Least Squares (PLS) regression combines features of PCA and multiple regression
- Constructs new predictor variables (latent variables) that maximize the covariance between the predictors and the response variable (PLS regression for customer satisfaction analysis)

Interpretation after variable transformation

Assess the significance of the transformed variables by examining the p-values associated with the coefficients
- P-value less than the chosen significance level (0.05) indicates that the transformed variable has a significant impact on the response variable (p-value of 0.02 for a transformed predictor)
Interpret the coefficients of the transformed variables
- Coefficients represent the change in the response variable for a one-unit change in the transformed predictor variable, holding other variables constant (a one-unit increase in the standardized income leads to a 0.5 unit increase in the response)
- Interpretation depends on the specific transformation applied (centering, standardization, PCA)
Evaluate the model's goodness of fit using the R-squared value
- R-squared measures the proportion of variance in the response variable explained by the transformed predictor variables
- Higher R-squared value indicates a better fit of the model to the data (R-squared of 0.8 suggests a good fit)
Assess the model's predictive power using techniques such as cross-validation or holdout sample
- Compare the predicted values with the actual values to assess the model's predictive accuracy (mean absolute error of 0.1 indicates high predictive accuracy)

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

Stay Connected

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Glossary

You have 3 free guides left 😟

You have 3 free guides left 😟

12.4 Multicollinearity and Variable Transformation

Multicollinearity and Variable Transformation

Multicollinearity in regression analysis

Top images from around the web for Multicollinearity in regression analysis

Top images from around the web for Multicollinearity in regression analysis

Diagnostic measures for multicollinearity

Variable transformation for multicollinearity

Interpretation after variable transformation

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

Stay Connected

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Next