2.3 Measures of Model Fit: R-squared and Adjusted R-squared
4 min read•july 30, 2024
Measures of model fit help us gauge how well our model explains the data. tells us what percentage of variation in the dependent variable our model accounts for, ranging from 0 to 1.
While R-squared is useful, it has limitations. Enter , which penalizes adding unnecessary variables. This helps us avoid and compare models with different numbers of predictors more accurately.
Coefficient of determination (R-squared)
Definition and interpretation
Top images from around the web for Definition and interpretation
Coefficient of determination - Wikipedia View original
Is this image relevant?
The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and ... View original
Is this image relevant?
Coefficient of determination - Wikipedia View original
Is this image relevant?
The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and ... View original
Is this image relevant?
1 of 2
Top images from around the web for Definition and interpretation
Coefficient of determination - Wikipedia View original
Is this image relevant?
The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and ... View original
Is this image relevant?
Coefficient of determination - Wikipedia View original
Is this image relevant?
The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and ... View original
Is this image relevant?
1 of 2
R-squared is a statistical measure representing the proportion of variance in the dependent variable predictable from the independent variable(s) in a linear regression model
Ranges from 0 to 1, with higher values indicating a better fit of the model to the data
An R-squared of 1 means the model explains all the variability of the response data around its mean
Interprets the percentage of variation in the dependent variable explainable by the independent variable(s) in the model
Also known as the coefficient of determination, commonly used to assess the of a linear regression model
Formula for R-squared: R−squared=1−(SSR/SST), where SSR is the sum of squared residuals and SST is the total sum of squares
Importance and usage
R-squared provides a quantitative measure of how well the linear regression model fits the observed data
Helps evaluate the strength of the relationship between the dependent and independent variables
Allows comparison of different models to determine which one better explains the variability in the data
Widely used in various fields (economics, social sciences, engineering) to assess the explanatory power of linear regression models
Calculating R-squared
Required components
To calculate R-squared, you need the sum of squared residuals (SSR) and the total sum of squares (SST) from the linear regression model
SSR is the sum of the squared differences between the predicted values and the actual values of the dependent variable
Represents the amount of variation in the dependent variable not explained by the model
SST is the sum of the squared differences between the actual values of the dependent variable and its mean
Represents the total variation in the dependent variable
Calculation methods
Once you have SSR and SST, use the formula R−squared=1−(SSR/SST) to calculate R-squared
Alternatively, most statistical software packages (SPSS, R) and programming languages (Python) provide functions to directly compute R-squared for a given linear regression model
Example in R:
summary(lm_model)$r.squared
returns the R-squared value for the linear model
lm_model
Example in Python with scikit-learn:
from sklearn.metrics import r2_score; r2_score(y_true, y_pred)
calculates R-squared given the true values (
y_true
) and predicted values (
y_pred
)
R-squared limitations vs adjusted R-squared
Limitations of R-squared
R-squared increases as more independent variables are added to the model, even if those variables do not have a significant impact on the dependent variable
This can lead to the inclusion of irrelevant variables and overfitting
Does not indicate whether the independent variables are statistically significant or if the model is appropriate for the data
Only measures the goodness of fit without considering the model's validity
Does not consider the number of independent variables in the model, potentially leading to overfitting if too many variables are included
Adjusted R-squared as an alternative
Adjusted R-squared addresses the limitations of R-squared by adjusting for the number of independent variables in the model
Penalizes the addition of unnecessary independent variables, providing a more reliable measure of the model's goodness of fit
Particularly useful when comparing models with different numbers of independent variables
Helps determine if adding more variables truly improves the model's explanatory power
Adjusted R-squared interpretation
Calculation and formula
Adjusted R-squared is calculated using the formula: AdjustedR−squared=1−[(1−R−squared)∗(n−1)/(n−k−1)], where n is the number of observations and k is the number of independent variables in the model
The adjusted R-squared value will always be less than or equal to the R-squared value
Decreases when the number of independent variables increases without a corresponding improvement in the model's fit
Interpretation and comparison
The interpretation of adjusted R-squared is similar to R-squared
Represents the proportion of variance in the dependent variable predictable from the independent variable(s), adjusted for the number of variables in the model
A higher adjusted R-squared value indicates a better fit of the model to the data, considering the number of independent variables used
When comparing models with different numbers of independent variables, adjusted R-squared is a more appropriate measure than R-squared
Helps identify the model that strikes a balance between explanatory power and parsimony (using fewer variables)