You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Measures of model fit help us gauge how well our model explains the data. tells us what percentage of variation in the dependent variable our model accounts for, ranging from 0 to 1.

While R-squared is useful, it has limitations. Enter , which penalizes adding unnecessary variables. This helps us avoid and compare models with different numbers of predictors more accurately.

Coefficient of determination (R-squared)

Definition and interpretation

Top images from around the web for Definition and interpretation
Top images from around the web for Definition and interpretation
  • R-squared is a statistical measure representing the proportion of variance in the dependent variable predictable from the independent variable(s) in a linear regression model
  • Ranges from 0 to 1, with higher values indicating a better fit of the model to the data
    • An R-squared of 1 means the model explains all the variability of the response data around its mean
  • Interprets the percentage of variation in the dependent variable explainable by the independent variable(s) in the model
  • Also known as the coefficient of determination, commonly used to assess the of a linear regression model
  • Formula for R-squared: Rsquared=1(SSR/SST)R-squared = 1 - (SSR / SST), where SSR is the sum of squared residuals and SST is the total sum of squares

Importance and usage

  • R-squared provides a quantitative measure of how well the linear regression model fits the observed data
  • Helps evaluate the strength of the relationship between the dependent and independent variables
  • Allows comparison of different models to determine which one better explains the variability in the data
  • Widely used in various fields (economics, social sciences, engineering) to assess the explanatory power of linear regression models

Calculating R-squared

Required components

  • To calculate R-squared, you need the sum of squared residuals (SSR) and the total sum of squares (SST) from the linear regression model
  • SSR is the sum of the squared differences between the predicted values and the actual values of the dependent variable
    • Represents the amount of variation in the dependent variable not explained by the model
  • SST is the sum of the squared differences between the actual values of the dependent variable and its mean
    • Represents the total variation in the dependent variable

Calculation methods

  • Once you have SSR and SST, use the formula Rsquared=1(SSR/SST)R-squared = 1 - (SSR / SST) to calculate R-squared
  • Alternatively, most statistical software packages (SPSS, R) and programming languages (Python) provide functions to directly compute R-squared for a given linear regression model
    • Example in R:
      summary(lm_model)$r.squared
      returns the R-squared value for the linear model
      lm_model
    • Example in Python with scikit-learn:
      from sklearn.metrics import r2_score; r2_score(y_true, y_pred)
      calculates R-squared given the true values (
      y_true
      ) and predicted values (
      y_pred
      )

R-squared limitations vs adjusted R-squared

Limitations of R-squared

  • R-squared increases as more independent variables are added to the model, even if those variables do not have a significant impact on the dependent variable
    • This can lead to the inclusion of irrelevant variables and overfitting
  • Does not indicate whether the independent variables are statistically significant or if the model is appropriate for the data
    • Only measures the goodness of fit without considering the model's validity
  • Does not consider the number of independent variables in the model, potentially leading to overfitting if too many variables are included

Adjusted R-squared as an alternative

  • Adjusted R-squared addresses the limitations of R-squared by adjusting for the number of independent variables in the model
  • Penalizes the addition of unnecessary independent variables, providing a more reliable measure of the model's goodness of fit
  • Particularly useful when comparing models with different numbers of independent variables
    • Helps determine if adding more variables truly improves the model's explanatory power

Adjusted R-squared interpretation

Calculation and formula

  • Adjusted R-squared is calculated using the formula: AdjustedRsquared=1[(1Rsquared)(n1)/(nk1)]Adjusted R-squared = 1 - [(1 - R-squared) * (n - 1) / (n - k - 1)], where n is the number of observations and k is the number of independent variables in the model
  • The adjusted R-squared value will always be less than or equal to the R-squared value
    • Decreases when the number of independent variables increases without a corresponding improvement in the model's fit

Interpretation and comparison

  • The interpretation of adjusted R-squared is similar to R-squared
    • Represents the proportion of variance in the dependent variable predictable from the independent variable(s), adjusted for the number of variables in the model
  • A higher adjusted R-squared value indicates a better fit of the model to the data, considering the number of independent variables used
  • When comparing models with different numbers of independent variables, adjusted R-squared is a more appropriate measure than R-squared
    • Helps identify the model that strikes a balance between explanatory power and parsimony (using fewer variables)
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary