You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Regression models are powerful tools for analyzing relationships between variables in epidemiological studies. helps us understand continuous outcomes, while tackles binary outcomes. digs into , crucial for studying disease progression and treatment effects.

These models allow researchers to control for factors and estimate the impact of specific variables on health outcomes. By interpreting coefficients, odds ratios, and hazard ratios, we can quantify the strength of associations and make evidence-based decisions in public health.

Linear Regression for Continuous Outcomes

Linear Regression Modeling

Top images from around the web for Linear Regression Modeling
Top images from around the web for Linear Regression Modeling
  • Linear regression is a statistical method used to model the linear relationship between a continuous dependent variable (outcome) and one or more independent variables (predictors)
  • The general form of a simple linear regression model is Y=β0+β1X+εY = β₀ + β₁X + ε, where YY is the dependent variable, XX is the independent variable, β0β₀ is the intercept, β1β₁ is the slope (regression coefficient), and εε is the error term
  • Multiple linear regression extends the simple linear regression model to include two or more independent variables: Y=β0+β1X1+β2X2+...+βpXp+εY = β₀ + β₁X₁ + β₂X₂ + ... + βₚXₚ + ε
  • Examples of continuous outcomes modeled using linear regression include body mass index (BMI), blood pressure, and income

Assumptions and Estimation

  • Assumptions of linear regression include linearity, independence, , and normality of residuals
    • Linearity assumes a linear relationship between the dependent and independent variables
    • Independence assumes that observations are independent of each other
    • Homoscedasticity assumes constant variance of the residuals across all levels of the independent variables
    • Normality assumes that the residuals follow a normal distribution
  • The method of least squares is used to estimate the regression coefficients by minimizing the sum of squared residuals
  • The coefficient of determination (R2) measures the proportion of variance in the dependent variable explained by the independent variable(s)
    • R2 ranges from 0 to 1, with higher values indicating a better fit of the model to the data

Logistic Regression for Binary Outcomes

Logistic Regression Modeling

  • Logistic regression is a statistical method used to model the relationship between a binary dependent variable (outcome) and one or more independent variables (predictors)
  • The logistic regression model estimates the probability of the occurring given the values of the independent variables: P(Y=1X)=1/(1+e(β0+β1X1+...+βpXp))P(Y=1|X) = 1 / (1 + e⁻(β₀ + β₁X₁ + ... + βₚXₚ))
  • Examples of binary outcomes modeled using logistic regression include disease status (present or absent), mortality (alive or dead), and customer churn (churned or retained)

Odds Ratios and Model Fit

  • The (OR) is a measure of association between an exposure and an outcome, representing the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure
  • The logistic regression coefficients (ββ) are interpreted as the change in the log odds of the outcome for a one-unit increase in the predictor variable, holding other variables constant
  • The exponentiated coefficients (eβe^β) represent the odds ratios for each predictor variable
  • can be assessed using the likelihood ratio test, Wald test, and deviance test, as well as measures such as the Hosmer-Lemeshow test and pseudo-R2 values (e.g., McFadden's R2, Cox and Snell R2)

Survival Analysis for Time-to-Event Data

Kaplan-Meier Method and Log-Rank Test

  • Survival analysis is a statistical method used to analyze time-to-event data, where the outcome variable is the time until an event of interest occurs (e.g., death, disease recurrence, or mechanical failure)
  • The Kaplan-Meier method is a non-parametric approach to estimate the survival function, S(t)S(t), which represents the probability of surviving beyond time tt
  • The log-rank test is used to compare the survival curves of two or more groups to determine if there is a statistically significant difference in survival between the groups
  • Examples of time-to-event data analyzed using survival analysis include time to death in cancer patients, time to disease recurrence after treatment, and time to mechanical failure in engineering systems

Cox Proportional Hazards Model and Censoring

  • The Cox proportional hazards model is a semi-parametric regression model used to investigate the relationship between survival time and one or more predictor variables
  • The hazard ratio (HR) is a measure of the effect of a predictor variable on the hazard (instantaneous risk) of the event occurring, assuming that the proportional hazards assumption holds
  • Censoring occurs when the exact survival time is unknown, either because the individual has not experienced the event by the end of the study (right-censoring) or because they were lost to follow-up (left-censoring)
  • Examples of predictor variables in a Cox proportional hazards model include age, gender, and treatment group

Regression Model Interpretation and Diagnostics

Coefficient Interpretation and Significance

  • Regression coefficients represent the change in the dependent variable associated with a one-unit change in the independent variable, holding other variables constant
  • In linear regression, the coefficients directly represent the change in the dependent variable, while in logistic regression, the coefficients represent the change in the log odds of the outcome
  • Statistical significance of regression coefficients can be assessed using t-tests (linear regression) or Wald tests (logistic regression), with p-values indicating the probability of observing the estimated coefficient if the null hypothesis (β=0β = 0) is true
  • Confidence intervals for regression coefficients provide a range of plausible values for the true population parameter

Model Fit and Diagnostic Tests

  • Model fit can be assessed using the coefficient of determination (R2) for linear regression and likelihood ratio tests, Wald tests, and deviance tests for logistic regression
  • Diagnostic tests for regression models include checking for (variance inflation factor), influential observations (Cook's distance, leverage), and residual plots to assess model assumptions (linearity, homoscedasticity, normality)
  • Cross-validation techniques, such as k-fold cross-validation or leave-one-out cross-validation, can be used to assess the model's predictive performance on unseen data and to detect overfitting
  • Examples of diagnostic tests include examining the variance inflation factor (VIF) to detect multicollinearity, using Cook's distance to identify influential observations, and creating residual plots to assess the linearity assumption in linear regression
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary