You have 3 free guides left 😟

Light

You have 3 free guides left 😟

11.3 Regression models: linear, logistic, and survival analysis

5 min read•august 14, 2024

Regression models are powerful tools for analyzing relationships between variables in epidemiological studies. helps us understand continuous outcomes, while tackles binary outcomes. digs into , crucial for studying disease progression and treatment effects.

These models allow researchers to control for factors and estimate the impact of specific variables on health outcomes. By interpreting coefficients, odds ratios, and hazard ratios, we can quantify the strength of associations and make evidence-based decisions in public health.

Linear Regression for Continuous Outcomes

Linear Regression Modeling

Top images from around the web for Linear Regression Modeling

Data Science for Water Professionals: Basic Linear Regression View original
Is this image relevant?
Simple Linear regression algorithm in machine learning with example - Codershood View original
Is this image relevant?
Data Science for Water Professionals: Basic Linear Regression View original
Is this image relevant?
Simple Linear regression algorithm in machine learning with example - Codershood View original
Is this image relevant?

1 of 2

Top images from around the web for Linear Regression Modeling

Data Science for Water Professionals: Basic Linear Regression View original
Is this image relevant?
Simple Linear regression algorithm in machine learning with example - Codershood View original
Is this image relevant?
Data Science for Water Professionals: Basic Linear Regression View original
Is this image relevant?
Simple Linear regression algorithm in machine learning with example - Codershood View original
Is this image relevant?

1 of 2

Linear regression is a statistical method used to model the linear relationship between a continuous dependent variable (outcome) and one or more independent variables (predictors)
The general form of a simple linear regression model is $Y = β₀ + β₁X + ε$ , where $Y$ is the dependent variable, $X$ is the independent variable, $β₀$ is the intercept, $β₁$ is the slope (regression coefficient), and $ε$ is the error term
Multiple linear regression extends the simple linear regression model to include two or more independent variables: $Y = β₀ + β₁X₁ + β₂X₂ + ... + βₚXₚ + ε$
Examples of continuous outcomes modeled using linear regression include body mass index (BMI), blood pressure, and income

Assumptions and Estimation

Assumptions of linear regression include linearity, independence, , and normality of residuals
- Linearity assumes a linear relationship between the dependent and independent variables
- Independence assumes that observations are independent of each other
- Homoscedasticity assumes constant variance of the residuals across all levels of the independent variables
- Normality assumes that the residuals follow a normal distribution
The method of least squares is used to estimate the regression coefficients by minimizing the sum of squared residuals
The coefficient of determination ( $R²$ $R^{2}$ ) measures the proportion of variance in the dependent variable explained by the independent variable(s)
- $R²$ ranges from 0 to 1, with higher values indicating a better fit of the model to the data

Logistic Regression for Binary Outcomes

Logistic Regression Modeling

Logistic regression is a statistical method used to model the relationship between a binary dependent variable (outcome) and one or more independent variables (predictors)
The logistic regression model estimates the probability of the occurring given the values of the independent variables: $P(Y=1|X) = 1 / (1 + e⁻(β₀ + β₁X₁ + ... + βₚXₚ))$
Examples of binary outcomes modeled using logistic regression include disease status (present or absent), mortality (alive or dead), and customer churn (churned or retained)

Odds Ratios and Model Fit

The (OR) is a measure of association between an exposure and an outcome, representing the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure
The logistic regression coefficients ( $β$ ) are interpreted as the change in the log odds of the outcome for a one-unit increase in the predictor variable, holding other variables constant
The exponentiated coefficients ( $e^β$ ) represent the odds ratios for each predictor variable
can be assessed using the likelihood ratio test, Wald test, and deviance test, as well as measures such as the Hosmer-Lemeshow test and pseudo- $R²$ values (e.g., McFadden's $R²$ , Cox and Snell $R²$ )

Survival Analysis for Time-to-Event Data

Kaplan-Meier Method and Log-Rank Test

Survival analysis is a statistical method used to analyze time-to-event data, where the outcome variable is the time until an event of interest occurs (e.g., death, disease recurrence, or mechanical failure)
The Kaplan-Meier method is a non-parametric approach to estimate the survival function, $S(t)$ , which represents the probability of surviving beyond time $t$
The log-rank test is used to compare the survival curves of two or more groups to determine if there is a statistically significant difference in survival between the groups
Examples of time-to-event data analyzed using survival analysis include time to death in cancer patients, time to disease recurrence after treatment, and time to mechanical failure in engineering systems

Cox Proportional Hazards Model and Censoring

The Cox proportional hazards model is a semi-parametric regression model used to investigate the relationship between survival time and one or more predictor variables
The hazard ratio (HR) is a measure of the effect of a predictor variable on the hazard (instantaneous risk) of the event occurring, assuming that the proportional hazards assumption holds
Censoring occurs when the exact survival time is unknown, either because the individual has not experienced the event by the end of the study (right-censoring) or because they were lost to follow-up (left-censoring)
Examples of predictor variables in a Cox proportional hazards model include age, gender, and treatment group

Regression Model Interpretation and Diagnostics

Coefficient Interpretation and Significance

Regression coefficients represent the change in the dependent variable associated with a one-unit change in the independent variable, holding other variables constant
In linear regression, the coefficients directly represent the change in the dependent variable, while in logistic regression, the coefficients represent the change in the log odds of the outcome
Statistical significance of regression coefficients can be assessed using t-tests (linear regression) or Wald tests (logistic regression), with p-values indicating the probability of observing the estimated coefficient if the null hypothesis ( $β = 0$ ) is true
Confidence intervals for regression coefficients provide a range of plausible values for the true population parameter

Model Fit and Diagnostic Tests

Model fit can be assessed using the coefficient of determination ( $R²$ ) for linear regression and likelihood ratio tests, Wald tests, and deviance tests for logistic regression
Diagnostic tests for regression models include checking for (variance inflation factor), influential observations (Cook's distance, leverage), and residual plots to assess model assumptions (linearity, homoscedasticity, normality)
Cross-validation techniques, such as k-fold cross-validation or leave-one-out cross-validation, can be used to assess the model's predictive performance on unseen data and to detect overfitting
Examples of diagnostic tests include examining the variance inflation factor (VIF) to detect multicollinearity, using Cook's distance to identify influential observations, and creating residual plots to assess the linearity assumption in linear regression

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

Stay Connected

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

About Fiveable Blog Careers Testimonials Code of Conduct Terms of Use Privacy Policy CCPA Privacy Policy

Resources

Cram Mode AP Score Calculators Study Guides Practice Quizzes Glossary Crisis Text Line Request a Feature

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Glossary

You have 3 free guides left 😟

You have 3 free guides left 😟

11.3 Regression models: linear, logistic, and survival analysis

Linear Regression for Continuous Outcomes

Linear Regression Modeling

Top images from around the web for Linear Regression Modeling

Top images from around the web for Linear Regression Modeling

Assumptions and Estimation

Logistic Regression for Binary Outcomes

Logistic Regression Modeling

Odds Ratios and Model Fit

Survival Analysis for Time-to-Event Data

Kaplan-Meier Method and Log-Rank Test

Cox Proportional Hazards Model and Censoring

Regression Model Interpretation and Diagnostics

Coefficient Interpretation and Significance

Model Fit and Diagnostic Tests

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

Stay Connected

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

About Us

Resources

© 2024 Fiveable Inc. All rights reserved.

AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

Back

Next