🫁Intro to Biostatistics Unit 4 – Hypothesis Testing

Hypothesis testing is a crucial tool in biostatistics for drawing conclusions from data. It involves formulating null and alternative hypotheses, choosing appropriate tests, and interpreting results. This process helps researchers assess the significance of observed effects and make informed decisions based on statistical evidence. Key concepts include null and alternative hypotheses, types of errors, test statistics, and p-values. Various tests are used for different scenarios, such as comparing means or proportions. Proper interpretation of results, considering statistical and practical significance, is essential for drawing meaningful conclusions in biomedical research.

Key Concepts

  • Null hypothesis (H0H_0) represents the default or status quo, assuming no significant effect or difference
  • Alternative hypothesis (HAH_A or H1H_1) challenges the null hypothesis, proposing a significant effect or difference exists
  • Type I error (false positive) occurs when rejecting a true null hypothesis, with the significance level α\alpha controlling its probability
  • Type II error (false negative) happens when failing to reject a false null hypothesis, with β\beta denoting its probability
    • Power (1β1-\beta) measures the probability of correctly rejecting a false null hypothesis
  • Test statistic is a value calculated from the sample data, used to determine whether to reject the null hypothesis
  • Critical value is a threshold on the test statistic that determines the rejection region for the null hypothesis
  • Rejection region is the range of test statistic values that lead to rejecting the null hypothesis

Types of Hypothesis Tests

  • One-sample tests compare a single sample mean or proportion to a hypothesized population value
    • One-sample t-test for comparing a sample mean to a population mean with unknown variance
    • One-sample z-test for comparing a sample mean to a population mean with known variance
    • One-sample proportion test for comparing a sample proportion to a population proportion
  • Two-sample tests compare means or proportions between two independent groups
    • Two-sample t-test for comparing means between two groups with unknown variances
    • Two-sample z-test for comparing means between two groups with known variances
    • Two-sample proportion test for comparing proportions between two groups
  • Paired tests compare means or proportions between two related or matched groups
    • Paired t-test for comparing means between two related groups
    • McNemar's test for comparing proportions between two related groups
  • Analysis of Variance (ANOVA) tests compare means among three or more groups
    • One-way ANOVA for comparing means among groups with one factor
    • Two-way ANOVA for comparing means among groups with two factors
  • Chi-square tests assess the association between two categorical variables
    • Chi-square test of independence for testing the association between two categorical variables
    • Chi-square goodness-of-fit test for comparing observed frequencies to expected frequencies

Steps in Hypothesis Testing

  1. State the null and alternative hypotheses clearly, specifying the parameter of interest and the direction of the alternative hypothesis (one-tailed or two-tailed)
  2. Choose an appropriate test statistic and significance level (α\alpha) based on the type of data and the research question
  3. Calculate the test statistic from the sample data using the appropriate formula for the selected hypothesis test
  4. Determine the critical value(s) or p-value associated with the test statistic, using the sampling distribution of the test statistic under the null hypothesis
  5. Compare the test statistic to the critical value(s) or p-value to the significance level, and decide whether to reject or fail to reject the null hypothesis
  6. Interpret the results in the context of the research question, considering the practical significance and potential limitations of the study
  7. Report the findings, including the test statistic, p-value, confidence interval (if applicable), and a clear conclusion based on the hypothesis test

Statistical Significance and p-values

  • Statistical significance indicates the likelihood of observing the sample results or more extreme results, assuming the null hypothesis is true
  • p-value is the probability of obtaining the observed sample results or more extreme results, given that the null hypothesis is true
    • A small p-value (typically < 0.05) suggests strong evidence against the null hypothesis, leading to its rejection
    • A large p-value (typically > 0.05) indicates weak evidence against the null hypothesis, leading to a failure to reject it
  • Significance level (α\alpha) is the predetermined probability threshold for rejecting the null hypothesis, commonly set at 0.05
  • Confidence interval provides a range of plausible values for the population parameter, with a specified level of confidence (e.g., 95%)
    • If the confidence interval does not contain the null hypothesis value, it suggests a statistically significant result
  • Multiple testing correction adjusts the significance level when conducting multiple hypothesis tests simultaneously to control the familywise error rate or false discovery rate

Common Errors and Pitfalls

  • Misinterpretation of p-values as the probability of the null hypothesis being true or the probability of the results occurring by chance alone
  • Confusing statistical significance with practical or clinical significance, as large sample sizes can lead to statistically significant but practically unimportant differences
  • Failing to check assumptions of the hypothesis test, such as normality, homogeneity of variance, or independence of observations
  • Choosing an inappropriate hypothesis test for the data type or research question, leading to invalid conclusions
  • Overinterpreting non-significant results as evidence of no effect, as a lack of statistical significance may be due to insufficient power or sample size
  • Engaging in data dredging or p-hacking, where multiple analyses are conducted until a significant result is found, without proper adjustment for multiple testing
  • Neglecting to consider potential confounding variables or alternative explanations for the observed results

Real-world Applications in Biostatistics

  • Clinical trials comparing the efficacy of a new drug to a placebo or standard treatment using hypothesis tests to assess treatment differences
  • Epidemiological studies investigating the association between risk factors and disease outcomes using hypothesis tests to identify significant relationships
  • Genetic studies testing for associations between genetic variants and phenotypic traits using hypothesis tests to detect significant genetic effects
  • Public health research evaluating the effectiveness of interventions or policies using hypothesis tests to compare outcomes between groups
  • Diagnostic test validation assessing the performance of a new diagnostic test compared to a gold standard using hypothesis tests to evaluate sensitivity and specificity
  • Survival analysis comparing survival rates between different treatment groups using hypothesis tests to detect significant differences in survival curves
  • Meta-analyses combining results from multiple studies using hypothesis tests to assess the overall effect size and heterogeneity across studies

Interpreting and Reporting Results

  • Report the null and alternative hypotheses, the chosen hypothesis test, and the significance level
  • Present the test statistic, degrees of freedom (if applicable), and the corresponding p-value
  • Interpret the p-value in the context of the research question, stating whether the null hypothesis is rejected or not rejected based on the significance level
  • Provide a confidence interval for the parameter of interest, if applicable, to indicate the precision of the estimate
  • Discuss the practical or clinical significance of the findings, considering the magnitude of the effect and its relevance to the field
  • Address potential limitations of the study, such as sample size, generalizability, or potential confounding factors
  • Suggest future research directions based on the findings and any unanswered questions or new hypotheses generated by the study

Advanced Topics and Extensions

  • Non-parametric tests, such as the Wilcoxon rank-sum test or Kruskal-Wallis test, for data that violate assumptions of parametric tests
  • Bayesian hypothesis testing, which incorporates prior information and calculates the posterior probability of the null and alternative hypotheses
  • Equivalence and non-inferiority testing, which aim to demonstrate that two treatments are similar or that a new treatment is not worse than a standard treatment by a specified margin
  • Multiple comparison procedures, such as Bonferroni correction or Tukey's HSD, to control the familywise error rate when conducting multiple pairwise comparisons
  • Multivariate hypothesis tests, such as MANOVA or Hotelling's T-squared test, for comparing means across multiple dependent variables simultaneously
  • Mixed-effects models and repeated measures designs, which account for correlated data structures and random effects in hypothesis testing
  • Sequential analysis and adaptive designs, which allow for interim analyses and modifications to the study design based on accumulating data while controlling the Type I error rate
  • Power analysis and sample size determination, which help ensure that a study has sufficient power to detect a meaningful effect size given the desired significance level and variability in the data


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.