Fiveable
Fiveable

15.3 Genome-Wide Association Studies (GWAS)

3 min readLast Updated on July 23, 2024

Genome-Wide Association Studies (GWAS) are powerful tools for uncovering genetic links to complex traits and diseases. By analyzing thousands of DNA samples, scientists can pinpoint specific genetic variations associated with everything from height to diabetes risk.

GWAS use single nucleotide polymorphisms (SNPs) as genetic markers. These tiny DNA differences are abundant and easy to detect, making them ideal for scanning the entire genome. By comparing SNP frequencies between groups, researchers can identify genetic hotspots linked to specific traits or conditions.

Genome-Wide Association Studies (GWAS)

Purpose and methodology of GWAS

Top images from around the web for Purpose and methodology of GWAS
Top images from around the web for Purpose and methodology of GWAS
  • GWAS aim to identify genetic variants (usually SNPs) associated with complex traits (height) or diseases (type 2 diabetes) to discover new insights into the genetic basis of human health and disease
  • Methodology involves genotyping a large number of individuals (thousands to hundreds of thousands) for a set of genetic markers (usually SNPs) across the genome
    • Collect phenotypic data (traits or disease status) for the same individuals
    • Perform statistical tests (chi-square or logistic regression) to identify SNPs significantly associated with the phenotype of interest
    • Adjust for multiple testing (Bonferroni correction) to control for false-positive associations

SNPs as genetic markers

  • SNPs are the most common type of genetic variation in the human genome occurring when a single nucleotide differs between individuals at a specific position (A to G)
  • SNPs are used as markers because they are abundant (millions), stable, and easy to genotype (microarrays or sequencing)
  • Using SNPs in GWAS involves selecting a set of SNPs to cover the entire genome at a certain density (one SNP every 5,000 base pairs)
    • The selected SNPs are genotyped in the study population
    • The genotypes are then tested for association with the phenotype of interest

Interpretation of GWAS results

  • Identifying significant SNPs involves performing statistical tests for each SNP comparing the frequency of the SNP alleles between cases and controls or across different levels of a quantitative trait
    • SNPs with p-values below a certain threshold (p<5×108p < 5 \times 10^{-8}) are considered significant
  • Interpreting p-values:
    • The p-value represents the probability of observing the association by chance, assuming the null hypothesis of no association is true
    • Lower p-values indicate stronger evidence for an association between the SNP and the phenotype
    • The significance threshold is set stringently to account for multiple testing (Bonferroni correction)

Challenges and limitations of GWAS

  • Population stratification occurs when the study population consists of subgroups with different genetic ancestries (European vs. African)
    • Can lead to spurious associations if the subgroups also differ in their phenotype frequencies
    • Addressed by adjusting for principal components of genetic variation or using family-based designs
  • Need for large sample sizes:
    • Complex traits are influenced by many genetic variants with small effect sizes
    • Large sample sizes (thousands to hundreds of thousands of individuals) are required to detect these small effects
    • Insufficient sample size can lead to false-negative results and limited power to detect true associations

Applications of GWAS findings

  • Identifying candidate genes:
    • Significant SNPs are mapped to nearby genes based on their genomic location
    • These genes are considered potential candidates for involvement in the trait or disease (BRCA1 for breast cancer)
    • The function and biological relevance of the candidate genes are then investigated
  • Identifying biological pathways:
    • The candidate genes identified from GWAS are analyzed for their roles in known biological pathways (insulin signaling)
    • Enrichment analyses are performed to determine if certain pathways are overrepresented among the candidate genes
    • These analyses can provide insights into the biological mechanisms underlying the complex trait or disease
© 2025 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2025 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2025 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary