is a key concept in genetics, describing how alleles at different loci are associated more often than expected by chance. It's crucial for understanding genetic variation and plays a vital role in mapping genes associated with diseases and traits.
LD is influenced by factors like genetic drift, population bottlenecks, and natural . Measuring LD helps researchers conduct genome-wide association studies, fine-map disease loci, and infer population history. Understanding LD patterns is essential for interpreting genetic data and designing effective genomic studies.
Definition of linkage disequilibrium
Linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population
Alleles that are in LD are found together on the same more often than would be expected by chance
LD is a crucial concept in population genetics and is widely used in genetic mapping and association studies
Causes of linkage disequilibrium
Genetic drift
Top images from around the web for Genetic drift
Frontiers | GeTallele: A Method for Analysis of DNA and RNA Allele Frequency Distributions View original
Genetic drift is the random fluctuation of allele frequencies in a population over time
In small populations, genetic drift can lead to the random fixation of alleles, resulting in increased LD
Genetic drift is more pronounced in smaller populations due to the greater impact of random sampling effects
Population bottlenecks
Population bottlenecks occur when a population undergoes a severe reduction in size, often due to environmental factors or demographic events
During a bottleneck, rare alleles may be lost, and the remaining alleles may become more frequent, leading to increased LD
Bottlenecks can also result in the random fixation of alleles, further contributing to LD
Founder effects
Founder effects occur when a new population is established by a small number of individuals from a larger population
The limited genetic diversity in the founding population can lead to increased LD, as the alleles present in the founders become more frequent
Founder effects are often observed in isolated populations or those that have undergone rapid expansion from a small initial population
Admixture
Admixture occurs when two or more previously isolated populations interbreed
Admixture can create new combinations of alleles, leading to LD between loci that were previously unlinked in the parental populations
The extent of LD generated by admixture depends on factors such as the genetic distance between the parental populations and the time since admixture occurred
Natural selection
Natural selection can create LD when it favors certain combinations of alleles at different loci
If two alleles at different loci confer a selective advantage when present together, they will tend to be inherited together, resulting in LD
Selective sweeps, where a beneficial allele rapidly increases in frequency and "sweeps" nearby linked alleles along with it, can also generate LD
Measures of linkage disequilibrium
D and D'
D is the basic measure of LD, calculated as the difference between the observed frequency of a haplotype and the expected frequency under random association
D' is a normalized version of D that ranges from -1 to 1, with |D'| = 1 indicating complete LD and D' = 0 indicating no LD
D' is useful for comparing the strength of LD between different pairs of loci, as it is not affected by allele frequencies
r and r²
r is the correlation coefficient between alleles at two loci, ranging from -1 to 1
is the square of r and represents the proportion of variance in allele frequencies at one locus that can be explained by the allele frequencies at the other locus
r² is commonly used in association studies, as it directly relates to the power to detect associations between markers and traits
Factors affecting linkage disequilibrium
Recombination rates
Recombination breaks down LD by shuffling alleles between haplotypes
The rate of is inversely related to the recombination rate between two loci
Regions of the genome with high recombination rates (hotspots) tend to have lower levels of LD, while regions with low recombination rates (coldspots) tend to have higher levels of LD
Mutation rates
Mutations can create new alleles and disrupt existing haplotypes, reducing LD
The impact of mutation on LD depends on the mutation rate and the age of the mutation
Recent mutations will be in complete LD with nearby alleles, while older mutations will have had more time to recombine and break down LD
Population size
Larger populations tend to have lower levels of LD due to the increased effectiveness of recombination in breaking down haplotypes
In smaller populations, genetic drift can lead to the random fixation of alleles and increased LD
Population size also affects the rate at which LD decays over time, with larger populations exhibiting faster decay
Mating patterns
Non-random mating, such as inbreeding or assortative mating, can increase LD by favoring the transmission of certain haplotypes
Inbreeding leads to an increase in homozygosity and can maintain LD by reducing the effective recombination rate
Assortative mating, where individuals with similar phenotypes mate more frequently, can create LD between loci that influence the phenotype
Applications of linkage disequilibrium
Genome-wide association studies (GWAS)
GWAS utilize LD to identify genetic variants associated with traits or diseases
By genotyping a set of markers across the genome and testing for associations with the phenotype of interest, GWAS can identify loci that harbor causal variants
The power of GWAS to detect associations depends on the strength of LD between the causal variant and the genotyped markers
Fine-mapping of disease loci
Once a locus has been identified through GWAS, fine-mapping can be used to pinpoint the causal variant(s) responsible for the association
Fine-mapping involves genotyping additional markers in the region of interest and analyzing patterns of LD to identify the most likely causal variant(s)
The resolution of fine-mapping depends on the strength of LD in the region and the density of markers genotyped
Inferring population history
Patterns of LD can provide insights into a population's demographic history, such as population bottlenecks, expansions, and admixture events
The extent and distribution of LD across the genome can be used to estimate parameters such as and the timing of demographic events
Comparing patterns of LD between populations can also reveal differences in their demographic histories and help identify regions of the genome that have been subject to population-specific selection
Detecting natural selection
LD can be used to detect signatures of natural selection in the genome
Regions of the genome that have undergone recent positive selection will exhibit elevated levels of LD and reduced genetic diversity
Various statistical tests, such as the extended haplotype homozygosity (EHH) test and the integrated haplotype score (iHS), have been developed to identify such regions based on patterns of LD
Linkage disequilibrium vs linkage analysis
Linkage disequilibrium and linkage analysis are two distinct but related concepts in genetics
Linkage analysis is a family-based method that uses the co-segregation of markers and traits within pedigrees to identify regions of the genome that contain causal variants
In contrast, LD is a population-based measure of the non-random association of alleles at different loci
While linkage analysis relies on the direct observation of recombination events within families, LD reflects the cumulative effects of recombination, mutation, drift, and selection over many generations in a population
Patterns of linkage disequilibrium
Variation across the genome
The extent of LD varies widely across the genome, with some regions exhibiting strong LD and others showing little to no LD
Factors such as recombination rates, mutation rates, and the action of selection can all contribute to this variation
Recombination hotspots, which are regions of the genome with elevated recombination rates, tend to have lower levels of LD compared to surrounding regions
Differences between populations
The patterns of LD can differ substantially between populations due to differences in their demographic histories and the action of population-specific selective pressures
Populations that have undergone recent bottlenecks or founder events tend to have higher levels of LD compared to those with more stable demographic histories
Admixture between populations can also create distinct patterns of LD, with the extent of LD depending on the genetic distance between the parental populations and the time since admixture occurred
Limitations of linkage disequilibrium
Indirect association
LD-based methods, such as GWAS, rely on the indirect association between markers and causal variants
This can lead to false positive associations if the causal variant is not directly genotyped and is only in partial LD with the associated marker
Conversely, false negatives can occur if the causal variant is not in strong LD with any of the genotyped markers
Confounding factors
Various confounding factors can influence the patterns of LD observed in a population and lead to spurious associations
, where subgroups within a population have different allele frequencies due to differences in ancestry, can create LD between unlinked loci and result in false positive associations
Cryptic relatedness, where individuals in a study are more closely related than expected by chance, can also inflate LD estimates and lead to false positives
Methods for estimating linkage disequilibrium
Pairwise LD measures
Pairwise LD measures, such as D, D', r, and r², are used to quantify the strength of association between alleles at two loci
These measures can be calculated from genotype data using various statistical software packages
Pairwise LD measures are often used to visualize patterns of LD across the genome and to identify regions of high or low LD
Haplotype-based methods
Haplotype-based methods consider the associations between alleles at multiple loci simultaneously
These methods can provide a more comprehensive view of LD patterns and can be more powerful for detecting associations than pairwise measures
Examples of haplotype-based methods include the estimation of haplotype frequencies, the calculation of haplotype diversity, and the identification of haplotype blocks
Visualization of linkage disequilibrium
LD plots
LD plots are used to visualize the strength of pairwise LD between markers in a region of the genome
These plots typically display the values of D' or r² for all pairs of markers, with the strength of LD indicated by the color or shading of the plot
LD plots can be used to identify regions of high LD, which may be indicative of functional importance or recent selection
Heatmaps
Heatmaps are another common method for visualizing LD patterns
In an LD heatmap, the strength of pairwise LD is represented by the color or intensity of each cell in the matrix, with darker colors indicating stronger LD
Heatmaps can be used to identify patterns of LD across larger regions of the genome and to compare LD patterns between different populations or subgroups
Impact of linkage disequilibrium on genomic studies
Study design considerations
The extent and distribution of LD in a population can have significant implications for the design of genetic studies
In populations with high levels of LD, fewer markers may be needed to capture the majority of the genetic variation, reducing the cost and complexity of the study
Conversely, in populations with lower levels of LD, a higher density of markers may be required to achieve adequate coverage of the genome
Interpretation of results
The presence of LD can complicate the interpretation of results from genetic studies
Associated markers identified through GWAS may not be the causal variants themselves, but rather may be in LD with the causal variant(s)
Fine-mapping and functional studies may be necessary to distinguish causal variants from those that are merely associated due to LD
The extent of LD in a region can also affect the resolution of genetic mapping, with higher levels of LD resulting in larger regions of association and reduced ability to pinpoint causal variants