You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Structural variations are genomic alterations that can significantly impact gene function and expression. From single nucleotide changes to large-scale rearrangements, these variations play crucial roles in genetic diversity, evolution, and disease susceptibility.

Understanding structural variations is essential for comprehending genome dynamics and their biological consequences. This topic explores different types of structural variations, mechanisms of formation, detection methods, and their impacts on gene function and phenotypes.

Types of structural variations

Single nucleotide variations

Top images from around the web for Single nucleotide variations
Top images from around the web for Single nucleotide variations
  • (SNVs) are changes in individual bases of DNA, including substitutions, , and of a single nucleotide
  • SNVs can be classified as transitions (purine-to-purine or pyrimidine-to-pyrimidine changes) or transversions (purine-to-pyrimidine or pyrimidine-to-purine changes)
  • SNVs can have various effects on gene function, such as altering protein structure, affecting splicing sites, or modifying regulatory elements
  • Examples of SNVs include single nucleotide polymorphisms (SNPs) and point mutations associated with genetic diseases (sickle cell anemia)

Insertions and deletions

  • Insertions involve the addition of one or more nucleotides into a DNA sequence, while deletions involve the removal of one or more nucleotides
  • Small insertions and deletions () are typically less than 50 base pairs in length and can cause frameshift mutations if they occur within coding regions
  • Larger insertions and deletions can involve entire genes or chromosomal segments and may have more profound effects on genome structure and function
  • Indels can be caused by replication slippage, errors, or the activity of (Alu elements)

Copy number variations

  • Copy number variations (CNVs) are changes in the number of copies of a particular DNA segment, ranging from a few hundred base pairs to entire genes or chromosomal regions
  • CNVs can be classified as deletions (fewer copies than the reference genome) or (more copies than the reference genome)
  • CNVs can affect gene dosage and expression levels, leading to phenotypic consequences and disease susceptibility (22q11.2 deletion syndrome)
  • CNVs can be detected using various methods, such as (aCGH) and (NGS) approaches

Duplications and deletions

  • Duplications involve the replication of a DNA segment, resulting in an increased number of copies within the genome
  • Deletions involve the loss of a DNA segment, leading to a reduced number of copies or complete absence of the segment
  • Duplications and deletions can range in size from a few base pairs to entire genes or chromosomal regions (1p36 deletion syndrome)
  • Duplications can lead to increased gene dosage and potential gain-of-function effects, while deletions can result in reduced gene dosage and loss-of-function effects
  • Duplications and deletions can be mediated by various mechanisms, such as non-allelic (NAHR) and (NHEJ)

Inversions and translocations

  • Inversions involve the reversal of a DNA segment within a chromosome, resulting in a change in orientation but no net gain or loss of genetic material
  • involve the exchange of genetic material between non-homologous chromosomes or different regions of the same chromosome
  • Inversions can disrupt gene function or create fusion genes if the breakpoints occur within coding regions ( 16 in acute myeloid leukemia)
  • Translocations can create novel fusion genes with altered functions or place genes under the control of different regulatory elements (Philadelphia chromosome in chronic myeloid leukemia)
  • Inversions and translocations can be balanced (no net gain or loss of genetic material) or unbalanced (associated with duplications or deletions)

Mechanisms of structural variations

Recombination and repair

  • Recombination is a process that generates new combinations of alleles through the exchange of genetic material between homologous chromosomes during meiosis
  • Non-allelic homologous recombination (NAHR) can occur between highly similar DNA sequences (segmental duplications) and lead to deletions, duplications, or inversions
  • Non-homologous end joining (NHEJ) is a repair mechanism that joins broken DNA ends without requiring extensive sequence homology, potentially resulting in small insertions, deletions, or translocations
  • Microhomology-mediated end joining (MMEJ) is another repair mechanism that uses short homologous sequences to align and join broken DNA ends, which can lead to indels or complex rearrangements

Replication errors and slippage

  • can occur during DNA synthesis, leading to the incorporation of incorrect nucleotides or the formation of secondary structures that cause insertions or deletions
  • Replication slippage is a common mechanism for generating small indels, particularly in regions with repetitive sequences (microsatellites)
  • Slippage occurs when the replication machinery dissociates and then re-associates with the template strand, resulting in the formation of loops or hairpins that are subsequently incorporated or excised
  • Replication fork stalling and collapse can also contribute to the formation of structural variations, particularly in regions with complex or repetitive sequences

Transposable elements and repeats

  • Transposable elements (TEs) are mobile genetic elements that can move and insert themselves into different locations within the genome
  • TEs can be classified as DNA transposons (move via a cut-and-paste mechanism) or retrotransposons (move via an RNA intermediate and reverse transcription)
  • Alu elements are the most abundant short interspersed nuclear elements (SINEs) in the human genome and can mediate deletions, duplications, or inversions through NAHR
  • Long interspersed nuclear elements (LINEs) are autonomous retrotransposons that can cause insertional mutagenesis and contribute to genome instability
  • Repeat sequences, such as segmental duplications and tandem repeats, can also facilitate the formation of structural variations through recombination or replication-based mechanisms

Detection of structural variations

Sequencing technologies for SVs

  • Next-generation sequencing (NGS) technologies have revolutionized the detection of structural variations by providing high-throughput and high-resolution data
  • (Illumina) is widely used for SV detection and relies on the mapping of reads to a reference genome to identify discordant or anomalous alignment patterns
  • (PacBio, Oxford Nanopore) generates reads that span larger genomic regions, enabling the detection of complex or repetitive SVs that are difficult to resolve with short reads
  • (10x Genomics) incorporates barcodes to link reads originating from the same DNA molecule, facilitating phasing and the detection of large-scale SVs

Read depth and coverage analysis

  • involves comparing the observed number of reads mapped to a genomic region to the expected number based on the average sequencing depth
  • Deletions are characterized by a decrease in read depth, while duplications show an increase in read depth relative to the reference genome
  • Coverage analysis can also identify copy number variations by segmenting the genome into regions with consistent read depth and comparing them to a reference or control sample
  • Challenges in read depth analysis include GC content biases, mappability issues, and the presence of repetitive or low-complexity regions

Split reads and discordant pairs

  • are reads that span the breakpoints of an SV, with parts of the read mapping to different genomic locations
  • Discordant read pairs are paired-end reads that map to the reference genome with an unexpected orientation or distance between the reads
  • Split reads and can be used to identify the precise breakpoints and types of SVs, such as deletions, insertions, inversions, and translocations
  • Challenges in split read and discordant pair analysis include the presence of repetitive sequences, mapping ambiguities, and the limited sensitivity for detecting small or complex SVs

De novo assembly vs reference-based

  • involves reconstructing the genome sequence from the sequencing reads without relying on a reference genome
  • De novo assembly can identify novel or population-specific SVs that are not present in the reference genome and can resolve complex or repetitive regions
  • Reference-based approaches map the sequencing reads to a reference genome and identify SVs based on discrepancies in the alignment patterns
  • Reference-based methods are computationally more efficient and can leverage existing annotations and resources, but may miss or misinterpret SVs in regions that differ significantly from the reference
  • Hybrid approaches that combine de novo assembly and can provide a more comprehensive and accurate detection of SVs

Biological impact of structural variations

Gene dosage and expression changes

  • Structural variations can alter gene dosage by changing the number of copies of a gene or by disrupting its coding sequence
  • Deletions can lead to a reduction in gene dosage and potentially cause haploinsufficiency, where a single functional copy of the gene is insufficient to maintain normal function
  • Duplications can increase gene dosage and potentially lead to overexpression or gain-of-function effects
  • Gene dosage changes can have significant impacts on cellular processes, developmental pathways, and disease susceptibility (Charcot-Marie-Tooth disease type 1A)

Fusion genes and novel transcripts

  • Structural variations, particularly translocations and inversions, can create fusion genes by joining parts of two different genes
  • Fusion genes can produce novel transcripts with altered functions, such as constitutive activation of signaling pathways or disruption of normal regulatory mechanisms
  • Fusion genes are commonly associated with various cancers and can serve as diagnostic markers or therapeutic targets (BCR-ABL1 fusion in chronic myeloid leukemia)
  • Structural variations can also create novel transcripts by placing genes under the control of different regulatory elements or by altering splicing patterns

Regulatory elements and TADs

  • Structural variations can affect the integrity and function of regulatory elements, such as promoters, enhancers, and insulators
  • Deletions or duplications of regulatory elements can lead to changes in gene expression patterns and potentially contribute to disease pathogenesis
  • Structural variations can also disrupt topologically associating domains (TADs), which are higher-order chromatin structures that facilitate interactions between genes and their regulatory elements
  • Disruption of TADs can lead to ectopic interactions between genes and regulatory elements, resulting in aberrant gene expression and developmental disorders (limb malformations associated with SVs in the WNT6/IHH/EPHA4/PAX3 locus)

Phenotypic effects and disease associations

  • Structural variations can have a wide range of phenotypic effects, depending on the genes and regulatory elements involved and the nature of the genomic alteration
  • SVs can contribute to various , including neurodevelopmental disorders (autism spectrum disorder), cardiovascular diseases (familial hypercholesterolemia), and cancer predisposition syndromes (BRCA1/2 deletions in hereditary breast and ovarian cancer)
  • SVs can also influence complex traits and common diseases by modifying gene expression, altering protein function, or interacting with other genetic and environmental factors
  • The phenotypic impact of SVs can be modulated by factors such as the size and location of the variant, the genetic background, and the tissue-specific expression patterns of the affected genes

Computational methods for SV analysis

SV callers and algorithms

  • are computational tools designed to identify and characterize structural variations from sequencing data
  • Different SV callers employ various algorithms and strategies to detect SVs, such as read depth analysis, split read mapping, discordant read pair analysis, and de novo assembly
  • Popular SV callers include , , , and , each with their own strengths and limitations in terms of sensitivity, specificity, and computational efficiency
  • Ensemble approaches that integrate multiple SV callers and evidence types can improve the accuracy and reliability of SV detection

Filtering and quality control

  • Filtering and quality control steps are essential to minimize false positives and prioritize high-confidence SVs for further analysis
  • Filtering criteria can include read depth, variant allele frequency, number of supporting reads, mapping quality, and presence in databases of known variants
  • Quality control measures, such as visual inspection of read alignments and PCR validation of selected SVs, can help assess the accuracy of SV calls
  • Establishing appropriate thresholds and benchmarking SV callers using simulated or well-characterized datasets can guide the optimization of filtering and quality control strategies

Genotyping and phasing of SVs

  • Genotyping involves determining the presence and zygosity of SVs in individual samples or populations
  • Phasing refers to the assignment of SVs to specific haplotypes or alleles, which is important for understanding their inheritance patterns and potential interactions
  • Genotyping and phasing of SVs can be challenging due to their size, complexity, and the presence of repetitive or low-complexity regions
  • Specialized algorithms and tools, such as and , have been developed to facilitate the genotyping and phasing of SVs from sequencing data

Annotation and functional interpretation

  • Annotation of SVs involves characterizing their genomic context, such as the genes, regulatory elements, and functional domains affected by the variant
  • Functional interpretation aims to predict the potential impact of SVs on gene function, expression, and biological processes
  • Annotation tools, such as and , can integrate information from various databases (RefSeq, Ensembl, ClinVar) to provide comprehensive annotations for SVs
  • Functional interpretation can leverage data from expression studies, epigenetic profiles, and model organisms to infer the potential consequences of SVs
  • Network and pathway analysis can help identify the broader biological processes and systems that may be perturbed by SVs

Population genetics of structural variations

SV frequencies and distributions

  • Structural variations exhibit varying frequencies and distributions across different populations and ancestral groups
  • Common SVs (>5% frequency) are often shared across populations, while rare SVs (<1% frequency) may be population-specific or private to certain individuals or families
  • The distribution of SVs across the genome is non-random, with hotspots of increased SV activity often associated with repetitive or complex regions (segmental duplications)
  • Population-scale sequencing studies, such as the 1000 Genomes Project and the Genome Aggregation Database (gnomAD), have provided valuable insights into the landscape of SVs in diverse human populations

Ancestry and population-specific SVs

  • Different ancestral groups and populations harbor unique sets of SVs that have arisen through distinct evolutionary histories and demographic events
  • Population-specific SVs can contribute to phenotypic diversity and adaptation to local environments (salivary amylase gene in populations with high-starch diets)
  • Ancestry-informative SVs can be used as markers for population structure and admixture analysis
  • Understanding the distribution of SVs across populations is important for interpreting their potential impact on disease risk and for designing population-specific screening or diagnostic strategies

Selection and evolutionary dynamics

  • Structural variations can be subject to various forms of natural selection, depending on their functional consequences and fitness effects
  • Positive selection can favor the spread of adaptive SVs that confer benefits in specific environments or contexts (APOBEC3B deletion associated with reduced risk of HIV infection)
  • Negative selection can eliminate deleterious SVs that disrupt essential genes or regulatory elements, leading to a depletion of such variants in the population
  • Balancing selection can maintain multiple SV alleles in the population if they confer heterozygote advantage or are involved in frequency-dependent selection
  • The evolutionary dynamics of SVs can be influenced by factors such as mutation rates, recombination hotspots, and demographic history (population bottlenecks, expansions, and migrations)

SVs in genome-wide association studies

  • Genome-wide association studies (GWAS) aim to identify genetic variants that are associated with complex traits or diseases by comparing allele frequencies between cases and controls
  • While GWAS have primarily focused on single nucleotide polymorphisms (SNPs), recent studies have begun to incorporate structural variations as potential contributors to phenotypic variation
  • SVs can be directly associated with complex traits or can modify the effects of other genetic variants through epistatic interactions or by altering gene regulation
  • Challenges in including SVs in GWAS include their lower frequencies, limited representation on genotyping arrays, and the difficulty in accurately genotyping and phasing complex SVs
  • Integrating SVs into GWAS can provide a more comprehensive understanding of the genetic architecture of complex traits and may help identify novel loci or mechanisms underlying disease susceptibility

Challenges and future directions

Complex and nested SVs

  • Complex structural variations involve multiple breakpoints or rearrangements that cannot be easily classified into simple categories (deletions, duplications, inversions, or translocations)
  • Nested SVs refer to the presence of multiple overlapping or interacting SVs within the same genomic region, which can complicate their detection and interpretation
  • Resolving complex and nested SVs requires advanced sequencing technologies (long-read, linked-read) and specialized computational methods that can handle the increased complexity of the data
  • Characterizing the functional impact of complex and nested SVs may require integrative approaches that combine genomic, epigenomic, and transcriptomic data to unravel their effects on gene regulation and expression

Long-read sequencing for SV detection

  • Long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), generate reads that can span tens to hundreds of kilobases
  • Long reads are particularly advantageous for detecting and resolving complex or repetitive SVs that are difficult to characterize with short-read sequencing
  • Long-read sequencing can provide more accurate breakpoint resolution and can identify novel or rare SVs that are missed by short-read approaches
  • Challenges in long-read sequencing include higher error rates, lower throughput, and increased computational complexity for data analysis and interpretation
  • Advances in long-read sequencing, such as improved accuracy, throughput, and cost-effectiveness, will facilitate their broader adoption for S
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary