Structural variations are genomic alterations that can significantly impact gene function and expression. From single nucleotide changes to large-scale rearrangements, these variations play crucial roles in genetic diversity, evolution, and disease susceptibility.
Understanding structural variations is essential for comprehending genome dynamics and their biological consequences. This topic explores different types of structural variations, mechanisms of formation, detection methods, and their impacts on gene function and phenotypes.
Types of structural variations
Single nucleotide variations
Top images from around the web for Single nucleotide variations
Frontiers | Identification of Single Nucleotide Non-coding Driver Mutations in Cancer View original
Is this image relevant?
Frontiers | Significance of Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNAs View original
Frontiers | Identification of Single Nucleotide Non-coding Driver Mutations in Cancer View original
Is this image relevant?
Frontiers | Significance of Single-Nucleotide Variants in Long Intergenic Non-protein Coding RNAs View original
Is this image relevant?
1 of 3
(SNVs) are changes in individual bases of DNA, including substitutions, , and of a single nucleotide
SNVs can be classified as transitions (purine-to-purine or pyrimidine-to-pyrimidine changes) or transversions (purine-to-pyrimidine or pyrimidine-to-purine changes)
SNVs can have various effects on gene function, such as altering protein structure, affecting splicing sites, or modifying regulatory elements
Examples of SNVs include single nucleotide polymorphisms (SNPs) and point mutations associated with genetic diseases (sickle cell anemia)
Insertions and deletions
Insertions involve the addition of one or more nucleotides into a DNA sequence, while deletions involve the removal of one or more nucleotides
Small insertions and deletions () are typically less than 50 base pairs in length and can cause frameshift mutations if they occur within coding regions
Larger insertions and deletions can involve entire genes or chromosomal segments and may have more profound effects on genome structure and function
Indels can be caused by replication slippage, errors, or the activity of (Alu elements)
Copy number variations
Copy number variations (CNVs) are changes in the number of copies of a particular DNA segment, ranging from a few hundred base pairs to entire genes or chromosomal regions
CNVs can be classified as deletions (fewer copies than the reference genome) or (more copies than the reference genome)
CNVs can affect gene dosage and expression levels, leading to phenotypic consequences and disease susceptibility (22q11.2 deletion syndrome)
CNVs can be detected using various methods, such as (aCGH) and (NGS) approaches
Duplications and deletions
Duplications involve the replication of a DNA segment, resulting in an increased number of copies within the genome
Deletions involve the loss of a DNA segment, leading to a reduced number of copies or complete absence of the segment
Duplications and deletions can range in size from a few base pairs to entire genes or chromosomal regions (1p36 deletion syndrome)
Duplications can lead to increased gene dosage and potential gain-of-function effects, while deletions can result in reduced gene dosage and loss-of-function effects
Duplications and deletions can be mediated by various mechanisms, such as non-allelic (NAHR) and (NHEJ)
Inversions and translocations
Inversions involve the reversal of a DNA segment within a chromosome, resulting in a change in orientation but no net gain or loss of genetic material
involve the exchange of genetic material between non-homologous chromosomes or different regions of the same chromosome
Inversions can disrupt gene function or create fusion genes if the breakpoints occur within coding regions ( 16 in acute myeloid leukemia)
Translocations can create novel fusion genes with altered functions or place genes under the control of different regulatory elements (Philadelphia chromosome in chronic myeloid leukemia)
Inversions and translocations can be balanced (no net gain or loss of genetic material) or unbalanced (associated with duplications or deletions)
Mechanisms of structural variations
Recombination and repair
Recombination is a process that generates new combinations of alleles through the exchange of genetic material between homologous chromosomes during meiosis
Non-allelic homologous recombination (NAHR) can occur between highly similar DNA sequences (segmental duplications) and lead to deletions, duplications, or inversions
Non-homologous end joining (NHEJ) is a repair mechanism that joins broken DNA ends without requiring extensive sequence homology, potentially resulting in small insertions, deletions, or translocations
Microhomology-mediated end joining (MMEJ) is another repair mechanism that uses short homologous sequences to align and join broken DNA ends, which can lead to indels or complex rearrangements
Replication errors and slippage
can occur during DNA synthesis, leading to the incorporation of incorrect nucleotides or the formation of secondary structures that cause insertions or deletions
Replication slippage is a common mechanism for generating small indels, particularly in regions with repetitive sequences (microsatellites)
Slippage occurs when the replication machinery dissociates and then re-associates with the template strand, resulting in the formation of loops or hairpins that are subsequently incorporated or excised
Replication fork stalling and collapse can also contribute to the formation of structural variations, particularly in regions with complex or repetitive sequences
Transposable elements and repeats
Transposable elements (TEs) are mobile genetic elements that can move and insert themselves into different locations within the genome
TEs can be classified as DNA transposons (move via a cut-and-paste mechanism) or retrotransposons (move via an RNA intermediate and reverse transcription)
Alu elements are the most abundant short interspersed nuclear elements (SINEs) in the human genome and can mediate deletions, duplications, or inversions through NAHR
Long interspersed nuclear elements (LINEs) are autonomous retrotransposons that can cause insertional mutagenesis and contribute to genome instability
Repeat sequences, such as segmental duplications and tandem repeats, can also facilitate the formation of structural variations through recombination or replication-based mechanisms
Detection of structural variations
Sequencing technologies for SVs
Next-generation sequencing (NGS) technologies have revolutionized the detection of structural variations by providing high-throughput and high-resolution data
(Illumina) is widely used for SV detection and relies on the mapping of reads to a reference genome to identify discordant or anomalous alignment patterns
(PacBio, Oxford Nanopore) generates reads that span larger genomic regions, enabling the detection of complex or repetitive SVs that are difficult to resolve with short reads
(10x Genomics) incorporates barcodes to link reads originating from the same DNA molecule, facilitating phasing and the detection of large-scale SVs
Read depth and coverage analysis
involves comparing the observed number of reads mapped to a genomic region to the expected number based on the average sequencing depth
Deletions are characterized by a decrease in read depth, while duplications show an increase in read depth relative to the reference genome
Coverage analysis can also identify copy number variations by segmenting the genome into regions with consistent read depth and comparing them to a reference or control sample
Challenges in read depth analysis include GC content biases, mappability issues, and the presence of repetitive or low-complexity regions
Split reads and discordant pairs
are reads that span the breakpoints of an SV, with parts of the read mapping to different genomic locations
Discordant read pairs are paired-end reads that map to the reference genome with an unexpected orientation or distance between the reads
Split reads and can be used to identify the precise breakpoints and types of SVs, such as deletions, insertions, inversions, and translocations
Challenges in split read and discordant pair analysis include the presence of repetitive sequences, mapping ambiguities, and the limited sensitivity for detecting small or complex SVs
De novo assembly vs reference-based
involves reconstructing the genome sequence from the sequencing reads without relying on a reference genome
De novo assembly can identify novel or population-specific SVs that are not present in the reference genome and can resolve complex or repetitive regions
Reference-based approaches map the sequencing reads to a reference genome and identify SVs based on discrepancies in the alignment patterns
Reference-based methods are computationally more efficient and can leverage existing annotations and resources, but may miss or misinterpret SVs in regions that differ significantly from the reference
Hybrid approaches that combine de novo assembly and can provide a more comprehensive and accurate detection of SVs
Biological impact of structural variations
Gene dosage and expression changes
Structural variations can alter gene dosage by changing the number of copies of a gene or by disrupting its coding sequence
Deletions can lead to a reduction in gene dosage and potentially cause haploinsufficiency, where a single functional copy of the gene is insufficient to maintain normal function
Duplications can increase gene dosage and potentially lead to overexpression or gain-of-function effects
Gene dosage changes can have significant impacts on cellular processes, developmental pathways, and disease susceptibility (Charcot-Marie-Tooth disease type 1A)
Fusion genes and novel transcripts
Structural variations, particularly translocations and inversions, can create fusion genes by joining parts of two different genes
Fusion genes can produce novel transcripts with altered functions, such as constitutive activation of signaling pathways or disruption of normal regulatory mechanisms
Fusion genes are commonly associated with various cancers and can serve as diagnostic markers or therapeutic targets (BCR-ABL1 fusion in chronic myeloid leukemia)
Structural variations can also create novel transcripts by placing genes under the control of different regulatory elements or by altering splicing patterns
Regulatory elements and TADs
Structural variations can affect the integrity and function of regulatory elements, such as promoters, enhancers, and insulators
Deletions or duplications of regulatory elements can lead to changes in gene expression patterns and potentially contribute to disease pathogenesis
Structural variations can also disrupt topologically associating domains (TADs), which are higher-order chromatin structures that facilitate interactions between genes and their regulatory elements
Disruption of TADs can lead to ectopic interactions between genes and regulatory elements, resulting in aberrant gene expression and developmental disorders (limb malformations associated with SVs in the WNT6/IHH/EPHA4/PAX3 locus)
Phenotypic effects and disease associations
Structural variations can have a wide range of phenotypic effects, depending on the genes and regulatory elements involved and the nature of the genomic alteration
SVs can contribute to various , including neurodevelopmental disorders (autism spectrum disorder), cardiovascular diseases (familial hypercholesterolemia), and cancer predisposition syndromes (BRCA1/2 deletions in hereditary breast and ovarian cancer)
SVs can also influence complex traits and common diseases by modifying gene expression, altering protein function, or interacting with other genetic and environmental factors
The phenotypic impact of SVs can be modulated by factors such as the size and location of the variant, the genetic background, and the tissue-specific expression patterns of the affected genes
Computational methods for SV analysis
SV callers and algorithms
are computational tools designed to identify and characterize structural variations from sequencing data
Different SV callers employ various algorithms and strategies to detect SVs, such as read depth analysis, split read mapping, discordant read pair analysis, and de novo assembly
Popular SV callers include , , , and , each with their own strengths and limitations in terms of sensitivity, specificity, and computational efficiency
Ensemble approaches that integrate multiple SV callers and evidence types can improve the accuracy and reliability of SV detection
Filtering and quality control
Filtering and quality control steps are essential to minimize false positives and prioritize high-confidence SVs for further analysis
Filtering criteria can include read depth, variant allele frequency, number of supporting reads, mapping quality, and presence in databases of known variants
Quality control measures, such as visual inspection of read alignments and PCR validation of selected SVs, can help assess the accuracy of SV calls
Establishing appropriate thresholds and benchmarking SV callers using simulated or well-characterized datasets can guide the optimization of filtering and quality control strategies
Genotyping and phasing of SVs
Genotyping involves determining the presence and zygosity of SVs in individual samples or populations
Phasing refers to the assignment of SVs to specific haplotypes or alleles, which is important for understanding their inheritance patterns and potential interactions
Genotyping and phasing of SVs can be challenging due to their size, complexity, and the presence of repetitive or low-complexity regions
Specialized algorithms and tools, such as and , have been developed to facilitate the genotyping and phasing of SVs from sequencing data
Annotation and functional interpretation
Annotation of SVs involves characterizing their genomic context, such as the genes, regulatory elements, and functional domains affected by the variant
Functional interpretation aims to predict the potential impact of SVs on gene function, expression, and biological processes
Annotation tools, such as and , can integrate information from various databases (RefSeq, Ensembl, ClinVar) to provide comprehensive annotations for SVs
Functional interpretation can leverage data from expression studies, epigenetic profiles, and model organisms to infer the potential consequences of SVs
Network and pathway analysis can help identify the broader biological processes and systems that may be perturbed by SVs
Population genetics of structural variations
SV frequencies and distributions
Structural variations exhibit varying frequencies and distributions across different populations and ancestral groups
Common SVs (>5% frequency) are often shared across populations, while rare SVs (<1% frequency) may be population-specific or private to certain individuals or families
The distribution of SVs across the genome is non-random, with hotspots of increased SV activity often associated with repetitive or complex regions (segmental duplications)
Population-scale sequencing studies, such as the 1000 Genomes Project and the Genome Aggregation Database (gnomAD), have provided valuable insights into the landscape of SVs in diverse human populations
Ancestry and population-specific SVs
Different ancestral groups and populations harbor unique sets of SVs that have arisen through distinct evolutionary histories and demographic events
Population-specific SVs can contribute to phenotypic diversity and adaptation to local environments (salivary amylase gene in populations with high-starch diets)
Ancestry-informative SVs can be used as markers for population structure and admixture analysis
Understanding the distribution of SVs across populations is important for interpreting their potential impact on disease risk and for designing population-specific screening or diagnostic strategies
Selection and evolutionary dynamics
Structural variations can be subject to various forms of natural selection, depending on their functional consequences and fitness effects
Positive selection can favor the spread of adaptive SVs that confer benefits in specific environments or contexts (APOBEC3B deletion associated with reduced risk of HIV infection)
Negative selection can eliminate deleterious SVs that disrupt essential genes or regulatory elements, leading to a depletion of such variants in the population
Balancing selection can maintain multiple SV alleles in the population if they confer heterozygote advantage or are involved in frequency-dependent selection
The evolutionary dynamics of SVs can be influenced by factors such as mutation rates, recombination hotspots, and demographic history (population bottlenecks, expansions, and migrations)
SVs in genome-wide association studies
Genome-wide association studies (GWAS) aim to identify genetic variants that are associated with complex traits or diseases by comparing allele frequencies between cases and controls
While GWAS have primarily focused on single nucleotide polymorphisms (SNPs), recent studies have begun to incorporate structural variations as potential contributors to phenotypic variation
SVs can be directly associated with complex traits or can modify the effects of other genetic variants through epistatic interactions or by altering gene regulation
Challenges in including SVs in GWAS include their lower frequencies, limited representation on genotyping arrays, and the difficulty in accurately genotyping and phasing complex SVs
Integrating SVs into GWAS can provide a more comprehensive understanding of the genetic architecture of complex traits and may help identify novel loci or mechanisms underlying disease susceptibility
Challenges and future directions
Complex and nested SVs
Complex structural variations involve multiple breakpoints or rearrangements that cannot be easily classified into simple categories (deletions, duplications, inversions, or translocations)
Nested SVs refer to the presence of multiple overlapping or interacting SVs within the same genomic region, which can complicate their detection and interpretation
Resolving complex and nested SVs requires advanced sequencing technologies (long-read, linked-read) and specialized computational methods that can handle the increased complexity of the data
Characterizing the functional impact of complex and nested SVs may require integrative approaches that combine genomic, epigenomic, and transcriptomic data to unravel their effects on gene regulation and expression
Long-read sequencing for SV detection
Long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), generate reads that can span tens to hundreds of kilobases
Long reads are particularly advantageous for detecting and resolving complex or repetitive SVs that are difficult to characterize with short-read sequencing
Long-read sequencing can provide more accurate breakpoint resolution and can identify novel or rare SVs that are missed by short-read approaches
Challenges in long-read sequencing include higher error rates, lower throughput, and increased computational complexity for data analysis and interpretation
Advances in long-read sequencing, such as improved accuracy, throughput, and cost-effectiveness, will facilitate their broader adoption for S