Genes are the fundamental units of heredity, encoding instructions for functional products like proteins and RNAs. Their structure and organization play a crucial role in regulating expression and determining an organism's characteristics.
Gene components include promoters, exons, introns, and UTRs. Gene architecture varies, with and arrangements. , , and epigenetic modifications further influence gene expression and genome organization.
Components of a gene
Genes are the fundamental units of heredity that encode instructions for the synthesis of functional gene products (proteins or )
The structure and organization of genes play a crucial role in regulating gene expression and ultimately determining the phenotypic characteristics of an organism
Promoter region
Top images from around the web for Promoter region
Plays a key role in initiating and regulating gene transcription
Examples:
TATA box (core element)
GC box (binding site for Sp1 transcription factor)
Transcription start site
The specific nucleotide position where RNA polymerase begins transcribing the gene
Marks the 5' end of the nascent RNA transcript
Often located within or near the promoter region
Examples:
+1 position (first transcribed nucleotide)
Initiator element (Inr)
Exons and introns
Exons are coding sequences that are retained in the mature mRNA after
Introns are non-coding sequences that are removed from the pre-mRNA during splicing
The arrangement of exons and introns varies among genes and can contribute to protein diversity through alternative splicing
Examples:
Constitutive exons (always included in the mature mRNA)
Cassette exons (alternatively spliced)
UTRs
Untranslated regions located at the 5' and 3' ends of the mature mRNA
Not translated into protein but play important roles in mRNA stability, localization, and translation efficiency
5' may contain regulatory elements that influence translation initiation
3' UTR often contains binding sites for microRNAs and RNA-binding proteins
Examples:
Iron-responsive element (IRE) in the 5' UTR of ferritin mRNA
AU-rich elements (AREs) in the 3' UTR of many unstable mRNAs
Transcription termination site
The position where RNA polymerase stops transcribing and dissociates from the DNA template
Typically located downstream of the coding sequence and 3' UTR
Transcription termination is mediated by specific sequences and protein factors
Examples:
Polyadenylation signal (AAUAAA) for RNA polymerase II transcripts
Rho-dependent termination sites for bacterial genes
Gene architecture
The arrangement and organization of genes within a genome can vary significantly among different organisms and gene families
Gene architecture influences gene expression, regulation, and evolution
Monocistronic vs polycistronic
Monocistronic genes contain a single open reading frame (ORF) and produce a single protein product
Polycistronic genes contain multiple ORFs and produce multiple protein products from a single mRNA transcript
Monocistronic gene organization is common in eukaryotes, while polycistronic genes are more prevalent in prokaryotes and some eukaryotic organelles (mitochondria and chloroplasts)
Examples:
Lac operon in E. coli (polycistronic)
Globin genes in humans (monocistronic)
Overlapping genes
Genes whose coding sequences partially or completely overlap on the same or opposite DNA strands
can be in the same or different reading frames and may have functional relationships or regulatory interactions
Examples:
Collagen genes COL4A1 and COL4A2 in humans (overlapping on opposite strands)
Genes in compact genomes of viruses and bacteria
Nested genes
Genes that are entirely contained within the introns of another gene
may have independent functions or may regulate the expression of the host gene
Examples:
Intronic microRNA genes (miR-126 within the of EGFL7)
Intronic snoRNA genes (U22 within the intron of GAS5)
Bidirectional promoters
Promoters that regulate the expression of two genes located on opposite strands of DNA
often have shared regulatory elements and may facilitate co-expression or coordinated regulation of gene pairs
Examples:
BRCA1 and NBR2 genes in humans
Histone gene pairs (H2A-H2B and H3-H4)
Regulatory elements
Non-coding DNA sequences that control gene expression by modulating transcription, RNA processing, or chromatin structure
Regulatory elements play critical roles in development, cell differentiation, and responses to environmental stimuli
Enhancers and silencers
Enhancers are distal regulatory elements that increase gene transcription by recruiting transcription factors and promoting
Silencers are distal regulatory elements that decrease gene transcription by recruiting repressive factors or promoting chromatin condensation
Enhancers and silencers can act independently of their orientation and distance from the target gene
Examples:
Locus control region (LCR) of the β-globin gene cluster
elements in the CD4 gene
Insulators
DNA sequences that function as boundaries to prevent inappropriate interactions between neighboring chromatin domains
can block the spread of or prevent -promoter communication when located between them
Examples:
cHS4 insulator in the chicken β-globin locus
CTCF-binding sites in mammalian genomes
Locus control regions
Distal regulatory elements that coordinate the expression of multiple genes within a chromatin domain
LCRs contain multiple enhancers and insulators and can regulate gene expression over long distances
Examples:
Human growth hormone (hGH) LCR
T-cell receptor α (TCRα) LCR
Chromatin structure
The organization of DNA and associated proteins (histones) into chromatin influences gene expression, DNA replication, and repair
Chromatin structure is dynamic and regulated by various epigenetic modifications
Euchromatin vs heterochromatin
is a loosely packed, transcriptionally active form of chromatin
Heterochromatin is a tightly packed, transcriptionally repressive form of chromatin
The balance between euchromatin and heterochromatin is important for maintaining proper gene expression patterns and genome stability
Examples:
Barr body (inactive X chromosome) in female mammals
Centromeric and telomeric regions of chromosomes
Histone modifications
Post-translational modifications of histones (acetylation, methylation, phosphorylation, etc.) alter chromatin structure and regulate gene expression
Specific are associated with active or repressive chromatin states
Examples:
H3K4me3 (trimethylation of histone H3 lysine 4) associated with active promoters
H3K27me3 (trimethylation of histone H3 lysine 27) associated with repressed genes
Chromatin accessibility
The degree to which chromatin is accessible to transcription factors and other regulatory proteins
Chromatin accessibility is influenced by histone modifications, chromatin remodeling complexes, and
Techniques like DNase-seq and ATAC-seq can map genome-wide chromatin accessibility
Examples:
Open chromatin regions at active promoters and enhancers
Closed chromatin regions at repressed genes and heterochromatin
Gene families
Groups of genes that share sequence similarity and often have related functions
Gene families arise through duplication events and subsequent divergence during evolution
Paralogous genes
Genes within the same species that originated from a common ancestral gene through duplication
Paralogous genes may have similar or divergent functions and can contribute to genetic redundancy or specialization
Examples:
Hox gene clusters in animals
Olfactory receptor genes in mammals
Orthologous genes
Genes in different species that originated from a common ancestral gene through speciation
Orthologous genes often have conserved functions and can be used to infer evolutionary relationships
Examples:
Pax6 gene in eye development across diverse animal phyla
FOXP2 gene in speech and language development in humans and vocal learning in birds
Pseudogenes
Non-functional gene copies that have lost their protein-coding ability due to mutations or lack of transcription
Pseudogenes can arise through duplication or retrotransposition events
Some pseudogenes may have regulatory roles or produce non-coding RNAs
Examples:
Olfactory receptor pseudogenes in humans
GULOP pseudogene in primates
Mobile genetic elements
DNA sequences that can move within genomes and contribute to genomic diversity and evolution
Mobile genetic elements can influence gene expression, genome structure, and the emergence of novel functions
Transposons
DNA transposons that move through a cut-and-paste mechanism mediated by transposase enzymes
Transposons are flanked by inverted terminal repeats (ITRs) and can insert into new genomic locations
Examples:
Ac/Ds elements in maize
P elements in Drosophila
Retrotransposons
Mobile genetic elements that move through an RNA intermediate and reverse transcription
Retrotransposons include long terminal repeat (LTR) elements and non-LTR elements (LINEs and SINEs)
Retrotransposons can create new gene copies and contribute to genome expansion
Examples:
Alu elements (SINEs) in primates
L1 elements (LINEs) in mammals
Insertion sites and effects
Mobile genetic elements can insert into various genomic locations, including exons, introns, and regulatory regions
Insertions can disrupt gene function, alter gene expression, or create new regulatory elements
Mobile element insertions can also cause genomic rearrangements and contribute to disease
Examples:
Retrotransposon insertion in the human FVIII gene causing hemophilia A
Transposon-mediated duplication of the Hox gene cluster in vertebrates
Genome organization
The arrangement and distribution of genes and other functional elements within a genome
Genome organization influences gene expression, recombination, and evolutionary processes
Gene density
The number of genes per unit of genomic distance (e.g., genes per megabase)
varies across different genomic regions and among different organisms
Gene-rich regions often have higher levels of transcription and chromatin accessibility
Examples:
High gene density in the major histocompatibility complex (MHC) region
Low gene density in heterochromatic regions
Isochores
Large-scale regions of the genome with relatively homogeneous GC content
can be classified into GC-rich (H3, H2, H1) and GC-poor (L1, L2) families
GC-rich isochores are associated with higher gene density, earlier replication timing, and more open chromatin
Examples:
H3 isochores in the human genome
L1 isochores in avian genomes
Synteny and conservation
The conservation of gene order and orientation across different species
Syntenic regions often contain functionally related genes or regulatory elements
can be used to identify orthologous genes and study genome evolution
Examples:
Hox gene clusters in vertebrates
MHC region in mammals
Epigenetic modifications
Heritable changes in gene expression that do not involve alterations in the DNA sequence
Epigenetic modifications play crucial roles in development, cell differentiation, and environmental responses
DNA methylation
The addition of methyl groups to cytosine residues, primarily in the context of CpG dinucleotides
DNA methylation is associated with transcriptional repression and heterochromatin formation
DNA methylation patterns are established and maintained by DNA methyltransferases (DNMTs)
Examples:
Genomic imprinting (parent-of-origin specific gene expression)
X chromosome inactivation in female mammals
Histone variants
Non-canonical histone proteins that replace canonical histones in specific genomic regions
can alter chromatin structure and dynamics, influencing gene expression and genome stability
Examples:
H2A.Z variant associated with active promoters and enhancers
CENP-A variant at centromeres
Non-coding RNAs
RNA molecules that do not encode proteins but have regulatory or structural functions
Non-coding RNAs can modulate gene expression at the transcriptional and post-transcriptional levels
Examples:
microRNAs (miRNAs) that repress translation or induce mRNA degradation
long non-coding RNAs (lncRNAs) that recruit chromatin-modifying complexes or act as scaffolds
Gene expression regulation
The control of gene expression at various stages, from transcription to post-translational modifications
Gene expression regulation ensures proper development, homeostasis, and responses to environmental cues
Transcription factors
Proteins that bind specific DNA sequences and regulate
Transcription factors can act as activators or repressors, recruiting co-factors and modulating chromatin accessibility
Examples:
p53 tumor suppressor protein
NF-κB in immune and inflammatory responses
Post-transcriptional modifications
Modifications to the RNA molecule that influence its stability, localization, and translation efficiency
include splicing, polyadenylation, and RNA editing
Examples:
Alternative splicing of the Dscam gene in Drosophila
A-to-I RNA editing in mammalian transcripts
Translational control
Regulation of protein synthesis at the level of translation initiation, elongation, or termination
can be mediated by RNA-binding proteins, microRNAs, or upstream open reading frames (uORFs)
Examples:
Iron-responsive element-binding protein (IRP) in iron homeostasis
Regulation of ATF4 translation by uORFs in stress responses
Genetic variation
Differences in the DNA sequence among individuals within a population or between populations
Genetic variation is the basis for phenotypic diversity and is shaped by evolutionary forces such as mutation, selection, and genetic drift
Single nucleotide polymorphisms
Single base pair changes in the DNA sequence that occur at a frequency of >1% in a population
SNPs can be located in coding regions (synonymous or non-synonymous), regulatory elements, or non-coding regions
SNPs can influence gene expression, protein function, or susceptibility to diseases
Examples:
Sickle cell anemia caused by a single amino acid change in the β-globin gene
SNPs associated with risk for Alzheimer's disease (APOE ε4 allele)
Copy number variations
Deletions or duplications of genomic regions ranging from a few kilobases to several megabases
CNVs can encompass one or more genes and influence gene dosage and expression
CNVs are associated with various developmental disorders and complex traits
Examples:
22q11.2 deletion syndrome (DiGeorge syndrome)
Salivary amylase gene (AMY1) copy number variation in human populations
Structural variations
Large-scale genomic rearrangements, including inversions, translocations, and complex multi-site variants
can disrupt genes, create fusion genes, or alter regulatory landscapes
Structural variations are often associated with developmental disorders and cancer
Examples:
Chromosomal translocations in leukemias (BCR-ABL fusion)
Inversion polymorphisms in the human genome (8p23.1)