You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

3.2 Functional annotation and gene ontology

4 min readjuly 30, 2024

and are crucial for understanding the roles of in organisms. They help researchers make sense of genomic data, identify drug targets, and unravel disease mechanisms by assigning biological functions to genes and gene products.

Gene Ontology provides a standardized framework for describing gene functions across species. It uses three main ontologies - , , and - organized in a hierarchical structure to facilitate annotation and analysis of gene sets.

Functional annotation in genomics

Definition and importance

  • Functional annotation is the process of assigning biological functions, processes, and pathways to genes or gene products based on experimental evidence or computational predictions
  • Crucial for understanding the roles and interactions of genes within an organism
    • Enables researchers to make sense of the vast amount of genomic data generated by high-throughput sequencing technologies (RNA-seq, ChIP-seq)
  • Helps in identifying potential drug targets, understanding disease mechanisms (cancer, neurodegenerative disorders), and guiding further experimental studies
  • Relies on various sources of information
    • Sequence homology (BLAST)
    • Protein domains (Pfam, InterPro)
    • Expression patterns (tissue-specific expression)
    • Literature mining (PubMed)

Methods and approaches

  • Manual annotation by expert curators involves reviewing the literature and experimental data to assign functions
    • Ensures high-quality and reliable annotations but is time-consuming and labor-intensive
  • Automated annotation methods can be used to assign functions to large datasets
    • Sequence similarity-based approaches (orthology, paralogy)
    • Domain-based approaches (presence of conserved protein domains)
    • These annotations may require additional validation
  • Integration of multiple lines of evidence (sequence, structure, expression, interactions) improves the confidence and accuracy of functional annotations

Gene ontology structure

Standardized vocabulary and framework

  • Gene Ontology (GO) is a standardized vocabulary and framework for describing the functions of genes and gene products across different species
  • Consists of three main ontologies
    • Biological Process (BP): describes the larger biological programs or objectives in which a gene or gene product is involved (cell cycle, signal transduction)
    • Molecular Function (MF): describes the specific molecular activities or tasks performed by a gene or gene product (DNA binding, catalytic activity)
    • Cellular Component (CC): describes the subcellular locations or macromolecular complexes where a gene or gene product is found (nucleus, ribosome)

Hierarchical organization and properties

  • GO terms are organized in a hierarchical structure
    • More specific terms are child terms of more general parent terms
    • Forms a directed acyclic graph (DAG)
  • Each GO term has a unique identifier, a name, and a definition, along with references to the evidence supporting the annotation
  • Relationships between GO terms include
    • is_a: indicates that a child term is a subtype or instance of a parent term
    • part_of: indicates that a child term is a component of a parent term
    • regulates: indicates that a child term modulates the occurrence or rate of a parent term

GO term application

Annotation process

  • GO annotation involves assigning the most appropriate and specific GO terms to a gene or gene product based on the available evidence
  • Evidence codes are used to indicate the type and strength of evidence supporting the annotation
    • Experimental evidence (IDA: Inferred from Direct Assay, IPI: Inferred from Physical Interaction)
    • Computational predictions (IEA: Inferred from Electronic Annotation, ISS: Inferred from Sequence or Structural Similarity)
  • Annotations can be made at different levels of granularity, depending on the specificity of the available evidence

Enrichment analysis

  • (GSEA) can be performed using GO annotations to identify overrepresented or underrepresented functional categories within a set of genes of interest
  • Enrichment analysis compares the frequency of GO terms in the gene set to their frequency in a background set (entire genome)
  • Helps identify biological processes, molecular functions, or cellular components that are significantly associated with the gene set
  • Can provide insights into the functional themes or pathways involved in a particular biological condition or experimental treatment

Functional enrichment analysis interpretation

Statistical significance and biological relevance

  • using GO annotations helps identify statistically overrepresented or underrepresented GO terms within a gene set compared to a background set
  • Enrichment analysis tools (, PANTHER, TopGO) calculate p-values or false discovery rates (FDR) to assess the significance of the enrichment
  • Overrepresented GO terms suggest that the gene set is enriched for specific functions or processes
    • Potentially indicates the biological mechanisms or pathways involved in the studied condition (disease, treatment response)
  • Underrepresented GO terms suggest that the gene set is depleted for specific functions or processes
    • Potentially indicates the biological mechanisms or pathways that are suppressed or not involved in the studied condition
  • Interpreting the results requires considering the biological context, the statistical significance of the enriched terms, and the potential biases or limitations of the annotation and analysis methods

Visualization and exploration

  • Visualization tools can help in exploring and interpreting the relationships between the enriched GO terms and their associated genes
  • GO term networks display the hierarchical relationships and connections between the enriched terms
    • Allows identification of broader functional themes and specific subprocesses
  • Treemaps or bar charts can be used to visualize the relative significance and overlap of the enriched terms
  • Interactive tools (, QuickGO) enable users to navigate the GO hierarchy, view term definitions, and explore the evidence supporting the annotations
  • Integration with other biological databases (, Reactome) can provide additional context and insights into the biological pathways and processes associated with the enriched terms
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary