You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

12.2 Biological databases and sequence analysis

3 min readaugust 7, 2024

Biological databases are the backbone of bioinformatics. They store vast amounts of genetic and protein data, making it easy for scientists to access and analyze. From 's nucleotide sequences to 's protein info, these databases are crucial for research.

Sequence analysis tools help scientists make sense of all this data. finds similar sequences, while multiple sequence alignment reveals evolutionary relationships. These techniques are essential for understanding genes, proteins, and how organisms evolve.

Biological Databases

Nucleotide and Protein Sequence Databases

Top images from around the web for Nucleotide and Protein Sequence Databases
Top images from around the web for Nucleotide and Protein Sequence Databases
  • GenBank is a comprehensive database maintained by the National Center for Biotechnology Information (NCBI) that stores nucleotide sequences and their protein translations
  • Includes sequences from various sources such as genomic DNA, cDNA, and RNA
  • Provides information about the function, structure, and evolution of the sequences
  • UniProt (Universal Protein Resource) is a central repository for and functional information
    • Consists of Swiss-Prot (manually annotated and reviewed) and TrEMBL (automatically annotated and not reviewed)
    • Provides information on protein sequences, functions, domains, post-translational modifications, and interactions

Gene Annotation

  • Gene annotation is the process of identifying and assigning biological information to gene sequences
  • Involves the identification of coding regions, regulatory elements, and non-coding RNAs
  • Utilizes various computational tools and databases to predict gene functions and structures
  • Helps in understanding the biological role of genes and their products (proteins and RNAs)
  • Essential for genome interpretation and comparative genomics studies

Sequence Analysis

  • BLAST (Basic Search Tool) is a widely used algorithm for comparing biological sequences
    • Allows researchers to find regions of local similarity between sequences
    • Helps in identifying homologous sequences, which are sequences that share a common evolutionary ancestor
  • Different types of BLAST exist for various purposes (nucleotide-nucleotide, protein-protein, translated searches)
  • BLAST results provide statistical significance scores (E-values) to assess the reliability of the matches

Multiple Sequence Alignment and Phylogenetic Analysis

  • Multiple sequence alignment (MSA) is the process of aligning three or more biological sequences to identify conserved regions and sequence variations
    • Allows for the identification of conserved functional domains, motifs, and residues
    • Helps in understanding evolutionary relationships among sequences
  • Phylogenetic analysis uses MSAs to infer evolutionary relationships and construct phylogenetic trees
    • Phylogenetic trees represent the evolutionary history and divergence of sequences
    • Different methods exist for phylogenetic tree construction (maximum parsimony, maximum likelihood, Bayesian inference)
  • Phylogenetic analysis helps in understanding species evolution, gene family evolution, and the identification of orthologs and paralogs

Protein Structure Prediction

Computational Methods for Protein Structure Prediction

  • Protein structure prediction aims to determine the three-dimensional structure of a protein from its amino acid sequence
  • Ab initio (or de novo) methods predict protein structures based on physical and chemical principles without relying on known structures
    • Involves energy minimization and conformational search algorithms
    • Computationally intensive and limited to small proteins
  • Fold recognition (or threading) methods predict protein structures by fitting the target sequence to known protein folds
    • Relies on the observation that many proteins adopt similar folds despite having different sequences
    • Helps in identifying distant evolutionary relationships and novel protein folds

Homology Modeling

  • modeling (or comparative modeling) predicts the structure of a protein based on its sequence similarity to one or more known structures (templates)
    • Relies on the principle that evolutionarily related proteins often have similar structures
    • Involves sequence alignment, template selection, model building, and refinement
  • Homology modeling is the most reliable method for protein structure prediction when suitable templates are available
  • Widely used in drug design, protein engineering, and understanding protein-ligand interactions
  • Examples of homology modeling software include MODELLER and SWISS-MODEL
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary