You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Substitution matrices are essential tools in sequence alignment, helping us understand how amino acids change over time. They assign scores to different amino acid swaps, showing which ones are more likely to happen as proteins evolve.

and are two main types of substitution matrices. PAM works best for closely related sequences, while BLOSUM is better for more distant relatives. Choosing the right matrix is key to getting accurate alignments and uncovering evolutionary relationships.

Substitution Matrices for Alignment

Role of Substitution Matrices

Top images from around the web for Role of Substitution Matrices
Top images from around the web for Role of Substitution Matrices
  • Substitution matrices quantify the likelihood of amino acid substitutions occurring during evolution
  • Provide scores for each possible , reflecting the probability of one amino acid being replaced by another over evolutionary time
    • Higher scores indicate more frequent and favorable substitutions (e.g., substitution of amino acids with similar properties like isoleucine and valine)
    • Lower or negative scores suggest less likely or unfavorable substitutions (e.g., substitution of amino acids with distinct properties like proline and tryptophan)
  • Enable the alignment of sequences by assigning scores to matches, mismatches, and gaps
    • Allow the identification of conserved regions and potential homology between sequences
  • Choice of an appropriate substitution matrix is crucial for accurate sequence alignment and homology detection
    • Directly influences the alignment quality and the inferred evolutionary relationships

Importance of Substitution Matrices

  • Essential tools used in sequence alignment algorithms
  • Crucial for identifying conserved regions and potential homology between sequences
  • Enable the comparison of sequences from different species or organisms
    • Help infer evolutionary relationships and functional similarities
  • Facilitate the detection of remote homologs and the identification of novel protein families
  • Play a key role in various bioinformatics applications, such as
    • Functional annotation of genes and proteins

PAM vs BLOSUM Matrices

PAM (Point Accepted Mutation) Matrices

  • Derived from closely related protein sequences
  • Model the evolutionary changes that occur over a specified number of accepted point mutations per 100 amino acids (e.g., PAM1, PAM250)
  • Based on the assumption that the evolutionary process is Markovian
    • Probability of an amino acid substitution depends only on the current amino acid and not on the previous substitutions
  • Extrapolated from a small number of observed mutations
  • Suitable for aligning closely related sequences (e.g., sequences within the same species or genus)

BLOSUM (BLOcks SUbstitution Matrix) Matrices

  • Derived from conserved sequence blocks in aligned protein families
  • Reflect the observed amino acid substitutions in conserved regions
  • Empirically derived and do not assume a specific evolutionary model
  • Different BLOSUM matrices are constructed based on the minimum sequence identity of the conserved blocks used in their derivation (e.g., BLOSUM45, BLOSUM62, BLOSUM80)
    • Lower numbers (e.g., BLOSUM45) are more suitable for aligning closely related sequences
    • Higher numbers (e.g., BLOSUM80) are more appropriate for aligning distantly related sequences
  • More suitable for aligning distantly related sequences and detecting remote homologs

Comparison and Applications

  • PAM matrices are generally used for aligning closely related sequences
    • Example: Aligning mammalian hemoglobin sequences to study recent evolutionary changes
  • BLOSUM matrices are preferred for aligning more divergent sequences and identifying distant evolutionary relationships
    • Example: Comparing bacterial and human protein sequences to identify conserved functional domains
  • The choice between PAM and BLOSUM matrices depends on the specific research question and the evolutionary distance between the sequences being analyzed

Choosing Substitution Matrices

Factors to Consider

  • Evolutionary distance between the sequences being aligned
    • Closely related sequences (e.g., within the same species or genus) → PAM matrices with lower numbers (e.g., PAM30, PAM70)
    • Distantly related sequences (e.g., between different phyla or kingdoms) → BLOSUM matrices with higher numbers (e.g., BLOSUM62, BLOSUM80)
  • Type of sequences being aligned
    • Protein-coding DNA sequences → Consider codon structure and potential for synonymous substitutions
      • Use matrices specifically designed for codon alignments (e.g., Goldman-Yang matrix, Muse-Gaut matrix)
    • Non-coding DNA sequences → Use matrices that account for specific evolutionary patterns and constraints of these regions (e.g., Tamura-Nei matrix, Kimura 2-parameter model)

Testing and Assessing Alignment Quality

  • Test multiple substitution matrices
  • Assess the alignment quality and biological relevance of the results
    • Evaluate the of known functional motifs or structural elements
    • Compare the alignment with existing biological knowledge or experimental data
  • Determine the most suitable matrix for a given analysis based on the alignment quality and biological insights gained

Interpreting Substitution Matrix Scores

Score Interpretation

  • Scores represent the log-odds ratios of the observed frequency of an amino acid substitution relative to the expected frequency based on amino acid composition
  • Positive scores → Observed substitution frequency is higher than expected by chance
    • Substitution is more likely to be tolerated and conserved during evolution
    • Higher positive scores imply a stronger preference for the substitution and a higher degree of sequence similarity and evolutionary conservation
      • Example: Score of +4 for the substitution of isoleucine (I) and valine (V) in the BLOSUM62 matrix
  • Negative scores → Observed substitution frequency is lower than expected by chance
    • Substitution is less likely to occur and may be detrimental to protein structure or function
    • Lower negative scores imply a stronger avoidance of the substitution and a lower degree of sequence similarity and evolutionary relatedness
      • Example: Score of -4 for the substitution of proline (P) and tryptophan (W) in the BLOSUM62 matrix
  • Scores close to zero → Observed substitution frequency is similar to the expected frequency
    • Neutral effect on protein evolution

Inferring Evolutionary Relationships

  • Compare scores across different substitution matrices
    • Higher scores suggest closer evolutionary ties
    • Lower scores indicate more distant relationships
  • Combine the interpretation of substitution matrix scores with other sources of information
    • Structural data
    • Functional annotations
    • Phylogenetic analyses
  • Gain a comprehensive understanding of the evolutionary history and functional implications of the aligned sequences
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary