Substitution matrices are essential tools in sequence alignment, helping us understand how amino acids change over time. They assign scores to different amino acid swaps, showing which ones are more likely to happen as proteins evolve.
and are two main types of substitution matrices. PAM works best for closely related sequences, while BLOSUM is better for more distant relatives. Choosing the right matrix is key to getting accurate alignments and uncovering evolutionary relationships.
Substitution Matrices for Alignment
Role of Substitution Matrices
Top images from around the web for Role of Substitution Matrices
Multiple sequence alignment - Wikipedia View original
Is this image relevant?
SubVis: an interactive R package for exploring the effects of multiple substitution matrices on ... View original
Is this image relevant?
Multiple sequence alignment - Wikipedia View original
Is this image relevant?
Multiple sequence alignment - Wikipedia View original
Is this image relevant?
SubVis: an interactive R package for exploring the effects of multiple substitution matrices on ... View original
Is this image relevant?
1 of 3
Top images from around the web for Role of Substitution Matrices
Multiple sequence alignment - Wikipedia View original
Is this image relevant?
SubVis: an interactive R package for exploring the effects of multiple substitution matrices on ... View original
Is this image relevant?
Multiple sequence alignment - Wikipedia View original
Is this image relevant?
Multiple sequence alignment - Wikipedia View original
Is this image relevant?
SubVis: an interactive R package for exploring the effects of multiple substitution matrices on ... View original
Is this image relevant?
1 of 3
Substitution matrices quantify the likelihood of amino acid substitutions occurring during evolution
Provide scores for each possible , reflecting the probability of one amino acid being replaced by another over evolutionary time
Higher scores indicate more frequent and favorable substitutions (e.g., substitution of amino acids with similar properties like isoleucine and valine)
Lower or negative scores suggest less likely or unfavorable substitutions (e.g., substitution of amino acids with distinct properties like proline and tryptophan)
Enable the alignment of sequences by assigning scores to matches, mismatches, and gaps
Allow the identification of conserved regions and potential homology between sequences
Choice of an appropriate substitution matrix is crucial for accurate sequence alignment and homology detection
Directly influences the alignment quality and the inferred evolutionary relationships
Importance of Substitution Matrices
Essential tools used in sequence alignment algorithms
Crucial for identifying conserved regions and potential homology between sequences
Enable the comparison of sequences from different species or organisms
Help infer evolutionary relationships and functional similarities
Facilitate the detection of remote homologs and the identification of novel protein families
Play a key role in various bioinformatics applications, such as
Functional annotation of genes and proteins
PAM vs BLOSUM Matrices
PAM (Point Accepted Mutation) Matrices
Derived from closely related protein sequences
Model the evolutionary changes that occur over a specified number of accepted point mutations per 100 amino acids (e.g., PAM1, PAM250)
Based on the assumption that the evolutionary process is Markovian
Probability of an amino acid substitution depends only on the current amino acid and not on the previous substitutions
Extrapolated from a small number of observed mutations
Suitable for aligning closely related sequences (e.g., sequences within the same species or genus)
BLOSUM (BLOcks SUbstitution Matrix) Matrices
Derived from conserved sequence blocks in aligned protein families
Reflect the observed amino acid substitutions in conserved regions
Empirically derived and do not assume a specific evolutionary model
Different BLOSUM matrices are constructed based on the minimum sequence identity of the conserved blocks used in their derivation (e.g., BLOSUM45, BLOSUM62, BLOSUM80)
Lower numbers (e.g., BLOSUM45) are more suitable for aligning closely related sequences
Higher numbers (e.g., BLOSUM80) are more appropriate for aligning distantly related sequences
More suitable for aligning distantly related sequences and detecting remote homologs
Comparison and Applications
PAM matrices are generally used for aligning closely related sequences
Example: Aligning mammalian hemoglobin sequences to study recent evolutionary changes
BLOSUM matrices are preferred for aligning more divergent sequences and identifying distant evolutionary relationships
Example: Comparing bacterial and human protein sequences to identify conserved functional domains
The choice between PAM and BLOSUM matrices depends on the specific research question and the evolutionary distance between the sequences being analyzed
Choosing Substitution Matrices
Factors to Consider
Evolutionary distance between the sequences being aligned
Closely related sequences (e.g., within the same species or genus) → PAM matrices with lower numbers (e.g., PAM30, PAM70)
Distantly related sequences (e.g., between different phyla or kingdoms) → BLOSUM matrices with higher numbers (e.g., BLOSUM62, BLOSUM80)
Type of sequences being aligned
Protein-coding DNA sequences → Consider codon structure and potential for synonymous substitutions
Use matrices specifically designed for codon alignments (e.g., Goldman-Yang matrix, Muse-Gaut matrix)
Non-coding DNA sequences → Use matrices that account for specific evolutionary patterns and constraints of these regions (e.g., Tamura-Nei matrix, Kimura 2-parameter model)
Testing and Assessing Alignment Quality
Test multiple substitution matrices
Assess the alignment quality and biological relevance of the results
Evaluate the of known functional motifs or structural elements
Compare the alignment with existing biological knowledge or experimental data
Determine the most suitable matrix for a given analysis based on the alignment quality and biological insights gained
Interpreting Substitution Matrix Scores
Score Interpretation
Scores represent the log-odds ratios of the observed frequency of an amino acid substitution relative to the expected frequency based on amino acid composition
Positive scores → Observed substitution frequency is higher than expected by chance
Substitution is more likely to be tolerated and conserved during evolution
Higher positive scores imply a stronger preference for the substitution and a higher degree of sequence similarity and evolutionary conservation
Example: Score of +4 for the substitution of isoleucine (I) and valine (V) in the BLOSUM62 matrix
Negative scores → Observed substitution frequency is lower than expected by chance
Substitution is less likely to occur and may be detrimental to protein structure or function
Lower negative scores imply a stronger avoidance of the substitution and a lower degree of sequence similarity and evolutionary relatedness
Example: Score of -4 for the substitution of proline (P) and tryptophan (W) in the BLOSUM62 matrix
Scores close to zero → Observed substitution frequency is similar to the expected frequency
Neutral effect on protein evolution
Inferring Evolutionary Relationships
Compare scores across different substitution matrices
Higher scores suggest closer evolutionary ties
Lower scores indicate more distant relationships
Combine the interpretation of substitution matrix scores with other sources of information
Structural data
Functional annotations
Phylogenetic analyses
Gain a comprehensive understanding of the evolutionary history and functional implications of the aligned sequences