🎲Data Science Statistics Unit 2 – Probability Axioms and Bayes' Theorem

Probability axioms and Bayes' Theorem form the foundation of statistical reasoning in data science. These concepts provide a framework for quantifying uncertainty, updating beliefs based on evidence, and making informed decisions in various fields. Understanding these principles is crucial for data scientists. They enable the development of powerful machine learning algorithms, statistical models, and predictive tools that can extract meaningful insights from complex datasets and drive data-driven decision-making processes.

Key Concepts

  • Probability quantifies the likelihood of an event occurring ranges from 0 (impossible) to 1 (certain)
  • Sample space represents all possible outcomes of an experiment or random process
  • Events are subsets of the sample space can be combined using set operations (union, intersection, complement)
  • Probability axioms provide the foundation for calculating probabilities ensure consistency and validity
  • Conditional probability measures the probability of an event occurring given that another event has already occurred
    • Denoted as P(AB)P(A|B) read as "the probability of A given B"
    • Calculated using the formula P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}
  • Independence two events are independent if the occurrence of one does not affect the probability of the other
  • Bayes' Theorem allows updating probabilities based on new evidence or information
    • Relates conditional probabilities P(AB)P(A|B) and P(BA)P(B|A)
    • Formula: P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}

Probability Basics

  • Probability is a measure of the likelihood that an event will occur
  • Expressed as a number between 0 and 1
    • 0 indicates an impossible event
    • 1 indicates a certain event
  • Sample space (usually denoted as Ω\Omega) is the set of all possible outcomes of an experiment or random process
  • An event is a subset of the sample space
    • Simple event consists of a single outcome (rolling a 6 on a die)
    • Compound event consists of multiple outcomes (rolling an even number on a die)
  • Probability of an event A is denoted as P(A)P(A)
  • Calculated by dividing the number of favorable outcomes by the total number of possible outcomes (assuming equally likely outcomes)

Probability Axioms

  • Axiom 1 (Non-negativity): The probability of any event A is greater than or equal to 0 P(A)0P(A) \geq 0
  • Axiom 2 (Normalization): The probability of the entire sample space is equal to 1 P(Ω)=1P(\Omega) = 1
  • Axiom 3 (Additivity): For any two mutually exclusive events A and B, the probability of their union is the sum of their individual probabilities P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B)
  • Consequences of the axioms:
    • The probability of an impossible event (empty set) is 0 P()=0P(\emptyset) = 0
    • The probability of the complement of an event A is P(Ac)=1P(A)P(A^c) = 1 - P(A)
    • For any two events A and B, P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
  • Axioms ensure consistency and validity of probability calculations provide a foundation for deriving other probability rules and theorems

Set Theory and Probability

  • Set theory provides a framework for describing and manipulating events in probability
  • Union of two events A and B (denoted as ABA \cup B) is the event that occurs when either A or B or both occur
  • Intersection of two events A and B (denoted as ABA \cap B) is the event that occurs when both A and B occur simultaneously
  • Complement of an event A (denoted as AcA^c) is the event that occurs when A does not occur
  • Mutually exclusive events cannot occur simultaneously P(AB)=0P(A \cap B) = 0
  • Exhaustive events collectively cover the entire sample space P(AB)=1P(A \cup B) = 1
  • Venn diagrams visually represent relationships between events using overlapping circles
    • Overlapping regions represent intersections
    • Non-overlapping regions represent mutually exclusive events

Conditional Probability

  • Conditional probability measures the probability of an event A occurring given that another event B has already occurred
  • Denoted as P(AB)P(A|B) read as "the probability of A given B"
  • Calculated using the formula P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}
    • Numerator is the probability of both A and B occurring
    • Denominator is the probability of B occurring
  • Multiplication rule: P(AB)=P(AB)P(B)=P(BA)P(A)P(A \cap B) = P(A|B)P(B) = P(B|A)P(A)
  • Independence two events A and B are independent if P(AB)=P(A)P(A|B) = P(A) or P(BA)=P(B)P(B|A) = P(B)
    • Occurrence of one event does not affect the probability of the other
    • For independent events, P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B)
  • Conditional probability is used to update probabilities based on new information or evidence

Bayes' Theorem

  • Bayes' Theorem relates conditional probabilities P(AB)P(A|B) and P(BA)P(B|A)
  • Allows updating probabilities based on new evidence or information
  • Formula: P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}
    • P(A)P(A) is the prior probability of A before considering evidence B
    • P(BA)P(B|A) is the likelihood of observing evidence B given that A is true
    • P(B)P(B) is the marginal probability of observing evidence B
    • P(AB)P(A|B) is the posterior probability of A after considering evidence B
  • Bayes' Theorem is derived from the multiplication rule and the law of total probability
  • Useful for updating beliefs or probabilities in light of new data (medical diagnosis, spam email filtering)
  • Requires specifying prior probabilities and likelihoods can be subjective or based on historical data

Applications in Data Science

  • Probability theory is fundamental to many aspects of data science and machine learning
  • Bayesian inference uses Bayes' Theorem to update probabilities or beliefs based on data
    • Prior probabilities represent initial beliefs about parameters or hypotheses
    • Likelihoods quantify the probability of observing the data given the parameters or hypotheses
    • Posterior probabilities represent updated beliefs after considering the data
  • Naive Bayes classifiers apply Bayes' Theorem to classify instances based on feature probabilities
    • Assumes independence between features given the class label
    • Efficient and effective for text classification and spam filtering
  • Probabilistic graphical models (Bayesian networks, Markov random fields) represent joint probability distributions over sets of random variables
    • Encode conditional independence assumptions using graph structures
    • Enable efficient inference and learning from data
  • Probability distributions (Gaussian, Bernoulli, Poisson) model the likelihood of different outcomes or values
    • Used for modeling data generating processes and making probabilistic predictions

Practice Problems and Examples

  1. A fair die is rolled. What is the probability of getting an even number?

    • Sample space: Ω={1,2,3,4,5,6}\Omega = \{1, 2, 3, 4, 5, 6\}
    • Event A: Getting an even number A={2,4,6}A = \{2, 4, 6\}
    • P(A)=36=12P(A) = \frac{3}{6} = \frac{1}{2}
  2. Two fair coins are tossed. What is the probability of getting at least one head?

    • Sample space: Ω={HH,HT,TH,TT}\Omega = \{HH, HT, TH, TT\}
    • Event A: Getting at least one head A={HH,HT,TH}A = \{HH, HT, TH\}
    • P(A)=34P(A) = \frac{3}{4}
  3. A bag contains 4 red balls and 6 blue balls. If two balls are drawn at random without replacement, what is the probability that both balls are red?

    • Total balls: 10
    • P(1st red)=410P(\text{1st red}) = \frac{4}{10}
    • P(2nd red | 1st red)=39P(\text{2nd red | 1st red}) = \frac{3}{9}
    • P(both red)=P(1st red)×P(2nd red | 1st red)=410×39=215P(\text{both red}) = P(\text{1st red}) \times P(\text{2nd red | 1st red}) = \frac{4}{10} \times \frac{3}{9} = \frac{2}{15}
  4. A medical test has a 95% accuracy rate for detecting a disease when it is present and a 90% accuracy rate for correctly identifying the absence of the disease. If 1% of the population has the disease, what is the probability that a person has the disease given that they test positive?

    • Let D be the event that a person has the disease and T be the event that they test positive
    • P(D)=0.01P(D) = 0.01, P(TD)=0.95P(T|D) = 0.95, P(TDc)=0.10P(T|D^c) = 0.10
    • Using Bayes' Theorem: P(DT)=P(TD)P(D)P(TD)P(D)+P(TDc)P(Dc)=0.95×0.010.95×0.01+0.10×0.990.087P(D|T) = \frac{P(T|D)P(D)}{P(T|D)P(D) + P(T|D^c)P(D^c)} = \frac{0.95 \times 0.01}{0.95 \times 0.01 + 0.10 \times 0.99} \approx 0.087


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.