Bayesian Statistics

📊Bayesian Statistics Unit 1 – Probability Theory Foundations

Probability theory forms the foundation of Bayesian statistics. It provides tools to measure and analyze uncertainty, from basic concepts like sample spaces and events to more complex ideas like probability distributions and conditional probabilities. Key concepts include probability axioms, random variables, and distributions. These are essential for understanding Bayesian inference, which uses prior knowledge and observed data to update probabilities and make informed decisions in various fields.

Key Concepts and Terminology

  • Probability measures the likelihood of an event occurring ranges from 0 (impossible) to 1 (certain)
  • Sample space (Ω\Omega) set of all possible outcomes of a random experiment
  • Event (A) subset of the sample space represents a specific outcome or set of outcomes
  • Probability density function (PDF) describes the probability distribution of a continuous random variable
    • Integrating the PDF over a specific range yields the probability of the random variable falling within that range
  • Probability mass function (PMF) describes the probability distribution of a discrete random variable
    • Summing the PMF over all possible values equals 1
  • Cumulative distribution function (CDF) gives the probability that a random variable takes a value less than or equal to a given value
  • Independence two events are independent if the occurrence of one does not affect the probability of the other

Probability Axioms and Rules

  • Axiom 1 (Non-negativity) probability of any event A is greater than or equal to 0 (P(A)0P(A) \geq 0)
  • Axiom 2 (Normalization) probability of the entire sample space is equal to 1 (P(Ω)=1P(\Omega) = 1)
  • Axiom 3 (Additivity) if A and B are mutually exclusive events, then P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B)
  • Complement Rule probability of an event A not occurring is 1 minus the probability of A occurring (P(Ac)=1P(A)P(A^c) = 1 - P(A))
  • Multiplication Rule for independent events A and B, P(AB)=P(A)×P(B)P(A \cap B) = P(A) \times P(B)
    • For dependent events, P(AB)=P(A)×P(BA)P(A \cap B) = P(A) \times P(B|A), where P(BA)P(B|A) is the conditional probability of B given A
  • Addition Rule for non-mutually exclusive events A and B, P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
  • Law of Total Probability if B1,B2,...,BnB_1, B_2, ..., B_n form a partition of the sample space, then for any event A, P(A)=i=1nP(ABi)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^n P(A \cap B_i) = \sum_{i=1}^n P(A|B_i)P(B_i)

Random Variables and Distributions

  • Random variable (X) function that assigns a numerical value to each outcome in a sample space
  • Discrete random variable can take on a countable number of distinct values (number of defective items in a batch)
  • Continuous random variable can take on any value within a specified range or interval (height of a randomly selected person)
  • Bernoulli distribution models a single trial with two possible outcomes (success or failure) with probability of success p
  • Binomial distribution models the number of successes in a fixed number of independent Bernoulli trials with probability of success p
  • Poisson distribution models the number of events occurring in a fixed interval of time or space, given the average rate of occurrence
  • Normal (Gaussian) distribution continuous probability distribution characterized by its mean (μ\mu) and standard deviation (σ\sigma)
    • 68-95-99.7 rule approximately 68%, 95%, and 99.7% of the data falls within 1, 2, and 3 standard deviations of the mean, respectively
  • Exponential distribution models the time between events in a Poisson process, with a constant average rate of occurrence

Conditional Probability and Bayes' Theorem

  • Conditional probability P(AB)P(A|B) probability of event A occurring given that event B has occurred
    • Calculated as P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}, where P(B)>0P(B) > 0
  • Bayes' Theorem relates conditional probabilities P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}
    • Useful for updating probabilities based on new information or evidence
  • Prior probability initial probability of an event before considering any additional information (prevalence of a disease in a population)
  • Likelihood probability of observing the data given a specific hypothesis (probability of a positive test result given that a person has the disease)
  • Posterior probability updated probability of an event after considering new information (probability of having the disease given a positive test result)
  • Bayes' Theorem in terms of prior, likelihood, and evidence P(HE)=P(EH)P(H)P(E)P(H|E) = \frac{P(E|H)P(H)}{P(E)}, where H is the hypothesis and E is the evidence

Expectation and Variance

  • Expectation (mean) of a discrete random variable X E[X]=xxP(X=x)E[X] = \sum_{x} x \cdot P(X=x)
    • For a continuous random variable, replace the sum with an integral
  • Expectation is a linear operator for constants a and b and random variables X and Y, E[aX+bY]=aE[X]+bE[Y]E[aX + bY] = aE[X] + bE[Y]
  • Variance measures the spread of a random variable X around its mean Var(X)=E[(XE[X])2]Var(X) = E[(X - E[X])^2]
    • Can also be calculated as Var(X)=E[X2](E[X])2Var(X) = E[X^2] - (E[X])^2
  • Standard deviation square root of the variance σX=Var(X)\sigma_X = \sqrt{Var(X)}
  • Covariance measures the linear relationship between two random variables X and Y Cov(X,Y)=E[(XE[X])(YE[Y])]Cov(X, Y) = E[(X - E[X])(Y - E[Y])]
  • Correlation normalized version of covariance ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation) Corr(X,Y)=Cov(X,Y)σXσYCorr(X, Y) = \frac{Cov(X, Y)}{\sigma_X \sigma_Y}

Joint and Marginal Distributions

  • Joint probability distribution P(X,Y)P(X, Y) describes the probability of two random variables X and Y taking on specific values simultaneously
    • For discrete random variables, joint PMF P(X=x,Y=y)P(X=x, Y=y) gives the probability of X=x and Y=y occurring together
    • For continuous random variables, joint PDF f(x,y)f(x, y) describes the probability density at (x, y)
  • Marginal probability distribution probability distribution of a single random variable, ignoring the others
    • For discrete random variables, marginal PMF P(X=x)=yP(X=x,Y=y)P(X=x) = \sum_y P(X=x, Y=y)
    • For continuous random variables, marginal PDF fX(x)=f(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f(x, y) dy
  • Conditional probability distribution probability distribution of one random variable given the value of another
    • For discrete random variables, conditional PMF P(Y=yX=x)=P(X=x,Y=y)P(X=x)P(Y=y|X=x) = \frac{P(X=x, Y=y)}{P(X=x)}
    • For continuous random variables, conditional PDF fYX(yx)=f(x,y)fX(x)f_{Y|X}(y|x) = \frac{f(x, y)}{f_X(x)}
  • Independence for random variables X and Y are independent if and only if their joint probability distribution is the product of their marginal distributions P(X,Y)=P(X)P(Y)P(X, Y) = P(X)P(Y) or f(x,y)=fX(x)fY(y)f(x, y) = f_X(x)f_Y(y)

Probability in Bayesian Context

  • Bayesian inference updating beliefs or probabilities based on new data or evidence
    • Combines prior knowledge with observed data to obtain a posterior distribution
  • Prior distribution P(θ)P(\theta) represents the initial beliefs about a parameter θ\theta before observing any data
    • Can be informative (based on previous studies or expert knowledge) or non-informative (uniform or vague priors)
  • Likelihood function P(Dθ)P(D|\theta) probability of observing the data D given the parameter θ\theta
    • Describes how likely the observed data is for different values of θ\theta
  • Posterior distribution P(θD)P(\theta|D) updated beliefs about the parameter θ\theta after observing the data D
    • Obtained by combining the prior and likelihood using Bayes' Theorem P(θD)=P(Dθ)P(θ)P(D)P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}
  • Marginal likelihood (evidence) P(D)P(D) probability of observing the data D, averaged over all possible values of θ\theta
    • Acts as a normalizing constant in Bayes' Theorem P(D)=P(Dθ)P(θ)dθP(D) = \int P(D|\theta)P(\theta) d\theta
  • Bayesian model comparison selecting among competing models based on their posterior probabilities
    • Bayes factor BF12=P(DM1)P(DM2)BF_{12} = \frac{P(D|M_1)}{P(D|M_2)} compares the evidence for two models M1M_1 and M2M_2

Common Applications and Examples

  • Bayesian A/B testing comparing two versions of a website or app to determine which performs better
    • Prior distribution represents initial beliefs about the conversion rates
    • Likelihood function based on the observed number of conversions and visitors for each version
    • Posterior distribution updates the beliefs about the conversion rates after observing the data
  • Bayesian parameter estimation inferring the values of model parameters from observed data
    • Prior distribution represents initial beliefs about the parameters (mean and standard deviation of a normal distribution)
    • Likelihood function based on the observed data points
    • Posterior distribution provides updated estimates of the parameters
  • Bayesian classification assigning an object to one of several classes based on its features
    • Prior distribution represents the initial probabilities of each class
    • Likelihood function describes the probability of observing the features given each class
    • Posterior distribution gives the updated probabilities of each class after observing the features
  • Bayesian regression fitting a linear or nonlinear model to observed data points
    • Prior distribution represents initial beliefs about the regression coefficients
    • Likelihood function based on the observed data points and the assumed noise distribution
    • Posterior distribution provides updated estimates of the regression coefficients
  • Bayesian networks graphical models representing the probabilistic relationships among a set of variables
    • Nodes represent variables, and edges represent conditional dependencies
    • Joint probability distribution factorizes according to the graph structure
    • Inference and learning algorithms used to update probabilities and learn the structure from data


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary