📊Bayesian Statistics Unit 1 – Probability Theory Foundations
Probability theory forms the foundation of Bayesian statistics. It provides tools to measure and analyze uncertainty, from basic concepts like sample spaces and events to more complex ideas like probability distributions and conditional probabilities.
Key concepts include probability axioms, random variables, and distributions. These are essential for understanding Bayesian inference, which uses prior knowledge and observed data to update probabilities and make informed decisions in various fields.
Useful for updating probabilities based on new information or evidence
Prior probability initial probability of an event before considering any additional information (prevalence of a disease in a population)
Likelihood probability of observing the data given a specific hypothesis (probability of a positive test result given that a person has the disease)
Posterior probability updated probability of an event after considering new information (probability of having the disease given a positive test result)
Bayes' Theorem in terms of prior, likelihood, and evidence P(H∣E)=P(E)P(E∣H)P(H), where H is the hypothesis and E is the evidence
Expectation and Variance
Expectation (mean) of a discrete random variable X E[X]=∑xx⋅P(X=x)
For a continuous random variable, replace the sum with an integral
Expectation is a linear operator for constants a and b and random variables X and Y, E[aX+bY]=aE[X]+bE[Y]
Variance measures the spread of a random variable X around its mean Var(X)=E[(X−E[X])2]
Can also be calculated as Var(X)=E[X2]−(E[X])2
Standard deviation square root of the variance σX=Var(X)
Covariance measures the linear relationship between two random variables X and Y Cov(X,Y)=E[(X−E[X])(Y−E[Y])]
Correlation normalized version of covariance ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation) Corr(X,Y)=σXσYCov(X,Y)
Joint and Marginal Distributions
Joint probability distribution P(X,Y) describes the probability of two random variables X and Y taking on specific values simultaneously
For discrete random variables, joint PMF P(X=x,Y=y) gives the probability of X=x and Y=y occurring together
For continuous random variables, joint PDF f(x,y) describes the probability density at (x, y)
Marginal probability distribution probability distribution of a single random variable, ignoring the others
For discrete random variables, marginal PMF P(X=x)=∑yP(X=x,Y=y)
For continuous random variables, marginal PDF fX(x)=∫−∞∞f(x,y)dy
Conditional probability distribution probability distribution of one random variable given the value of another
For discrete random variables, conditional PMF P(Y=y∣X=x)=P(X=x)P(X=x,Y=y)
For continuous random variables, conditional PDF fY∣X(y∣x)=fX(x)f(x,y)
Independence for random variables X and Y are independent if and only if their joint probability distribution is the product of their marginal distributions P(X,Y)=P(X)P(Y) or f(x,y)=fX(x)fY(y)
Probability in Bayesian Context
Bayesian inference updating beliefs or probabilities based on new data or evidence
Combines prior knowledge with observed data to obtain a posterior distribution
Prior distribution P(θ) represents the initial beliefs about a parameter θ before observing any data
Can be informative (based on previous studies or expert knowledge) or non-informative (uniform or vague priors)
Likelihood function P(D∣θ) probability of observing the data D given the parameter θ
Describes how likely the observed data is for different values of θ
Posterior distribution P(θ∣D) updated beliefs about the parameter θ after observing the data D
Obtained by combining the prior and likelihood using Bayes' Theorem P(θ∣D)=P(D)P(D∣θ)P(θ)
Marginal likelihood (evidence) P(D) probability of observing the data D, averaged over all possible values of θ
Acts as a normalizing constant in Bayes' Theorem P(D)=∫P(D∣θ)P(θ)dθ
Bayesian model comparison selecting among competing models based on their posterior probabilities
Bayes factor BF12=P(D∣M2)P(D∣M1) compares the evidence for two models M1 and M2
Common Applications and Examples
Bayesian A/B testing comparing two versions of a website or app to determine which performs better
Prior distribution represents initial beliefs about the conversion rates
Likelihood function based on the observed number of conversions and visitors for each version
Posterior distribution updates the beliefs about the conversion rates after observing the data
Bayesian parameter estimation inferring the values of model parameters from observed data
Prior distribution represents initial beliefs about the parameters (mean and standard deviation of a normal distribution)
Likelihood function based on the observed data points
Posterior distribution provides updated estimates of the parameters
Bayesian classification assigning an object to one of several classes based on its features
Prior distribution represents the initial probabilities of each class
Likelihood function describes the probability of observing the features given each class
Posterior distribution gives the updated probabilities of each class after observing the features
Bayesian regression fitting a linear or nonlinear model to observed data points
Prior distribution represents initial beliefs about the regression coefficients
Likelihood function based on the observed data points and the assumed noise distribution
Posterior distribution provides updated estimates of the regression coefficients
Bayesian networks graphical models representing the probabilistic relationships among a set of variables
Nodes represent variables, and edges represent conditional dependencies
Joint probability distribution factorizes according to the graph structure
Inference and learning algorithms used to update probabilities and learn the structure from data