You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Probability theory forms the backbone of causal inference, providing tools to quantify uncertainty and make informed decisions. It introduces key concepts like probability distributions, independence, and , which are essential for understanding cause-and-effect relationships.

Mastering probability theory enables researchers to model complex scenarios, estimate causal effects, and assess the strength of evidence. From basic axioms to advanced concepts like and limit theorems, probability theory equips us with the necessary framework to tackle causal inference challenges.

Basics of probability

  • Probability is a fundamental concept in statistics and causal inference that quantifies the likelihood of an event occurring
  • Understanding probability is crucial for making inferences about cause-and-effect relationships and assessing the strength of evidence for causal claims

Probability axioms

Top images from around the web for Probability axioms
Top images from around the web for Probability axioms
  • Non-negativity: Probability of an event is always greater than or equal to 0, P(A)0P(A) \geq 0
  • Normalization: Probability of the entire sample space is equal to 1, P(S)=1P(S) = 1
  • Additivity: If events A and B are mutually exclusive, then P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B)
  • Complementary events: The probability of an event A and its complement A' sum to 1, P(A)+P(A)=1P(A) + P(A') = 1

Sample spaces and events

  • Sample space (S) is the set of all possible outcomes of a random experiment (coin toss, rolling a die)
  • An event (A) is a subset of the sample space, representing a specific outcome or group of outcomes (getting heads, rolling an even number)
  • Events can be simple (a single outcome) or compound (a combination of outcomes)
  • cannot occur simultaneously (rolling a 1 and rolling a 6 on a single die roll)

Conditional probability

  • Conditional probability P(AB)P(A|B) is the probability of event A occurring given that event B has already occurred
  • Calculated as P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}, where P(AB)P(A \cap B) is the probability of both A and B occurring
  • Allows for updating probabilities based on new information or evidence
  • Helps in understanding the dependence between events and is crucial for causal inference

Probability distributions

  • A is a function that describes the likelihood of different outcomes in a
  • Probability distributions are essential for modeling uncertainty and variability in causal inference

Discrete probability distributions

  • Discrete random variables have a countable number of possible outcomes (number of defective items in a batch)
  • Probability mass function (PMF) assigns probabilities to each possible outcome
  • Examples include Bernoulli, binomial, and Poisson distributions

Continuous probability distributions

  • Continuous random variables can take on any value within a specified range (height, weight)
  • Probability density function (PDF) describes the relative likelihood of different values
  • Examples include normal, exponential, and uniform distributions
  • Probabilities are calculated using integrals of the PDF over a given range

Joint probability distributions

  • distribution describes the probabilities of two or more random variables occurring together
  • Denoted as P(X,Y)P(X, Y) for random variables X and Y
  • Allows for modeling the dependence between multiple variables
  • Marginal and conditional probabilities can be derived from the joint distribution

Marginal probability distributions

  • Marginal probability distribution is the probability distribution of a single random variable, ignoring the others
  • Obtained by summing (discrete) or integrating (continuous) the joint distribution over the other variables
  • Provides information about the individual behavior of a random variable
  • Useful for simplifying complex joint distributions and focusing on specific variables of interest

Independence and dependence

  • Independence and dependence describe the relationship between events or random variables
  • Understanding these concepts is crucial for correctly modeling and interpreting causal relationships

Independent events

  • Events A and B are independent if the occurrence of one does not affect the probability of the other
  • Mathematically, P(AB)=P(A)P(A|B) = P(A) and P(BA)=P(B)P(B|A) = P(B)
  • For , the joint probability is the product of the individual probabilities, P(AB)=P(A)×P(B)P(A \cap B) = P(A) \times P(B)
  • Example: Flipping a fair coin twice, the outcome of the second flip is independent of the first

Dependent events

  • Events A and B are dependent if the occurrence of one affects the probability of the other
  • Mathematically, P(AB)P(A)P(A|B) \neq P(A) or P(BA)P(B)P(B|A) \neq P(B)
  • The joint probability of dependent events is not equal to the product of their individual probabilities
  • Example: Drawing cards from a deck without replacement, the probability of drawing a specific card changes after each draw

Conditional independence

  • Events A and B are conditionally independent given event C if P(AB,C)=P(AC)P(A|B,C) = P(A|C) and P(BA,C)=P(BC)P(B|A,C) = P(B|C)
  • Conditional independence implies that once we know the outcome of C, the occurrence of A does not provide any additional information about B, and vice versa
  • Plays a crucial role in causal inference, as it helps in identifying confounding factors and estimating causal effects

Bayes' theorem

  • Bayes' theorem is a fundamental rule in probability theory that describes how to update probabilities based on new evidence
  • It is named after the Reverend Thomas Bayes, an 18th-century British statistician and Presbyterian minister

Bayes' rule

  • Bayes' rule states that the probability of an event A given event B is equal to the probability of event B given A, multiplied by the probability of A, divided by the probability of B
  • Mathematically, P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}
  • Allows for updating prior probabilities (before observing evidence) to posterior probabilities (after observing evidence)
  • Example: In medical testing, Bayes' rule can be used to calculate the probability of a patient having a disease given a positive test result

Prior vs posterior probabilities

  • Prior probability P(A)P(A) is the initial probability of an event A before observing any evidence
  • Posterior probability P(AB)P(A|B) is the updated probability of event A after observing evidence B
  • Bayes' rule provides a way to calculate the posterior probability by combining the prior probability with the likelihood of the evidence
  • Example: Prior probability of a patient having a disease based on population prevalence, updated to a posterior probability after a positive test result

Bayesian inference

  • Bayesian inference is a method of statistical inference that uses Bayes' theorem to update probabilities as more evidence becomes available
  • Involves specifying a prior distribution for the parameters of interest, then updating it with observed data to obtain a posterior distribution
  • Allows for incorporating prior knowledge and beliefs into the analysis
  • Widely used in causal inference for estimating causal effects, handling missing data, and assessing the sensitivity of results to assumptions

Expectation and variance

  • Expectation and are two fundamental concepts in probability theory that describe the central tendency and variability of a random variable
  • They are essential for summarizing and comparing probability distributions in causal inference

Expected value

  • The (or mean) of a random variable X, denoted as E(X)E(X), is the average value of X over its entire range
  • For a discrete random variable, E(X)=xx×P(X=x)E(X) = \sum_{x} x \times P(X=x), where xx are the possible values of X
  • For a continuous random variable, E(X)=x×f(x)dxE(X) = \int_{-\infty}^{\infty} x \times f(x) dx, where f(x)f(x) is the probability density function
  • Represents the long-run average value of the random variable if the experiment is repeated many times

Variance and standard deviation

  • Variance, denoted as Var(X)Var(X) or σ2\sigma^2, measures the average squared deviation of a random variable X from its expected value
  • Calculated as Var(X)=E[(XE(X))2]Var(X) = E[(X - E(X))^2]
  • Standard deviation, denoted as σ\sigma, is the square root of the variance and measures the average deviation from the mean
  • Both variance and standard deviation quantify the spread or dispersion of a probability distribution

Covariance and correlation

  • Covariance, denoted as Cov(X,Y)Cov(X,Y), measures the joint variability of two random variables X and Y
  • Calculated as Cov(X,Y)=E[(XE(X))(YE(Y))]Cov(X,Y) = E[(X - E(X))(Y - E(Y))]
  • A positive covariance indicates that X and Y tend to increase or decrease together, while a negative covariance suggests an inverse relationship
  • , denoted as ρ(X,Y)\rho(X,Y), is a standardized version of covariance that ranges from -1 to 1
  • Calculated as ρ(X,Y)=Cov(X,Y)σXσY\rho(X,Y) = \frac{Cov(X,Y)}{\sigma_X \sigma_Y}, where σX\sigma_X and σY\sigma_Y are the standard deviations of X and Y
  • Correlation measures the strength and direction of the linear relationship between two variables

Common probability distributions

  • Probability distributions are mathematical functions that describe the likelihood of different outcomes for a random variable
  • Understanding common probability distributions is essential for modeling and analyzing data in causal inference

Bernoulli and binomial distributions

  • Bernoulli distribution models a single trial with two possible outcomes (success or failure), with a fixed probability of success pp
  • Probability mass function: P(X=1)=pP(X=1) = p and P(X=0)=1pP(X=0) = 1-p
  • models the number of successes in a fixed number of independent Bernoulli trials
  • Probability mass function: P(X=k)=(nk)pk(1p)nkP(X=k) = \binom{n}{k} p^k (1-p)^{n-k}, where nn is the number of trials and kk is the number of successes

Poisson distribution

  • Models the number of events occurring in a fixed interval of time or space, given a constant average rate of occurrence
  • Probability mass function: P(X=k)=eλλkk!P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}, where λ\lambda is the average rate of occurrence
  • Often used to model rare events, such as the number of defects in a manufacturing process or the number of accidents in a given time period

Normal distribution

  • Also known as the Gaussian distribution, it is a continuous probability distribution that is symmetric and bell-shaped
  • Probability density function: f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}, where μ\mu is the mean and σ\sigma is the standard deviation
  • Many natural phenomena and measurement errors follow a
  • Central Limit Theorem states that the sum or average of a large number of independent random variables will be approximately normally distributed

Exponential distribution

  • Models the time between events in a Poisson process, or the time until a specific event occurs
  • Probability density function: f(x)=λeλxf(x) = \lambda e^{-\lambda x} for x0x \geq 0, where λ\lambda is the rate parameter
  • Memoryless property: The probability of an event occurring in the next time interval does not depend on how much time has already passed
  • Often used to model waiting times, such as the time between customer arrivals or the time until a machine failure

Limit theorems

  • Limit theorems are fundamental results in probability theory that describe the behavior of random variables and their distributions as the sample size increases
  • They are crucial for making inferences and justifying statistical methods in causal inference

Law of large numbers

  • States that the average of a large number of independent and identically distributed (i.i.d.) random variables will converge to their expected value as the sample size increases
  • Weak law of large numbers: The sample mean converges in probability to the expected value
  • Strong law of large numbers: The sample mean converges almost surely to the expected value
  • Provides a theoretical justification for using sample averages to estimate population means

Central limit theorem

  • States that the sum or average of a large number of i.i.d. random variables will be approximately normally distributed, regardless of the underlying distribution
  • More precisely, if X1,X2,...,XnX_1, X_2, ..., X_n are i.i.d. random variables with mean μ\mu and variance σ2\sigma^2, then i=1nXinμσn\frac{\sum_{i=1}^n X_i - n\mu}{\sigma\sqrt{n}} converges in distribution to a standard normal random variable as nn \to \infty
  • Allows for using normal-based inference methods, such as confidence intervals and hypothesis tests, for non-normal data when the sample size is large

Convergence in probability vs distribution

  • Convergence in probability: A sequence of random variables XnX_n converges in probability to a random variable XX if, for any ϵ>0\epsilon > 0, P(XnX>ϵ)0P(|X_n - X| > \epsilon) \to 0 as nn \to \infty
  • Convergence in distribution: A sequence of random variables XnX_n converges in distribution to a random variable XX if limnFXn(x)=FX(x)\lim_{n \to \infty} F_{X_n}(x) = F_X(x) for all continuity points xx of FXF_X, where FXnF_{X_n} and FXF_X are the cumulative distribution functions of XnX_n and XX, respectively
  • Convergence in probability is a stronger notion than convergence in distribution
  • Both types of convergence are important in causal inference for establishing the asymptotic properties of estimators and test statistics

Probability in causal inference

  • Probability plays a crucial role in causal inference by quantifying the uncertainty associated with cause-and-effect relationships
  • It provides a framework for defining and estimating causal effects, assessing the strength of evidence, and making predictions under different scenarios

Probability of causation

  • The probability of causation (PC) is the probability that an outcome would not have occurred in the absence of a particular cause
  • Formally, PC=P(Y0=0Y1=1)PC = P(Y_0 = 0 | Y_1 = 1), where Y1Y_1 is the observed outcome under the presence of the cause, and Y0Y_0 is the counterfactual outcome under the absence of the cause
  • Quantifies the extent to which a cause is responsible for an observed effect
  • Helps in attributing outcomes to specific causes and making causal attributions

Probability of necessity and sufficiency

  • The probability of necessity (PN) is the probability that an outcome would not have occurred if the cause had been absent
  • Formally, PN=P(Y0=0Y=1)PN = P(Y_0 = 0 | Y = 1), where YY is the observed outcome
  • The probability of sufficiency (PS) is the probability that an outcome would have occurred if the cause had been present
  • Formally, PS=P(Y1=1Y=0)PS = P(Y_1 = 1 | Y = 0)
  • PN and PS provide information about the causal relationship between a cause and an effect
  • High PN suggests that the cause is necessary for the effect, while high PS suggests that the cause is sufficient for the effect

Probability and counterfactuals

  • are hypothetical scenarios that describe what would have happened under different causal conditions
  • In causal inference, counterfactuals are used to define causal effects and reason about cause-and-effect relationships
  • Probability is used to express the uncertainty associated with counterfactual outcomes
  • For example, the average causal effect (ACE) can be defined as ACE=E[Y1Y0]=E[Y1]E[Y0]ACE = E[Y_1 - Y_0] = E[Y_1] - E[Y_0], where Y1Y_1 and Y0Y_0 are the potential outcomes under treatment and control, respectively
  • Counterfactual probabilities, such as P(Y1=1)P(Y_1 = 1) and P(Y0=1)P(Y_0 = 1), are used to estimate causal effects from observational data
  • Probability and counterfactuals provide a unified framework for causal reasoning and inference
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary