You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

Joint probability distributions are a fundamental concept in stochastic processes, describing how multiple random variables interact. They allow us to model complex systems with multiple uncertain components, providing a framework for analyzing their behavior and making predictions.

These distributions come in discrete and continuous forms, each with unique properties and calculation methods. Understanding marginal and conditional distributions derived from joint distributions is crucial for extracting specific information and updating probabilities based on observed data.

Joint probability distribution definition

  • Joint probability distributions describe the probabilistic relationship between two or more random variables, capturing how likely different combinations of values are to occur simultaneously
  • Allow modeling and analyzing systems or experiments involving multiple uncertain quantities, which is foundational in stochastic processes and many real-world applications

Discrete vs continuous

Top images from around the web for Discrete vs continuous
Top images from around the web for Discrete vs continuous
  • Discrete joint distributions are used when the random variables can only take on a countable number of distinct values (integers, specific categories)
  • Continuous joint distributions apply when the variables have an uncountably infinite range of possible values (real numbers on an interval or the whole real line)
  • The type of joint distribution affects how probabilities are calculated and represented mathematically (sums for discrete, integrals for continuous)

Marginal vs conditional distributions

  • Marginal distributions consider only one variable at a time, ignoring information about the others
    • Obtained by summing (discrete) or integrating (continuous) the joint distribution over the other variables
    • Represents the individual behavior of each component variable
  • Conditional distributions fix the values of some variables and look at the probabilities for the remaining ones
    • Calculated by dividing the joint probability by the marginal of the fixed variables (like Bayes' rule)
    • Shows how the distribution of certain variables changes based on knowledge of others

Joint probability mass functions

  • A (PMF) gives the probability of each possible combination of values for discrete random variables
  • The PMF is a function p(x1,x2,,xn)p(x_1, x_2, \ldots, x_n) that maps from the possible values of the variables to probabilities between 0 and 1
  • The probabilities for all possible outcomes must sum to 1, a key property of valid PMFs

Discrete random variables

  • PMFs are defined over a countable sample space, the set of all possible combinations of values the discrete random variables can take
  • Common discrete distributions used in multivariate settings include multinomial, Poisson, geometric, and more
  • Many concepts from univariate discrete distributions extend intuitively to the multivariate case (expected values, variance, generating functions)

Multivariate distributions

  • A is a joint distribution over more than one variable, discrete or continuous
  • Multivariate PMFs can be represented by tables or matrices enumerating the probability of each possible combination of values
  • Sums and other operations on the PMF can be used to derive useful quantities and distributions (marginals, conditionals, moments)

Calculating probabilities

  • Probabilities of events are calculated by summing the PMF values for all outcomes contained in the event
  • For an event AA defined by conditions on the variables: P(A)=(x1,,xn)Ap(x1,,xn)P(A) = \sum_{(x_1,\ldots,x_n) \in A} p(x_1,\ldots,x_n)
  • The inclusion-exclusion principle and other counting techniques are often helpful in determining which outcomes satisfy the conditions defining an event of interest

Joint probability density functions

  • A (PDF) is used to specify a continuous multivariate distribution
  • Gives the relative likelihood of different combinations of values, but not directly interpretable as probabilities
  • Probabilities are found by integrating the PDF over a region of interest, not just evaluating it at a point

Continuous random variables

  • Joint PDFs apply to continuous random variables that can take any value in a specified range
  • Common continuous multivariate distributions include multivariate normal, exponential, beta, gamma, and more
  • Densities allow working with continuous quantities (measurements, times, etc.) without discretization

Multivariate density functions

  • A multivariate PDF is a function f(x1,,xn)f(x_1,\ldots,x_n) that gives the joint density of continuous random variables X1,,XnX_1,\ldots,X_n
  • Must be non-negative everywhere, and integrate to 1 over the entire domain
  • Can be used to find marginal and conditional PDFs through integration and division similar to the discrete case

Probability calculations with integrals

  • For an event AA defined by conditions on the continuous random variables, the probability is given by an integral: P(A)=Af(x1,,xn)dx1dxnP(A) = \int_{A} f(x_1,\ldots,x_n) dx_1\cdots dx_n
  • Multiple integrals are often required, taken over the region of the sample space corresponding to event AA
  • Computational tools and clever manipulations are often needed to evaluate the integrals for complex regions

Joint cumulative distribution functions

  • The joint cumulative distribution function (CDF) of random variables X1,,XnX_1,\ldots,X_n is defined as F(x1,,xn)=P(X1x1,,Xnxn)F(x_1,\ldots,x_n) = P(X_1 \leq x_1,\ldots, X_n \leq x_n)
  • Gives the probability that each variable is less than or equal to a specified value simultaneously
  • Applies to both discrete and continuous distributions, unifying the PMF and PDF perspectives

CDF definition for joint distributions

  • For discrete variables, the joint CDF can be expressed as a sum: F(x1,,xn)=y1x1ynxnp(y1,,yn)F(x_1,\ldots,x_n) = \sum_{y_1 \leq x_1} \cdots \sum_{y_n \leq x_n} p(y_1,\ldots,y_n)
  • In the continuous case, the CDF is an integral: F(x1,,xn)=x1xnf(y1,,yn)dyndy1F(x_1,\ldots,x_n) = \int_{-\infty}^{x_1} \cdots \int_{-\infty}^{x_n} f(y_1,\ldots,y_n) dy_n \cdots dy_1
  • The CDF is the fundamental way to specify any multivariate distribution, from which other representations can be derived

Properties of joint CDFs

  • Joint CDFs are monotonically increasing in each argument: if xiyix_i \leq y_i for all ii, then F(x1,,xn)F(y1,,yn)F(x_1,\ldots,x_n) \leq F(y_1,\ldots,y_n)
  • Marginal CDFs can be found by taking limits as the other arguments go to infinity: limx1,,xi1,xi+1,,xnF(x1,,xn)=Fi(xi)\lim_{x_1,\ldots,x_{i-1},x_{i+1},\ldots,x_n \to \infty} F(x_1,\ldots,x_n) = F_i(x_i)
  • The joint CDF converges to 1 as all arguments go to infinity, and to 0 if any argument goes to -\infty

Relationship to probability

  • The joint CDF evaluated at particular values gives the probability of the random variables falling in the rectangular region bounded above by those values
  • P(a1<X1b1,,an<Xnbn)=F(b1,,bn)F(b1,,bn1,an)F(a1,b2,,bn)++(1)nF(a1,,an)P(a_1 < X_1 \leq b_1, \ldots, a_n < X_n \leq b_n) = F(b_1,\ldots,b_n) - F(b_1,\ldots,b_{n-1},a_n) - \cdots - F(a_1,b_2,\ldots,b_n) + \cdots + (-1)^n F(a_1,\ldots,a_n)
  • Intuitively, the probability is found by including and excluding the relevant corners of the rectangular region

Independent vs dependent variables

  • and dependence describe the relationship between random variables in a joint distribution
  • Determine whether knowing the value of one variable provides any information about the likely values of the others
  • Have significant implications for inference, sampling, and many applications of joint distributions

Definition of independence

  • Random variables X1,,XnX_1,\ldots,X_n are independent if their joint PMF or PDF factors as a product of marginals: p(x1,,xn)=p1(x1)pn(xn)p(x_1,\ldots,x_n) = p_1(x_1) \cdots p_n(x_n) or f(x1,,xn)=f1(x1)fn(xn)f(x_1,\ldots,x_n) = f_1(x_1) \cdots f_n(x_n)
  • Intuitively, the variables are independent if knowing the values of some of them provides no information about the probabilities of the others
  • Independent variables can be treated separately, simplifying analysis and allowing results from univariate distributions to be applied more easily

Factoring joint distributions

  • For independent variables, the joint PMF, PDF, or CDF can be written as a product of the marginal distributions for each variable
  • This factorization greatly simplifies working with the joint distribution, as the individual variables can be considered in isolation
  • Many results for sums and transformations of independent random variables rely on this product structure

Conditional distributions for dependence

  • If random variables are not independent, their conditional distributions provide a way to describe the dependence between them
  • The conditional PMF or PDF of X1,,XkX_1,\ldots,X_k given Xk+1,,XnX_{k+1},\ldots,X_n is defined as p(x1,,xkxk+1,,xn)=p(x1,,xn)p(xk+1,,xn)p(x_1,\ldots,x_k | x_{k+1},\ldots,x_n) = \frac{p(x_1,\ldots,x_n)}{p(x_{k+1},\ldots,x_n)} or f(x1,,xkxk+1,,xn)=f(x1,,xn)f(xk+1,,xn)f(x_1,\ldots,x_k | x_{k+1},\ldots,x_n) = \frac{f(x_1,\ldots,x_n)}{f(x_{k+1},\ldots,x_n)}
  • Conditional distributions allow updating probabilities based on observed values, a key idea in Bayesian inference and many applications

Covariance and correlation

  • and are two measures of the linear dependence between random variables
  • Provide a way to quantify the strength and direction of any linear relationship
  • Are important summary statistics for multivariate data and appear in many formulas related to joint distributions

Measures of dependence

  • The covariance between random variables XX and YY is defined as Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]
    • Measures the joint variability of the variables around their means
    • Is positive when larger values of one variable tend to occur with larger values of the other, and negative when larger values of one tend to occur with smaller values of the other
  • The correlation between XX and YY is defined as ρ(X,Y)=Cov(X,Y)Var(X)Var(Y)\rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}
    • Normalizes the covariance to be between -1 and 1, allowing comparison across different scales
    • Measures the linear relationship: ρ=±1\rho = \pm 1 implies a perfect linear relationship, while ρ=0\rho = 0 implies no linear relationship (but a nonlinear relationship may exist)

Covariance matrix

  • The covariance matrix Σ\Sigma of a random vector X=(X1,,Xn)\mathbf{X} = (X_1,\ldots,X_n) is an n×nn \times n matrix whose (i,j)(i,j) entry is Cov(Xi,Xj)\text{Cov}(X_i,X_j)
  • Summarizes all pairwise covariances between the components of the random vector
  • Is symmetric and positive semi-definite, with diagonal entries equal to the variances of each component
  • Appears in multivariate versions of Chebyshev's inequality, the weak law of large numbers, and the central limit theorem

Correlation coefficient

  • The correlation coefficient matrix R\mathbf{R} has (i,j)(i,j) entry equal to the correlation ρ(Xi,Xj)\rho(X_i,X_j)
  • Is the covariance matrix of the standardized variables (Xiμi)/σi(X_i - \mu_i)/\sigma_i, where μi\mu_i and σi\sigma_i are the mean and standard deviation of XiX_i
  • Has diagonal entries of 1 and off-diagonal entries between -1 and 1
  • Is often easier to interpret than the covariance matrix due to the normalized scale

Transformations of random vectors

  • Transformations of random vectors are used to create new random variables or vectors from existing ones
  • Often used to simplify calculations, standardize variables, or obtain distributions with desirable properties
  • The distribution of the transformed variables can be found using the joint distribution of the original variables

Linear transformations

  • A linear transformation of a random vector X=(X1,,Xn)\mathbf{X} = (X_1,\ldots,X_n) is a new vector Y=(Y1,,Ym)\mathbf{Y} = (Y_1,\ldots,Y_m) defined by Y=AX+b\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b} for an m×nm \times n matrix A\mathbf{A} and m×1m \times 1 vector b\mathbf{b}
  • The mean vector and covariance matrix of Y\mathbf{Y} are given by E[Y]=AE[X]+bE[\mathbf{Y}] = \mathbf{A}E[\mathbf{X}] + \mathbf{b} and Cov(Y)=ACov(X)AT\text{Cov}(\mathbf{Y}) = \mathbf{A}\text{Cov}(\mathbf{X})\mathbf{A}^T
  • Many important results in statistics and signal processing involve linear transformations of random vectors (principal component analysis, filtering, etc.)

Jacobian matrix

  • For a general (nonlinear) transformation Y=g(X)\mathbf{Y} = g(\mathbf{X}), the joint PDF of Y\mathbf{Y} is related to that of X\mathbf{X} by fY(y)=fX(g1(y))det(Jg1(y))f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(g^{-1}(\mathbf{y})) |\det(J_{g^{-1}}(\mathbf{y}))|
  • Jg1(y)J_{g^{-1}}(\mathbf{y}) is the Jacobian matrix of the inverse transformation X=g1(Y)\mathbf{X} = g^{-1}(\mathbf{Y}), with (i,j)(i,j) entry equal to xiyj\frac{\partial x_i}{\partial y_j}
  • The Jacobian matrix accounts for how the transformation stretches or compresses regions of the sample space, affecting the probability density

Distribution of transformed variables

  • The joint CDF of Y=g(X)\mathbf{Y} = g(\mathbf{X}) is given by FY(y)=P(g(X)y)=g(x)yfX(x)dxF_{\mathbf{Y}}(\mathbf{y}) = P(g(\mathbf{X}) \leq \mathbf{y}) = \int_{g(\mathbf{x}) \leq \mathbf{y}} f_{\mathbf{X}}(\mathbf{x}) d\mathbf{x}
    • The region of integration is the set of x\mathbf{x} values that map into the rectangle (,y1]××(,ym](-\infty,y_1] \times \cdots \times (-\infty,y_m] under gg
  • For linear transformations of continuous random vectors, the joint PDF can be found using the Jacobian formula with Jg1(y)=A1J_{g^{-1}}(\mathbf{y}) = \mathbf{A}^{-1}
  • In the discrete case, the PMF of Y\mathbf{Y} is given by pY(y)=x:g(x)=ypX(x)p_{\mathbf{Y}}(\mathbf{y}) = \sum_{\mathbf{x}: g(\mathbf{x}) = \mathbf{y}} p_{\mathbf{X}}(\mathbf{x})

Sums of random variables

  • Sums of random variables arise in many applications, such as repeated measurements, cumulative effects, or aggregations
  • The distribution of a sum depends on the joint distribution of the individual variables being added together
  • Convolutions provide a general way to find the distribution of sums in both the discrete and continuous cases

Convolution for discrete variables

  • For independent discrete random variables XX and YY with PMFs pXp_X and pYp_Y, the PMF of their sum Z=X+YZ = X + Y is given by the convolution sum: pZ(z)=kpX(k)pY(zk)p_Z(z) = \sum_k p_X(k)p_Y(z-k)
    • The convolution evaluates the probability of all ways to achieve a sum of zz by adding values of XX and YY
  • The convolution sum extends to more than two variables: pX1++Xn(z)=k1++kn=zpX1(k1)pXn(kn)p_{X_1 + \cdots + X_n}(z) = \sum_{k_1 + \cdots + k_n = z} p_{X_1}(k_1) \cdots p_{X_n}(k_n)
  • Convolution sums can be efficiently computed using generating functions or Fourier transforms

Convolution integral for continuous variables

  • For independent continuous random variables XX and YY with PDFs fXf_X and fYf_Y, the PDF of their sum Z=X+YZ = X + Y is given by the convolution integral: $f_Z(z) = \int_{
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary