You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

and are key concepts in probability theory. They help us understand the average behavior and spread of random variables. These tools are essential for analyzing and predicting outcomes in uncertain situations.

Expectation quantifies the average value, while variance measures dispersion. Together, they provide a foundation for more advanced probability concepts. Understanding these ideas is crucial for tackling complex problems in stochastic processes and statistical analysis.

Definition of expectation

  • Expectation is a fundamental concept in probability theory that quantifies the average value of a
  • It provides a measure of the central tendency or long-run average behavior of a random variable
  • Expectation is denoted by the symbol E[X]\mathbb{E}[X] for a random variable XX

Discrete random variables

Top images from around the web for Discrete random variables
Top images from around the web for Discrete random variables
  • For a discrete random variable XX with probability mass function p(x)p(x), the expectation is defined as: E[X]=xxp(x)\mathbb{E}[X] = \sum_{x} x \cdot p(x)
  • The expectation is calculated by summing the product of each possible value of XX and its corresponding probability
  • Example: For a fair six-sided die, the expectation of the number shown is E[X]=1+2+3+4+5+66=3.5\mathbb{E}[X] = \frac{1+2+3+4+5+6}{6} = 3.5

Continuous random variables

  • For a continuous random variable XX with probability density function f(x)f(x), the expectation is defined as: E[X]=xf(x)dx\mathbb{E}[X] = \int_{-\infty}^{\infty} x \cdot f(x) \, dx
  • The expectation is calculated by integrating the product of each possible value of XX and its corresponding probability density over the entire domain
  • Example: For a standard , the expectation is E[X]=0\mathbb{E}[X] = 0

Linearity of expectation

  • The expectation operator is linear, meaning that for any constants aa and bb and random variables XX and YY: E[aX+bY]=aE[X]+bE[Y]\mathbb{E}[aX + bY] = a\mathbb{E}[X] + b\mathbb{E}[Y]
  • Linearity holds regardless of whether XX and YY are independent or not
  • This property simplifies calculations involving sums or linear combinations of random variables

Law of the unconscious statistician

  • Also known as the law of the unconscious statistician (LOTUS), it states that for a function g(X)g(X) of a random variable XX: E[g(X)]=xg(x)p(x)\mathbb{E}[g(X)] = \sum_{x} g(x) \cdot p(x) for discrete XX E[g(X)]=g(x)f(x)dx\mathbb{E}[g(X)] = \int_{-\infty}^{\infty} g(x) \cdot f(x) \, dx for continuous XX
  • LOTUS allows the calculation of the expectation of a function of a random variable without explicitly deriving its distribution
  • Example: To find the expected value of the square of a standard normal random variable, E[X2]=x212πex22dx=1\mathbb{E}[X^2] = \int_{-\infty}^{\infty} x^2 \cdot \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} \, dx = 1

Properties of expectation

  • Expectation has several important properties that facilitate calculations and provide insights into the behavior of random variables
  • These properties are essential for deriving and understanding more advanced concepts in probability theory

Non-negativity

  • If a random variable XX is non-negative (i.e., X0X \geq 0), then its expectation is also non-negative: E[X]0\mathbb{E}[X] \geq 0
  • This property follows directly from the definition of expectation, as the sum or integral of non-negative values is always non-negative

Monotonicity

  • If two random variables XX and YY satisfy XYX \leq Y (i.e., P(XY)=1P(X \leq Y) = 1), then their expectations satisfy: E[X]E[Y]\mathbb{E}[X] \leq \mathbb{E}[Y]
  • Monotonicity implies that if one random variable is always smaller than or equal to another, its expectation will also be smaller than or equal to the expectation of the other variable

Bounds on expectation

  • The expectation of a random variable XX is bounded by its minimum and maximum values: min(X)E[X]max(X)\min(X) \leq \mathbb{E}[X] \leq \max(X)
  • This property follows from the monotonicity of expectation and provides a range within which the expected value must lie
  • Example: For a random variable XX representing the number of heads in three coin tosses, 0E[X]30 \leq \mathbb{E}[X] \leq 3

Conditional expectation

  • extends the concept of expectation to situations where additional information or conditions are given
  • It allows for the calculation of expected values based on specific events or subsets of the sample space

Definition and properties

  • The conditional expectation of a random variable XX given an event AA with P(A)>0P(A) > 0 is defined as: E[XA]=E[X1A]P(A)\mathbb{E}[X|A] = \frac{\mathbb{E}[X \cdot \mathbf{1}_A]}{P(A)}, where 1A\mathbf{1}_A is the indicator function of event AA
  • Conditional expectation satisfies the properties of linearity, non-negativity, and monotonicity, similar to regular expectation
  • Example: In a standard deck of 52 cards, the conditional expected value of a card given that it is a face card (jack, queen, or king) is E[Xface card]=11+12+133=12\mathbb{E}[X|\text{face card}] = \frac{11+12+13}{3} = 12

Tower property

  • The tower property states that for random variables XX and YY: E[E[XY]]=E[X]\mathbb{E}[\mathbb{E}[X|Y]] = \mathbb{E}[X]
  • This property allows for the calculation of the unconditional expectation by first conditioning on another random variable and then taking the expectation of the conditional expectation
  • The tower property is particularly useful in situations where the conditional expectation is easier to compute than the unconditional expectation

Law of total expectation

  • The states that for a random variable XX and a partition of the sample space {A1,A2,,An}\{A_1, A_2, \ldots, A_n\}: E[X]=i=1nE[XAi]P(Ai)\mathbb{E}[X] = \sum_{i=1}^{n} \mathbb{E}[X|A_i] \cdot P(A_i)
  • This law allows for the calculation of the unconditional expectation by conditioning on a partition of the sample space and then summing the products of the conditional expectations and their corresponding probabilities
  • Example: In a factory, the probability of a defective item is 0.1. The cost of a non-defective item is 10,andthecostofadefectiveitemis10, and the cost of a defective item is 50. The expected cost of an item is E[X]=E[Xnon-defective]P(non-defective)+E[Xdefective]P(defective)=100.9+500.1=14\mathbb{E}[X] = \mathbb{E}[X|\text{non-defective}] \cdot P(\text{non-defective}) + \mathbb{E}[X|\text{defective}] \cdot P(\text{defective}) = 10 \cdot 0.9 + 50 \cdot 0.1 = 14

Variance and standard deviation

  • Variance and standard deviation are measures of the dispersion or spread of a random variable around its expected value
  • They quantify the degree to which the values of a random variable deviate from the mean

Definition of variance

  • The variance of a random variable XX is defined as: Var(X)=E[(XE[X])2]\text{Var}(X) = \mathbb{E}[(X - \mathbb{E}[X])^2]
  • Variance measures the average squared deviation of a random variable from its expected value
  • It can also be calculated using the formula Var(X)=E[X2](E[X])2\text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2, which is often more convenient

Properties of variance

  • Variance has several important properties:
    • Non-negativity: Var(X)0\text{Var}(X) \geq 0 for any random variable XX
    • Scaling: For a constant aa, Var(aX)=a2Var(X)\text{Var}(aX) = a^2 \text{Var}(X)
    • Additivity for independent variables: If XX and YY are independent, then Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)
  • These properties are useful for calculating and manipulating variances of random variables

Standard deviation

  • The standard deviation of a random variable XX is defined as the square root of its variance: σX=Var(X)\sigma_X = \sqrt{\text{Var}(X)}
  • Standard deviation has the same units as the random variable and provides a more interpretable measure of dispersion
  • Example: For a normal distribution with mean μ\mu and variance σ2\sigma^2, approximately 68% of the values lie within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations

Coefficient of variation

  • The coefficient of variation (CV) is a dimensionless measure of relative dispersion, defined as: CV=σXE[X]CV = \frac{\sigma_X}{\mathbb{E}[X]}
  • CV allows for the comparison of the relative variability of random variables with different scales or units
  • A higher CV indicates greater relative dispersion, while a lower CV suggests less relative variability
  • Example: Comparing the variability of stock returns, a stock with a mean return of 10% and a standard deviation of 5% has a CV of 0.5, while a stock with a mean return of 5% and a standard deviation of 5% has a CV of 1, indicating higher relative variability

Covariance and correlation

  • and correlation are measures of the linear relationship between two random variables
  • They quantify the degree to which two variables vary together or are associated with each other

Definition of covariance

  • The covariance between two random variables XX and YY is defined as: Cov(X,Y)=E[(XE[X])(YE[Y])]\text{Cov}(X, Y) = \mathbb{E}[(X - \mathbb{E}[X])(Y - \mathbb{E}[Y])]
  • Covariance measures the average product of the deviations of two random variables from their respective means
  • A positive covariance indicates that the variables tend to move in the same direction, while a negative covariance suggests they tend to move in opposite directions

Properties of covariance

  • Covariance has several important properties:
    • Symmetry: Cov(X,Y)=Cov(Y,X)\text{Cov}(X, Y) = \text{Cov}(Y, X)
    • Linearity: For constants aa and bb, Cov(aX+b,Y)=aCov(X,Y)\text{Cov}(aX + b, Y) = a \text{Cov}(X, Y)
    • Relationship to variance: Cov(X,X)=Var(X)\text{Cov}(X, X) = \text{Var}(X)
  • These properties are useful for calculating and manipulating covariances of random variables

Correlation coefficient

  • The correlation coefficient between two random variables XX and YY is defined as: ρX,Y=Cov(X,Y)σXσY\rho_{X,Y} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}
  • Correlation coefficient is a dimensionless measure of the linear relationship between two variables, ranging from -1 to 1
  • A correlation of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 suggests no linear relationship
  • Example: The correlation between the returns of two stocks can be used to assess the degree to which they move together in the market

Cauchy-Schwarz inequality

  • The states that for any two random variables XX and YY: (E[XY])2E[X2]E[Y2](\mathbb{E}[XY])^2 \leq \mathbb{E}[X^2] \mathbb{E}[Y^2]
  • This inequality provides an upper bound on the absolute value of the covariance between two random variables
  • The inequality becomes an equality if and only if XX and YY are linearly dependent (i.e., Y=aX+bY = aX + b for some constants aa and bb)
  • The Cauchy-Schwarz inequality is useful for proving various results in probability theory and statistics

Moment-generating functions

  • Moment-generating functions (MGFs) are a powerful tool for characterizing the of a random variable
  • They provide a way to calculate moments of a distribution and can be used to derive various properties and results

Definition and properties

  • The of a random variable XX is defined as: MX(t)=E[etX]M_X(t) = \mathbb{E}[e^{tX}]
  • MGFs have several important properties:
    • Uniqueness: If two random variables have the same MGF, they have the same probability distribution
    • Linearity: For constants aa and bb, MaX+b(t)=ebtMX(at)M_{aX+b}(t) = e^{bt} M_X(at)
    • Independence: If XX and YY are independent, then MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) M_Y(t)
  • These properties make MGFs a valuable tool for working with probability distributions

Relationship to expectation and variance

  • The moments of a random variable XX can be obtained by differentiating its MGF:
    • E[X]=MX(0)\mathbb{E}[X] = M'_X(0)
    • E[X2]=MX(0)\mathbb{E}[X^2] = M''_X(0)
    • Var(X)=MX(0)(MX(0))2\text{Var}(X) = M''_X(0) - (M'_X(0))^2
  • Higher-order moments can be obtained by taking higher-order derivatives of the MGF
  • This relationship allows for the calculation of moments directly from the MGF without explicitly deriving the probability distribution

Uniqueness and existence

  • If a random variable has a valid MGF, it uniquely determines its probability distribution
  • However, not all random variables have a valid MGF (e.g., Cauchy distribution)
  • The existence of an MGF depends on the behavior of the random variable's tails
  • MGFs exist for random variables with light tails (e.g., normal distribution) but may not exist for heavy-tailed distributions

Applications in probability calculations

  • MGFs can be used to derive the distribution of sums of independent random variables
  • They are particularly useful for deriving the distribution of linear combinations of independent random variables from well-known distributions (e.g., sum of independent normal random variables)
  • MGFs can also be used to prove various limit theorems, such as the central limit theorem and the law of large numbers
  • Example: The MGF of a standard normal random variable is MX(t)=et2/2M_X(t) = e^{t^2/2}, which can be used to show that the sum of independent standard normal random variables follows a normal distribution with variance equal to the number of terms in the sum

Inequalities involving expectation and variance

  • Several important inequalities relate the expectation and variance of random variables to their probability distributions
  • These inequalities provide bounds on the probability of events based on the moments of the random variables involved

Markov's inequality

  • Markov's inequality states that for a non-negative random variable XX and any a>0a > 0: P(Xa)E[X]aP(X \geq a) \leq \frac{\mathbb{E}[X]}{a}
  • This inequality provides an upper bound on the probability that a non-negative random variable exceeds a certain value
  • Markov's inequality is often used as a first step in deriving more powerful inequalities

Chebyshev's inequality

  • states that for a random variable XX with finite mean μ\mu and variance σ2\sigma^2, and any k>0k > 0: P(Xμkσ)1k2P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}
  • This inequality provides an upper bound on the probability that a random variable deviates from its mean by more than a certain number of standard deviations
  • Chebyshev's inequality is a more powerful result than Markov's inequality, as it applies to any random variable with finite variance

Chernoff bounds

  • Chernoff bounds are a family of inequalities that provide exponentially decaying upper bounds on the probability of a sum of independent random variables deviating from its expected value
  • For a sum Sn=i=1nXiS_n = \sum_{i=1}^n X_i of independent random variables XiX_i with Xi1|X_i| \leq 1 and any ε>0\varepsilon > 0: P(SnE[Sn]ε)e2ε2/nP(S_n - \mathbb{E}[S_n] \geq \varepsilon) \leq e^{-2\varepsilon^2/n} $P(S_n - \mathbb{E}[S_n] \leq -\varepsilon)
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary