and are key concepts in probability theory. They help us understand the average behavior and spread of random variables. These tools are essential for analyzing and predicting outcomes in uncertain situations.
Expectation quantifies the average value, while variance measures dispersion. Together, they provide a foundation for more advanced probability concepts. Understanding these ideas is crucial for tackling complex problems in stochastic processes and statistical analysis.
Definition of expectation
Expectation is a fundamental concept in probability theory that quantifies the average value of a
It provides a measure of the central tendency or long-run average behavior of a random variable
Expectation is denoted by the symbol E[X] for a random variable X
Discrete random variables
Top images from around the web for Discrete random variables
data visualization - Plot the probability mass function - Cross Validated View original
Is this image relevant?
1 of 3
For a discrete random variable X with probability mass function p(x), the expectation is defined as:
E[X]=∑xx⋅p(x)
The expectation is calculated by summing the product of each possible value of X and its corresponding probability
Example: For a fair six-sided die, the expectation of the number shown is E[X]=61+2+3+4+5+6=3.5
Continuous random variables
For a continuous random variable X with probability density function f(x), the expectation is defined as:
E[X]=∫−∞∞x⋅f(x)dx
The expectation is calculated by integrating the product of each possible value of X and its corresponding probability density over the entire domain
Example: For a standard , the expectation is E[X]=0
Linearity of expectation
The expectation operator is linear, meaning that for any constants a and b and random variables X and Y:
E[aX+bY]=aE[X]+bE[Y]
Linearity holds regardless of whether X and Y are independent or not
This property simplifies calculations involving sums or linear combinations of random variables
Law of the unconscious statistician
Also known as the law of the unconscious statistician (LOTUS), it states that for a function g(X) of a random variable X:
E[g(X)]=∑xg(x)⋅p(x) for discrete XE[g(X)]=∫−∞∞g(x)⋅f(x)dx for continuous X
LOTUS allows the calculation of the expectation of a function of a random variable without explicitly deriving its distribution
Example: To find the expected value of the square of a standard normal random variable, E[X2]=∫−∞∞x2⋅2π1e−2x2dx=1
Properties of expectation
Expectation has several important properties that facilitate calculations and provide insights into the behavior of random variables
These properties are essential for deriving and understanding more advanced concepts in probability theory
Non-negativity
If a random variable X is non-negative (i.e., X≥0), then its expectation is also non-negative:
E[X]≥0
This property follows directly from the definition of expectation, as the sum or integral of non-negative values is always non-negative
Monotonicity
If two random variables X and Y satisfy X≤Y (i.e., P(X≤Y)=1), then their expectations satisfy:
E[X]≤E[Y]
Monotonicity implies that if one random variable is always smaller than or equal to another, its expectation will also be smaller than or equal to the expectation of the other variable
Bounds on expectation
The expectation of a random variable X is bounded by its minimum and maximum values:
min(X)≤E[X]≤max(X)
This property follows from the monotonicity of expectation and provides a range within which the expected value must lie
Example: For a random variable X representing the number of heads in three coin tosses, 0≤E[X]≤3
Conditional expectation
extends the concept of expectation to situations where additional information or conditions are given
It allows for the calculation of expected values based on specific events or subsets of the sample space
Definition and properties
The conditional expectation of a random variable X given an event A with P(A)>0 is defined as:
E[X∣A]=P(A)E[X⋅1A], where 1A is the indicator function of event A
Conditional expectation satisfies the properties of linearity, non-negativity, and monotonicity, similar to regular expectation
Example: In a standard deck of 52 cards, the conditional expected value of a card given that it is a face card (jack, queen, or king) is E[X∣face card]=311+12+13=12
Tower property
The tower property states that for random variables X and Y:
E[E[X∣Y]]=E[X]
This property allows for the calculation of the unconditional expectation by first conditioning on another random variable and then taking the expectation of the conditional expectation
The tower property is particularly useful in situations where the conditional expectation is easier to compute than the unconditional expectation
Law of total expectation
The states that for a random variable X and a partition of the sample space {A1,A2,…,An}:
E[X]=∑i=1nE[X∣Ai]⋅P(Ai)
This law allows for the calculation of the unconditional expectation by conditioning on a partition of the sample space and then summing the products of the conditional expectations and their corresponding probabilities
Example: In a factory, the probability of a defective item is 0.1. The cost of a non-defective item is 10,andthecostofadefectiveitemis50. The expected cost of an item is E[X]=E[X∣non-defective]⋅P(non-defective)+E[X∣defective]⋅P(defective)=10⋅0.9+50⋅0.1=14
Variance and standard deviation
Variance and standard deviation are measures of the dispersion or spread of a random variable around its expected value
They quantify the degree to which the values of a random variable deviate from the mean
Definition of variance
The variance of a random variable X is defined as:
Var(X)=E[(X−E[X])2]
Variance measures the average squared deviation of a random variable from its expected value
It can also be calculated using the formula Var(X)=E[X2]−(E[X])2, which is often more convenient
Properties of variance
Variance has several important properties:
Non-negativity: Var(X)≥0 for any random variable X
Scaling: For a constant a, Var(aX)=a2Var(X)
Additivity for independent variables: If X and Y are independent, then Var(X+Y)=Var(X)+Var(Y)
These properties are useful for calculating and manipulating variances of random variables
Standard deviation
The standard deviation of a random variable X is defined as the square root of its variance:
σX=Var(X)
Standard deviation has the same units as the random variable and provides a more interpretable measure of dispersion
Example: For a normal distribution with mean μ and variance σ2, approximately 68% of the values lie within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations
Coefficient of variation
The coefficient of variation (CV) is a dimensionless measure of relative dispersion, defined as:
CV=E[X]σX
CV allows for the comparison of the relative variability of random variables with different scales or units
A higher CV indicates greater relative dispersion, while a lower CV suggests less relative variability
Example: Comparing the variability of stock returns, a stock with a mean return of 10% and a standard deviation of 5% has a CV of 0.5, while a stock with a mean return of 5% and a standard deviation of 5% has a CV of 1, indicating higher relative variability
Covariance and correlation
and correlation are measures of the linear relationship between two random variables
They quantify the degree to which two variables vary together or are associated with each other
Definition of covariance
The covariance between two random variables X and Y is defined as:
Cov(X,Y)=E[(X−E[X])(Y−E[Y])]
Covariance measures the average product of the deviations of two random variables from their respective means
A positive covariance indicates that the variables tend to move in the same direction, while a negative covariance suggests they tend to move in opposite directions
Properties of covariance
Covariance has several important properties:
Symmetry: Cov(X,Y)=Cov(Y,X)
Linearity: For constants a and b, Cov(aX+b,Y)=aCov(X,Y)
Relationship to variance: Cov(X,X)=Var(X)
These properties are useful for calculating and manipulating covariances of random variables
Correlation coefficient
The correlation coefficient between two random variables X and Y is defined as:
ρX,Y=σXσYCov(X,Y)
Correlation coefficient is a dimensionless measure of the linear relationship between two variables, ranging from -1 to 1
A correlation of 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 suggests no linear relationship
Example: The correlation between the returns of two stocks can be used to assess the degree to which they move together in the market
Cauchy-Schwarz inequality
The states that for any two random variables X and Y:
(E[XY])2≤E[X2]E[Y2]
This inequality provides an upper bound on the absolute value of the covariance between two random variables
The inequality becomes an equality if and only if X and Y are linearly dependent (i.e., Y=aX+b for some constants a and b)
The Cauchy-Schwarz inequality is useful for proving various results in probability theory and statistics
Moment-generating functions
Moment-generating functions (MGFs) are a powerful tool for characterizing the of a random variable
They provide a way to calculate moments of a distribution and can be used to derive various properties and results
Definition and properties
The of a random variable X is defined as:
MX(t)=E[etX]
MGFs have several important properties:
Uniqueness: If two random variables have the same MGF, they have the same probability distribution
Linearity: For constants a and b, MaX+b(t)=ebtMX(at)
Independence: If X and Y are independent, then MX+Y(t)=MX(t)MY(t)
These properties make MGFs a valuable tool for working with probability distributions
Relationship to expectation and variance
The moments of a random variable X can be obtained by differentiating its MGF:
E[X]=MX′(0)
E[X2]=MX′′(0)
Var(X)=MX′′(0)−(MX′(0))2
Higher-order moments can be obtained by taking higher-order derivatives of the MGF
This relationship allows for the calculation of moments directly from the MGF without explicitly deriving the probability distribution
Uniqueness and existence
If a random variable has a valid MGF, it uniquely determines its probability distribution
However, not all random variables have a valid MGF (e.g., Cauchy distribution)
The existence of an MGF depends on the behavior of the random variable's tails
MGFs exist for random variables with light tails (e.g., normal distribution) but may not exist for heavy-tailed distributions
Applications in probability calculations
MGFs can be used to derive the distribution of sums of independent random variables
They are particularly useful for deriving the distribution of linear combinations of independent random variables from well-known distributions (e.g., sum of independent normal random variables)
MGFs can also be used to prove various limit theorems, such as the central limit theorem and the law of large numbers
Example: The MGF of a standard normal random variable is MX(t)=et2/2, which can be used to show that the sum of independent standard normal random variables follows a normal distribution with variance equal to the number of terms in the sum
Inequalities involving expectation and variance
Several important inequalities relate the expectation and variance of random variables to their probability distributions
These inequalities provide bounds on the probability of events based on the moments of the random variables involved
Markov's inequality
Markov's inequality states that for a non-negative random variable X and any a>0:
P(X≥a)≤aE[X]
This inequality provides an upper bound on the probability that a non-negative random variable exceeds a certain value
Markov's inequality is often used as a first step in deriving more powerful inequalities
Chebyshev's inequality
states that for a random variable X with finite mean μ and variance σ2, and any k>0:
P(∣X−μ∣≥kσ)≤k21
This inequality provides an upper bound on the probability that a random variable deviates from its mean by more than a certain number of standard deviations
Chebyshev's inequality is a more powerful result than Markov's inequality, as it applies to any random variable with finite variance
Chernoff bounds
Chernoff bounds are a family of inequalities that provide exponentially decaying upper bounds on the probability of a sum of independent random variables deviating from its expected value
For a sum Sn=∑i=1nXi of independent random variables Xi with ∣Xi∣≤1 and any ε>0:
P(Sn−E[Sn]≥ε)≤e−2ε2/n
$P(S_n - \mathbb{E}[S_n] \leq -\varepsilon)