🎣Statistical Inference Unit 13 – Asymptotic Theory & Large Sample Inference

Asymptotic theory explores how statistical estimators behave as sample sizes approach infinity. It's crucial for understanding the reliability and efficiency of statistical methods in large datasets, providing a foundation for hypothesis testing and confidence interval construction. Key concepts include consistency, efficiency, and asymptotic normality of estimators. These principles allow researchers to make inferences about population parameters using large samples, even when the exact distribution of an estimator is unknown or complex to derive.

Key Concepts and Definitions

  • Asymptotic theory studies the behavior of estimators and statistical procedures as the sample size approaches infinity
  • Consistency of an estimator means that as the sample size increases, the estimator converges in probability to the true parameter value
  • Efficiency of an estimator refers to its variance relative to other estimators, with more efficient estimators having smaller variances
    • An estimator is asymptotically efficient if its variance achieves the Cramér-Rao lower bound as the sample size tends to infinity
  • Asymptotic normality implies that the distribution of an estimator, properly standardized, converges to a standard normal distribution as the sample size increases
  • Asymptotic unbiasedness indicates that the bias of an estimator tends to zero as the sample size grows large
  • Asymptotic equivalence of two sequences of random variables means that their difference converges in probability to zero as the sample size increases
  • Asymptotic relative efficiency (ARE) compares the efficiency of two estimators in the limit, calculated as the ratio of their asymptotic variances

Foundations of Asymptotic Theory

  • Asymptotic theory relies on the concept of limits and convergence of sequences of random variables
  • Convergence in probability means that for any ϵ>0\epsilon > 0, P(XnX>ϵ)0P(|X_n - X| > \epsilon) \to 0 as nn \to \infty
    • This is a weak form of convergence, as it only requires the probability of large deviations to vanish asymptotically
  • Almost sure convergence (or convergence with probability 1) is a stronger form of convergence, implying that P(limnXn=X)=1P(\lim_{n \to \infty} X_n = X) = 1
  • Convergence in distribution (or weak convergence) means that the cumulative distribution function (CDF) of XnX_n converges to the CDF of XX at all continuity points of the latter
    • This is denoted as XndXX_n \xrightarrow{d} X
  • Convergence in quadratic mean (or L2L^2 convergence) requires that E[(XnX)2]0E[(X_n - X)^2] \to 0 as nn \to \infty, which implies convergence in probability
  • Slutsky's theorem allows for the manipulation of sequences of random variables that converge in probability or distribution
    • For example, if XnpaX_n \xrightarrow{p} a and YndYY_n \xrightarrow{d} Y, then XnYndaYX_nY_n \xrightarrow{d} aY

Convergence Types and Properties

  • Convergence in probability is closed under continuous transformations, meaning that if XnpXX_n \xrightarrow{p} X and gg is a continuous function, then g(Xn)pg(X)g(X_n) \xrightarrow{p} g(X)
  • Convergence in distribution is closed under continuous transformations, i.e., if XndXX_n \xrightarrow{d} X and gg is a continuous function, then g(Xn)dg(X)g(X_n) \xrightarrow{d} g(X)
  • The continuous mapping theorem generalizes the previous properties, stating that if XndXX_n \xrightarrow{d} X and gg is a continuous function, then g(Xn)dg(X)g(X_n) \xrightarrow{d} g(X)
  • The Mann-Wald theorem (or the converging together lemma) states that if XnpXX_n \xrightarrow{p} X and YnpXY_n \xrightarrow{p} X, then XnYnp0X_n - Y_n \xrightarrow{p} 0
    • This is useful for proving the asymptotic equivalence of two estimators
  • The delta method approximates the distribution of a transformed random variable using a Taylor series expansion
    • If n(Xnμ)dN(0,σ2)\sqrt{n}(X_n - \mu) \xrightarrow{d} N(0, \sigma^2) and gg is a differentiable function, then n(g(Xn)g(μ))dN(0,σ2[g(μ)]2)\sqrt{n}(g(X_n) - g(\mu)) \xrightarrow{d} N(0, \sigma^2[g'(\mu)]^2)
  • The Cramér-Wold device is a theorem that relates the joint convergence in distribution of random vectors to the convergence of linear combinations of their components

Central Limit Theorem and Its Applications

  • The central limit theorem (CLT) states that the sum of a large number of independent and identically distributed (i.i.d.) random variables with finite mean and variance converges in distribution to a normal distribution
    • Formally, if X1,X2,,XnX_1, X_2, \ldots, X_n are i.i.d. with mean μ\mu and variance σ2\sigma^2, then i=1nXinμnσdN(0,1)\frac{\sum_{i=1}^n X_i - n\mu}{\sqrt{n}\sigma} \xrightarrow{d} N(0, 1)
  • The CLT holds under more general conditions, such as for independent but not identically distributed random variables with finite variances (Lindeberg-Feller CLT)
  • The CLT is the foundation for many statistical procedures, as it justifies the use of normal approximations for the sampling distributions of estimators
  • The sample mean Xˉ\bar{X} is asymptotically normal under the conditions of the CLT, with n(Xˉμ)dN(0,σ2)\sqrt{n}(\bar{X} - \mu) \xrightarrow{d} N(0, \sigma^2)
  • The sample variance S2S^2 is also asymptotically normal, with n(S2σ2)dN(0,μ4σ4)\sqrt{n}(S^2 - \sigma^2) \xrightarrow{d} N(0, \mu_4 - \sigma^4), where μ4\mu_4 is the fourth central moment of the population
  • The CLT can be used to construct confidence intervals and hypothesis tests for population parameters based on large samples
    • For example, an approximate 95% confidence interval for the population mean is Xˉ±1.96Sn\bar{X} \pm 1.96\frac{S}{\sqrt{n}}

Asymptotic Distributions of Estimators

  • The asymptotic distribution of an estimator characterizes its behavior as the sample size tends to infinity
  • Maximum likelihood estimators (MLEs) are asymptotically normal under regularity conditions, with n(θ^nθ)dN(0,I1(θ))\sqrt{n}(\hat{\theta}_n - \theta) \xrightarrow{d} N(0, I^{-1}(\theta)), where I(θ)I(\theta) is the Fisher information
    • This result is known as the asymptotic normality of MLEs
  • The asymptotic variance of an MLE achieves the Cramér-Rao lower bound, making MLEs asymptotically efficient
  • Method of moments estimators are also asymptotically normal under certain conditions, with their asymptotic variance depending on the moments of the population
  • The asymptotic distribution of the sample quantiles is related to the quantile function of the population and its density at the quantile of interest
  • The asymptotic distribution of the sample correlation coefficient is normal, with variance depending on the population correlation and the fourth moments of the joint distribution
  • Asymptotically pivotal quantities, such as studentized statistics, have asymptotic distributions that do not depend on unknown parameters
    • These are useful for constructing confidence intervals and tests in large samples

Large Sample Hypothesis Testing

  • Hypothesis tests based on large sample theory rely on the asymptotic distributions of test statistics under the null hypothesis
  • The Wald test is based on the asymptotic normality of MLEs, with the test statistic W=(θ^nθ0)2I1(θ^n)/nW = \frac{(\hat{\theta}_n - \theta_0)^2}{I^{-1}(\hat{\theta}_n)/n} asymptotically following a chi-square distribution with 1 degree of freedom under the null hypothesis
  • The likelihood ratio test (LRT) compares the maximized likelihoods under the null and alternative hypotheses, with the test statistic 2log(Λn)-2\log(\Lambda_n) asymptotically following a chi-square distribution with degrees of freedom equal to the difference in the number of parameters
  • The score test (or Lagrange multiplier test) is based on the gradient of the log-likelihood at the null hypothesis parameter value, with the test statistic asymptotically following a chi-square distribution under the null
  • Rao's efficient score test is an asymptotically equivalent version of the score test that uses the Fisher information matrix to standardize the score function
  • Large sample tests for proportions, such as the z-test and the chi-square test for goodness of fit, rely on the asymptotic normality of the sample proportion and the asymptotic chi-square distribution of the Pearson statistic, respectively

Confidence Intervals in Large Samples

  • Confidence intervals based on large sample theory utilize the asymptotic distributions of estimators to construct intervals with a desired coverage probability
  • The Wald confidence interval for a parameter θ\theta is based on the asymptotic normality of the MLE, with the interval given by θ^n±zα/2I1(θ^n)/n\hat{\theta}_n \pm z_{\alpha/2}\sqrt{I^{-1}(\hat{\theta}_n)/n}, where zα/2z_{\alpha/2} is the (1α/2)(1-\alpha/2) quantile of the standard normal distribution
  • The likelihood ratio confidence interval is constructed by inverting the likelihood ratio test, i.e., finding the set of parameter values for which the LRT fails to reject the null hypothesis at a given significance level
  • The score confidence interval is obtained by inverting the score test, i.e., finding the set of parameter values for which the score statistic falls within the acceptance region of the test
  • Large sample confidence intervals for proportions can be constructed using the normal approximation to the binomial distribution, with the interval given by p^±zα/2p^(1p^)/n\hat{p} \pm z_{\alpha/2}\sqrt{\hat{p}(1-\hat{p})/n}
  • The delta method can be used to construct confidence intervals for transformed parameters, such as the ratio of two means or the difference of two proportions
    • The interval is based on the asymptotic normality of the transformed estimator, with the variance obtained using the delta method

Practical Applications and Examples

  • Large sample theory is widely used in various fields, such as economics, finance, social sciences, and medical research, where sample sizes are often large
  • In clinical trials, the asymptotic normality of the sample mean is used to compare the effectiveness of treatments, with confidence intervals and hypothesis tests based on the normal approximation
    • For example, a z-test can be used to compare the mean blood pressure reduction between a treatment and a placebo group
  • In survey sampling, the CLT justifies the use of normal approximations for the sampling distribution of the sample mean or proportion, allowing for the construction of confidence intervals and hypothesis tests
    • For instance, a large sample confidence interval can be used to estimate the proportion of voters supporting a particular candidate
  • In finance, the asymptotic properties of estimators are used to analyze the performance of asset pricing models and to test market efficiency
    • The Fama-MacBeth regression, which relies on the asymptotic normality of the average estimated coefficients, is a common approach to test asset pricing models
  • In econometrics, large sample theory is the foundation for the asymptotic properties of ordinary least squares (OLS) and other estimation methods, as well as for the construction of hypothesis tests and confidence intervals
    • The asymptotic normality of the OLS estimator is used to test the significance of regression coefficients and to construct confidence intervals for the marginal effects of predictors
  • In machine learning, the asymptotic properties of estimators are relevant for understanding the behavior of learning algorithms as the sample size grows large
    • For example, the consistency and asymptotic normality of the k-nearest neighbors classifier can be studied using large sample theory


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.