You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

is a powerful method for fitting mathematical models to observed data. It minimizes the sum of squared differences between observed and predicted values, making it widely applicable in statistics, signal processing, and machine learning.

This technique forms the foundation for various data fitting approaches, from simple linear regression to complex nonlinear models. Understanding least squares is crucial for anyone working with data analysis, as it provides a robust framework for extracting meaningful insights from noisy observations.

Least squares approximation

  • Fundamental technique in approximation theory for fitting a mathematical model to observed data points
  • Minimizes the sum of squared differences between the observed values and the values predicted by the model
  • Widely used in various fields such as statistics, signal processing, and machine learning

Definition of least squares

Top images from around the web for Definition of least squares
Top images from around the web for Definition of least squares
  • Approximation method that estimates parameters of a model by minimizing the squared differences between observed and predicted values
  • Given a set of data points (xi,yi)(x_i, y_i), the goal is to find a function f(x)f(x) that minimizes i=1n(yif(xi))2\sum_{i=1}^{n} (y_i - f(x_i))^2
  • The function f(x)f(x) can be linear or nonlinear, depending on the model

Minimizing sum of squared errors

  • The objective of least squares is to minimize the (SSE) or
  • SSE is defined as i=1n(yif(xi))2\sum_{i=1}^{n} (y_i - f(x_i))^2, where yiy_i is the observed value and f(xi)f(x_i) is the predicted value
  • Minimizing SSE leads to the best-fit model that reduces the overall discrepancy between the observed and predicted values

Applications in data fitting

  • Least squares is extensively used in and regression analysis
  • Helps in finding the best-fit line, polynomial, or other functions to describe the relationship between variables
  • Examples include linear regression, polynomial regression, and nonlinear regression (exponential, logarithmic)

Linear least squares

  • Deals with fitting a linear model to the data points
  • The model is of the form f(x)=a0+a1xf(x) = a_0 + a_1 x, where a0a_0 and a1a_1 are the parameters to be estimated
  • Closed-form solution exists for linear least squares using or matrix decomposition methods

Nonlinear least squares

  • Involves fitting a nonlinear model to the data points
  • The model can be any nonlinear function, such as exponential, logarithmic, or trigonometric functions
  • Requires iterative optimization techniques to find the best-fit parameters, as closed-form solutions may not exist

Formulation of least squares problems

  • Least squares problems arise when there are more equations than unknowns in a linear system
  • The goal is to find the best approximation to the solution that minimizes the sum of squared errors

Overdetermined systems of equations

  • An overdetermined system has more equations than unknowns
  • In matrix form, Ax=bAx = b, where AA is an m×nm \times n matrix with m>nm > n
  • No exact solution exists, so least squares approximation is used to find the best-fit solution

Normal equations

  • The normal equations are derived by minimizing the sum of squared errors
  • For the linear system Ax=bAx = b, the normal equations are given by ATAx=ATbA^T Ax = A^T b
  • Solving the normal equations yields the least squares solution x^=(ATA)1ATb\hat{x} = (A^T A)^{-1} A^T b

QR decomposition

  • QR decomposition factorizes the matrix AA into an orthogonal matrix QQ and an upper triangular matrix RR
  • Useful for solving least squares problems efficiently and numerically stable
  • The least squares solution is obtained by solving Rx=QTbRx = Q^T b

Singular value decomposition (SVD)

  • SVD decomposes the matrix AA into three matrices: A=UΣVTA = U \Sigma V^T
  • UU and VV are orthogonal matrices, and Σ\Sigma is a diagonal matrix containing singular values
  • SVD is robust and can handle rank-deficient and ill-conditioned matrices
  • The least squares solution is given by x^=VΣ+UTb\hat{x} = V \Sigma^+ U^T b, where Σ+\Sigma^+ is the pseudoinverse of Σ\Sigma

Solving least squares problems

  • Various methods exist for solving least squares problems, depending on the size and structure of the problem

Direct methods vs iterative methods

  • Direct methods compute the least squares solution in a finite number of steps
  • Examples include normal equations, QR decomposition, and SVD
  • Iterative methods start with an initial guess and refine the solution iteratively
  • Examples include , conjugate gradient, and LSQR

Cholesky decomposition

  • Cholesky decomposition factorizes a symmetric positive definite matrix AA into LLTLL^T, where LL is a lower triangular matrix
  • Efficient for solving normal equations when ATAA^T A is symmetric positive definite
  • The least squares solution is obtained by solving LLTx=ATbLL^T x = A^T b using forward and back substitution

Gram-Schmidt process

  • Gram-Schmidt process orthogonalizes a set of vectors
  • Can be used to solve least squares problems by orthogonalizing the columns of the matrix AA
  • The least squares solution is obtained by projecting the vector bb onto the orthogonal basis formed by the columns of AA

Householder transformations

  • Householder transformations are reflections that can be used to zero out elements below the diagonal of a matrix
  • Used in QR decomposition to transform the matrix AA into an upper triangular matrix
  • Numerically stable and efficient for solving least squares problems

Givens rotations

  • Givens rotations are orthogonal transformations that selectively zero out elements in a matrix
  • Used in QR decomposition to introduce zeros below the diagonal of the matrix AA
  • Suitable for sparse matrices and can be applied selectively to specific elements

Regularization in least squares

  • Regularization techniques are used to prevent overfitting and improve the stability of least squares solutions

Tikhonov regularization

  • adds a penalty term to the least squares objective function
  • The regularized objective function is i=1n(yif(xi))2+λf2\sum_{i=1}^{n} (y_i - f(x_i))^2 + \lambda \|f\|^2, where λ\lambda is the regularization parameter
  • Helps in finding a smooth and stable solution by controlling the complexity of the model

L1 vs L2 regularization

  • (Lasso) adds the L1 norm of the parameters to the objective function
  • Promotes sparsity in the solution, as it tends to shrink some parameters to exactly zero
  • (Ridge) adds the squared L2 norm of the parameters to the objective function
  • Shrinks the parameters towards zero but does not set them exactly to zero

Regularization parameter selection

  • The regularization parameter λ\lambda controls the balance between fitting the data and regularization
  • Proper selection of λ\lambda is crucial for achieving the desired balance and preventing overfitting or underfitting
  • Techniques for selecting λ\lambda include cross-validation, L-curve method, and Bayesian methods

Weighted least squares

  • assigns different weights to each data point based on their importance or reliability

Definition and motivation

  • In weighted least squares, each squared error term is multiplied by a weight wiw_i
  • The objective function becomes i=1nwi(yif(xi))2\sum_{i=1}^{n} w_i (y_i - f(x_i))^2
  • Weights can be used to emphasize or de-emphasize certain data points based on their significance or uncertainty

Incorporating prior knowledge

  • Weights can be assigned based on prior knowledge about the reliability or importance of each data point
  • Data points with higher weights have a greater influence on the least squares solution
  • Weights can also be used to incorporate information about the or covariance of the measurements

Iteratively reweighted least squares (IRLS)

  • IRLS is an iterative method for solving weighted least squares problems
  • Starts with an initial set of weights and iteratively updates the weights based on the residuals
  • The process is repeated until , where the weights stabilize and the solution converges
  • IRLS is particularly useful for robust regression, where the weights are adjusted to downweight outliers or influential points

Constrained least squares

  • Constrained least squares incorporates additional constraints on the parameters or the solution

Equality-constrained least squares

  • Equality constraints impose strict conditions on the parameters or the solution
  • The least squares problem is solved subject to linear equality constraints of the form Cx=dCx = d
  • Lagrange multipliers or elimination of variables can be used to solve problems

Inequality-constrained least squares

  • Inequality constraints impose bounds or restrictions on the parameters or the solution
  • The least squares problem is solved subject to linear inequality constraints of the form GxhGx \leq h
  • Quadratic programming or can be used to solve problems

Active set methods

  • Active set methods solve constrained least squares problems by iteratively identifying the active constraints
  • The active constraints are the ones that are satisfied as equalities at the current solution
  • The problem is reduced to an equality-constrained least squares problem considering only the active constraints
  • The solution is updated iteratively by adding or removing constraints from the active set until the optimal solution is found

Least squares and orthogonal polynomials

  • Orthogonal polynomials play a significant role in least squares approximation and

Legendre polynomials

  • are a family of orthogonal polynomials defined on the interval [-1, 1]
  • They are solutions to the Legendre differential equation and have the property of orthogonality
  • Legendre polynomials are used in least squares approximation and Gauss-Legendre quadrature

Chebyshev polynomials

  • are another family of orthogonal polynomials defined on the interval [-1, 1]
  • They are solutions to the Chebyshev differential equation and have the property of minimal maximum deviation
  • Chebyshev polynomials are used in least squares approximation, interpolation, and Chebyshev quadrature

Hermite polynomials

  • are orthogonal polynomials defined on the real line (-∞, ∞)
  • They are solutions to the Hermite differential equation and have the property of orthogonality with respect to the weight function ex2e^{-x^2}
  • Hermite polynomials are used in least squares approximation, Gaussian quadrature, and quantum mechanics

Nonlinear least squares optimization

  • Nonlinear least squares involves fitting a nonlinear model to the data points

Gauss-Newton method

  • The is an iterative algorithm for solving nonlinear least squares problems
  • It linearizes the nonlinear model around the current estimate of the parameters and solves the resulting linear least squares problem
  • The process is repeated iteratively until convergence, updating the parameter estimates at each iteration

Levenberg-Marquardt algorithm

  • The is an extension of the Gauss-Newton method that incorporates a trust region approach
  • It introduces a damping parameter that controls the step size and direction of the parameter updates
  • The damping parameter is adjusted adaptively based on the reduction in the sum of squared errors
  • The Levenberg-Marquardt algorithm is more robust and efficient than the Gauss-Newton method, especially when the initial guess is far from the optimal solution

Trust region methods

  • solve nonlinear least squares problems by iteratively approximating the objective function with a simpler model within a trust region
  • The trust region is a neighborhood around the current estimate of the parameters, where the simpler model is considered to be a good approximation
  • The size of the trust region is adjusted based on the agreement between the simpler model and the actual objective function
  • Trust region methods provide global convergence properties and can handle ill-conditioned problems

Statistical properties of least squares

  • Least squares estimation has several desirable statistical properties when certain assumptions are met

Unbiasedness and consistency

  • Least squares estimators are unbiased, meaning that the expected value of the estimator is equal to the true value of the parameter
  • refers to the property that the least squares estimator converges to the true value of the parameter as the sample size increases
  • Under the assumptions of linearity, independence, homoscedasticity, and normality, least squares estimators are unbiased and consistent

Gauss-Markov theorem

  • The states that among all linear unbiased estimators, the least squares estimator has the smallest variance
  • This property is known as the best linear unbiased estimator (BLUE)
  • The Gauss-Markov theorem holds under the assumptions of linearity, independence, and homoscedasticity

Confidence intervals and prediction intervals

  • provide a range of plausible values for the parameters with a specified level of confidence
  • estimate the range of future observations based on the fitted model and the observed data
  • Both confidence intervals and prediction intervals can be constructed using the standard errors and the t-distribution, assuming normality of the errors

Robust least squares methods

  • Robust least squares methods are designed to handle outliers and deviations from the assumptions of ordinary least squares

M-estimators

  • are a class of robust estimators that minimize a generalized objective function
  • The objective function is chosen to downweight the influence of outliers or large residuals
  • Examples of M-estimators include Huber, Tukey's bisquare, and Hampel estimators
  • M-estimators provide robust estimates of the parameters in the presence of outliers

Least median of squares

  • (LMS) is a robust regression method that minimizes the median of the squared residuals
  • LMS is highly resistant to outliers, as it seeks to find the narrowest strip covering half of the data points
  • The LMS estimator has a breakdown point of 50%, meaning it can handle up to 50% of the data being outliers

Least trimmed squares

  • (LTS) is another robust regression method that minimizes the sum of the smallest squared residuals
  • LTS trims a specified percentage of the largest residuals and fits the model to the remaining data points
  • The LTS estimator has a high breakdown point and is resistant to outliers and leverage points

Computational aspects and software

  • Efficient and numerically stable algorithms are crucial for solving large-scale least squares problems

Numerical stability and conditioning

  • Numerical stability refers to the sensitivity of the solution to perturbations in the input data or roundoff errors
  • Ill-conditioned problems are sensitive to small changes in the data and can lead to inaccurate or unstable solutions
  • Techniques such as QR decomposition, SVD, and regularization can improve the numerical stability of least squares computations

Sparse least squares problems

  • Sparse least squares problems involve matrices with a large number of zero elements
  • Exploiting the sparsity structure can significantly reduce the computational cost and memory requirements
  • Specialized algorithms and data structures, such as sparse matrix representations and iterative methods, are used to solve sparse least squares problems efficiently

Least squares in Python, MATLAB, and R

  • Python libraries for least squares include NumPy, SciPy, and scikit-learn
  • MATLAB provides built-in functions for least squares, such as
    lsqr
    ,
    lsqnonneg
    , and
    lsqlin
  • R offers functions like
    lm
    ,
    glm
    , and
    nls
    for least squares regression and nonlinear least squares
  • These software tools provide efficient implementations of various least squares algorithms and support a wide range of models and regularization techniques
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary