is a powerful method for fitting mathematical models to observed data. It minimizes the sum of squared differences between observed and predicted values, making it widely applicable in statistics, signal processing, and machine learning.
This technique forms the foundation for various data fitting approaches, from simple linear regression to complex nonlinear models. Understanding least squares is crucial for anyone working with data analysis, as it provides a robust framework for extracting meaningful insights from noisy observations.
Least squares approximation
Fundamental technique in approximation theory for fitting a mathematical model to observed data points
Minimizes the sum of squared differences between the observed values and the values predicted by the model
Widely used in various fields such as statistics, signal processing, and machine learning
Definition of least squares
Top images from around the web for Definition of least squares
Principal Component Regression in Python View original
Is this image relevant?
Linear Regression (2 of 4) | Concepts in Statistics View original
Principal Component Regression in Python View original
Is this image relevant?
Linear Regression (2 of 4) | Concepts in Statistics View original
Is this image relevant?
1 of 3
Approximation method that estimates parameters of a model by minimizing the squared differences between observed and predicted values
Given a set of data points (xi,yi), the goal is to find a function f(x) that minimizes ∑i=1n(yi−f(xi))2
The function f(x) can be linear or nonlinear, depending on the model
Minimizing sum of squared errors
The objective of least squares is to minimize the (SSE) or
SSE is defined as ∑i=1n(yi−f(xi))2, where yi is the observed value and f(xi) is the predicted value
Minimizing SSE leads to the best-fit model that reduces the overall discrepancy between the observed and predicted values
Applications in data fitting
Least squares is extensively used in and regression analysis
Helps in finding the best-fit line, polynomial, or other functions to describe the relationship between variables
Examples include linear regression, polynomial regression, and nonlinear regression (exponential, logarithmic)
Linear least squares
Deals with fitting a linear model to the data points
The model is of the form f(x)=a0+a1x, where a0 and a1 are the parameters to be estimated
Closed-form solution exists for linear least squares using or matrix decomposition methods
Nonlinear least squares
Involves fitting a nonlinear model to the data points
The model can be any nonlinear function, such as exponential, logarithmic, or trigonometric functions
Requires iterative optimization techniques to find the best-fit parameters, as closed-form solutions may not exist
Formulation of least squares problems
Least squares problems arise when there are more equations than unknowns in a linear system
The goal is to find the best approximation to the solution that minimizes the sum of squared errors
Overdetermined systems of equations
An overdetermined system has more equations than unknowns
In matrix form, Ax=b, where A is an m×n matrix with m>n
No exact solution exists, so least squares approximation is used to find the best-fit solution
Normal equations
The normal equations are derived by minimizing the sum of squared errors
For the linear system Ax=b, the normal equations are given by ATAx=ATb
Solving the normal equations yields the least squares solution x^=(ATA)−1ATb
QR decomposition
QR decomposition factorizes the matrix A into an orthogonal matrix Q and an upper triangular matrix R
Useful for solving least squares problems efficiently and numerically stable
The least squares solution is obtained by solving Rx=QTb
Singular value decomposition (SVD)
SVD decomposes the matrix A into three matrices: A=UΣVT
U and V are orthogonal matrices, and Σ is a diagonal matrix containing singular values
SVD is robust and can handle rank-deficient and ill-conditioned matrices
The least squares solution is given by x^=VΣ+UTb, where Σ+ is the pseudoinverse of Σ
Solving least squares problems
Various methods exist for solving least squares problems, depending on the size and structure of the problem
Direct methods vs iterative methods
Direct methods compute the least squares solution in a finite number of steps
Examples include normal equations, QR decomposition, and SVD
Iterative methods start with an initial guess and refine the solution iteratively
Examples include , conjugate gradient, and LSQR
Cholesky decomposition
Cholesky decomposition factorizes a symmetric positive definite matrix A into LLT, where L is a lower triangular matrix
Efficient for solving normal equations when ATA is symmetric positive definite
The least squares solution is obtained by solving LLTx=ATb using forward and back substitution
Gram-Schmidt process
Gram-Schmidt process orthogonalizes a set of vectors
Can be used to solve least squares problems by orthogonalizing the columns of the matrix A
The least squares solution is obtained by projecting the vector b onto the orthogonal basis formed by the columns of A
Householder transformations
Householder transformations are reflections that can be used to zero out elements below the diagonal of a matrix
Used in QR decomposition to transform the matrix A into an upper triangular matrix
Numerically stable and efficient for solving least squares problems
Givens rotations
Givens rotations are orthogonal transformations that selectively zero out elements in a matrix
Used in QR decomposition to introduce zeros below the diagonal of the matrix A
Suitable for sparse matrices and can be applied selectively to specific elements
Regularization in least squares
Regularization techniques are used to prevent overfitting and improve the stability of least squares solutions
Tikhonov regularization
adds a penalty term to the least squares objective function
The regularized objective function is ∑i=1n(yi−f(xi))2+λ∥f∥2, where λ is the regularization parameter
Helps in finding a smooth and stable solution by controlling the complexity of the model
L1 vs L2 regularization
(Lasso) adds the L1 norm of the parameters to the objective function
Promotes sparsity in the solution, as it tends to shrink some parameters to exactly zero
(Ridge) adds the squared L2 norm of the parameters to the objective function
Shrinks the parameters towards zero but does not set them exactly to zero
Regularization parameter selection
The regularization parameter λ controls the balance between fitting the data and regularization
Proper selection of λ is crucial for achieving the desired balance and preventing overfitting or underfitting
Techniques for selecting λ include cross-validation, L-curve method, and Bayesian methods
Weighted least squares
assigns different weights to each data point based on their importance or reliability
Definition and motivation
In weighted least squares, each squared error term is multiplied by a weight wi
The objective function becomes ∑i=1nwi(yi−f(xi))2
Weights can be used to emphasize or de-emphasize certain data points based on their significance or uncertainty
Incorporating prior knowledge
Weights can be assigned based on prior knowledge about the reliability or importance of each data point
Data points with higher weights have a greater influence on the least squares solution
Weights can also be used to incorporate information about the or covariance of the measurements
Iteratively reweighted least squares (IRLS)
IRLS is an iterative method for solving weighted least squares problems
Starts with an initial set of weights and iteratively updates the weights based on the residuals
The process is repeated until , where the weights stabilize and the solution converges
IRLS is particularly useful for robust regression, where the weights are adjusted to downweight outliers or influential points
Constrained least squares
Constrained least squares incorporates additional constraints on the parameters or the solution
Equality-constrained least squares
Equality constraints impose strict conditions on the parameters or the solution
The least squares problem is solved subject to linear equality constraints of the form Cx=d
Lagrange multipliers or elimination of variables can be used to solve problems
Inequality-constrained least squares
Inequality constraints impose bounds or restrictions on the parameters or the solution
The least squares problem is solved subject to linear inequality constraints of the form Gx≤h
Quadratic programming or can be used to solve problems
Active set methods
Active set methods solve constrained least squares problems by iteratively identifying the active constraints
The active constraints are the ones that are satisfied as equalities at the current solution
The problem is reduced to an equality-constrained least squares problem considering only the active constraints
The solution is updated iteratively by adding or removing constraints from the active set until the optimal solution is found
Least squares and orthogonal polynomials
Orthogonal polynomials play a significant role in least squares approximation and
Legendre polynomials
are a family of orthogonal polynomials defined on the interval [-1, 1]
They are solutions to the Legendre differential equation and have the property of orthogonality
Legendre polynomials are used in least squares approximation and Gauss-Legendre quadrature
Chebyshev polynomials
are another family of orthogonal polynomials defined on the interval [-1, 1]
They are solutions to the Chebyshev differential equation and have the property of minimal maximum deviation
Chebyshev polynomials are used in least squares approximation, interpolation, and Chebyshev quadrature
Hermite polynomials
are orthogonal polynomials defined on the real line (-∞, ∞)
They are solutions to the Hermite differential equation and have the property of orthogonality with respect to the weight function e−x2
Hermite polynomials are used in least squares approximation, Gaussian quadrature, and quantum mechanics
Nonlinear least squares optimization
Nonlinear least squares involves fitting a nonlinear model to the data points
Gauss-Newton method
The is an iterative algorithm for solving nonlinear least squares problems
It linearizes the nonlinear model around the current estimate of the parameters and solves the resulting linear least squares problem
The process is repeated iteratively until convergence, updating the parameter estimates at each iteration
Levenberg-Marquardt algorithm
The is an extension of the Gauss-Newton method that incorporates a trust region approach
It introduces a damping parameter that controls the step size and direction of the parameter updates
The damping parameter is adjusted adaptively based on the reduction in the sum of squared errors
The Levenberg-Marquardt algorithm is more robust and efficient than the Gauss-Newton method, especially when the initial guess is far from the optimal solution
Trust region methods
solve nonlinear least squares problems by iteratively approximating the objective function with a simpler model within a trust region
The trust region is a neighborhood around the current estimate of the parameters, where the simpler model is considered to be a good approximation
The size of the trust region is adjusted based on the agreement between the simpler model and the actual objective function
Trust region methods provide global convergence properties and can handle ill-conditioned problems
Statistical properties of least squares
Least squares estimation has several desirable statistical properties when certain assumptions are met
Unbiasedness and consistency
Least squares estimators are unbiased, meaning that the expected value of the estimator is equal to the true value of the parameter
refers to the property that the least squares estimator converges to the true value of the parameter as the sample size increases
Under the assumptions of linearity, independence, homoscedasticity, and normality, least squares estimators are unbiased and consistent
Gauss-Markov theorem
The states that among all linear unbiased estimators, the least squares estimator has the smallest variance
This property is known as the best linear unbiased estimator (BLUE)
The Gauss-Markov theorem holds under the assumptions of linearity, independence, and homoscedasticity
Confidence intervals and prediction intervals
provide a range of plausible values for the parameters with a specified level of confidence
estimate the range of future observations based on the fitted model and the observed data
Both confidence intervals and prediction intervals can be constructed using the standard errors and the t-distribution, assuming normality of the errors
Robust least squares methods
Robust least squares methods are designed to handle outliers and deviations from the assumptions of ordinary least squares
M-estimators
are a class of robust estimators that minimize a generalized objective function
The objective function is chosen to downweight the influence of outliers or large residuals
Examples of M-estimators include Huber, Tukey's bisquare, and Hampel estimators
M-estimators provide robust estimates of the parameters in the presence of outliers
Least median of squares
(LMS) is a robust regression method that minimizes the median of the squared residuals
LMS is highly resistant to outliers, as it seeks to find the narrowest strip covering half of the data points
The LMS estimator has a breakdown point of 50%, meaning it can handle up to 50% of the data being outliers
Least trimmed squares
(LTS) is another robust regression method that minimizes the sum of the smallest squared residuals
LTS trims a specified percentage of the largest residuals and fits the model to the remaining data points
The LTS estimator has a high breakdown point and is resistant to outliers and leverage points
Computational aspects and software
Efficient and numerically stable algorithms are crucial for solving large-scale least squares problems
Numerical stability and conditioning
Numerical stability refers to the sensitivity of the solution to perturbations in the input data or roundoff errors
Ill-conditioned problems are sensitive to small changes in the data and can lead to inaccurate or unstable solutions
Techniques such as QR decomposition, SVD, and regularization can improve the numerical stability of least squares computations
Sparse least squares problems
Sparse least squares problems involve matrices with a large number of zero elements
Exploiting the sparsity structure can significantly reduce the computational cost and memory requirements
Specialized algorithms and data structures, such as sparse matrix representations and iterative methods, are used to solve sparse least squares problems efficiently
Least squares in Python, MATLAB, and R
Python libraries for least squares include NumPy, SciPy, and scikit-learn
MATLAB provides built-in functions for least squares, such as
lsqr
,
lsqnonneg
, and
lsqlin
R offers functions like
lm
,
glm
, and
nls
for least squares regression and nonlinear least squares
These software tools provide efficient implementations of various least squares algorithms and support a wide range of models and regularization techniques