You have 3 free guides left 😟
Unlock your guides
You have 3 free guides left 😟
Unlock your guides

3.1 The Hessian matrix and its properties

4 min readaugust 7, 2024

The is a powerful tool in multivariable calculus. It's a square matrix of second-order partial derivatives that helps us understand the local behavior of functions. This matrix is key in classifying and determining if they're minima, maxima, or .

The Hessian's properties, like symmetry and definiteness, are crucial for optimization problems. By analyzing its , we can use the to classify critical points. This information is vital for understanding function behavior and solving real-world optimization challenges.

Definition and Properties

The Hessian Matrix and Its Components

Top images from around the web for The Hessian Matrix and Its Components
Top images from around the web for The Hessian Matrix and Its Components
  • The Hessian matrix is a square matrix of second-order partial derivatives of a scalar-valued function
  • Consists of the partial derivatives of the gradient vector with respect to each variable
  • For a function f(x1,x2,...,xn)f(x_1, x_2, ..., x_n), the Hessian matrix H(f)H(f) is defined as:
\frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}$$ - Each entry in the Hessian matrix represents a second partial derivative of the function with respect to two variables ### Symmetry of the Hessian Matrix - The Hessian matrix is symmetric if the function $f$ has continuous second partial derivatives - Symmetry implies that $\frac{\partial^2 f}{\partial x_i \partial x_j} = \frac{\partial^2 f}{\partial x_j \partial x_i}$ for all $i$ and $j$ - This property is a consequence of Clairaut's theorem, which states that the order of taking mixed partial derivatives does not matter if the function has continuous second partial derivatives - Example: For the function $f(x, y) = x^2 + xy + y^2$, the Hessian matrix is: $$H(f) = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}$$ which is symmetric ## Classification of Critical Points ### Positive and Negative Definite Hessian Matrices - A critical point of a function is a point where the gradient is zero - The Hessian matrix can be used to classify critical points based on its definiteness - A Hessian matrix is [positive definite](https://www.fiveableKeyTerm:Positive_Definite) if all of its eigenvalues are positive - At a critical point with a positive definite Hessian, the function has a local minimum - A Hessian matrix is [negative definite](https://www.fiveableKeyTerm:Negative_Definite) if all of its eigenvalues are negative - At a critical point with a negative definite Hessian, the function has a local maximum ### Indefinite Hessian Matrices and Saddle Points - A Hessian matrix is indefinite if it has both positive and negative eigenvalues - At a critical point with an indefinite Hessian, the function has a saddle point - A saddle point is a point where the function increases in some directions and decreases in others - Example: For the function $f(x, y) = x^2 - y^2$, the Hessian matrix is: $$H(f) = \begin{bmatrix} 2 & 0 \\ 0 & -2 \end{bmatrix}$$ which is indefinite, and the critical point $(0, 0)$ is a saddle point ### Eigenvalues and the Second Derivative Test - The eigenvalues of the Hessian matrix determine the definiteness of the matrix - Positive eigenvalues indicate a positive definite matrix, while negative eigenvalues indicate a negative definite matrix - If the eigenvalues have mixed signs, the matrix is indefinite - The second derivative test uses the eigenvalues of the Hessian to classify critical points - If all eigenvalues are positive, the critical point is a local minimum - If all eigenvalues are negative, the critical point is a local maximum - If the eigenvalues have mixed signs, the critical point is a saddle point ## Applications ### Taylor Series Expansion and the Hessian Matrix - The Hessian matrix plays a crucial role in the second-order Taylor series expansion of a function - The Taylor series expansion approximates a function near a point using its derivatives - For a function $f(x_1, x_2, ..., x_n)$ and a point $\mathbf{a} = (a_1, a_2, ..., a_n)$, the second-order Taylor series expansion is: $$f(\mathbf{x}) \approx f(\mathbf{a}) + \nabla f(\mathbf{a})^T (\mathbf{x} - \mathbf{a}) + \frac{1}{2} (\mathbf{x} - \mathbf{a})^T H(f)(\mathbf{a}) (\mathbf{x} - \mathbf{a})$$ where $\nabla f(\mathbf{a})$ is the gradient vector and $H(f)(\mathbf{a})$ is the Hessian matrix evaluated at the point $\mathbf{a}$ - The Hessian matrix captures the second-order information about the function, which helps improve the accuracy of the approximation - Example: Consider the function $f(x, y) = x^2 + xy + y^2$ and the point $(1, 1)$. The second-order Taylor series expansion around this point is: $$f(x, y) \approx 3 + 3(x - 1) + 3(y - 1) + (x - 1)^2 + (x - 1)(y - 1) + (y - 1)^2$$ ### Optimization and Newton's Method - The Hessian matrix is used in optimization algorithms to find the minimum or maximum of a function - Newton's method is an iterative optimization algorithm that uses the Hessian matrix to find the roots of a function's gradient - The update rule for Newton's method is: $$\mathbf{x}_{k+1} = \mathbf{x}_k - [H(f)(\mathbf{x}_k)]^{-1} \nabla f(\mathbf{x}_k)$$ where $\mathbf{x}_k$ is the current estimate of the minimum or maximum, $H(f)(\mathbf{x}_k)$ is the Hessian matrix evaluated at $\mathbf{x}_k$, and $\nabla f(\mathbf{x}_k)$ is the gradient vector evaluated at $\mathbf{x}_k$ - The Hessian matrix provides second-order information about the function, which helps Newton's method converge faster than gradient-based methods like gradient descent - Example: To find the minimum of the function $f(x, y) = x^2 + xy + y^2$ using Newton's method, starting from the point $(1, 1)$, the first iteration would be: $$\mathbf{x}_1 = (1, 1) - \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}^{-1} \begin{bmatrix} 3 \\ 3 \end{bmatrix} = (0, 0)$$ which is the global minimum of the function
© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.
Glossary
Glossary