Lecture 5-2: Newton’s Method: Part 2

Lecture 5-2: Newton’s Method: Part 2#

Download the original slides: CMSE382-Lec5_2.pdf

Warning

This is an AI-generated transcript of the lecture slides and may contain errors or inaccuracies. Please refer to the original course materials for authoritative content.

Damped Newton’s Method#

Topics Covered#

Topics:

Damped Newton’s method
Cholesky factorization
Hybrid gradient-Newton method

Announcements:

Homework 2 posted, due Thursday, Feb 12 at 11:59pm
Midterm 1 on Wednesday, Feb 18.
No office hours Friday, Feb 13.

Damped Newton Method - Motivating Example#

Consider the function \(f(x,y) = \sqrt{x^2+1}+\sqrt{y^2+1}\) whose optimal solution is \((0,0)\).

\[\begin{split}\nabla^2 f(\textbf{x}) = \begin{bmatrix} (1+x^2)^{-3/2} & 0 \\ 0 & (1+y^2)^{-3/2} \end{bmatrix}. \quad \nabla^2 f(\mathbf{x}) > 0.\end{split}\]

There is NO \(m > 0\) such that \(\lambda_{\min}(\nabla^2 f(\textbf{x})) \geq m\) for any \(\mathbf{x} \in \mathbb{R}^n\)

Slow convergence:

Divergence:

Descent of the generated sequence is not guaranteed even when \(\nabla^2 f(\mathbf{x}) > 0\).
Rectify by introducing a step size \(t_k\) (Damped Newton’s method).

Damped Newton’s Method Algorithm#

Input: \(\varepsilon > 0\) tolerance parameter

Initialization: Pick \(\textbf{x}_0 \in \mathbb{R}^n\) arbitrarily

General step: For any \(k = 0, 1, 2, \ldots\), do:

Compute the Newton direction \(\textbf{d}_k\), which is the solution to the linear system \(\nabla^2 f(\textbf{x}_k)\textbf{d}_k = -\nabla f(\textbf{x}_k)\)
Pick \(t_k\) using constant stepsize, exact line search, or backtracking
Set \(\textbf{x}_{k+1} = \textbf{x}_k+t_k\textbf{d}_k\)
If \(\|\nabla f(\textbf{x}_{k+1})\| < \varepsilon\), stop and output \(\textbf{x}_{k+1}\).

Damped Newton Method - Revisiting the Example#

Revisiting the previous example: \(f(x,y) = \sqrt{x^2+1}+\sqrt{y^2+1}\)

\[\begin{split}\nabla^2 f(\textbf{x}) = \begin{bmatrix} (1+x^2)^{-3/2} & 0 \\ 0 & (1+y^2)^{-3/2} \end{bmatrix}. \quad \nabla^2 f(\mathbf{x}) > 0.\end{split}\]

There is NO \(m > 0\) such that \(\lambda_{\min}(\nabla^2 f(\textbf{x})) \geq m\) for any \(\mathbf{x} \in \mathbb{R}^n\)

Pure Newton’s Divergence:

Damped Newton’s Convergence:

Cholesky Decomposition#

When \(A\) is symmetric and positive definite, it has a Cholesky decomposition given by

\[A = L L^{T},\]

where \(L\) is a lower triangular matrix (a matrix with zeros everywhere above the diagonal).

If \(A\) is diagonal, the Cholesky decomposition represents the matrix square root:

\[\begin{split}L=A^{\frac{1}{2}}= \begin{bmatrix} \sqrt{A_{11}} & 0 & \ldots & 0\\ 0 & \sqrt{A_{22}} & \ldots & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & \ldots & \sqrt{A_{nn}} \end{bmatrix}\end{split}\]

Example: \(A=\begin{bmatrix}9 & 0\\ 0 & 25\end{bmatrix}\), \(L = A^{\frac{1}{2}}=\begin{bmatrix} 3 & 0\\ 0 & 5\end{bmatrix}\)

Verify: \(L L^T=\begin{bmatrix} 3 & 0\\ 0 & 5\end{bmatrix} \cdot \begin{bmatrix} 3 & 0\\ 0 & 5\end{bmatrix} = \begin{bmatrix}9 & 0\\ 0 & 25\end{bmatrix}\)

Hybrid Gradient-Newton Method#

Hybrid Gradient-Newton Method - Motivation#

Motivation:

Newton’s method assumes that \(\nabla^2 f(\mathbf{x})>0\).
Gradient Descent does not use the Hessian.

We avoid the assumption \(\nabla^2 f(\mathbf{x})>0\) by constructing a hybrid method.

The hybrid gradient-Newton method:

Does not require the Hessian to be positive definite
Is likely to converge faster than the gradient method
Approach: At each iteration, determine if \(\nabla^2f(\textbf{x}_k) \succ 0\)
- If \(\nabla^2f(\textbf{x}_k) \succ 0\), use a Newton step
- Otherwise, use a gradient descent step

Hybrid Gradient-Newton Method Algorithm#

Input: \(\varepsilon>0\) tolerance parameter, method for finding stepsize \(t\) (constant step size, exact line search, or backtracking)

Initialization: Pick \(\textbf{x}_0 \in \mathbb{R}^n\) arbitrarily

General step: For any \(k = 0,1,2, \ldots\), do:

If \(\nabla^2f(\textbf{x}_k) \succ 0\), then take \(\mathbf{d}_k\) as the solution to the system \(\nabla^2 f(\textbf{x}_k)\textbf{d}_k = -\nabla f(\textbf{x}_k)\). Else, set \(\textbf{d}_k = -\nabla f(\textbf{x}_k)\).
Pick stepsize \(t\) according to the input method
Set \(\textbf{x}_{k+1} = \textbf{x}_k+t_k\textbf{d}_k\)
If \(\|\nabla f(\textbf{x}_{k+1})\| \leq \varepsilon\), then stop and \(\textbf{x}_{k+1}\) is the output.

Lecture 5-2: Newton’s Method: Part 2

Contents

Lecture 5-2: Newton’s Method: Part 2#

Damped Newton’s Method#

Topics Covered#

Damped Newton Method - Motivating Example#

Damped Newton’s Method Algorithm#

Damped Newton Method - Revisiting the Example#

Cholesky Decomposition#

Cholesky Decomposition#

Hybrid Gradient-Newton Method#

Hybrid Gradient-Newton Method - Motivation#

Hybrid Gradient-Newton Method Algorithm#