Lecture 4-2: Gradient Method: Part 2#
Download the original slides: CMSE382-Lec4_2.pdf
Warning
This is an AI-generated transcript of the lecture slides and may contain errors or inaccuracies. Please refer to the original course materials for authoritative content.
Condition Number, Sensitivity, and Diagonal Scaling#
Topics Covered#
Topics:
Condition number
Gradient descent solution sensitivity
Diagonal scaling
Announcements:
Quiz 2 today, end of class
No office hours Friday
Condition Number#
Recall: positive definiteness. Singular matrix.
An \(n\times n\) real symmetric matrix \(A\) is called:
Positive definite if \(\boldsymbol{x}^T A \boldsymbol{x} > 0\) for every non-zero choice of \(\boldsymbol{x}\).
The diagonal entries of \(A\) are positive.
There is an invertible matrix \(B\) such that \(A=B^T B\)
\(\mathbf{A}\) is positive definite if and only if all its eigenvalues are positive.
A singular matrix is a square matrix that does not have an inverse.
A matrix is singular if and only if its determinant is 0.
A matrix has a zero eigenvalue if and only if its determinant is 0.
Condition Number#
Definition: Let \(\mathbf{A}\) be a positive definite matrix. Then the condition number of \(\mathbf{A}\) is defined by
\(\kappa \geq 1\)
Matrices with large condition number are ill-conditioned and matrices with small condition number are well-conditioned.
A singular matrix has an infinite condition number.
Sensitivity of Solutions#
Sensitivity of gradient descent solution
For a linear system \(A \mathbf{x} = b\), the condition number measures the sensitivity of the solution \(\mathbf{x}\) to fluctuations in the observed data \(\mathbf{b}\):
Gradient descent rate of convergence of \(\textbf{x}_k\) to a given stationary point \(\textbf{x}^\ast\) depends on \(\kappa(\nabla^2f(\textbf{x}^\ast))\).
Example: Solve the system
If \(\mathbf{b}=\begin{bmatrix}1\\1 \end{bmatrix}\), then \(\mathbf{x} = \begin{bmatrix} 0.4999975 \\ 0.4999975 \end{bmatrix}\)
If \(\mathbf{b}=\begin{bmatrix}1.01\\1 \end{bmatrix}\), then \(\mathbf{x} = \begin{bmatrix} 500.50249748\\-499.49750251 \end{bmatrix}\)
Diagonal Scaling - Motivating Example#
Assume we want to minimize the quadratic function
We can write this problem as \(\min\limits_{\textbf{x}} \{\textbf{x}^\top\textbf{A}\textbf{x}\}\)
The condition number of \(\textbf{A}\) is \(\kappa(A)=1668.001\).
Using gradient descent, we would expect slow convergence.
Diagonal Scaling#
Main idea: We have an ill-conditioned \(\mathbf{A}\) but we want to minimize \(\textbf{x}^\top\mathbf{A}\textbf{x}\).
Instead:
pick a non-singular matrix \(\mathbf{S}\)
set \(\textbf{x} = \mathbf{S}\textbf{y}\)
Then instead minimize the transformation:
The transformation aims to speed up convergence while still giving a good answer.
The gradient method for the transformed problem becomes:
Instead of picking \(\mathbf{S}\) directly, set \(\mathbf{D} = \mathbf{S}\mathbf{S}^\top\) and pick \(\mathbf{D}\) instead.
Scaled Gradient Method#
Input: tolerance parameter \(\varepsilon > 0\).
Initialization: Pick \(\textbf{x}_0 \in \mathbb{R}^n\) arbitrarily.
For any \(k = 0, 1, 2, \ldots\) do:
Pick a scaling matrix \(\textbf{D}_k \succ 0\)
Pick a stepsize \(t_k\)
For example, using exact line search on the function \(g(t) = f(\textbf{x}_k - t\textbf{D}_k\nabla f(\textbf{x}_k))\).
Set \(\textbf{x}_{k+1} = \textbf{x}_k - t_k\textbf{D}_k\nabla f(\textbf{x}_k)\).
If \(\|\nabla f(\textbf{x}_{k+1})\| \leq \varepsilon\), then STOP and \(\textbf{x}_{k+1}\) is the output.
Choosing the Scaling Matrix \(\textbf{D}_k\)#
How to choose \(\textbf{D}_k\)?
Newton’s method: \(\textbf{D}_k = (\nabla^2 f(\textbf{x}_k))^{-1}\)
Diagonal method: \(\textbf{D}_k = (\nabla^2f(\textbf{x}_k))^{-1}_{ii}\)
Often chosen because they result in fast convergence.