Worksheet Chapter 5, Part 1

Worksheet Chapter 5, Part 1#

Topics#

This section focuses on Chapter 5.1 Beck

This notebook covers:

Pure Newton’s method
quadratic local convergence of Newton’s method

%matplotlib inline
import matplotlib.pylab as plt
import numpy as np

Code from Last Time#

Before we get started, I’m going to copy in the code previous classes so that we can use it here.

# Backtracking line search function

def step_size_backtrack(f, g, x_k,s=2, alpha=1/4, beta = 0.5):
    """A function to compute the step size using backtracking line search.
    Args:
        f : The objective function. Note this should be a function of some input vector x.
        g : The gradient of the objective function. Again this should be a function of some input vector x.
        x_k : Current point
        s : Initial step size (default is 2)
        alpha : Parameter for backtracking (default is 1/4)
        beta : Step size reduction factor in backtracking (default is 0.5)

    Returns:
        t : The step size found using backtracking line search
    """

    t = s
    
    fun_val = f(x_k)
    grad_x = g(x_k)
    
    while fun_val - f(x_k - t * grad_x) < alpha*t*np.linalg.norm(grad_x)**2:   
        t = beta * t

    return t

def gradient_method_backtrack(f, g, x0, epsilon, s=2, alpha=1/4, beta=0.5, max_iter = 100):
    """A function to minimize x^t A x + 2 b^t x using the gradient method with backtracking line search.

    Args:
        f : The objective function. Note this should be a function of some input vector x.
        g : The gradient of the objective function. Again this should be a function of some input vector x. 
        x0 : Initial guess for the solution
        epsilon : Tolerance for the stopping criterion
        max_iter : Maximum number of iterations to perform (default is 100)
        s : Initial step size for backtracking line search (default is 2)
        alpha : Parameter for backtracking (default is 1/4)
        beta : Step size reduction factor in backtracking (default is 0.5)

    Returns:
        x : The vector of inputs that minimizes the objective function
        fun_val : The value of the objective function at the minimum
        x_all : A list of all iterates generated during the optimization
    """
    # Start with initial guess
    x_k = x0
    f_k = f(x_k)  # Initial function value
    
    # Store all iterates (the x_k's used)
    x_all = [x0]
    
    # Keep a counter for how many iterations
    ite = 0 
    grad_k = g(x_k)
    while np.linalg.norm(grad_k) > epsilon and ite < max_iter:
    
        grad_k = g(x_k)
        t_k = step_size_backtrack(f, g, x_k, s, alpha, beta) #<----
        d_k = -grad_k
        
        # Update x_k to the next x_k (x_{k+1} in the notation above)
        x_k = x_k + t_k * d_k
        
        # Compute the function value at the new iterate
        f_k = f(x_k)  
        
        # Append the new iterate to the list
        x_all.append(x_k)       

        ite = ite + 1
        if ite >= max_iter:
            print("Maximum number of iterations reached.")
            break
    
    x_soln = x_k
    f_soln = f_k 
    
    return x_soln, f_soln, x_all

#------
# Example usage for last class's function, x^2 + 2y^2
#------

def f(x):
    return x[0]**2 + 2 * x[1]**2

def g(x):
    return np.array([2*x[0], 4*x[1]])

x0 = np.array([1.0, 1.0])
epsilon = 1e-6

x_soln, f_soln, x_all = gradient_method_backtrack(f, g, x0, epsilon, s=2, alpha=1/4, beta=0.5, max_iter = 100)

print("Solution x:", x_soln)

Solution x: [0. 0.]

Pure Newton’s Method#

We consider the unconstrained problem

\[\min_\mathbf{x}f(\mathbf{x}),\]

where \(f(\cdot)\) is twice continuously differentiable over \(\mathbb{R}^n\).

Given \(\mathbf{x}_k\), the next iteration \(\mathbf{x}_{k+1}\) is obtained by minimizing the following quadratic approximation of the function around \(\mathbf{x}\).

\[\mathbf{x}_{k+1}=\arg\min_{\mathbf{x}}f(\mathbf{x}_k)+\nabla f(\mathbf{x}_k)^\top(\mathbf{x}-\mathbf{x}_k)+{1\over 2}(\mathbf{x}-\mathbf{x}_k)^\top\nabla^2f(\mathbf{x}_k)(\mathbf{x}-\mathbf{x}_k)\]

We will further assume that \(\nabla^2f(\mathbf{x}_k)\) is positive definite. Then the solution \(\mathbf{x}_{k+1}\) is unique. In this case, we have

\[\nabla f(\mathbf{x}_k)+\nabla^2f(\mathbf{x}_k)(\mathbf{x}_{k+1}-\mathbf{x}_k)=0\]

which means

\[\mathbf{x}_{k+1}=\mathbf{x}_k-(\nabla^2f(\mathbf{x}_k))^{-1}\nabla f(\mathbf{x})\]

The vector \(-\nabla^2f(\mathbf{x}_k)^{-1}\nabla f(\mathbf{x}_k)\) is called the Newton direction, and the algorithm is called the pure Newton’s method.

Today’s example#

Consider the minimization problem

\[\min_{x,y} 100x^4 +0.01y^4,\]

whose optimal solution is \((0,0)\).

This function has gradient

\[\begin{split} \nabla f (x,y) = \begin{bmatrix} 400 x^3 \\ 0.04 y^3 \end{bmatrix} \end{split}\]

and Hessian

\[\begin{split} \nabla^2 f (x,y) = \begin{bmatrix} 1200 x^2 & 0 \\ 0 & 0.12 y^2 \end{bmatrix} \end{split}\]

❓❓❓ Question ❓❓❓: Write a function f which returns the function value at the input point.

# Your code here 

def f(x):
    pass #<-- your code here

# Test it. This should 100.01.
x0 = np.array([1, 1])
f(x0)

### ANSWER ###
def f(x):
    return 100*x[0]**4 + 0.01*x[1]**4 

# Test it 
x0 = np.array([1, 1])
f(x0)

np.float64(100.01)

❓❓❓ Question ❓❓❓: Write a function g which returns our gradient.

# Your code here 

def g(x):
    pass #<-- your code here
    
# Test it. This should be a vector with entries 400 and 0.04.
x0 = np.array([1, 1])
g(x0)

### ANSWER ###
def g(x):
    return np.array([400*x[0]**3,0.04*x[1]**3])
    
# Test it. This should be a vector with entries 400 and 0.04.
x0 = np.array([1, 1])
print(g(x0))

[4.e+02 4.e-02]

❓❓❓ Question ❓❓❓: Write a function h which returns the Hessian.

# Your code here 

def h(x):
    pass
    
# Test it. This should be a 2x2 matrix.
x0 = np.array([1, 1])
h(x0)

### ANSWER ###
def h(x):
    return np.array([[1200*x[0]**2, 0],[0,0.12*x[1]**2]])

    
# Test it. This should be a 2x2 matrix.
x0 = np.array([1, 1])
h(x0)

array([[1.2e+03, 0.0e+00],
       [0.0e+00, 1.2e-01]])

❓❓❓ Question ❓❓❓: Does this function satisfy the requirements of the convergence theorem?

✎ Your answer here

###ANSWER### No, this can be seen because the Hessian \(\nabla^2 f(\mathbf{x}) \) is not Lipshitz

Newton’s Method Code#

❓❓❓ Question ❓❓❓: Update the code below to run the pure Newton method on our function.

def pure_newton(f, g, h, x0, epsilon=1e-5, max_iter=100):
    """A function to minimize a twice-differentiable function using pure Newton's method.
    
    Args:
        f : The objective function. Note this should be a function of some input vector x.
        g : The gradient of the objective function. Again this should be a function of some input vector x.
        h : The Hessian of the objective function. This should be a function of some input vector x.
        x0 : Initial guess for the solution
        epsilon : Tolerance for the stopping criterion (default is 1e-5)
        max_iter : Maximum number of iterations to perform (default is 100)
    
    Returns:
        x : The vector of inputs that minimizes the objective function
        fun_val : The value of the objective function at the minimum
        x_all : A list of all iterates generated during the optimization
    """
    # Start with initial guess
    x_k = x0
    f_k = f(x_k)  # Initial function value
    
    # Store all iterates (the x_k's used)
    x_all = [x0]
    
    # Keep a counter for how many iterations
    ite = 0
    grad_k = g(x_k)
    
    while np.linalg.norm(grad_k) > epsilon:
        ite = ite + 1
        
        #------
        # Remove break before working 
        break 
        #------
        
        # Compute Hessian at current point
        hess_k = h(x_k)
        
        # Solve for Newton direction: 
        d_k = None #<-- your code here 
        
        # Update x_k to the next x_k (x_{k+1} in the notation above)
        x_k = None #<-- your code here
        
        # Compute the function value at the new iterate
        f_k = f(x_k)
        grad_k = g(x_k)
        
        # Append the new iterate to the list
        x_all.append(x_k)
        
        print("iter_number = %3d \t grad = %2.6f \t fun_val = %2.6f" %(ite, np.linalg.norm(grad_k), f_k))
        
        if ite >= max_iter:
            print("Maximum number of iterations reached.")
            break
    
    x_soln = x_k
    f_soln = f_k
    
    return x_soln, f_soln, x_all

# Test it 

x0 = np.array([1, 1])
x, fun_val, x_all = pure_newton(f, g, h, x0, epsilon=1e-6, max_iter=100)

###ANSWER###
def pure_newton(f, g, h, x0, epsilon=1e-5, max_iter=100):
    """A function to minimize a twice-differentiable function using pure Newton's method.
    
    Args:
        f : The objective function. Note this should be a function of some input vector x.
        g : The gradient of the objective function. Again this should be a function of some input vector x.
        h : The Hessian of the objective function. This should be a function of some input vector x.
        x0 : Initial guess for the solution
        epsilon : Tolerance for the stopping criterion (default is 1e-5)
        max_iter : Maximum number of iterations to perform (default is 100)
    
    Returns:
        x : The vector of inputs that minimizes the objective function
        fun_val : The value of the objective function at the minimum
        x_all : A list of all iterates generated during the optimization
    """
    # Start with initial guess
    x_k = x0
    f_k = f(x_k)  # Initial function value
    
    # Store all iterates (the x_k's used)
    x_all = [x0]
    
    # Keep a counter for how many iterations
    ite = 0
    grad_k = g(x_k)
    
    while np.linalg.norm(grad_k) > epsilon and ite < max_iter:
        ite = ite + 1
        
        # Compute Hessian at current point
        hess_k = h(x_k)
        
        # Compute Newton direction using matrix operations
        d_k = -np.linalg.inv(hess_k) @ grad_k
        
        # Update x_k to the next x_k (x_{k+1} in the notation above)
        x_k = x_k + d_k
        
        # Compute the function value at the new iterate
        f_k = f(x_k)
        grad_k = g(x_k)
        
        # Append the new iterate to the list
        x_all.append(x_k)
        
        print("iter_number = %3d \t grad = %2.6f \t fun_val = %2.6f" %(ite, np.linalg.norm(grad_k), f_k))
        
        if ite >= max_iter:
            print("Maximum number of iterations reached.")
            break
    
    x_soln = x_k
    f_soln = f_k
    
    return x_soln, f_soln, x_all

# Test it 

x0 = np.array([1, 1])
x, fun_val, x_all = pure_newton(f, g, h, x0, epsilon=1e-6, max_iter=100)

iter_number =   1 	 grad = 118.518519 	 fun_val = 19.755062
iter_number =   2 	 grad = 35.116598 	 fun_val = 3.902234
iter_number =   3 	 grad = 10.404918 	 fun_val = 0.770812
iter_number =   4 	 grad = 3.082939 	 fun_val = 0.152259
iter_number =   5 	 grad = 0.913463 	 fun_val = 0.030076
iter_number =   6 	 grad = 0.270656 	 fun_val = 0.005941
iter_number =   7 	 grad = 0.080194 	 fun_val = 0.001174
iter_number =   8 	 grad = 0.023761 	 fun_val = 0.000232
iter_number =   9 	 grad = 0.007040 	 fun_val = 0.000046
iter_number =  10 	 grad = 0.002086 	 fun_val = 0.000009
iter_number =  11 	 grad = 0.000618 	 fun_val = 0.000002
iter_number =  12 	 grad = 0.000183 	 fun_val = 0.000000
iter_number =  13 	 grad = 0.000054 	 fun_val = 0.000000
iter_number =  14 	 grad = 0.000016 	 fun_val = 0.000000
iter_number =  15 	 grad = 0.000005 	 fun_val = 0.000000
iter_number =  16 	 grad = 0.000001 	 fun_val = 0.000000
iter_number =  17 	 grad = 0.000000 	 fun_val = 0.000000

Gradient Descent Version#

❓❓❓ Question ❓❓❓: Now let’s try the same problem using gradient descent with backtracking line search to solve this problem with \((s=1, \alpha=0.5, \beta=0.5, \epsilon=10^{-6})\). Update the inputs to run this on our function. How many iterations did it take? More or less than Newton?

x0 = np.array([1, 1])


#Enter your code below to run the gradient method function with parameters specified in the problem above and print solution.
# x_gradient, fun_val_gradient, x_all_gradient = gradient_method_backtrack() #<- Uncomment and add your inputs here

### ANSWER ###

x0 = np.array([1, 1])

x_gradient, fun_val_gradient, x_all_gradient = gradient_method_backtrack(f, g, x0, epsilon = 1e-6, s=1, alpha=1/2, beta=0.5, max_iter = 10000)

print(x_gradient)
# Note that we haven't even made it to the solution with 10,000 iterations. This is definitely worse than Newton

Maximum number of iterations reached.
[0.00035356 0.03533898]