Worksheet Chapter 5, Part 1#
Topics#
This section focuses on Chapter 5.1 Beck
This notebook covers:
Pure Newton’s method
quadratic local convergence of Newton’s method
%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
Code from Last Time#
Before we get started, I’m going to copy in the code previous classes so that we can use it here.
# Backtracking line search function
def step_size_backtrack(f, g, x_k,s=2, alpha=1/4, beta = 0.5):
"""A function to compute the step size using backtracking line search.
Args:
f : The objective function. Note this should be a function of some input vector x.
g : The gradient of the objective function. Again this should be a function of some input vector x.
x_k : Current point
s : Initial step size (default is 2)
alpha : Parameter for backtracking (default is 1/4)
beta : Step size reduction factor in backtracking (default is 0.5)
Returns:
t : The step size found using backtracking line search
"""
t = s
fun_val = f(x_k)
grad_x = g(x_k)
while fun_val - f(x_k - t * grad_x) < alpha*t*np.linalg.norm(grad_x)**2:
t = beta * t
return t
def gradient_method_backtrack(f, g, x0, epsilon, s=2, alpha=1/4, beta=0.5, max_iter = 100):
"""A function to minimize x^t A x + 2 b^t x using the gradient method with backtracking line search.
Args:
f : The objective function. Note this should be a function of some input vector x.
g : The gradient of the objective function. Again this should be a function of some input vector x.
x0 : Initial guess for the solution
epsilon : Tolerance for the stopping criterion
max_iter : Maximum number of iterations to perform (default is 100)
s : Initial step size for backtracking line search (default is 2)
alpha : Parameter for backtracking (default is 1/4)
beta : Step size reduction factor in backtracking (default is 0.5)
Returns:
x : The vector of inputs that minimizes the objective function
fun_val : The value of the objective function at the minimum
x_all : A list of all iterates generated during the optimization
"""
# Start with initial guess
x_k = x0
f_k = f(x_k) # Initial function value
# Store all iterates (the x_k's used)
x_all = [x0]
# Keep a counter for how many iterations
ite = 0
grad_k = g(x_k)
while np.linalg.norm(grad_k) > epsilon and ite < max_iter:
grad_k = g(x_k)
t_k = step_size_backtrack(f, g, x_k, s, alpha, beta) #<----
d_k = -grad_k
# Update x_k to the next x_k (x_{k+1} in the notation above)
x_k = x_k + t_k * d_k
# Compute the function value at the new iterate
f_k = f(x_k)
# Append the new iterate to the list
x_all.append(x_k)
ite = ite + 1
if ite >= max_iter:
print("Maximum number of iterations reached.")
break
x_soln = x_k
f_soln = f_k
return x_soln, f_soln, x_all
#------
# Example usage for last class's function, x^2 + 2y^2
#------
def f(x):
return x[0]**2 + 2 * x[1]**2
def g(x):
return np.array([2*x[0], 4*x[1]])
x0 = np.array([1.0, 1.0])
epsilon = 1e-6
x_soln, f_soln, x_all = gradient_method_backtrack(f, g, x0, epsilon, s=2, alpha=1/4, beta=0.5, max_iter = 100)
print("Solution x:", x_soln)
Solution x: [0. 0.]
Pure Newton’s Method#
We consider the unconstrained problem
where \(f(\cdot)\) is twice continuously differentiable over \(\mathbb{R}^n\).
Given \(\mathbf{x}_k\), the next iteration \(\mathbf{x}_{k+1}\) is obtained by minimizing the following quadratic approximation of the function around \(\mathbf{x}\).
We will further assume that \(\nabla^2f(\mathbf{x}_k)\) is positive definite. Then the solution \(\mathbf{x}_{k+1}\) is unique. In this case, we have
which means
The vector \(-\nabla^2f(\mathbf{x}_k)^{-1}\nabla f(\mathbf{x}_k)\) is called the Newton direction, and the algorithm is called the pure Newton’s method.
Today’s example#
Consider the minimization problem
whose optimal solution is \((0,0)\).
This function has gradient
and Hessian
❓❓❓ Question ❓❓❓:
Write a function f which returns the function value at the input point.
# Your code here
def f(x):
pass #<-- your code here
# Test it. This should 100.01.
x0 = np.array([1, 1])
f(x0)
### ANSWER ###
def f(x):
return 100*x[0]**4 + 0.01*x[1]**4
# Test it
x0 = np.array([1, 1])
f(x0)
np.float64(100.01)
❓❓❓ Question ❓❓❓:
Write a function g which returns our gradient.
# Your code here
def g(x):
pass #<-- your code here
# Test it. This should be a vector with entries 400 and 0.04.
x0 = np.array([1, 1])
g(x0)
### ANSWER ###
def g(x):
return np.array([400*x[0]**3,0.04*x[1]**3])
# Test it. This should be a vector with entries 400 and 0.04.
x0 = np.array([1, 1])
print(g(x0))
[4.e+02 4.e-02]
❓❓❓ Question ❓❓❓:
Write a function h which returns the Hessian.
# Your code here
def h(x):
pass
# Test it. This should be a 2x2 matrix.
x0 = np.array([1, 1])
h(x0)
### ANSWER ###
def h(x):
return np.array([[1200*x[0]**2, 0],[0,0.12*x[1]**2]])
# Test it. This should be a 2x2 matrix.
x0 = np.array([1, 1])
h(x0)
array([[1.2e+03, 0.0e+00],
[0.0e+00, 1.2e-01]])
❓❓❓ Question ❓❓❓: Does this function satisfy the requirements of the convergence theorem?
✎ Your answer here
###ANSWER### No, this can be seen because the Hessian \(\nabla^2 f(\mathbf{x}) \) is not Lipshitz
Newton’s Method Code#
❓❓❓ Question ❓❓❓: Update the code below to run the pure Newton method on our function.
def pure_newton(f, g, h, x0, epsilon=1e-5, max_iter=100):
"""A function to minimize a twice-differentiable function using pure Newton's method.
Args:
f : The objective function. Note this should be a function of some input vector x.
g : The gradient of the objective function. Again this should be a function of some input vector x.
h : The Hessian of the objective function. This should be a function of some input vector x.
x0 : Initial guess for the solution
epsilon : Tolerance for the stopping criterion (default is 1e-5)
max_iter : Maximum number of iterations to perform (default is 100)
Returns:
x : The vector of inputs that minimizes the objective function
fun_val : The value of the objective function at the minimum
x_all : A list of all iterates generated during the optimization
"""
# Start with initial guess
x_k = x0
f_k = f(x_k) # Initial function value
# Store all iterates (the x_k's used)
x_all = [x0]
# Keep a counter for how many iterations
ite = 0
grad_k = g(x_k)
while np.linalg.norm(grad_k) > epsilon:
ite = ite + 1
#------
# Remove break before working
break
#------
# Compute Hessian at current point
hess_k = h(x_k)
# Solve for Newton direction:
d_k = None #<-- your code here
# Update x_k to the next x_k (x_{k+1} in the notation above)
x_k = None #<-- your code here
# Compute the function value at the new iterate
f_k = f(x_k)
grad_k = g(x_k)
# Append the new iterate to the list
x_all.append(x_k)
print("iter_number = %3d \t grad = %2.6f \t fun_val = %2.6f" %(ite, np.linalg.norm(grad_k), f_k))
if ite >= max_iter:
print("Maximum number of iterations reached.")
break
x_soln = x_k
f_soln = f_k
return x_soln, f_soln, x_all
# Test it
x0 = np.array([1, 1])
x, fun_val, x_all = pure_newton(f, g, h, x0, epsilon=1e-6, max_iter=100)
###ANSWER###
def pure_newton(f, g, h, x0, epsilon=1e-5, max_iter=100):
"""A function to minimize a twice-differentiable function using pure Newton's method.
Args:
f : The objective function. Note this should be a function of some input vector x.
g : The gradient of the objective function. Again this should be a function of some input vector x.
h : The Hessian of the objective function. This should be a function of some input vector x.
x0 : Initial guess for the solution
epsilon : Tolerance for the stopping criterion (default is 1e-5)
max_iter : Maximum number of iterations to perform (default is 100)
Returns:
x : The vector of inputs that minimizes the objective function
fun_val : The value of the objective function at the minimum
x_all : A list of all iterates generated during the optimization
"""
# Start with initial guess
x_k = x0
f_k = f(x_k) # Initial function value
# Store all iterates (the x_k's used)
x_all = [x0]
# Keep a counter for how many iterations
ite = 0
grad_k = g(x_k)
while np.linalg.norm(grad_k) > epsilon and ite < max_iter:
ite = ite + 1
# Compute Hessian at current point
hess_k = h(x_k)
# Compute Newton direction using matrix operations
d_k = -np.linalg.inv(hess_k) @ grad_k
# Update x_k to the next x_k (x_{k+1} in the notation above)
x_k = x_k + d_k
# Compute the function value at the new iterate
f_k = f(x_k)
grad_k = g(x_k)
# Append the new iterate to the list
x_all.append(x_k)
print("iter_number = %3d \t grad = %2.6f \t fun_val = %2.6f" %(ite, np.linalg.norm(grad_k), f_k))
if ite >= max_iter:
print("Maximum number of iterations reached.")
break
x_soln = x_k
f_soln = f_k
return x_soln, f_soln, x_all
# Test it
x0 = np.array([1, 1])
x, fun_val, x_all = pure_newton(f, g, h, x0, epsilon=1e-6, max_iter=100)
iter_number = 1 grad = 118.518519 fun_val = 19.755062
iter_number = 2 grad = 35.116598 fun_val = 3.902234
iter_number = 3 grad = 10.404918 fun_val = 0.770812
iter_number = 4 grad = 3.082939 fun_val = 0.152259
iter_number = 5 grad = 0.913463 fun_val = 0.030076
iter_number = 6 grad = 0.270656 fun_val = 0.005941
iter_number = 7 grad = 0.080194 fun_val = 0.001174
iter_number = 8 grad = 0.023761 fun_val = 0.000232
iter_number = 9 grad = 0.007040 fun_val = 0.000046
iter_number = 10 grad = 0.002086 fun_val = 0.000009
iter_number = 11 grad = 0.000618 fun_val = 0.000002
iter_number = 12 grad = 0.000183 fun_val = 0.000000
iter_number = 13 grad = 0.000054 fun_val = 0.000000
iter_number = 14 grad = 0.000016 fun_val = 0.000000
iter_number = 15 grad = 0.000005 fun_val = 0.000000
iter_number = 16 grad = 0.000001 fun_val = 0.000000
iter_number = 17 grad = 0.000000 fun_val = 0.000000
Gradient Descent Version#
❓❓❓ Question ❓❓❓: Now let’s try the same problem using gradient descent with backtracking line search to solve this problem with \((s=1, \alpha=0.5, \beta=0.5, \epsilon=10^{-6})\). Update the inputs to run this on our function. How many iterations did it take? More or less than Newton?
x0 = np.array([1, 1])
#Enter your code below to run the gradient method function with parameters specified in the problem above and print solution.
# x_gradient, fun_val_gradient, x_all_gradient = gradient_method_backtrack() #<- Uncomment and add your inputs here
### ANSWER ###
x0 = np.array([1, 1])
x_gradient, fun_val_gradient, x_all_gradient = gradient_method_backtrack(f, g, x0, epsilon = 1e-6, s=1, alpha=1/2, beta=0.5, max_iter = 10000)
print(x_gradient)
# Note that we haven't even made it to the solution with 10,000 iterations. This is definitely worse than Newton
Maximum number of iterations reached.
[0.00035356 0.03533898]
© Copyright 2025, The Department of Computational Mathematics, Science and Engineering at Michigan State University.