Worksheet Chapter 3, Part 2#

✅ Put your name here.

✅ Put your group number here.

Topics#

This section focuses on Ch 3 from Beck

  • Regularized Least Squares

  • Tikhonov (Quadratic) Regularization

  • Denoising using regularized least squares

%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
import scipy as sp
from scipy.sparse import diags 

Regularized least squares#

Last time we solved problems of the form

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]

Today’s change is that we are going to add a regularization term, so we are going to solve the regularized least squares (RLS) problem

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2 + \lambda R(\mathbf{x}).\]
  • \(\lambda\) is the regularization paramter. If it is 0, we are just doing ordinary least squares again. The larger \(\lambda\) is, the more weight is given to the regularization function.

  • \(R(\mathbf{x})\) is some chosen function of the input vector \(\mathbf{x}\).

We are generally going to work with quadratic regularization (also called Tikhonov regularization) where \(R(\mathbf{x}) = \|D\mathbf{x}\|^2\). If \(A^\top A + \lambda D^\top D \succ 0\), then the RLS solution is given by

\[ x_{RLS} = (A^\top A + \lambda D^\top D)^{-1} A^\top b. \]

❓❓❓ Question ❓❓❓: In a moment we will be using the regularization term

\[\min_\mathbf{x} \|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2+\lambda\|\mathbf{x}\|^2.\]

For this setting, what is \(\mathbf{D}\)? What does \(x_{RLS}\) simplify to in this case?

Answer the question here.

Example#

Let

\[\begin{split}\mathbf{A}=\begin{bmatrix} 2 & 3 & 4\\3 & 5 & 7\\ 4 & 7 & 10\end{bmatrix}+\begin{bmatrix} 10^{-3} & 0 & 0\\0 & 10^{-3} & 0\\ 0 & 0 & 10^{-3}\end{bmatrix}\qquad \mathbf{x}_{true}=\begin{bmatrix}1 \\ 2\\ 3\end{bmatrix}\end{split}\]

and \(\mathbf{b}\) is a noisy measurement of \(\mathbf{A}\mathbf{x}_{true}\), that is,

\[\mathbf{b}=\mathbf{A}\mathbf{x}_{true}+\mathbf{e},\]

where \(\mathbf{e}\) is randomly generated from a standard normal distributtion with zero mean and standard deviation of \(0.01\).

We don’t actually know \(\mathbf{e}\), so we are assuming someone handed us \(\mathbf{A}\) and \(\mathbf{b}\) and our goal is to find \(\mathbf{x}\) to minimize

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]
A0 = np.array([[2, 3, 4], [3, 5, 7], [4, 7, 10]]) 
A = A0 + 1e-3 * np.eye(3)

print(f"A = \n{A}")
x_true = np.array([1, 2, 3]).T
b = A@x_true + 0.01 * np.random.randn(3)
print("\n")
print(f"b = {b}")

❓❓❓ Question ❓❓❓:

  • Write code to find the least squares solution \(x_{LS}\) for

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]
  • Print your answer. Is it close to true value of x? If not, why do you think so?

# Enter your code below.

Answer the question here.

Our found \(\mathbf{x}\) is nowhere close to \(\mathbf{x}_{true}\) even though \(\mathbf{A}\) is full rank. It is because the condition number of \(\mathbf{A}\) is very large. The condition number is the ratio of the largest to the smallest eigenvalues of \(\mathbf{A}\).

❓❓❓ Question ❓❓❓:

  • Write code below to find the eigenvalues of \(A\)

  • Compute the ratio of the largest to the smallest to check that this ratio, which is the condition number, is indeed large.

# Enter code below.

To control the output of the least squares solution, we instead optimize a regularized version, which minimizes

\[\min_\mathbf{x} \|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2+\lambda\|\mathbf{x}\|^2.\]

❓❓❓ Question ❓❓❓: Use what you figured out in the first section to solve for \(\mathbf{x}_{RLS}\) in the case \(\lambda = 0.1\) by multiplying the available matrices. Is your solution close to \(x_{true}\) now?

# Your code here 

Denoising#

The denoising problem aims to remove noise or corruptions in a signal. Say you are given a time series \(\mathbf{b} \in \mathbf{R}^n\), and you think it came from a noisy sample of some real data \(\mathbf{x}_{true} \in \mathbf{R}^n\) plus some noise \(\mathbf{w} \in \mathbf{R}^n\). That is

\[ \mathbf{b} = \mathbf{x}_{true} + \mathbf{w}. \]

Your job is to find a good guess for \(\mathbf{x}\) but you don’t have much to go on since you know nothing about \(\mathbf{w}\).

Let’s set up an example.

n = 300
t = np.linspace(0, 4, n)
x_true = np.sin(t) + t*(np.cos(t)**2)
b = x_true + 0.05 * np.random.randn(n)
plt.plot(t, b, 'k', label='Noisy data')
plt.plot(t, x_true, 'r-', label='True signal')
plt.xlabel('t')
plt.ylabel('x(t)')
plt.legend();

The least squares problem for \(\mathbf{x} \approx \mathbf{b}\), where we are solving \(A\mathbf{x} = \mathbf{b}\) for just \(A\) equal to the identity matrix, is

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{x}-\mathbf{b}\|^2.\]

However, obviously the minimizer of the problem is \(\mathbf{x} = \mathbf{b}\), so there’s really no point in studying that at all. Instead, we can use the solution of the regularization version

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2 + \lambda R(\mathbf{x})\]

to get more useful information.

Choosing the regularization term#

We know that we want the signal to be smooth, but what does that mean? For us, we’re going to try to find \(\mathbb{x}\) so that adjacent values \(x_i\) and \(x_{i+1}\) are close together, meaning \(|x_i-x_{i+1}|\) is small. This can be required by using a regularization term like this:

\[R(\mathbb{x}) = \sum_{i=1}^{n-1}(x_i-x_{i+1})^2.\]

But that doesn’t look like our \(\|D\mathbf{x}\|^2\) term yet so we need to fix it.

In this case, we choose to use the “first order Tikhanov regularization matrix”

\[\begin{split}\mathbf{L}_{(n-1)\times n}= \begin{bmatrix}1&-1&0&\cdots &0&0\\ 0&1&-1&\cdots &0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots &1&-1\end{bmatrix}.\end{split}\]

❓❓❓ Question ❓❓❓:

  • Write down by hand the matrix \(L\) for \(n=4\). Note you should have an \((n-1) \times n = 3 \times 4\) matrix.

  • Once you have written it down, uncomment the code below to check that you have the same matrix.

# Here's one way to get the matrix L

n = 4
L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()

L

❓❓❓ Question ❓❓❓:

  • Now, again by hand, multiply \(\mathbf{L}\mathbf{x}\) for \(n=4\) where you can write

\[\begin{split} \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} \end{split}\]
  • The result above should be a vector. What is \(\|L\mathbf{x}\|^2\)?

Space for notes here.

Back to the example#

Ok, so we were working with this data drawn below.

plt.plot(t, b, 'k', label='Noisy data')
plt.plot(t, x_true, 'r-', label='True signal')
plt.xlabel('t')
plt.ylabel('x(t)')
plt.legend();

❓❓❓ Question ❓❓❓: I’ve copied the code to make \(L\) below.

  • Add code to find the solution to the RLS problem when \(\lambda = 10\).

  • Plot your solution. What do you see?

### Complete the code below.

n = len(b)
L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()
    
lam = 10

#Enter code below to find regularized least squares solution.


#Plot the solution

Effect of \(\lambda\)#

Next, we would like to study how the choice of the regularization parameter \(\lambda\) affects the solution.

❓❓❓ Question ❓❓❓: Write code below that solves for and plots the denoised signal for \(\lambda = [1, 10, 100, 1000]\) to compare with the true signal \(x_{true}\).

## Enter your code below.

❓❓❓ Question ❓❓❓: What values of \(\lambda\) give you close enough solution to the clean signal? What happens when \(\lambda\) is too large?

Answer the question here.

Extra time#

If you have extra time, try this. Last class we did polynomial regression with ordinary least squares. For that data, add a quadratic regularization term. How does this change the learned function?

© Copyright 2025, The Department of Computational Mathematics, Science and Engineering at Michigan State University.