Worksheet Chapter 3, Part 2#

✅ Put your name here.

✅ Put your group number here.

Topics#

This section focuses on Ch 3 from Beck

  • Regularized Least Squares

  • Tikhonov (Quadratic) Regularization

  • Denoising using regularized least squares

%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
import scipy as sp
from scipy.sparse import diags 

Regularized least squares#

Last time we solved problems of the form

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]

Today’s change is that we are going to add a regularization term, so we are going to solve the regularized least squares (RLS) problem

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2 + \lambda R(\mathbf{x}).\]
  • \(\lambda\) is the regularization paramter. If it is 0, we are just doing ordinary least squares again. The larger \(\lambda\) is, the more weight is given to the regularization function.

  • \(R(\mathbf{x})\) is some chosen function of the input vector \(\mathbf{x}\).

We are generally going to work with quadratic regularization (also called Tikhonov regularization) where \(R(\mathbf{x}) = \|D\mathbf{x}\|^2\). If \(A^\top A + \lambda D^\top D \succ 0\), then the RLS solution is given by

\[ x_{RLS} = (A^\top A + \lambda D^\top D)^{-1} A^\top b. \]

❓❓❓ Question ❓❓❓: In a moment we will be using the regularization term

\[\min_\mathbf{x} \|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2+\lambda\|\mathbf{x}\|^2.\]

For this setting, what is \(\mathbf{D}\)? What does \(x_{RLS}\) simplify to in this case?

Answer the question here.

Solution

\(x_{RLS} = (A^\top A + \lambda I)^{-1} A^\top b.\)

Example#

Let

\[\begin{split}\mathbf{A}=\begin{bmatrix} 2 & 3 & 4\\3 & 5 & 7\\ 4 & 7 & 10\end{bmatrix}+\begin{bmatrix} 10^{-3} & 0 & 0\\0 & 10^{-3} & 0\\ 0 & 0 & 10^{-3}\end{bmatrix}\qquad \mathbf{x}_{true}=\begin{bmatrix}1 \\ 2\\ 3\end{bmatrix}\end{split}\]

and \(\mathbf{b}\) is a noisy measurement of \(\mathbf{A}\mathbf{x}_{true}\), that is,

\[\mathbf{b}=\mathbf{A}\mathbf{x}_{true}+\mathbf{e},\]

where \(\mathbf{e}\) is randomly generated from a standard normal distributtion with zero mean and standard deviation of \(0.01\).

We don’t actually know \(\mathbf{e}\), so we are assuming someone handed us \(\mathbf{A}\) and \(\mathbf{b}\) and our goal is to find \(\mathbf{x}\) to minimize

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]
A0 = np.array([[2, 3, 4], [3, 5, 7], [4, 7, 10]]) 
A = A0 + 1e-3 * np.eye(3)

print(f"A = \n{A}")
x_true = np.array([1, 2, 3]).T
b = A@x_true + 0.01 * np.random.randn(3)
print("\n")
print(f"b = {b}")
A = 
[[ 2.001  3.     4.   ]
 [ 3.     5.001  7.   ]
 [ 4.     7.    10.001]]


b = [20.00090976 33.99555657 47.99461542]

❓❓❓ Question ❓❓❓:

  • Write code to find the least squares solution \(x_{LS}\) for

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]
  • Print your answer. Is it close to true value of x? If not, why do you think so?

# Enter your code below.

Answer the question here.

Hide code cell content

### ANSWER ###

x_ls = np.linalg.lstsq(A,b, rcond=None)[0]
print(f"x_ls = {x_ls}")

## Note that this is very different from the true solution 
print(f"x_true = {x_true}")
x_ls = [1.74182096 0.53042903 3.73105973]
x_true = [1 2 3]

Our found \(\mathbf{x}\) is nowhere close to \(\mathbf{x}_{true}\) even though \(\mathbf{A}\) is full rank. It is because the condition number of \(\mathbf{A}\) is very large. The condition number is the ratio of the largest to the smallest eigenvalues of \(\mathbf{A}\).

❓❓❓ Question ❓❓❓:

  • Write code below to find the eigenvalues of \(A\)

  • Compute the ratio of the largest to the smallest to check that this ratio, which is the condition number, is indeed large.

# Enter code below.

Hide code cell content

### ANSWER ###

print(f"rank: {np.linalg.matrix_rank(A0)}")
print(f"eig(A) = {np.linalg.eig(A)[0]}")
print("Note the largest comes first, smallest last.")

#Print ratio
print(f"condition number = {np.linalg.eig(A)[0][0]/np.linalg.eig(A)[0][2]}")
#The ratio of largest to smallest eigenvalues of A is indeed large.

print("\n")
print("This can also be done with nupy directly:")
print(f"condition number (np.linalg.cond) = {np.linalg.cond(A)}")
rank: 2
eig(A) = [1.66404103e+01 3.61589702e-01 1.00000000e-03]
Note the largest comes first, smallest last.
condition number = 16640.41029802771


This can also be done with nupy directly:
condition number (np.linalg.cond) = 16640.41029805186

To control the output of the least squares solution, we instead optimize a regularized version, which minimizes

\[\min_\mathbf{x} \|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2+\lambda\|\mathbf{x}\|^2.\]

❓❓❓ Question ❓❓❓: Use what you figured out in the first section to solve for \(\mathbf{x}_{RLS}\) in the case \(\lambda = 0.1\) by multiplying the available matrices. Is your solution close to \(x_{true}\) now?

# Your code here 

Hide code cell content

### ANSWER ###
# $x_{RLS} = (A^\top A + \lambda I)^{-1} A^\top b.

lam = 0.1
x_rls = np.linalg.inv(A.T@A + lam*np.eye(3))@(A.T@b)
print(f"Found x: {x_rls}")
print(f"True x: {x_true}")
Found x: [1.09262706 2.01892877 2.9452746 ]
True x: [1 2 3]

Denoising#

The denoising problem aims to remove noise or corruptions in a signal. Say you are given a time series \(\mathbf{b} \in \mathbf{R}^n\), and you think it came from a noisy sample of some real data \(\mathbf{x}_{true} \in \mathbf{R}^n\) plus some noise \(\mathbf{w} \in \mathbf{R}^n\). That is

\[ \mathbf{b} = \mathbf{x}_{true} + \mathbf{w}. \]

Your job is to find a good guess for \(\mathbf{x}\) but you don’t have much to go on since you know nothing about \(\mathbf{w}\).

Let’s set up an example.

n = 300
t = np.linspace(0, 4, n)
x_true = np.sin(t) + t*(np.cos(t)**2)
b = x_true + 0.05 * np.random.randn(n)
plt.plot(t, b, 'k', label='Noisy data')
plt.plot(t, x_true, 'r-', label='True signal')
plt.xlabel('t')
plt.ylabel('x(t)')
plt.legend();
../../../../_images/89f42738fe138d453d77508d97ebb4872855b5fe99b8148ebc0890b407e640b0.png

The least squares problem for \(\mathbf{x} \approx \mathbf{b}\), where we are solving \(A\mathbf{x} = \mathbf{b}\) for just \(A\) equal to the identity matrix, is

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{x}-\mathbf{b}\|^2.\]

However, obviously the minimizer of the problem is \(\mathbf{x} = \mathbf{b}\), so there’s really no point in studying that at all. Instead, we can use the solution of the regularization version

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2 + \lambda R(\mathbf{x})\]

to get more useful information.

Choosing the regularization term#

We know that we want the signal to be smooth, but what does that mean? For us, we’re going to try to find \(\mathbb{x}\) so that adjacent values \(x_i\) and \(x_{i+1}\) are close together, meaning \(|x_i-x_{i+1}|\) is small. This can be required by using a regularization term like this:

\[R(\mathbb{x}) = \sum_{i=1}^{n-1}(x_i-x_{i+1})^2.\]

But that doesn’t look like our \(\|D\mathbf{x}\|^2\) term yet so we need to fix it.

In this case, we choose to use the “first order Tikhanov regularization matrix”

\[\begin{split}\mathbf{L}_{(n-1)\times n}= \begin{bmatrix}1&-1&0&\cdots &0&0\\ 0&1&-1&\cdots &0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots &1&-1\end{bmatrix}.\end{split}\]

❓❓❓ Question ❓❓❓:

  • Write down by hand the matrix \(L\) for \(n=4\). Note you should have an \((n-1) \times n = 3 \times 4\) matrix.

  • Once you have written it down, uncomment the code below to check that you have the same matrix.

# Here's one way to get the matrix L

n = 4
L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()

L
/var/folders/j9/kbv0jxbn7458fwg4nvr4rlkm0000gn/T/ipykernel_71639/1214235867.py:4: FutureWarning: Input has data type int64, but the output has been cast to float64.  In the future, the output data type will match the input. To avoid this warning, set the `dtype` parameter to `None` to have the output dtype match the input, or set it to the desired output data type.
  L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()
array([[ 1., -1.,  0.,  0.],
       [ 0.,  1., -1.,  0.],
       [ 0.,  0.,  1., -1.]])

❓❓❓ Question ❓❓❓:

  • Now, again by hand, multiply \(\mathbf{L}\mathbf{x}\) for \(n=4\) where you can write

\[\begin{split} \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} \end{split}\]
  • The result above should be a vector. What is \(\|L\mathbf{x}\|^2\)?

Space for notes here.

Solution
  • The multiplication results in

\[\begin{split} L\mathbf{x} = \begin{bmatrix} x_1 - x_2 \\ x_2 - x_3 \\ x_3 - x_4 \end{bmatrix} \end{split}\]
  • Then the norm when no one tells you a subscript is just the 2-norm, so here it’s the square root of the sum of the entries squared. So, it’s

\[\|L\mathbf{x}\|^2 = \sum_{i=1}^{n-1}(x_i-x_{i+1})^2\]

which is exactly what we wanted in the first place.

Back to the example#

Ok, so we were working with this data drawn below.

plt.plot(t, b, 'k', label='Noisy data')
plt.plot(t, x_true, 'r-', label='True signal')
plt.xlabel('t')
plt.ylabel('x(t)')
plt.legend();
../../../../_images/89f42738fe138d453d77508d97ebb4872855b5fe99b8148ebc0890b407e640b0.png

❓❓❓ Question ❓❓❓: I’ve copied the code to make \(L\) below.

  • Add code to find the solution to the RLS problem when \(\lambda = 10\).

  • Plot your solution. What do you see?

### Complete the code below.

n = len(b)
L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()
    
lam = 10

#Enter code below to find regularized least squares solution.


#Plot the solution
/var/folders/j9/kbv0jxbn7458fwg4nvr4rlkm0000gn/T/ipykernel_71639/614064222.py:4: FutureWarning: Input has data type int64, but the output has been cast to float64.  In the future, the output data type will match the input. To avoid this warning, set the `dtype` parameter to `None` to have the output dtype match the input, or set it to the desired output data type.
  L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()

Hide code cell content

### ANSWER ###

n = len(b)
L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()
lam = 10
    
#Enter code below to find regularized least squares solution.

x_rls = np.linalg.inv(np.eye(n) + lam*(L.T@L))@(b)

#Plot the solution
# plt.plot(t, b, 'k', label='Noisy data')
# plt.plot(t, x_true, 'r-', label='True signal')
plt.plot(t, x_rls, "--", label = 'lambda = 10');
plt.xlabel('t')
plt.ylabel('x(t)')
plt.legend();

# Note that as lambda increases, the solution becomes smoother.
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 4
      1 ### ANSWER ###
----> 4 n = len(b)
      5 L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()
      6 lam = 10

NameError: name 'b' is not defined

Effect of \(\lambda\)#

Next, we would like to study how the choice of the regularization parameter \(\lambda\) affects the solution.

❓❓❓ Question ❓❓❓: Write code below that solves for and plots the denoised signal for \(\lambda = [1, 10, 100, 1000]\) to compare with the true signal \(x_{true}\).

## Enter your code below.

Hide code cell content

### ANSWER ###

fig = plt.figure(figsize = (8,8))
lam = np.array([1, 10, 100, 1000])
for j in range(4):
    ax = fig.add_subplot(2, 2, j+1)
    x_rls = np.linalg.inv(np.eye(n) + lam[j]*(L.T@L))@(b)

    ax.plot(t, x, 'grey', label='True signal')
    ax.plot(t, x_rls, 'purple', label='RLS solution')
    ax.set_title('lambda= %3d' %lam[j])
plt.show()
../../../../_images/c9e5ae4367a1a9238ee3ff69130567ad3c3cfbfd10ce00fd6edd1f7bad6c077b.png

Hide code cell content

### ANSWER ###

start_index = 50
end_index = 200

fig = plt.figure(figsize = (8,8))
plt.title("Zoomed in view")
lam = np.array([1, 10, 100, 1000])
for j in range(4):
    ax = fig.add_subplot(2, 2, j+1)
    x_rls = np.linalg.solve(np.eye(n) + lam[j] * L.T @ L, b)
    ax.plot(t[start_index:end_index], x_true[start_index:end_index], 'grey', label='True signal')
    ax.plot(t[start_index:end_index], x_rls[start_index:end_index], 'purple', label='RLS solution')
    ax.set_title('lambda= %3d' %lam[j])
plt.show();
../../../../_images/1ac6f1859aa377efedcc0bc120a37a9026aba9ef5ff3e33ab7a8338d8b891c8b.png

❓❓❓ Question ❓❓❓: What values of \(\lambda\) give you close enough solution to the clean signal? What happens when \(\lambda\) is too large?

Answer the question here.

Solution

\(\lambda=10\) gives relatively clean solution. The result for \(\lambda = 1\) still looks noisy. For larger \(\lambda\) values, the solution starts deviating more from the clean signal.

Extra time#

If you have extra time, try this. Last class we did polynomial regression with ordinary least squares. For that data, add a quadratic regularization term. How does this change the learned function?

© Copyright 2025, The Department of Computational Mathematics, Science and Engineering at Michigan State University.