Worksheet Chapter 3, Part 2

Worksheet Chapter 3, Part 2#

✅ Put your name here.

✅ Put your group number here.

Topics#

This section focuses on Ch 3 from Beck

Regularized Least Squares
Tikhonov (Quadratic) Regularization
Denoising using regularized least squares

%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
import scipy as sp
from scipy.sparse import diags 

Regularized least squares#

Last time we solved problems of the form

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]

Today’s change is that we are going to add a regularization term, so we are going to solve the regularized least squares (RLS) problem

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2 + \lambda R(\mathbf{x}).\]

\(\lambda\) is the regularization paramter. If it is 0, we are just doing ordinary least squares again. The larger \(\lambda\) is, the more weight is given to the regularization function.
\(R(\mathbf{x})\) is some chosen function of the input vector \(\mathbf{x}\).

We are generally going to work with quadratic regularization (also called Tikhonov regularization) where \(R(\mathbf{x}) = \|D\mathbf{x}\|^2\). If \(A^\top A + \lambda D^\top D \succ 0\), then the RLS solution is given by

\[ x_{RLS} = (A^\top A + \lambda D^\top D)^{-1} A^\top b. \]

❓❓❓ Question ❓❓❓: In a moment we will be using the regularization term

\[\min_\mathbf{x} \|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2+\lambda\|\mathbf{x}\|^2.\]

For this setting, what is \(\mathbf{D}\)? What does \(x_{RLS}\) simplify to in this case?

✎ Answer the question here.

Example#

Let

\[\begin{split}\mathbf{A}=\begin{bmatrix} 2 & 3 & 4\\3 & 5 & 7\\ 4 & 7 & 10\end{bmatrix}+\begin{bmatrix} 10^{-3} & 0 & 0\\0 & 10^{-3} & 0\\ 0 & 0 & 10^{-3}\end{bmatrix}\qquad \mathbf{x}_{true}=\begin{bmatrix}1 \\ 2\\ 3\end{bmatrix}\end{split}\]

and \(\mathbf{b}\) is a noisy measurement of \(\mathbf{A}\mathbf{x}_{true}\), that is,

\[\mathbf{b}=\mathbf{A}\mathbf{x}_{true}+\mathbf{e},\]

where \(\mathbf{e}\) is randomly generated from a standard normal distributtion with zero mean and standard deviation of \(0.01\).

We don’t actually know \(\mathbf{e}\), so we are assuming someone handed us \(\mathbf{A}\) and \(\mathbf{b}\) and our goal is to find \(\mathbf{x}\) to minimize

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]

A0 = np.array([[2, 3, 4], [3, 5, 7], [4, 7, 10]]) 
A = A0 + 1e-3 * np.eye(3)

print(f"A = \n{A}")
x_true = np.array([1, 2, 3]).T
b = A@x_true + 0.01 * np.random.randn(3)
print("\n")
print(f"b = {b}")

❓❓❓ Question ❓❓❓:

Write code to find the least squares solution \(x_{LS}\) for

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2.\]

Print your answer. Is it close to true value of x? If not, why do you think so?

# Enter your code below.

✎ Answer the question here.

Our found \(\mathbf{x}\) is nowhere close to \(\mathbf{x}_{true}\) even though \(\mathbf{A}\) is full rank. It is because the condition number of \(\mathbf{A}\) is very large. The condition number is the ratio of the largest to the smallest eigenvalues of \(\mathbf{A}\).

❓❓❓ Question ❓❓❓:

Write code below to find the eigenvalues of \(A\)
Compute the ratio of the largest to the smallest to check that this ratio, which is the condition number, is indeed large.

# Enter code below.

To control the output of the least squares solution, we instead optimize a regularized version, which minimizes

\[\min_\mathbf{x} \|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2+\lambda\|\mathbf{x}\|^2.\]

❓❓❓ Question ❓❓❓: Use what you figured out in the first section to solve for \(\mathbf{x}_{RLS}\) in the case \(\lambda = 0.1\) by multiplying the available matrices. Is your solution close to \(x_{true}\) now?

# Your code here

Denoising#

The denoising problem aims to remove noise or corruptions in a signal. Say you are given a time series \(\mathbf{b} \in \mathbb{R}^n\), and you think it came from a noisy sample of some real data \(\mathbf{x}_{true} \in \mathbb{R}^n\) plus some noise \(\mathbf{w} \in \mathbb{R}^n\). That is

\[ \mathbf{b} = \mathbf{x}_{true} + \mathbf{w}. \]

Your job is to find a good guess for \(\mathbf{x}\) but you don’t have much to go on since you know nothing about \(\mathbf{w}\).

Let’s set up an example.

n = 300
t = np.linspace(0, 4, n)
x_true = np.sin(t) + t*(np.cos(t)**2)
b = x_true + 0.05 * np.random.randn(n)

plt.plot(t, b, 'k', label='Noisy data')
plt.plot(t, x_true, 'r-', label='True signal')
plt.xlabel('t')
plt.ylabel('x(t)')
plt.legend();

The least squares problem for \(\mathbf{x} \approx \mathbf{b}\), where we are solving \(A\mathbf{x} = \mathbf{b}\) for just \(A\) equal to the identity matrix, is

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{x}-\mathbf{b}\|^2.\]

However, obviously the minimizer of the problem is \(\mathbf{x} = \mathbf{b}\), so there’s really no point in studying that at all. Instead, we can use the solution of the regularization version

\[\min_{\mathbf{x}\in\mathbb{R}^{n}}\|\mathbf{A}\mathbf{x}-\mathbf{b}\|^2 + \lambda R(\mathbf{x})\]

to get more useful information.

Choosing the regularization term#

We know that we want the signal to be smooth, but what does that mean? For us, we’re going to try to find \(\mathbb{x}\) so that adjacent values \(x_i\) and \(x_{i+1}\) are close together, meaning \(|x_i-x_{i+1}|\) is small. This can be required by using a regularization term like this:

\[R(\mathbb{x}) = \sum_{i=1}^{n-1}(x_i-x_{i+1})^2.\]

But that doesn’t look like our \(\|D\mathbf{x}\|^2\) term yet so we need to fix it.

In this case, we choose to use the “first order Tikhanov regularization matrix”

\[\begin{split}\mathbf{L}_{(n-1)\times n}= \begin{bmatrix}1&-1&0&\cdots &0&0\\ 0&1&-1&\cdots &0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots &1&-1\end{bmatrix}.\end{split}\]

❓❓❓ Question ❓❓❓:

Write down by hand the matrix \(L\) for \(n=4\). Note you should have an \((n-1) \times n = 3 \times 4\) matrix.
Once you have written it down, uncomment the code below to check that you have the same matrix.

# Here's one way to get the matrix L

n = 4
L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()

L

❓❓❓ Question ❓❓❓:

Now, again by hand, multiply \(\mathbf{L}\mathbf{x}\) for \(n=4\) where you can write

\[\begin{split} \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} \end{split}\]

The result above should be a vector. What is \(\|L\mathbf{x}\|^2\)?

✎ Space for notes here.

Back to the example#

Ok, so we were working with this data drawn below.

plt.plot(t, b, 'k', label='Noisy data')
plt.plot(t, x_true, 'r-', label='True signal')
plt.xlabel('t')
plt.ylabel('x(t)')
plt.legend();

❓❓❓ Question ❓❓❓: I’ve copied the code to make \(L\) below.

Add code to find the solution to the RLS problem when \(\lambda = 10\).
Plot your solution. What do you see?

### Complete the code below.

n = len(b)
L = diags([1,-1], offsets = [0,1], shape=(n-1,n)).toarray()
    
lam = 10

#Enter code below to find regularized least squares solution.

#Plot the solution

Effect of \(\lambda\)#

Next, we would like to study how the choice of the regularization parameter \(\lambda\) affects the solution.

❓❓❓ Question ❓❓❓: Write code below that solves for and plots the denoised signal for \(\lambda = [1, 10, 100, 1000]\) to compare with the true signal \(x_{true}\).

## Enter your code below.

❓❓❓ Question ❓❓❓: What values of \(\lambda\) give you close enough solution to the clean signal? What happens when \(\lambda\) is too large?

✎ Answer the question here.

Extra time#

If you have extra time, try this. Last class we did polynomial regression with ordinary least squares. For that data, add a quadratic regularization term. How does this change the learned function?

Worksheet Chapter 3, Part 2

Contents

Worksheet Chapter 3, Part 2#

Topics#

Regularized least squares#

Example#

Denoising#

Choosing the regularization term#

Back to the example#

Effect of \(\lambda\)#

Extra time#