Lecture 3-2: Least Squares: Part 2

Lecture 3-2: Least Squares: Part 2#

Download the original slides: CMSE382-Lec3_2.pdf

Warning

This is an AI-generated transcript of the lecture slides and may contain errors or inaccuracies. Please refer to the original course materials for authoritative content.

Regularized Least Squares#

Topics Covered#

Regularized least squares
Tikhonov regularization
De-noising

Ordinary Least Squares Approximation#

Definition: The residual sum of squares error (or total sum of square errors) is

\[\text{RSS}=\|A \mathbf{x}_{\text{LS}}-\mathbf{b}\|^2\]

The minimizer of this equation is the least squares estimate

\[\textbf{x}_{\text{LS}} = (\mathbf{A}^\top\mathbf{A})^{-1}\mathbf{A}^\top\mathbf{b}\]

Regularized Least Squares#

Definition: We add a regularization function \(R(\cdot)\) to OLS to obtain the regularized least squares function:

\[\min\limits_{\textbf{x} \in \mathbb{R}^n}\|A\textbf{x} - \mathbf{b}\|^2 + \lambda R(\mathbf{x})\]

\(\lambda > 0\) determines the weight given to the regularization function.
\(R\) is chosen based on prior knowledge or desired behavior
When \(R(\mathbf{x})=0\), we recover ordinary least squares.
When \(R(\mathbf{x})=\|\mathbf{x}\|_2^2 = \sum_{i=1}^n x_i^2\), we have ridge regression.
When \(R(\mathbf{x})=\|\mathbf{x}\|_1\), we have LASSO regression.

Why Add Regularization?#

Why add \(R\)?

Can solve ill-posed problems.
- Underdetermined systems have infinite solutions and OLS fails. Regularized least squares allows constraining the problem.
OLS focuses on reducing the error sum of squares, but can lead to poor fits in-between data points.
- A regularization term can penalize big errors in-between data points for a better fit.
Regularization allows including prior knowledge into the model:
- E.g., allows smoother fits for noisy data (denoising)

Tikhonov (Quadratic) Regularization#

\[\min\limits_{\textbf{x} \in \mathbb{R}^n}\|A\textbf{x} - \mathbf{b}\|^2 + \lambda R(\mathbf{x}) \qquad \Rightarrow \qquad \min\limits_{\textbf{x} \in \mathbb{R}^n}\|A\textbf{x} - \mathbf{b}\|^2 + \lambda \|D \mathbf{x} \|^2\]

Definition: When \(R(\mathbf{x})=\|D \mathbf{x} \|^2\), we have Tikhonov regularization. \(D\) is called the Tikhonov matrix.

We require that the null space of \(D\) must intersect the null space of \(A\) at \(\mathbf{0}\) for a unique solution.
Often \(D\) is a scalar multiple of the identity matrix.

Solving Tikhonov Regularization#

\[\min\limits_{\textbf{x} \in \mathbb{R}^n}\|A\textbf{x} - \mathbf{b}\|^2 + \lambda \|D \mathbf{x} \|^2 = \mathbf{x}^\top (A^\top A+\lambda D^\top D)\mathbf{x} - 2 \mathbf{b}^\top A \mathbf{x} + \|\mathbf{b}\|^2\]

If \(\nabla^2 f = 2 (A^\top A+\lambda D^\top D) \succ 0\), then \(x_{\text{LS}}=(A^\top A + \lambda D^\top D)^{-1}A^\top \mathbf{b}\)

Tikhonov Regularization Choices#

Order	\(R(\mathbf{x})\)	Promotes
zeroth	\(\|L_0 \, \mathbf{x}\|^2\)	small \(\|\mathbf{x}\|\), \(L_0\) is the identity matrix
First	\(\|L_1\, \mathbf{x}\|^2\)	smoothness by minimizing 1st derivative
Second	\(\|L_2 \, \mathbf{x}\|^2\)	smoothness by minimizing 2nd derivative

First-order finite difference approximation:

\[\begin{split}x'|_i \approx \frac{x_{i+1}-x_i}{h}, \qquad L_1 = \begin{bmatrix} -1 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & -1 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 0 & -1 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & -1 & 1 \end{bmatrix}\end{split}\]

Second-order finite difference approximation:

\[\begin{split}x''|_i \approx \frac{x_{i+1}-2x_i+x_{i-1}}{h^2}, \qquad L_2= \begin{bmatrix} -2 & 1 & 0 & 0 & \ldots & 0 & 0 \\ 1 & -2 & 1 & 0 & \ldots & 0 & 0 \\ 0 & 1 & -2 & 1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & 1 & -2 \end{bmatrix}\end{split}\]

Noise#

Noisy data can be written as

\[\mathbf{b}=\textbf{x}+\mathbf{w}\]

where \(\textbf{x}\) is the signal vector and \(\mathbf{w}\) is the noise vector.

Problem Statement: Given \(\textbf{b}\), find a good estimate for \(\textbf{x}\)

Denoising Using Regularized Least Squares#

Idea: Directly incorporate prior knowledge that the solution is smooth.

Small difference between any two subsequent function values
Popular choice: \(R(\mathbf{x})=\sum\limits_{i=1}^{n-1}(x_i-x_{i+1})^2\)

Can write as a Tikhonov matrix: \(R(\mathbf{x})=\sum\limits_{i=1}^{n-1}(x_i-x_{i+1})^2=\|L\, \mathbf{x}\|^2\)

\[\begin{split}L = \begin{bmatrix} 1 & -1 & 0 & 0 & \ldots & 0 & 0 \\ 0 & 1 & -1 & 0 & \ldots & 0 & 0 \\ 0 & 0 & 1 & -1 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & & \vdots & \vdots \\ 0 & 0 & 0 & 0 & \ldots & 1 & -1 \end{bmatrix}\end{split}\]

The regularized least squares problem becomes \(\min\limits_{\textbf{x}} \|\textbf{x}-\textbf{b}\|^2 + \lambda \|\mathbf{L}\textbf{x}\|^2\), with solution \(\textbf{x}_{\text{RLS}}(\lambda) = (\mathbf{I}+\lambda \mathbf{L}^\top \mathbf{L})^{-1}\textbf{b}\)

Lecture 3-2: Least Squares: Part 2

Contents

Lecture 3-2: Least Squares: Part 2#

Regularized Least Squares#

Topics Covered#

Ordinary Least Squares Approximation#

Regularized Least Squares#

Why Add Regularization?#

Tikhonov (Quadratic) Regularization#

Solving Tikhonov Regularization#

Tikhonov Regularization Choices#

Noise#

Denoising Using Regularized Least Squares#