18 Pre-Class Assignment: Least Squares Fit (Regression)
Contents
18 Pre-Class Assignment: Least Squares Fit (Regression)#
Readings for this topic (Recommended in bold)#
Goals for today’s pre-class assignment#
1. Least Squares Fit#
Review Chapters Chapter 13 pg 225-239 of the Boyd textbook.
In this first part of this course, we try to solve the system of linear equations \(Ax=b\) with an \(m\times n\) matrix \(A\) and a column vector \(b\).
There are three possible outcomes: an unique solution, no solution, and infinite many solutions. (Review the material on this part if you are no familiar with when the three types of outcomes happen.)
When \(m<n\), we call the matrix \(A\) underdeterminated, because we can not have an unique solution for it. When \(m>n\), we call the matrix \(A\) overdeterminated, becasue we may not have a solution with high probability.
However, if we still need to find a best \(x\), even when there is no solution or infinite many solutions we use a technique called least squares fit (LSF). Least squares fit find \(x\) such that \(\|Ax-b\|\) is the smallest (i.e. we try to minimize the estimation error).
When there is no solution, we want to find \(x\) such that \(Ax-b\) is small (here, we want \(\|Ax-b\|\) to be small).
If the null space of \(A\) is just \(\{0\}\), we can find an unique \(x\) to obtain the smallest \(\|Ax-b\|\).
If there is a unique solution \(x^*\) for \(Ax=b\), then \(x^*\) is the optimal \(x\) to obtain the smallest \(\|Ax-b\|\), which is 0.
Because the null space of \(A\) is just \(\{0\}\), you can not have infinite many solutions for \(Ax=b\).
If the null space of \(A\) is not just \(\{0\}\), we know that we can always add a nonzero point \(x_0\) in the null space of \(A\) to a best \(x^*\), and \(\|A(x^*+x_0)-b\|=\|Ax^*-b\|\). Therefore, when we have multiple best solutions, we choose to find the \(x\) in the rowspace of \(A\), and this is unique.
QUESTION 1: Let $\(A=\begin{bmatrix}1\\2\end{bmatrix},\quad b=\begin{bmatrix}1.5 \\ 2\end{bmatrix}\)\( Find the best \)x\( such that \)|Ax-b|$ has the smallest value.
Put your answer to the above question here.
QUESTION 2: Compute \((A^\top A)^{-1}A^\top b\).
Put your answer to the above question here.
2. Linear Regression#
Watch the video for using Least Squares to do linear regression.
from IPython.display import YouTubeVideo
YouTubeVideo("Lx6CfgKVIuE",width=640,height=360, cc_load_policy=True)
QUESTION 3: How to tell it is a good fit or a bad one?
Put your answer to the above question here.
3. One-to-one and Inverse transform#
Read Section 4.9 of the textbook if you are not familiar with this part.
Definition: A transformation \(T:U\mapsto V\) is said to be one-to-one if each element in the range is the image of just one element in the domain. That is, for two elements (\(x\) and \(y\)) in \(U\). \(T(x)=T(y)\) happens only when \(x=y\).
Theorem: Let \(T:U\mapsto V\) be a one-to-one linear transformation. If \(\{u_1,\dots,u_n\}\) is linearly independent in \(U\), then \(\{T(u_1),\dots,T(u_n)\}\) is linearly independent in \(V\).
Definition: A linear transformation \(T:U\mapsto V\) is said to be invertible if there exists a transformation \(S:V\mapsto U\), such that $\(S(T(u))=u,\qquad T(S(v))=v,\)\( for any \)v\( in \)V\( and any \)u\( in \)U$.
QUESTION 4: If linear transformation \(T:U\mapsto V\) is invertible, and the dimension of \(U\) is 2, what is the dimension of \(V\)? Why?
Put your answer to the above question here.
4. Inverse of a Matrix#
Recall the four fundamental subspaces of a \(m\times n\) matrix \(A\)
The rowspace and nullspace of \(A\) in \(R^n\)
The columnspace and the nullspace of \(A^\top\) in \(R^m\)
The two-sided inverse gives us the following $\( {A}{A}^{-1}=I={A}^{-1}{A} \)$
For this we need \(r = m = n\), here \(r\) is the rank of the matrix.
For a left-inverse, we have the following
Full column rank, with \(r = n \leq m\) (but possibly more rows)
The nullspace contains just the zero vector (columns are independent)
The rows might not all be independent
We thus have either no or only a single solution to \(Ax=b\).
\(A^\top \) will now also have full row rank
From \((A^\top A)^{-1}A^\top A = I\) follows the fact that \((A^\top A)^{-1}A^\top\) is a left-sided inverse
Note that \((A^\top A)^{-1}A^\top\) is a \(n\times m\) matrix and \(A\) is of size \(m\times n\), theire mulitiplication \((A^\top A)^{-1}A^\top A\) results in a \(n\times n\) identity matrix
The \(A(A^\top A)^{-1}A^\top\) is a \(m\times m\) matrix. BUT \(A(A^\top A)^{-1}A^\top\neq I\) if \(m\neq n\). The matrix \(A(A^\top A)^{-1}A^\top\) is the projection matrix onto the column space of \(A\).
QUESTION 5: What is the projection matrix that projects any vector onto the subspace spanned by \([1,2]^\top\). (What matrix will give the same result as projecting any point onto the vector \([1,2]^\top\).)
Put your answer to the above question here.
QUESTION 6: If \(m=n\), is the left inverse the same as the inverse?
Put your answer to the above question here.
Theorem: For a matrix \(A\) with \(r=n<m\), the columnspace of \(A\) has dimension \(r(=n)\). The linear transfrom \(A: R^n\mapsto R^m\) is one-to-one. In addition, the linear transformation \(A\) from \(R^n\) to the columnspace of \(A\) is one-to-one and onto (it means that for any element in the columnspace of \(A\), we can find \(x\) in \(R^n\) such that it equals \(Ax\).) Then the left inverse of \(A\) is a one-to-one mapping from the columnspace of \(A\) to \(R^n\), and it can be considered as an inverse transform of \(A\).
Written by Dr. Dirk Colbry, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
###STARTFOOTER###