Motivation

One of my goals is to understand more deeply what Neural Networks are doing. Another is to have an easier time understanding and implementing cutting edge academic papers. In order to work toward those goals, I am revisiting the Math behind Neural Networks. This time my goal is to understand intuitively every piece of the material forward and backward - rather than to pass a course on a deadline.

This blog post will be my notes about Lecture 2 from the following course:

Gilbert Strang. 18.06 Linear Algebra. Spring 2010. Massachusetts Institute of Technology: MIT OpenCourseWare, https://ocw.mit.edu. License: Creative Commons BY-NC-SA.

Goal

The goal is to solve N equations with N unknowns, just like it was with lecture 1. In the first Lecture it was about understanding how to solve them intuitively with linear combinations and matrix multiplication. In reality, that is not a scalable choice when you get to higher and higher dimensions. We need to be able to express these concepts in matrices and be more comfortable with that language and operations..

Matrix Form

As a recap, we can express a system of equations by listing the equations. x

$x+2y+z=2$

$3x+8y+z=12$

$4y+z=2$

We can write the same thing in matrix form using the formula $Ax=b$. The matrix A is from the system of equation above.

$A = \begin{bmatrix}1&2&1\\3&8&1\\0&4&1\end{bmatrix}$

Elimination

The general idea in elimination is to isolate variables so we can solve the equation. For example, if we have 0x + 0y + 1z = 10, it is very easy to solve for z. We can then plug it into another equation in the series that has z and 1 other unknown to get another value, and so on and so forth.

$E_{21}$

We are going to start by eliminating the first variable in the second row. Row 2, column 1. This would leave the second row with only 2 unknowns.

The way we do this is we subtract row 1 from row 2. If we substract 3 * 1 from 3, we get 0, so 3 will be the multiplier.

$\begin{bmatrix}1&0&0\\-3&1&0\\0&0&1\end{bmatrix} \begin{bmatrix}1&2&1\\3&8&1\\0&4&1\end{bmatrix} =\begin{bmatrix}1&2&1\\0&2&-2\\0&4&1\end{bmatrix}$

$E_{31}$

We then move on to row 3, column 1. Lucky for us it's already 0 so we can skip this step

$E_{32}$

We now move on to row 3, column 2. If we can get this to 0, then we have 1 unknown in that column. That's what we want as that's easily solvable.

$\begin{bmatrix}1&0&0\\0&1&0\\0&-2&1\end{bmatrix} \begin{bmatrix}1&2&1\\0&2&-2\\0&4&1\end{bmatrix} =\begin{bmatrix}1&2&1\\0&2&-2\\0&0&5\end{bmatrix}$

How can this fail?

A 0 cannot be in one of the pivot spots. The pivot positions are the diagonals. If that happens, elimination will fail. If you think about it, a 0 being in the pivot spot means that the left side of the equation is 0. Either the right side is also 0 and it gives you no information, or the right side is not 0 and there is no solution (ie 2 = 0 is False).

Of course, there are some tricks that can be added to minimize failures. The most common of these are row exchanges. By swapping the order of the rows, you can potentially solve an equation that would have failed in the original order.

For example, if I were to exchange the 2nd and 3rd rows I would multiply by the following matrix.

$\begin{bmatrix}1&0&0\\0&0&1\\0&1&0\end{bmatrix}$

Back Substitution

Let's recap the steps that were taken. We started with A and through elimination in 2 steps we ended with a new matrix.

$A => \begin{bmatrix}1&2&1\\3&8&1\\0&4&1\end{bmatrix} => \begin{bmatrix}1&2&1\\0&2&-2\\0&4&1\end{bmatrix} => \begin{bmatrix}1&2&1\\0&2&-2\\0&0&5\end{bmatrix} => U$

Finish the Equation

You may have realized these matrices represented the left side of the equation. You may have wondered how we can modify the left side of the equation while leaving the right side alone? The answer is that we cannot. What is done on the left side must be applied to the right side. Let's do that now.

We have done 2 transformations.

  1. The first transformation was subtracting 3 of the first row from the second.
  2. The second transformation was subracting 2 of the second row from the third.

Let's do that to the the right side

b => $\begin{bmatrix}2\\12\\2\end{bmatrix} => \begin{bmatrix}2\\6\\2\end{bmatrix} => \begin{bmatrix}2\\6\\-10\end{bmatrix} => c$

Let's put the left and right side of the equations together to see what it looks like.

$Ux = C$

$\begin{bmatrix}1&2&1\\0&2&-2\\0&0&5\end{bmatrix}\begin{bmatrix}x\\y\\z\end{bmatrix}=\begin{bmatrix}2\\6\\-10\end{bmatrix}$

Final Solution

Intuitive Solution

Great! Let's translate the $Ux=C$ matrix above back to our systems of equations view just to see if we can see how this transformation helps us.

$x + 2y + z = 2$

$2y - 2z = 6$

$5z = -10$

When we look at it here, we can solve them with some simple algebra starting from the bottom equation.

$5z = -10 \;\;\;\mathbf{=>}\;\;\; z = -10/5 \;\;\;\mathbf{=>}\;\;\; z = -2$

$2y - 2z = 6 \;\;\;\mathbf{=>}\;\;\; 2y - 2(-2) = 6 \;\;\;\mathbf{=>}\;\;\; 2y + 4 = 6 \;\;\;\mathbf{=>}\;\;\; 2y = 2 \;\;\;\mathbf{=>}\;\;\; y = 1$

$x+2y+z=2 \;\;\;\mathbf{=>}\;\;\;x+2-2=2$$x+2y+z=2 \;\;\;\mathbf{=>}\;\;\;x=2$

Matrix Solution

So first we should ask if we have an intuitive solution, why bother with doing the whole thing in matrix format? Isn't it the same thing?

And yes, we will be doing the same thing. The reason is scalability and ability to transfer to N equation and N unknowns. There's a limit to what can be done by hand, and matrix form allows for easier scalability.

Recap

We did several steps intuitively. Let's recap our elimination steps from above

$E_{21}$ step $\begin{bmatrix}1&0&0\\-3&1&0\\0&0&1\end{bmatrix} \begin{bmatrix}1&2&1\\3&8&1\\0&4&1\end{bmatrix} =\begin{bmatrix}1&2&1\\0&2&-2\\0&4&1\end{bmatrix}$

$E_{32}$ step $\begin{bmatrix}1&0&0\\0&1&0\\0&-2&1\end{bmatrix} \begin{bmatrix}1&2&1\\0&2&-2\\0&4&1\end{bmatrix} =\begin{bmatrix}1&2&1\\0&2&-2\\0&0&5\end{bmatrix}$

Simplify into 1 step

Great! Now let's just combine these so we only have 1 equation.

$E_{21}E_{32}A=U$ $\begin{bmatrix}1&0&0\\0&1&0\\0&-2&1\end{bmatrix} \begin{bmatrix}1&0&0\\-3&1&0\\0&0&1\end{bmatrix} \begin{bmatrix}1&2&1\\3&8&1\\0&4&1\end{bmatrix}= \begin{bmatrix}1&2&1\\0&2&-2\\0&0&5\end{bmatrix} $

Now let's simplify this by multiplying and combining my $E_{21}$ and $E_{31}$ matrices together. We will call the result $E$

$E_{21}E_{32}=E$ $\begin{bmatrix}1&0&0\\0&1&0\\0&-2&1\end{bmatrix} \begin{bmatrix}1&0&0\\-3&1&0\\0&0&1\end{bmatrix} = \begin{bmatrix}1&0&0\\-3&1&0\\6&-2&1\end{bmatrix}$

Great! Now we can use this to simplify our formula.

$EA=U$ $\begin{bmatrix}1&0&0\\-3&1&0\\6&-2&1\end{bmatrix} \begin{bmatrix}1&2&1\\3&8&1\\0&4&1\end{bmatrix}= \begin{bmatrix}1&2&1\\0&2&-2\\0&0&5\end{bmatrix}$