The most important thing about reading this blog post is to not get scared off by the formulas. The post may look like all the crap you normally skim over, so you may be tempted to skim over this one. Don’t! None of this is hard. Just read the post top to bottom, and I promise you every individual step and the whole thing put together will make sense.
Highschool math
In hight school your math teacher may have started a treatment of linear algebra by making you solve a system of linear equations, at which point you very sensibly zoned out because you knew you’d go on to program computers and never have to solve a system of linear equations again (don’t worry, I won’t be talking much about them here).
[
begin{eqnarray}
0x_1 + 1x_2 = 0\
-1x_1 – 0x_2 = 0
end{eqnarray}
tag{1}
]
You also may have learned that for some reason you can take the coefficients and put them in a 2D array like this: (A=begin{bmatrix} 0 & 1 \ -1 & 0 \ end{bmatrix}). You’ve now defined a matrix (A), and you can re-express the system of linear equations above as follows:
[
newcommandqvec[1]{begin{bmatrix}#1end{bmatrix}}
Aqvec{x_1\x_2}=0
tag{2}
]
If you’re really hellbent on cleaning things up, you can express the vector (qvec{x_1, x_2}) as (x=qvec{x_1 \ x_2}), which now gives you a really clean equation:
[
Ax=0
tag{3}
]
Equations 1 – 3 are just different ways to say the exact same thing. In different situations you may prefer one notation over another, but there is no material difference between them. They are all equivalent.
Matrix-vector multiplication
I’ll talk about what matrix-vector multiplication means in a moment. For now let’s look at how the operation is defined. The precise definition of matrix-vector multiplication flows out of the notation above, so you never again have to look it up on wikipedia. If you need to multiply a matrix by a vector, say (begin{bmatrix} 0 & 1 \ -1 & 0 \ end{bmatrix} qvec{1 \ 2}), just recall that this is equivalent to the left side of the system of equations above. Before, we took the coefficients in the linear equation system and factored them out into a 2D array. Now we reverse the procedure– take our 2D array and factor it back in as coefficients:
[
qvec{
0*1 + 1*2\
-1*1 – 0*2
}=qvec{2 \ -1}
]
If you forget how matrix-vector multiplication works, just remember that its definition flows out of the notation. Convert the matrix-vector multiplication notation back into the linear equation system notation again, and you get the matrix-vector multiplication formula.
You may remember from high school that there is an operation defined on vectors called the dot product. The dot product is the sum of pairwise multiplication of elements of two vectors. E.g. (qvec{0, 1}cdotqvec{1, 2}=0*1 + 1*2=2). The dot product of two vectors is an operation that represents the degree to which the two vectors point in the same direction.
That’s simple enough. But here is something curious. Another way to think of matrix-vector multiplication is by treating each row of a matrix as its own vector, and computing the dot products of these row vectors with the vector we’re multiplying by (in this case (qvec{1, 2})). How on earth does that work?! What does vector similarity have to do with linear equations, or with matrix-vector multiplication? I cannot answer this question quite yet. But we’ll eventually build up to an answer in future posts.
Matrices as functions
Now let’s look at what matrix-vector multiplication means. This blew my mind when I first learned about it. You can think of a matrix as a function, and you can think of multiplying a matrix by a vector as applying that function to the vector. So when you see (Ax), autocomplete it in your head to “calling some function (A) with argument (x)”.
This is actually not so strange– you can think of many structures as functions. For example, you can think of a number (3) as a function. When you multiply it by things, it makes them three times bigger. Thinking about matrices this way happens to be very convenient.
The fact that (Ax=0) denotes both the linear system in equation 1, and a call to a function (A) with argument (x) (getting the zero vector in return) leads to a curious insight about the relationship between high school math and programming.
In high school you’re given equations and asked to find their roots. We already established that a system of equations is equivalent to matrix-vector multiplication, which can be thought of as function application. And so, in high school you’re given a function (A) along with its output, and asked to find the inputs that match that output. Programming is usually the exact oppos