For me, the most amusing feature of LA is how quickly results and tools escalate from trivial and obvious to surprising and hard. LA is often introduced as slick notation to express systems of linear equations. Gaussian elimination and matrix inverses (when they exist) are natural evolutions of this, but not what I consider surprising.
Matrix multiplication is an elementary next step, but it is not at all clear why one would multiply matrices. My first real mathematical surprise (I was in jr high at the time) is a result of matrix multiplication: there is a matrix A of real values that satisfies A @ A = -I. (The matrix is [[0,1],[-1,0]]). I was aware of complex numbers, but until that moment, I didn't know you could get "minus identity" without first artificially introducing the idea of "square root of -1". But this shows one can construct "square root of -1" without introducing anything like that.
When one gets to determinants, characteristic polynomials, eigenstructure, fixed subspaces, projection operators, SVD, orthogonal transformations, and so-on, the work really begins. That determinants are multiplicative is really wonderful, but not at all obvious. SVD in full generality was only shown in 1936 and modern algorithms emerged in the 1950's to compute it [1].
Finally, check out the LDU decomposition of the Walsh matrix [2].
[1] https://en.wikipedia.org/wiki/Singular_value_decomposition#H...
I disagree. Matrix multiplication is the composition of linear maps. AB is the transformation you get by applying B, then applying A to the result of that. Furthermore this provides a great intuition for why matrix multiplication is not, in general, commutative. If your professor didn't make this clear when you learned the definition of matrix multiplication, then they didn't do a great job.
>That determinants are multiplicative is really wonderful, but not at all obvious
Again, I think it is kind of obvious. The determinant of a map measures the degree to which it alters the area/volume/measure of the unit cube. So if A doubles it and B triples it, then BA should scale it by a factor of 6, since matrix multiplication is the composition of maps, and you're tripling the size of something that was already doubled.
This is why in a first course on linear algebra, it's more important to get the intuition and logical structure down than learn advanced matrix decompositions. Now your complex number example is a little less obvious. I actually think using the 2x2 matrix representation of complex numbers is the best way to introduce them because it sidesteps the part where you take roots of negative numbers, which is the most conceptually problematic part for many people and usually requires some kind of handwave.
This is an insight due to Cayley (see @ogogmad's comment, [1], and [2]).
> If your professor...
My professor provided historical context.
[1] https://abel.math.harvard.edu/~knill/history/matrix/index.ht...
[2] https://ia802808.us.archive.org/0/items/philtrans05474612/05...
Note: this quote originates from some early publications by amateur mathematician Randy Newman.
Arbitrary functions which take in vectors and output vectors can be very complex and thus difficult to reason about. A useful simplifying assumption is that of linearity: that f applied to a linear combination is just a linear combination of f applied to each piece of the combination separately. Linear algebra, broadly speaking, is the study of functions of this kind and the properties that emerge from making the linearity assumption.
It turns out that, if we assume a function f is linear, all of the information about that function is contained in what it does to a set of basis vectors. We can in essence "encode" the function by a table of numbers (a matrix), where the kth column contains the result of f applied to the kth basis vector. In this way, given a basis, any linear transformation f has a matrix A which compactly represents it.
Since f is linear, to compute f(v) I could write v in my chosen basis then apply f to each basis vector and recombine. Alternatively, I could write the matrix A representing f in that basis, and then multiply Av. The two are equivalent: that is, Av = f(v). And so matrix-vector multiplication is "just" evaluating the function f.
Matrix columns, you mean. I think it is Strang who is a staunch proponent of teaching this to students early on. It is indeed helpful to realize that the matrix can be seen as a collection of the images of the basis vectors under the linear transformation the matrix represents.
I think this becomes less surprising when phrased as: the deepest knowledge about linear algebra that we have so far is that gained most recently—at which point it sounds almost tautological. There's no reason to think that we've plumbed all the mysteries of linear algebra, no matter how much more we know (especially about the computational side of it) than we used to.
(Having said that, the article struck me as fairly vacuous.)
I can imagine this being true of a field that is no longer studied, although I can't think of any good example—the only unsatisfactory one that occurred to me is penmanship, but (a) that's not exactly a scientific discipline and (b) there are probably still people out there making strides in penmanship—but I find it hard to believe of any living discipline. In fact, I would argue that, if it is true of a discipline that its deepest mysteries were discovered in the past, then that discipline is effectively dead.
That article is not at all an exposition of what the Philosophy of Linear Algebra is.
>While nowadays most people learn linear algebra through the perspective of vectors and transformations, the matrix and determinant are much older than the ideas of vectors in a multi-dimensional space.
That is not true. Anyway, what is the article about? According to the title it's about linear algebra. Then, the basic objects are vectors, and the interesting topic of discussion are maps. Some maps can be represented by matrices. That's it.
Yet, the article begins by describing matrices. Is it like those fiction books who randomize the order of the chapters to make them more interesting?
Or is the article supposed to be about matrices in general, not about linear algebra? (Niche) uses of matrices outside of linear algebra do exist.
>An upper triangular matrix conveys a whole lot more than just the fact that the numbers form a triangle; it tells you that the matrix is describing a nonsingular system that is solvable.
That is not true in general.
If the matrix is an upper triangular matrix AND all the coefficients in the diagonal are non-zero, THEN it's regular (so invertible, so non-singular).
But philosophically, WHY does it do that?
The answer is because you are basically trying to invert the matrix but you notice half-way through that you can already use the upper triangular matrix to get the result with non-matrix means, so you just stop there instead (it's a lot less work). It can be shown that the kind of manipulations you do to get from the triangular matrix to the inverse matrix are all conserving the determinant, so you're in the clear.
That is slightly interesting from a philosophical standpoint.
>The diagonal matrix describes a system with elements that only operate on themselves,
It just means the elements don't mix the different axes of the multiplicand in the result. Coordinate along axis one in the result = coordinate along axis one multiplied by diagonal element one. And so on, for two, three.
>symmetric matrix tells the story of a system that is bi-directional
Argh. It just means that the eigenvectors are all orthogonal to each other and the eigenvalues are all real. It's nice to have these properties (for example because it's intuitive to have orthogonal coordinate axes), so it's often used for calculations.
Also, eigenvectors and eigenvalues are interesting because they give you a better basis for your calculation (literally).
If you choose your coordinate axes wrong, it will tell you the "better" ones (the one intrinsic to the problem--so it's much easier to calculate things there).
To summarize, to save others the work: that article is not about the philosophy of linear algebra. It's just showing a small example with matrices and not explaining any WHYs in the process. Where it does explain things, it's wrong multiple times.
----
Maybe a small real Philosophy of Linear Algebra here:
The real Philosophy of Linear Algebra is that there are a lot of systems that are linear, therefore it's nice to have a complete mathematical theory for systems of linear equations. That's it. It's not magical, it's not general enough (cough non-linear systems), and (looked at it right) it's simple.
Linear algebra's objects of discussion are vectors and maps.
What vectors are everyone learned in school. Suffice to say vectors are any objects which can be added together, multiplied by a scalar, and then form a commutative ring with respect to addition; and multiplication and addition distribute over in the usual way (like for real numbers). Notably, there's no division! This is not a complete definition, but you get the idea.
(It is also useful, but not necessary, to introduce a so-called "inner" product. Inner because the result stays inside the vector space. Nevermind!)
In any case, you can then have equations with vectors in them. Like 5 v⃗ = w⃗, where v⃗, w⃗ are vectors.
This works like any other system of equations: there are not enough equations for those two unknowns v⃗ and w⃗.
So what about this system of equations: { 5 v⃗ = w⃗, 5 v⃗ = w⃗ } ? That's silly you say?
Ok then, what about { 5 v⃗ = w⃗, 10 v⃗ = 2 w⃗ } ? Still doesn't work?
So what's the rule here? Turns out it has to do with those upper triangular matrices.
A system that has no interesting solution (that is not 0⃗):
5 v⃗ = w⃗
5 v⃗ = 2 w⃗
w⃗ = 2 w⃗
0⃗ = 2 w⃗ - w⃗
0⃗ = w⃗
Booring.Anyway, let's introduce functions on vectors, so-called "maps" (here, f and g are maps).
f(v⃗) := 5 v⃗
g(v⃗) := (5/2) v⃗
f(v⃗) = w⃗
g(v⃗) = w⃗
THAT is the basic philosophy of linear algebra.Now for some more practical stuff:
It's annoying to have those abstract v⃗ and w⃗ be around. In a lot of cases, you are only interested in coordinates with respect to some basis (the basis is a set of axes that you choose--usually your axes of measurement. That is the axes your measurement devices measure along). It can be shown that the number of basis vectors you need for that (to represent any abstract vector!) is equal to the dimension of the vector space, usually 3.
Let the basis be {b⃗_1, b⃗_2, b⃗_3}. These basis vectors obviously must be independent, so b⃗_1 := b⃗_2 is out because that's cheating. b⃗_1 := 2 b⃗_2 ? Out. b⃗_1 := 2 b⃗_2 + 3 b⃗_3 ? Stop it already.
Now you can find numbers q, r, s such that v⃗ = q b⃗_1 + r b⃗_2 + s b⃗_3.
If you want to be more conserative with your letter consumption, better is to call q and r and s: v^1, v^2, v^3 instead (it's just a notation, and the ^1 and ^2 and ^3 are part of the name for now).
So
v⃗ = v^1 b⃗_1 + v^2 b⃗_2 + v^3 b⃗_3.
w⃗ = w^1 b⃗_1 + w^2 b⃗_2 + w^3 b⃗_3.
Or in non-annoying notation v⃗ = ∑_i v^i b⃗_i
w⃗ = ∑_i w^i b⃗_i
Or in even less annoying notation, let the sum be implied when an index repeats in a term v⃗ = v^i b⃗_i
w⃗ = w^i b⃗_i
Ahhh. Much better.If the f is a LINEAR map, then you can do this (by definition, which you have to look up):
f(∑_i v^i b⃗_i) = ∑_i v^i f(b⃗_i)
So in order to specify the linear map f completely, you only need {_i f(b⃗_i) }, i.e. how f acts on the individual basis vectors each. That's it.So for f and g in non-annoying notation:
f(v^i b⃗_i) = v^i f(b⃗_i)
g(v^i b⃗_i) = v^i g(b⃗_i)
Also, f(v⃗) = w⃗ = w^i b⃗_i
g(v⃗) = w⃗ = w^i b⃗_i
So f(v^i b⃗_i) = v^i f(b⃗_i) = w^i b⃗_i
g(v^i b⃗_i) = v^i g(b⃗_i) = w^i b⃗_i
So v^i f(b⃗_i) = w^i b⃗_i
v^i g(b⃗_i) = w^i b⃗_i
And NOW we can go to matrices.Those b⃗_i are annoying, and the f(b⃗_i) are even more annoying.
So let's get rid of them.
Let c⃗_i := f(b⃗_i). Let d⃗_i := g(b⃗_i).
Now we'd like all those abstract basis vectors to disappear. But c⃗_i and b⃗_i are different bases (if they are bases at all--neeevermind for now :P), we'd like these to be specified only with the b⃗_i. Let c⃗_i = c^j_i b⃗_j
v^i c⃗_i = w^i b⃗_i
v^i c^j_i b⃗_j = w^i b⃗_i
v^i c^j_i b⃗_j = w^i δ^j_i b⃗_j
c^j_i v^i = w^j
c := (_i (_j c^j_i)) That's the first matrix!
v := (_i v^i) That's a column "vector"--a matrix
w := (_i w^i) That's a column "vector"--a matrix
Note: δ^j_i := {1 if i=j, otherwise 0} c v = w
Likewise for the g, with the d⃗_i killed and replaced by the b⃗_i.And now we are at the philosophically completely uninteresting result. Good job.
What do you have in mind here?
>Computers certainly helped accelerate the development of linear algebra and vice-versa.
Computers have helped with matrix analysis given floating point constraints inherit to computers. But linear algebra has been in a pretty solid place as theory since Halmos published "Finite Dimensional Vector Spaces" in 1942.
~ Irving Kaplansky
It's long joked that much pure math research is resolving really hard problems down to a calculus or linear algebra computation.
When I first learned differential geometry, I was enamored with all the stuff we could do without coordinates. Then I learned Riemannian geometry and felt like I had been duped.
I digress…
>> linear algebra doesn’t exist simply because humanity managed to concoct a system that coincidentally finds usage in many seemingly unrelated subjects
I’m not a stranger to this field, but this article doesn’t make this case at all, and I’m not sure it can be made. Linear algebra is fundamentally a set of notation rules that people built upon and found ways to represent their problems with. It’s ubiquity in many fields is no more surprising than the observation that many different genre of literature are written in English, and somehow nobody is confused enough to think that flight instruction manuals are the domain of English rather than the range of it. There is no practical limit to the number of academic papers one can write which awkwardly contort a simple intuitive computation into an indecipherable representation in linear algebra. If you want the concept of linear algebra to consume all linearity that exists in the world, then fine, whatever, but then the notation sucks. You may call a continuous Fourier transform a linear operation, but try writing down the values of that matrix.
No; there’s nothing fundamentally tying the inner-workings of the world to the mechanics of linear algebra, but has been considerable effort to adapt some narrowly-chosen set of mechanics to represent the world, and some subset of those things work out well, so that’s what we focus on. But this is just path dependency. A different set of rules for writing linear relationships could have worked out better, or worse, but it would have to get there regardless. I expect that the current system will disappear soon as the world shifts to talking about computational tensors, and more flexible rules about how to apply operations. Who says you can’t multiply a row by a row? That’s stupid; we can and should do better. Is the determinant magical, or is is just a an arbitrary series of shift-product-sum which is far more applicable in the general sense? Do we really need a whole matrix to describe a permutation?
Linearity may be considered well-described when there is no longer any need for a separate course to teach the notation and tricks behind it. When I was in grad school, one of my favorite books was Gene Golub’s “Matrix Computations”. It really is an impressive book from an incredible mind, like a book on chess positions written by Bobby Fisher. Regardless, it was hard for me to go through it without wondering why such a thing exists. Much like chess, the human mind can become enamored with any simple set of rules that can be extended to sufficient complexity. Faced with a real problem, sometimes it helps me to get to a solution, and sometimes it distracts me from the solution. I can never be sure which is more common.
> But this book was where I first learned about tensor products, and why the matrix elements go the way they do and not the other way (Halmos is very careful on this point).
I haven't read the book, but it sounds like the answer may be there.
My impression/speculation is that prefixing functions comes from the historical process turning prose into notation. You start with “sine of a” and that eventually becomes “sin a”, etc. Later the idea of a generic f(x) was from the early 18th century.
The spatial organization of written matrix notation is directly lifted from the way people wrote systems of simultaneous linear equations. Just take the variables out and put all of the numbers in a grid.
The subject is poorly taught imho. It's used for almost everything and yet most instruction focus on the matrix and vector operations without tying it to other things.
It's not as though there are no YouTube millionaires, after all.
Determinants, yes. Matrices, no. Matrices were originally invented by Arthur Cayley as a formalism for describing linear transformations.
Mathematicians making historical claims ought to cite their sources. Mathematicians should also define what their terms mean, otherwise it's not clear what they're saying. I'm defining matrices as a certain number system which supports operations called "plus" and "times" (for the pedants, a preadditive category).
[edit]
Secondary sources: If we look here [1], matrices were invented in parallel by Sylvester and Cayley. Cayley invented the notion of a linear transformation first (1845), and in parallel Sylvester developed a notion of "matrix" to study determinants (1850), and then Cayley finally combined the two ideas to form the modern matrix algebra (1858). Therefore the claim that matrices are older than their modern perspective in terms of vectors and transformations isn't true.