Yet, I have witnessed many young mathematics students that could not write a concise, self-contained proof, nor understand its value. I certainly was one of those, and this advice helped me. For these people, it is helpful to learn how to organize your thoughts in an over the top, nearly bourbakist, formal way. Also, the correctness of proofs is much easier to check this way, and any incorrect or illogical stuff sticks out immediately. Then, once you have written your stuff in that dry style, you can add some glimpses of discourse that become much more valuable than if you had started with some informal hand-waving. This is pretty much the writing style of Arnold: his proofs are breath-takingly concise and elegant, and there is an insigthful discourse around them. The proofs without the discourse stand on their own, but the discourse alone would be worthless.
I like your analogy of climbing the cliff and pulling the ladder. But there is another cliff that goes even higher and you needed the ladder for that one! Of course you need to help others to build their own ladders.
> Somewhere not too far away, a student in the class you're teaching cries.
Maybe, maybe not. In any case, I agree that you cannot teach math in a purely bourbakist style. I prefer a "visual" style like that inspired by the books of Arnold, Strang, Needham, and I am the sole teacher in my lab that seriously uses the word "amplitwist" to refer to the complex derivative :)
Part 1:
==============
I feel like there is a survivorship bias in your assessment: the students who would benefit from learning to construct a rigid argument without holes are already the ones who can hand-wave their way to the result, i.e. they already have the motivation and intuition to get there.
On the other hand, I feel like over 9 out of 10 people who could and would enjoy advanced mathematics get turned off by unnecessary formalisms some time in highs school (Lockhart's Lament describes vividly how Euclidean geometry there is massacred - and that's both the first, and often the last time they see proofs!).
To add insult to injury, rigid reasoning is not introduced to students unless they are math majors, and even then, it's when they take a Real Analysis course. The way we teach intro Calculus and Linear Algebra classes should be classified as a Geneva Convention violation and a crime against humanity: all the intuition you can get from a Bourbakism with none of the rigor.
It doesn't need to be this way. Even rigor can be fun. Just like everything else, rigor is a part of mathematics that we do for a reason. Once you approach the very concept of rigor the same way you approach, say, derivatives, you will see that there is no need to impose it on people.
How many times have you seen a "proof" that 0 = 1, usually derived from a coy division by zero, or abusing square roots, etc? People repost those on Facebook as memes. They are fun!
But also, they are the motivating example for rigor. After all, that's the entire reason we need it in the first place: to avoid arriving at incorrect conclusions.
Without having a vast assortment of examples of arriving at incorrect conclusions, rigor is both unmotivated and unnecessary. Newton and Leibniz didn't need rigor when they invented Calculus, after all; hand-wavy infinitesimals did just fine. Why should the students bother?
There is no value in rigor in and of itself. All the effort to put mathematics on a rigorous footing gave us are things like Banach-Tarski Paradox (which is, objectively, absurd and only shows that the extent to which math models physics goes so far!) and Godel's Incompleteness Theorem (which shows that even attempting to reach Perfect Rigor is futile).
You don't need to introduce Peano's axioms to talk about number theory, and neither does any number theorist, really. And we wouldn't want any student to crank out a Principia while working on their topology homework.
So, treating rigor as a branch of math (which it is!), it needs to be introduced and taught just like any other branch - starting with context, stories, pitfalls, and seeing all the motivation for why we do things the way we do.
It starts with basic critical thinking, logic and philosophy classes, where people learn the difference between "All liberals support free healthcare" and "All supporters of free healthcare are liberals" (....well, I wish).
Going further, it's seeing the "proofs" that 0=1, or that all cats are grey (by induction). The latter "proof" is still the only thing that motivates me to check the "obviously true" things, like the induction step being applicable to the base case.
In high school, I had a great little book called "Lapses in Mathematical Reasoning" by Bradis and co-authors. It was a perfectly accessible assortment of gotchas.
Zeno's paradoxes are a motivation for some of the rigor of Calculus (convergent sequences and infinite sums are the answer to the paradox).
And when we look at rigor like this - like a thing that needs to be motivated, not an a-priori good - we see a rather disturbing pattern that rigor has been introduced at the expense of clear reasoning.
Take Calculus. Teaching it with limits, epsilon-deltas, etc. without giving a motivation for why this complex machinery is needed is purely a waste, and a thing that made many people despise math (it turned me off from analysis for a very long time, personally).
The problems that this rigor addresses aren't even taught to the vast majority of people who take Calculus! Everything that the intro course covers can be taught with infinitesimals just fine without introducing the epsilon-delta rigor.
And, in fact, epsilon-delta rigor can be entirely dispensed with (because the infinitesimals can be put on a rigorous basis, with non-standard analysis). Epsilon-delta was not an achievement. It was a defeat. It was the greatest minds of the time not being able to figure out how to add rigor to the concepts that Leibniz and Newton introduced, and so they simply powered through and worked around the concept of infinitesimal to make some hairy math work.
With rigor, just like with anything else, we have to ask: what's the return on investment there? Is it a good bang for the buck? Why is hand-waving bad?
Having learned a subject, we know where hand-wavy reasoning can lead. We know that not all cats are grey, or that continuous function doesn't need to be differentiable anywhere.
But there is no value in rigorous reasoning in Calculus if we are not running into monstrosities like the Weirstrass function. And, when we start out, we don't - because the Nature is quite nice, math-wise. At least on a day-to-day scale.
Adopting Arnold's mindset, the amount of rigor in a mathematical argument is somewhat like the amount of precision in a physical model.
No sane person would start teaching physics with Einstein's relativity. But in math, not only we do that, we never teach Newton's Laws - and in introductory classes, we don't even explain the formulas!
Imagine forcing high-schoolers crunch Einstein's tensors where all they needed was F = mg, without ever explaining what curvature even is or why it's needed ("it'll come in handy, trust us").
This is what we do with Calculus - or in any area where rigor is used without justification.
>Then, once you have written your stuff in that dry style, you can add some glimpses of discourse that become much more valuable than if you had started with some informal hand-waving
You always start with some informal hand-waving. Not including it in your paper is, put simply, lying by omission.
And great mathematicians didn't shy away from prose, especially when introducing significant concepts. When I was trying to understand quaternions, I found all the texts I looked at stupefying - until I found Hamilton's book where he introduced them.
Not only I got more from the first chapter than I was aware there was to know, but I also learned things like where the word vector comes from when we use it to mean "a magnitude and a direction". Learning it was infinitely more valuable to me than seeing the axioms of a vector space (which, of course, you never need to remember - just write down a handful or so rules that translations in a plane satisfy, and the chances that something that fits ain't a vector space will be zero unless you go out of the way to make up a contrived example just for that purpose).
In fact, and that's Arnold's point, you lose no rigor by ditching formal reasoning when you can be concrete.
I believe that it is detrimental to the human brain to go through the exercise of "proving" that a collection of invertible operators, along with all their compositions, form a group.
And yet, this is a common exercise! People spend time on this! Just watch: [1]. The video goes for seventeen minutes! For diagonal matrices with non-zero entries!
I feel that having this "rigor" is worse than saying that these matrices form a group because of course they do.
On the other hand, no time is ever spent explaining why the formal definition of "set with an operation" is introduced. That's because it's needless, of course; it seems that the sole purpose of this definition is to create exercises.
=====
[1] https://www.youtube.com/watch?v=q_JqHQPbmUk
[2] https://math.stackexchange.com/questions/919040/proving-a-gr...
[3] https://math.stackexchange.com/questions/1108349/prove-that-...
=============
From the comment on that video:
>It is more fun to proof that the set of a 2 by 2 matrices with everywhere the same value x with x not equal to 0 is a group. The determinant of such matrices is 0 but it is still a group.
This is only surprising if you don't understand 2x2 matrices as operators on a plane, in which case the exercise is a cruel perversion (why on Earth would anyone want to consider such matrices, or check that they form a group, with identity that's not the identity matrix?! And how would one come up with this to begin with?!).
"Even though the determinant is zero" is a symptom of a conceptual gap. Of course the determinant being zero has nothing to do with these matrices forming a group! You can embed GL(2) into GL(3) by filling the rest of the entries with zero, and of course this will still be a group: because it acts on the XY plane in just the same way as before, and matrix multiplication still gives their composition because we defined it to work this way.
And of course matrices of the form [x x; x x] form a group. A better question would be, why wouldn't they?
Take the following kindergarten-accessible definition of a group: actions that you can undo, repeat, and combine.
Let's hand-wave the above exercise with this definition. What does [t t; t t] do? Let's take t=1. [1 1; 1 1] takes a point in [a, b] in a plane, and sends it to a point [a+b, a+b].
Doesn't seem like you can undo that, because you don't know whether [3, 3] came from [1, 2] or [2, 1]. Bummer.
Well no surprise, the operation [1 1; 1 1] smashes the whole plane into a single line spanned by [1; 1], aka y = x (the image is spanned by column vectors). We might as well ignore anything off this line, 'cause we can't tell points away from the line apart after applying [1 1; 1 1].
What does [1 1; 1 1] do on its image, the line y = x? It sends [a; a] to [2a, 2a] on the same line. So [1 1; 1 1] acts like multiplying by 2. That's certainly something you can undo.
The same works for other matrices; [x x; x x] acts like multiplying by 2x on that line. You can undo that as long as x is not 0.
And you can repeat/combine these operations because who's gonna stop you?
There is nothing left to prove.
This "hand-wavy" argument, of course, is something that gives much more understanding than a "formal" proof from the "definition". That proof is "fun" because it is surprising - and it is surprising because it doesn't make sense.
And I would argue that it's much easier to make a mistake there - and conclude that it's not a group because such-and-such axiom doesn't hold.
The hand-wavy argument, though, ultimately comes from (or would lead to) an understanding that matrices act on their eigenspaces - and little more needs to be said (given that all of those matrices share an eigenspace).
Furthermore, this gives an example of a group representation for a group of nonzero real numbers with operation defined by xy = 2xy.
Of course, such a definition is utterly confusing; why would anyone come up with such a thing, other than to torture people? Why
would* one want to redefine the product of real numbers to be something else?!Seeing people work it out on Math StackExchange [3] is painful.
The people giving the answers are confident that a * b := 2(a+b) is both is and isn't a group! This alone should tell you that at some point, rigor becomes a hindrance. This is that point.
I say, the correct answer is that if someone gives you a group without a thing that it acts on, ask for your money back.
Of course that thing isn't a group, but what breaks if we just say it is? Since nobody is giving me a refund on this, let's see how it would act on itself. An element a would act by sending b to (2a) + 2b, so we have translation and scaling. Can we undo this? Sure, we just need to shift back by (-2a) and scale down by 1/2. But scaling down isn't an option here, so tough luck.
It's not the only problem, of course; but the student is left utterly confused (again, see comments in [3]!) by this exercise, whose point seems to be that "arbitrarily messing with definitions sometimes works and sometimes doesn't".
But it
feels* like this should be a group. Let's fix it. The rule "a * x -> 2(a+x)" is kosher; we can take it to be the action of a on the real line.What does the composition look like?
well, a * b * x = a * (2b + 2x) = 2a + 4b + 4x
That "4x" there tells us that the group generated by these actions is larger than just the generating set. Nothing in our generating set can multiply by 4 (again, that would be a way to see that the rule doesn't define a group). The exploration can then go on further to examining which subgroup of the affine transformations of the real line this generates. It's interesting!
I trust you that rigor being forced on you could have improved your mathematical reasoning. But in that case, you are exceptional - or there was more to it than "do it this way just because". The most common case, in my experience, is represented in the [1][2][3] (particularly in the comments): it makes people confused, wrong, and lost.
I'd rather have them never seen a definition of a group than go through that kind of brain damage.
>I agree that you cannot teach math in a purely bourbakist style. I prefer a "visual" style like that inspired by the books of Arnold, Strang, Needham, and I am the sole teacher in my lab that seriously uses the word "amplitwist" to refer to the complex derivative :)
In that case, they might be crying tears of joy or grief over all the years they were taught otherwise :)
[1] https://www.youtube.com/watch?v=q_JqHQPbmUk
[2] https://math.stackexchange.com/questions/919040/proving-a-gr...
[3] https://math.stackexchange.com/questions/1108349/prove-that-...