Take the first example in Figure 4: "Find the derivative of the function using the definition of a derivative. f(x) = (x**2-1) / (2*x-3)". The "solution" produced is to just use a symbolic math package's 'diff' function to find the derivative. I assume that the actual intent of the question is for the student to use the definition of a derivative: f'(x) = limit of (f(x+e)-f(x))/e as e goes to zero, to find the derivative of this function, by directly finding this limit.
The "answers" for other questions similarly miss the point. For example, convergence of a series is determined by just asking a symbolic math package whether it converges, not by any actual reasoning, as would be expected of a student. And the question asking for the Type I error probability of a statistical test is "solved" using a simulation program, whereas I expect a human student is expected to get the exact answer by analytical calculation.
So basically they created some sort of general purpose math library, that can automatically detect the type of problem, find the correct library to solve it, and input the right inputs to get the right output. That is all very impressive and would be a great product actually, if refined.
No need for the bullshit headline.
they likely trained language model on bunch of stack overflow questions or something similar.
Likewise, we still teach bubble sort because it's an easy thing to teach rudimentary proof skills with. Similarly, we start students off with Newtonian physics.
This is a classic local-maximum situation. If you don't train people to solve problems as well as a low- or middle-competence program, then no-one is ever going to develop the expertise that will allow them to outperform the program.