I see literally 0 evidence of this sort of trend.
Copilot is still BY FAR the best AI coding tool, and it’s just ok and hasn’t improved much since release IMO.
We don’t have infinite code to train models on, we’ve definitely trained GPT on about everything in GitHub and probably gitlab.
I’d be very surprised to see this level of code assistants with our current AI methods.
Aider ranks the LLM engines, with claude-3.5-sonnet coming out on top, but doesn't test against Copilot
I haven’t tried codestral yet tbf.
Also, LLM leaderboards are pretty useless. They’re either contrived tests or synthetic benchmarks which don’t reflect reality. We don’t have a good evaluation framework for LLMs yet.
The one you posted, for example, is having models run through 113 small specifically Python coding exercises from the exercism GitHub repo. Those are not representative of the tasks most SWEs deal with AND it’s very likely that those models we’re “contaminated” by being already trained on those exact exercises (since they are open source)
That leaderboard is little better than marketing. Which makes sense since it’s from an AI code assistant company.
EDIT: and even on this extremely minor eval, the best LLMs could do were ~78%. Not exactly what I’d call “good”
Given that humans have agency and codebases do not, we eventually end up with Conway's Law in effect again as humans start to shape everything around themselves.
I think the author has problems understanding causation.
Take a baseball batter. Pitch to the batter a bunch of times, leaving all the balls to rest wherever they land. If we later move all the balls farther away, we don't expect the batter to hit farther.
I was about to mention the psychological barrier phenomena according to which, once a record is broken, scores of athletes move past it in short time, to say that this might actually be an example where it could have an impact if the batter is aware.
But then I realized that I never questioned this idea and was just believing something I heard at some point many years ago. Turns out, it's just another piece of "pop sci" trivia, aka misconstrued or wrong: https://www.scienceofrunning.com/2017/05/the-roger-bannister...