I mourn the loss of working on intellectually stimulating programming problems, but that’s a part of my job that’s fading. I need to decide if the remaining work - understanding requirements, managing teams, what have you - is still enjoyable enough to continue.
To be honest, I’m looking at leaving software because the job has turned into a different sort of thing than what I signed up for.
So I think this article is partly right, Bob is not learning those skills which we used to require. But I think the market is going to stop valuing those skills, so it’s not really a _problem_, except for Bob’s own intellectual loss.
I don’t like it, but I’m trying to face up to it.
The problem arrises when Bob encounters a problem too complex or unique for agents to solve.
To me, it seems a bit like the difference between learning how to cook versus buying microwave dinners. Sure, a good microwave dinner can taste really good, and it will be a lot better than what a beginning cook will make. But imagine aspiring cooks just buying premade meals because "those aren't going anywhere". Over the span of years, eventually a real cook will be able to make way better meals than anything you can buy at a grocery store.
The market will always value the exact things LLMs can not do, because if an LLM can do something, there is no reason to hire a person for that.
"Being able to deliver using AI" wasn't the point of the article. If it was the point, your comment would make sense.
The point of the program referred to in the article is not to deliver results, but to deliver an Alice. Delivering a Bob is a failure of the program.
Whether you think that a Bob+AI delivers the same results is not relevant to the point of the article, because the goal is not to deliver the results, it's to deliver an Alice.
To the reader and the casual passerby, I ask: Do you have to work at this pace, in this manner? I understand completely that mandates and pressure from above may instill a primal fear to comply, but would you be willing to summon enough courage to talk to maybe one other person you think would be sympathetic to these feelings? If you have ever cared about quality outcomes, if for no other reason than the sake of personal fulfillment, would it not be worth it to firmly but politely refuse purely metrics-focused mandates?
But let's assume Bob continues to have an active role, because the people above him bought in to the hype and are convinced that "prompt engineer" is the job of the future. When things inevitably start falling apart because the Bobs of the world hit a wall and can't solve the problems that need to be solved (spoiler: this is already happening), what do we do? We need Alices to come in and fix it, but the market actively discourages the existence of Alice, so what happens when there are no more Alices left? Do we just give up and collectively forget how to do things beyond a basic level?
I have a feeling that, yes, we as a species are just going to forget how to do things beyond a certain level. We are going to forget how to write an innovative science paper. We are going to forget how to create websites that aren't giant, buggy piles of React spaghetti that make your browser tab eat 2GB of RAM. We've always been forgetting, really - there are many things that humans in the past knew how to do, but nobody knows how to do today, because that's what happens when the incentive goes missing for too long. Price and convenience often win over quality, to the point that quality stops being an option. This is a form of evolutionary regression, though, and negatively affects our quality of life in many ways. AI is massively accelerating this regression, and if we don't find some way to stop it, I believe our current way of life will be entirely unrecognizable in a few decades.
I do think coding with local agents will keep improving to a good level but if deep thinking cloud tokens become too expensive you'll reach the limits of what your local, limited agent can do much more quickly (i.e. be even less able to do more complex work as other replies mention).
I dread the flip side of this which is dealing with obtuse bullshit like trying to understand why Oracle ADF won’t render forms properly, or how to optimize some codebase with a lot of N+1 calls when there’s looming deadlines and the original devs never made it scalable, or needing to dig into undercommented legacy codebases or needing to work on 3-5 projects in parallel.
Agents iterating until those start working (at least cases that are testable) and taking some of the misery and dread away makes it so that I want to theatrically defenestrate myself less.
Not everyone has the circumstance to enjoy pleasant and mentally stimulating work that’s not a frustrating slog all the time - the projects that I actually like working on are the ones I pick for weekends, I can’t guarantee the same for the 9-5.
AI in software engineering is kept afloat by the bullshitters who jump on any new bandwagon because they are incompetent and need to distract from that. Managers like bullshit, so these people thrive for a couple of years until the next wave of bullshit is fashionable.
This point is directly addressed in the paper: Bob will ultimately not be able to do the things Alice can, with or without agents, because he didn't build the necessary internal deep structure and understanding of the problem space.
And if Alice later on ends up being a better scientist (using agents!) than Bob will ever be, would you not say there was something lost to the world?
Learning needs a hill to climb, and somebody to actually climb it. Bob only learned how to press an elevator button.
Following the model of how startups have worked for the last 20 years or so, I expect agents to eventually be locked-down/nerfed/ad-infested for higher payments. We are enjoying the fruits of VC money at the moment and they are getting everyone addicted to agents. Eventually they need to turn a profit.
Not sure how this plays out, but I would hang on to any competencies you have for anyone (or business) that wants to stick around in software. Use agents strategically, but don't give up your ability to code/reason/document, etc. The only way I can see this working differently is that there are huge advances in efficiency and open-source models.
Yes, but how does he know if it worked? If you have instant feedback, you can use LLMs and correct when things blow up. In fact, you can often try all options and see which works, which makes it ”easy” in terms of knowledge work. If you have delayed feedback, costly iterations, or multiple variables changing underneath you at all times, understanding is the only way.
That’s why building features and fixing bugs is easy, and system level technical decision making is hard. One has instant feedback, the other can take years. You could make the ”soon” argument, but even with better models, they’re still subject to training data, which is minimal for year+ delayed feedback and multivariate problems.
Some people treat toilet as magic hole where they throw stuff in flush and think it is fine.
If you throw garbage in you will at some point have problems.
We are in stage where people think it is fine to drop everything into LLM but then they will see the bill for usage and might be surprised that they burned money and the result was not exactly what they expected.
Aren't they currently propped up by investor money?
What happens when the investors realize the scam that it is and stop investing or start investing less...
The problem with unlearning generic tools and relying on ones you rent by big corporations is that it is unreliable in the long term. The prices will be rising. The conditions will worsen. Oh nice that Bob made a thing using HammerAsAService™, but the terms of conditions (changing once a week) he accepted last week clearly say it belongs to the company now. Bob should be happy they are not suing him yet, but Bob isn't sure whether the thing that came out a month after was independently developed by that company or not just a clone of his work. Bob wishes he knew how to use a hammer.
But Bob can't do things with agents.
He can get a project from someone else and ask the agents to do that project. Then give the output of the agents back to that someone else, and that someone else reviews it, says why it's wrong, and sends it back. Bob feeds the review to the agents, gets something back, then gives the output back to that someone else who reviews it, etc. So,
1) The loop requires his advisor to know how to go about doing the thing.
2) Bob is absolutely unnecessary and should be discarded.
3) Alice will eventually be qualified to be an advisor.
edit: And the crisis that the article is really pointing out is that when the advisor is using the LLM (while Bob is driving an Uber), and his productivity goes way up because he's only handling the things only he can handle, what about Alice?
Let's say that pre-AI the advisor could either do the job in 2 months or assign it to Alice who could do it in 12 with a week of the advisor's supervision/review. Now, with the LLM, the advisor can do the job in 2 weeks without Alice. Before, Alice made barely any money and had no health insurance. After, Alice is also driving Uber.
Now the advisor has a heart attack and now the thing just can't be done. Also, Ubers become pretty much self-driving, so Bob and Alice are not only ignorant, but unemployed. They can't even afford to take an Uber.
Code agents are great template generators and modifiers but for net new (innovative! work it‘s often barely usable without a ton of handholding or „non code generation coding“
I am in the same boat, but close enough to retirement that I'm less "scared" about it. For me I'm moving up the chain; not people management, but devoting a lot more of my time up the abstraction continuum. Looking a lot more at overall designs and code quality and managing specs and inputs and requirements.
I wrote some design docs past few days for a big project the team is embarking on. We never had that before, at least not in the level of detail (per time quantum) that I was able to produce. Used 2 models from 2 companies - one to write, one to review, and bounce between them until the 3 of us agree.
Honestly it didn't take any less time than I would have done it alone, but the level of detail was better, and covered more edge cases. Calling it a "win" right now. I still enjoy it, as most of the code I/we was/are writing is mostly fancy CRUD anyway, and doesn't have huge scaling problems to solve (and too few devs I feel are being honest about their work, here).
If not, you're changing learning to cook for Uber only meals.
And since the alternative is starving, Uber will boil the pot.
Don't give up your self sufficiency.
Can he? If he outsources all his thinking and understanding to agents, can he then fix things he doesn't know how to fix without agents?
Any skill is practice first and foremost. If Bob has had no practice, what then?
Also, the premise that it took each of them a year to do the project means Bob was slacking because he probably could've done it in less than a month.
More importantly, what's gonna be the next stable category of remote-first jobs that a person with a tech-adjacent or tech-minded skillset can tack onto? That's all I care about, to be honest.
I may hate tech with a passion at times and be overly bullish on its future, but there's no replacing my past jobs which have graced me and many others with quality time around family, friends, nature and sports while off work.
There is a vast range of scenarios in which being more or less independent from agents to perform cognitive tasks will be both desirable and necessary, at the individual, societal and economic level.
The question of how much territory we should give up to AI really is both philosophical and political. It isn’t going to be settled in mere one-sided arguments.
You're still working on intellectually stimulating programming problems. AI doesn't go all the way with any reliability, it just provides some assistance. You're still ultimately responsible for getting things right, even with key AI help.
It’s not for me. Being a middle manager, with all of the liability and none of the agency, is not what I want to do for a living. Telling a robot to generate mediocre web apps and SVGs of penguins on bicycles is a lousy job.
But he does things wrong.
I think the key issue is whether Bob develops the ability to choose valuable things to do with agents and to judge whether the output is actually right.
That’s the open question to me: how people develop the judgment needed to direct and evaluate that output.
He'll get things (papers, code, etc) which he can't evaluate. And the next round of agents will be trained on the slop produced by the previous ones. Both successive Bob's and successive agents will have less understanding.
Let’s wait until they a business model that creates profit.
Most of them won’t go away, but many will become outdated or slow or enshittificated.
Imagine building your career based on the quality of google‘s search
Why not? Once the true cost of token generation is passed on to the end user and costs go up by 10 or 100 times, and once the honeymoon delusion of "oh wow I can just prompt the AI to write code" fades, there's a big question as to if what's left is worth it. If it isn't, agents will most certainly go away and all of this will be consigned to the "failed hype" bin along with cryptocurrency and "metaverse".
Now, you don't do thing and do other things when LLMs get stuck. There is no "given enough time I can do it".
I can't see how somebody would go solving slop bugs (slugs :)) in heavy AI generated codebase.
Hope, I'm wrong but that's somehing I personally encountered. Stay sharp.
Didn't PhD projects used to be about advancing the state of art?
Maybe we'll get back to that.
I’ve been reminded lately of a conversation I had with a guy at hacker space cafe around ten years ago in Berlin.
He had been working as a programmer for a significantly longer time than me. Long enough that for many years of his career, he had been programming in assembly.
He was lamenting that these days, software was written in higher level languages, and that more and more programmers no longer had the same level of knowledge about the lower level workings of computers. He had a valid point and I enjoyed talking to him.
I think about this now when I think about agentic coding. Perhaps over time most software development will be done without the knowledge of the higher level programming languages that we know today. There will still be people around that work in the higher level programming languages in the future, and are intimately familiar with the higher level languages just like today there are still people who work in assembly even if the percentage of people has gotten lower over time relative to those that don’t.
And just like there are areas where assembly is still required knowledge, I think there will be areas where knowledge of the programming languages we use today will remain necessary and vibe coding alone wont cut it. But the percentage of people working in high level languages will go down, relative to the number of people vibe coding and never even looking at the code that the LLM is writing.
And so the paradox is, the LLMs are only useful† if you're Schwartz, and you can't become Schwartz by using LLMs.
Which means we need people like Alice! We have to make space for people like Alice, and find a way to promote her over Bob, even though Bob may seem to be faster.
The article gestures at this but I don't think it comes down hard enough. It doesn't seem practical. But we have to find a way, or we're all going to be in deep trouble when the next generation doesn't know how to evaluate what the LLMs produce!
---
† "Useful" in this context means "helps you produce good science that benefits humanity".
I hope it will encourage people to think more about what they get out of the work, what doing the work does for them; I think that's a good thing.
That you can't "become Schwartz" by using LLMs is an unproven assumption. Actually, it's a contradiction in the logic of the essay: if Bob managed to produce a valid output by using an LLM at all, then it means that he must have acquired precisely that supervision ability that the essay claims to be necessary.
Btw, note that in the thought experiment Bob isn't just delegating all the work to the LLM. He makes it summarise articles, extract important knowledge and clarify concepts. This is part of a process of learning, not being a passive consumer.
The solution is relatively simple though - not sure the article suggests this as I only skimmed through:
Being good in your field doesn't only mean pushing articles but also being able to talk about them. I think academia should drift away from written form toward more spoken form, i.e. conferences.
What if, say, you can only publish something after presenting your work in person, answer questions, etc? The audience can be big or small, doesn't matter.
It would make publishing anything at all more expensive but maybe that's exactly what academia needs even irrespective of this AI craze?
So I settled on very incremental work. It's annoying cutting and pasting code blocks into the web interface while I'm working on my interface to Neovim, spent a whole day realizing I can't trust it to instrument neovim and don't want to learn enough lua to manage it. (I moved onto neovim from Emacs because I don't like elisp and gpt is even worse at working on my emacs setup than neovim, the end goal is my own editor in ruby but gpt damn sure can't understand that atm) But at least I'm pushing a real flywheel and not the brooms from Fantasia.
If you are a massive company that owns all of the knowledge and all of the technology needed to apply that knowledge, then you don't need Alice. You don't _want_ Alice. You want more Bobs. It looks better on the books.
Tale as old as time.
I built a full stack app in Python+typescript where AI agents process 10k+ near-real-time decisions and executions per day.
I have never done full stack development and I would not have been able to do it without GitHub Copilot, but I have worked in IT (data) for 15 years including 6 in leadership. I have built many systems and teams from scratch, set up processes to ensure accuracy and minimize mistakes, and so on.
I have learned a ton about full stack development by asking the coding agent questions about the app, bouncing ideas off of it, planning together, and so on.
So yes, you need to have an idea of what you're doing if you want to build anything bigger than a cheap one shot throwaway project that sort of works, but brings no value and nobody is actually gonna use.
This is how it is right now, but at the same time AI coding agents have come an incredibly long way since 2022! I do think they will improve but it can't exactly know what you want to build. It's making an educated guess. An approximation of what you're asking it to do. You ask the same thing twice and it will have two slightly different results (assuming it's a big one shot).
This is the fundamental reality of LLMs, sort of like having a human walking (where we were before AI), a human using a car to get to places (where we are now) and FSD (this is future, look how long this took compared to the first cars).
I have gained a lot of benefit using LLMs in conjunction with textbooks for studying. So, I think LLMs could help you become Schwartz.
Was the LLM even useful for Schwartz, if it produced false output?
For so many workers, their companies just want them to produce bullshit. Their managers wouldn't frame it this way, but if their subordinates start producing work with strict intellectual rigor it's going to be an issue and the subordinates will hear about it.
So, you're not wrong. But the majority of LLM customers don't care and they just want to report success internally, and the product needs to be "just good enough." An LLM might produce a shitty webpage. So long as the page loads no on will ever notice or care that it's wrong in the way that a physics paper could be wrong.
It's exactly the same for coding.
This is one more argument to the only LLM trained to find "truth"
What may well happen instead is that Bob publishes two papers. He then outcompetes Alice based on the insistence that others have on "publish or perish". Alice becomes unemployed and struggles, having been pushed out.
The person who puts the time and effort in doesn't just sit at the same level and they don't both just find decent employment. Competition happens and the authentic learning is considered a waste of time, which leads to real and often life threatening consequences (like being homeless after being unable to find employment).
I thought I'd give Gemini a go. When I uploaded the 18-page PDF, it complained the output exceeded some limit. So I used pdftk to break it up into 4-page chunks, which seemed to work - the output looked very good and passed a couple of spot checks. But I don't trust these things as far as I can kick them.
There was a transaction column and a running balance column, so I did a quick check to see if every new balance equalled the previous one plus the transaction. And it almost always did. There were a couple of errors I put down to transcription errors. I was wrong. I eventually twigged that these errors only happened where I had split the PDFs. After tracking where the balance first went wrong, it became evident it had dropped chunks of lines, duplicated others, and misaligned the transaction and balance columns. It was complete rubbish, in other words.
So why did my balance check show so few errors? I put that down to it knowing what a good bank statement looked like. A good bank statement adds up. So it adjusted all the balances so it looked like a real bank statement. I also noticed these errors got more frequent in later pages. I tried splitting the PDF into single pages and loading them into the model one at a time. That didn't help much for the later pages, but the first one was usually good. So then I loaded each page into a fresh context, with a fresh prompt. If that didn't produce something that balanced, the second go always did.
I'm not sure it saved time over doing it manually in the end. It's a tired analogy now, but it's true: at their heart, these things are stochastic parrots. They almost never produce the same output twice when given the same input. Instead, they produce output that has a high probability of following the input tokens supplied. If there is only one correct output but the output is small enough, the odds are decent they will get it right. But once the size grows, the odds of it outputting complete crap become a near certainty.
In the past, if you neglected your institutes and academia, what happened? A rival state got electricity/cars/nuclear weapons first and you would be SOL.
These days, what happens? They invent faster phones? Higher res video?
This take may be a bit hyperbolic but I find it's a good thought exercise prompt.
Academia doesn't want to produce astrophysics (or any field) scientists just so the people who became scientists can feel warm and fuzzy inside when looking at the stars, it wants to produce scientists who can produce useful results. Bob produced a useful result with the help of an agent, and learned how to do that, so Bob had, for all intents and purposes, the exact same output as Alice.
Well, unless you're saying that astrophysics as a field literally does not matter at all, no matter what results it produces, in which case, why are we bothering with it at all?
Once they have to solve a novel problem that was not already solved for all intentes and purposes, Alice will be able to apply her skillset to that, whereas Bob will just run into a wall when the LLM starts producing garbage.
It seems to me that "high-skill human" > "LLM" > "low-skill human", the trap is that people with low levels of skills will see a fast improvement of their output, at the hidden cost of that slow build-up of skills that has a way higher ceiling.
Because we largely want people who have committed to tens of thousands of dollars of debt to feel sufficiently warm and fuzzy enough to promote the experience so that the business model doesn’t collapse.
It’s difficult to think anyone would end up truly regretting doing a course in astrophysics, or any of the liberal arts and sciences if they have a modicum of passion, but it’s very believable that a majority of them won’t go on to have a career in it, whatever it is, directly.
They’re probably more likely to gain employment on their data science skills, or whether core competencies they honed, or just the fact that they’ve proven they can learn highly abstract concepts, or whatever their field generalises to.
Most of the jobs are in not-highly-specific academic-outcome.
We're minting an entire generation of people completely dependent on VC funding. What happens if/when the AI companies fail to find a path to profitability and the VC funding dries up?
Once I realized that this white on black contrast was hurting my eyes, I decided to stop as I didn't want to see stripes for too long when looking away.
Some activity has outcomes that aren't strictly in the results.
I will make an explicit, plausible, counterpoint: academia wants to produce understanding. This is, more or less, by definition, not possible with an AI directly (obviously AIs can be useful in the process).
Take GR as an example. The vast majority of the dynamical character of the theory is inaccessible to human beings. We study it because we wanted to understand it, and only secondarily because we had a concrete "result" we were trying to "achieve."
A person who cares only about results and not about understanding is barely a person, in my opinion.
The industrialization of academia hasn't even produced more results, it has produced more meaningless papers. Just like LLMs produce the 10.000th note taking app, which for the LLM psychosis afflicted is apparently enough.
LLMs are exceptionally good at building prototypes. If the professor needs a month, Bob will be done with the basic prototype of that paper by lunch on the same day, and try out dozens of hypotheses by the end of the day. He will not be chasing some error for two weeks, the LLM will very likely figure it out in matter of minutes, or not make it in the first place. Instructing it to validate intermediate results and to profile along the way can do magic.
The article is correct that Bob will not have understood anything, but if he wants to, he can spend the rest of the year trying to understand what the LLM has built for him, after verifying that the approach actually works in the first couple of weeks already. Even better, he can ask the LLM to train him to do the same if he wishes. Learn why things work the way they do, why something doesn't converge, etc.
Assuming that Bob is willing to do all that, he will progress way faster than Alice. LLMs won't take anything away if you are still willing to take the time to understand what it's actually building and why things are done that way.
5 years from now, Alice will be using LLMs just like Bob, or without a job if she refuses to, because the place will be full of Bobs, with or without understanding.
This won’t affect everyone equally. Some Bob’s will nerd out and spend their free time learning, but other Bob’s won’t.
But do you actually understand it? The article argues exactly against this point - that you cannot understand the problems in the same way when letting agents do the initial work as you would when doing it without agents.
from the article: "you cannot learn physics by watching someone else do it. You have to pick up the pencil. You have to attempt the problem. You have to get it wrong, sit with the wrongness, and figure out where your reasoning broke. Reading the solution manual and nodding along feels like understanding. It is not understanding. Every student who has tried to coast through a problem set by reading the solutions and then bombed the exam knows this in their bones. We have centuries of accumulated pedagogical wisdom telling us that the attempt, including the failed attempt, is where the learning lives. And yet, somehow, when it comes to AI agents, we've collectively decided that maybe this time it's different. That maybe nodding at Claude's output is a substitute for doing the calculation yourself. It isn't. We knew that before LLMs existed. We seem to have forgotten it the moment they became convenient."
Your perspective is cut off. In the real world Bob is supposed to produce outcomes that work. If he moves on into the industry and keeps producing hallucinated, skewed, manipulated nonsense, then he will fall flat instantly. If he manages to survive unnoticed, he will become CEO. The latter rather unlikely.
I don’t think we have good answers unfortunately. Im very happy to be able to get the exact tools I want for my specific niche and be able to run experiments in no time. But I also see that intellectually I do not engage at the same level. I can justify my reasoning for high level design decisions I tried to get the agent to follow, but if there is an issue or I need to justify an implementation decision that’s way messier. I had that experience a few times over the past year and every time I have to reverse engineer what the agent might have been doing before I can answer, or realizing I completely misunderstood a specific protocol because I didn’t have to actively engage with it
Isn't this learning swimming by watching others explaining swimming? Bob would think he knows swimming, until he has to get into the water.
Speaking from personal experience, 99% people/students don't, and that is the problem.
> Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.
No they won't be. They might be worse. They might be better. But they'll be very different.
And, like you said...
> Alice and Bob had the same year. One paper each.
No they won't. Alice would've taken a year. Bob would've taken a few days.
You've already covered why that might actually be OK, so I'll talk about the author's other error:
> This sounds idealistic until you think about what astrophysics actually is. Nobody's life depends on the precise value of the Hubble constant. No policy changes if the age of the Universe turns out to be 13.77 billion years instead of 13.79. Unlike medicine, where a cure for Alzheimer's would be invaluable regardless of whether a human or an AI discovered it, astrophysics has no clinical output. The results, in a strict practical sense, don't matter. What matters is the process of getting them: the development and application of methods, the training of minds, the creation of people who know how to think about hard problems. If you hand that process to a machine, you haven't accelerated science. You've removed the only part of it that anyone actually needed.
Keep asking why. why does the development of the application and methods, the training of minds matter?
the goal isn't abstract. The goal is still ultimately for the benefit of humanity, just like the cure for Alzheimer's.
Humanity learned physics, so we made rockets and now we have satellites, and the entire planet is connected with communication and information.
Humanity must continue to invest in astrophysics so that we do not get wiped out by a single rogue asteroid barreling through the cosmos, like the dinosaurs did.
Now i'm not saying that there isn't other benefit to making generically intelligent humans that know how to think. But at the end of the day, the purpose of astrophysics is no less existential than the purpose for developing medicine.
I want to know the age of the universe so that we can understand what created it, and if we can reverse entropy, and if there is anything beyond the universe. That is a quest for humanity that will take hundreds if not millions of years.
> You do what your supervisor did for you, years ago: you give each of them a well-defined project. Something you know is solvable, because other people have solved adjacent versions of it. Something that would take you, personally, about a month or two. You expect it to take each student about a year ...
Is that how PhD projects are supposed to work? The supervisor is a subject matter expert and comes up with a well-defined achievable project for the student?
This can lead to a lot of problems as I think in some fields, by some academics, the default assumption is the former, when it's really the latter. This leads to a kind of overattribution of contribution by senior faculty, or conversely, an underappreciation of less senior individuals. The tendency for senior faculty be listed last on papers, and therefore, for the first and last authors to accumulate credit, is a good example of how twisted this logic has become.
It's one tiny example of enormous problems with credit in academics (but also maybe far afield from your question).
And there is everything in between.
How this plays out:
I use Claude to write some moderately complex code and raise a PR. Someone asks me to change something. I look at the review and think, yeah, that makes sense, I missed that and Claude missed that. The code works, but it's not quite right. I'll make some changes.
Except I can't.
For me, it turns out having decisions made for you and fed to you is not the same as making the decisions and moving the code from your brain to your hands yourself. Certainly every decision made was fine: I reviewed Claude's output, got it to ask questions, answered them, and it got everything right. I reviewed its code before I raised the PR. Everything looked fine within the bounds of my knowledge, and this review was simply something I didn't know about.
But I didn't make any of those decisions. And when I have to come back to the code to make updates - perhaps tomorrow - I have nothing to grab onto in my mind. Nothing is in my own mental cache. I know what decisions were made, but I merely checked them, I didn't decide them. I know where the code was written, but I merely verified it, I didn't write it.
And so I suffer an immediate and extreme slow-down, basically re-doing all of Claude's work in my mind to reach a point where I can make manual changes correctly.
But wait, I could just use Claude for this! But for now I don't, because I've seen this before. Just a few moments ago. Using Claude has just made it significantly slower when I need to use my own knowledge and skills.
I'm still figuring out whether this problem is transient (because this is a brand new system that I don't have years of experience with), or whether it will actually be a hard blocker to me using Claude long-term. Assuming I want to be at my new workplace for many years and be successful, it will cost me a lot in time and knowledge to NOT build the castle in the sky myself.
Then I spend time to read each file change and give feedback on things I'd do differently. Vastly saves me time and it's very close or even better than what I would have written.
If the result is something you can't explain than slow down and follow the steps it takes as they are taken.
In a living codebase you spent long stretches to learn how it works. It's like reading a book that doesn't match your taste, but you eventually need to understand and edit it, so you push through. That process is extremely valuable, you will get familiar with the codebase, you map it out in your head, you imagine big red alerts on the problematic stuff. Over time you become more and more efficient editing and refactoring the code.
The short term state of AI is pretty much outlined by you. You get a high level bug or task, you rephrase it into proper technical instructions and let a coding agent fill in the code. Yell a few times. Fix problems by hand.
But you are already "detached" from the codebase, you have to learn it the hard way. Each time your agent is too stupid. You are less efficient, at least in this phase. But your overall understanding of the codebase will degrade over time. Once the serious data corruption hits the company, it will take weeks to figure it out.
I think this psychological detachment can potentially play out really bad for the whole industry. If we get stuck for too long in this weird phase, the whole tech talent pool might implode. (Is anyone working on plumbing LLMs?)
You can do this during the previous change phase of course. Just ask "How would one plan this change to the codebase? Could you explain in depth why?" If you're expected to be thoroughly familiar with that code, it makes no sense to skip that step.
I do wonder where all the novel products produced by 10x devs who are now 100x with LLMs, the “idea guys” who can now produce products from whole clothe without having to hire pesky engineers.. where is the one-man 10 billion dollar startups, etc? We are 3-4 years into this mania and all I see on the other end of it is the LLMs themselves.
Why hasn’t anything gotten better?
Show me an llm that can sell my product and find market fit.
In reality llms are taking away profitable tools and keeping the revenue themselves.
The 10x dev doesn’t just set out to build a hello world app, ya know.
I don't know if I mind.
Example. This paragraph, to me, has a eerily perfect rhythm. The ending sentence perfectly delivers the twist. Like, why would you write in perfect prose an argument piece in the science realm?
> Unlike Alice, who spent the year reading papers with a pencil in hand, scribbling notes in the margins, getting confused, re-reading, looking things up, and slowly assembling a working understanding of her corner of the field, Bob has been using an AI agent. When his supervisor sent him a paper to read, Bob asked the agent to summarize it. When he needed to understand a new statistical method, he asked the agent to explain it. When his Python code broke, the agent debugged it. When the agent's fix introduced a new bug, it debugged that too. When it came time to write the paper, the agent wrote it. Bob's weekly updates to his supervisor were indistinguishable from Alice's. The questions were similar. The progress was similar. The trajectory, from the outside, was identical.
LLM speak. But the rest of that quote doesn't look LLM-generated, it's too fiddly and complex of an argument. I think this was edited with AI, but the underlying argument at least is human.
If you could, why wouldn't you? LLM witch-hunts over every halfway competent writer are becoming quite tiresome
The author is a bit naive here:
1. Society only progresses when people are specialised and can delegate their thinking
2. Specialisation has been happening for millenia. Agriculture allowed people to become specialised due to abundance of food
3. We accept delegation of thinking in every part of life. A manager delegates thinking to their subordinates. I delegate some thinking to my accountant
4. People will eventually get the hang of using AI to do the optimum amount of delegation such that they still retain what is necessary and delegate what is not necessary. People who don't do this optimally will get outcompeted
The author just focuses on some local problems like skill atrophy but does not see the larger picture and how specific pattern has been repeating a lot in humanity's history.
> It is a profoundly erroneous truism ... that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them.
> 4. People will eventually get the hang of using AI to do the optimum amount of delegation such that they still retain what is necessary and delegate what is not necessary. People who don't do this optimally will get outcompeted
Then they’ll be at the mercy of the online service availability and the company themselves. Also there’s the non deterministic result. I can delegate my understanding of some problems to a library, a software, a framework, because their operation are deterministic. Not so with LLMs.
Do you lack fundamental understand of those apps you built that are still in use? Did you lack understanding of their workings when you built them?
However, at one point in my career, I was frustrated with limitations in a language (Fortran II) and my curiosity got the better of me and I studied compilers thoroughly.
This led to a new job and the understanding of many new useful programming concepts. Very rewarding.
But if you are curious, studying compilers, maybe even writing a new one, will give you tools to do other things.
While working with LLMs, much of my experience gives me new ideas to push the LLM to explore.
And indeed running it through a few AI text detectors, like Pangram (not perfect, by any means, but a useful approximation), returns high probabilities.
It would have felt more honest if the author had included a disclaimer that it was at least part written with AI, especially given its length and subject matter.
Maybe he won't be able to blog himself out of a wet paper bag tomorrow, but people seem to think he's a great thinker today. Isn't that all that matters?
(I tried faking some AI style.)
I'd draw a comparison to high-level languages and language frameworks. Yes, 99% of the time, if I'm building a web frontend, I can live in React world and not think about anything that is going on under the hood. But, there is 1% of the time where something goes wrong, and I need to understand what is happening underneath the abstraction.
Similarly, I now produce 99% of my code using an agent. However, I still feel the need to thoroughly understand the code, in order to be able to catch the 1% of cases where it introduces a bug or does something suboptimally.
It's possible that in future, LLMs will get _so_ good that I don't feel the need to do this, in the same way that I don't think about the transistors my code is ultimately running on. When doing straightforward coding tasks, I think they're already there, but I think they aren't quite at that point when it comes to large distributed systems.
The problem is, they're nothing like transistors, and never will be. Those are simple. Work or don't, consistently, in an obvious, or easily testable, way.
LLM are more akin to biological things. Complex. Not well understood. Unpredictable behavior. To be safely useful, they need something like a lion tamer, except every individual LLM is its own unique species.
I like working on computers because it minimizes the amount of biological-like things I have to work with.
In academia, understanding is vital. The same for research.
But in production, results are what matters.
Alice would be a better researcher, but Bob would be a better producer. He knows how to wrangle the tools.
Each has its value. Many researchers develop marvelous ideas, but struggle to commercialize them, while production-oriented engineers, struggle to come up with the ideas.
You need both.
yea, there are multiple parts to education. 1) teach skills useful to the economy 2) teach the theories of the subject, and finally 3) tweak existing theories and create new ones. An electrician can fix problems without understanding theory of electromagnetism. These are the trades folks. A EE college graduate has presumably understood some theory, and can apply it in different useful ways. These are the engineers. Finally, there are folks who not only understand the theory of the craft, but can tweak it creatively for the future. These are the researchers.
Bob better fits as a trades-person or engineer whereas Alice fits better as a researcher.
I think the real danger is no longer caring about what you're doing. Yesterday I just pointed Claude at my static site generator and told it to clean it up. I wanted to care but... I didn't.
If someone claims to have "understood [middle school] algebra" but they aren't able to solve equations by themselves, you'd be skeptical. Of course past some point of familiarity it's simply faster to throw things into a CAS, but if you remove the initial manual struggle, then have you wired up your brain for understanding? There was a post on HN a few days back about how familiarity with a tool leads to a sense of "embodied understanding" [1], and I think the initial struggle is an intrinsic part of learning to get to the "unconscious competence' level.
Weak ownership, unclear direction, and "sure, I guess" reviews were survivable when output was slow. When changes came in one at a time, you could get away with not really deciding.
AI doesn't introduce a new failure mode. It puts pressure on the old one. The trickle becomes a firehose, and suddenly every gap is visible. Nobody quite owns the decision. Standards exist somewhere between tribal memory, wishful thinking, and coffee. And the question of whether something actually belongs gets deferred just long enough to merge it, but forces the answer without input.
The teams doing well with agentic workflows aren't typically using magic models. They've just done the uncomfortable work of deciding what they're building, how decisions are made, and who has the authority to say no.
AI is fine, it just removed another excuse for not having our act together. While we certainly can side-eye AI because of it, we own the problems. Well, not me. The other guy who quit before I started.
And it's not just a volume problem.
Mediocre devs previously couldn't complete a project by themselves and were forced to solicit help and receive feedback along the way.
When all managers care about is "shipping", development becomes a race to the bottom. Devs who used to collaborate are now competing. Whoever gets the slop into the codebase fastest, wins.
The difficulty of passing the defence vary's wildly between Universities, departments and committees. Some are very serious affairs with a decent chance of failure while others are more of a show event for friends and family. Mine was more of the latter, but I doubt I would have passed that day if I had spend the previous years prompting instead of doing the grunt work.
The process you describe is a gate keeping exercise which will change to include llm judges at somepoint.
Kids should start off with Commodore 64s, then get late 80’s or early 90’s Mac’s, then Windows 95, Debian and internet access (but only html). Finally, when they’re 18, be allowed an iPhone, Android and modern computing.
Parenting can’t prevent the use of LLMs in grad school, but a similar approach could be taken by grad departments: don’t allow LLMs for the first few years, and require pen and paper exams, as well as oral examinations for all research papers.
It's actually great since a lot of older technology is cheap and still readily available. My little ones love listening to old records, control the playback speed and hear the music go up in pitch if the RPMs are set too high. We look at the tracks on the vinyl under a microscope at talk about how the music is written on it that way. VHS an audio cassettes offer their own talking points.
For computers, we don't literally use a Commodore 64 but we run simpler, old software on new hardware. Mostly because a lot of newer education software is somehow also funded by injecting ads into the games (awful). But there is also some good "modern" educations software worth checking out. I highly recommend gcompris.net.
I'm sure this approach breaks down at the very frontiers of highly technical fields but... virtually all work, even work by educated professionals, happens outside that area anyway. On well-trodden ground, you can improve at supervising agents by doing things that test your ability to supervise agents.
Vs fields where there is not a reliable feedback path, or that feedback path is much more noisy.
In my experience, doing these things with the right intentions can actually improve understanding faster than not using them. When studying physics I would sometimes get stuck on small details - e.g. what algebraic rule was used to get from Eq 2.1 to 2.2? what happens if this was d^2 instead of d^3 etc. Textbooks don't have space to answer all these small questions, but LLMs can, and help the student continue making progress.
Also, it seems hard to imagine that Alice and Bob's weekly updates would be indistinguishable if Bob didn't actually understand what he was working on.
I've been having a lot of fun vibe coding little interactive data visualizations so when I present the feature to stakeholders they can fiddle with it and really understand how it relates to existing data. I saw the agent leave a comment regarding Cramer's rule and yeah its a bit unsettling that I forgot what that is and haven't bothered to look it up, but I can tell from the graphs that its doing the correct thing.
There's now a larger gap between me and the code, but the chasm between me and the stakeholders is getting smaller and so far that feels like an improvement.
There’s a reason most people aren’t promoted to manager until they have years of experience under their belt. And now we’re expecting folks to be managers on day 1.
It'll be valuable to refer back to this in 5-10 years.
Archived here so we can always come back to this: https://archive.md/MZtsI
One thing I've seen asserted:
> What he demonstrated is that Claude can, with detailed supervision, produce a technically rigorous physics paper. What he actually demonstrated, if you read carefully, is that the supervision is the physics. Claude produced a complete first draft in three days... The equations seemed right... Then Schwartz read it, and it was wrong... It faked results. It invented coefficients...
The argument that AI output isn't good enough is somewhat in opposition to the idea that we need to worry about folks losing or never gaining skills/knowledge.
There are ways around this:
"It's only evident to experts and there won't be experts if students don't learn"
But at the end of the day, in the long run, the ideas and results that last are the ones that work. By work, I mean ones that strictly improve outcomes (all outputs are the same with at least one better). This is because, with respect to technological progress, humans are pretty well modeled as just a slightly better than random search for optimal decisioning where we tend to not go backwards permanently.
All that to say that, at times, AI is one of the many things that we've come up with that is wrong. At times, it's right. If it helps on aggregate, we'll probably adopt it permanently, until we find something strictly better.
> Take away the agent, and Bob is still a first-year student who hasn't started yet.
This may be true, but I can see almost no conceivable word where the agent will be taken away. I think we should evaluate Bob's ability based on what he can do with an agent, not without, and here he seems to be doing quite well.
> I've been hearing "just wait" since 2023.
On almost any timeline, this is very short. Given the fact that we have already arrived at models able to almost build complete computer programs based on a single prompt, and solve frontier level math problems, I think any framework that relies on humans continuing to have an edge over LLMs in the medium term may be built on shaky grounds.
Two very interesting questions today in this vein for me are:
- Is the best way to teach complex topics to students today to have them carry out simple tasks?
The author acknowledges that the difference between Bob and Alice only materializes at a very high level, basically when Alice becomes a PI of her own. If we were solely focused on teaching thinking at this level (with access to LLMs), how would we frame the educational path? It may look exactly like it does now, but it could also look very differently.
- Is there inherent value in humans learning specific skills?
If we get to a stage where LLMs can carry out most/all intellectual tasks better than humans, do we still want humans to learn these skills? My belief is yes, but I am frankly not sure how to motivate this answer.
LLM access is a paid service. HN concerns itself with inequality constantly and it's not inconceivable that some individuals get ahead because they can afford to pay for more tokens and better models than those who are poorer.
This article first says that you give juniors well-defined projects and let them take a long time because the process is the product. Then goes on to lament the fact that they will no longer have to debug Python code, as if debugging python code is the point of it all. The thing that LLMs can't yet do is pick a high-level direction for a novel problem and iterate until the correct solution is reached. They absolutely can and do iterate until a solution is reached, but it's not necessarily correct. Previously, guiding the direction was the job of the professor. Now, in a smaller sense, the grad student needs to be guiding the direction and validating the details, rather than implementing the details with the professor guiding the direction. This is an improvement - everybody levels up.
I also disagree with the premise that the primary product of astrophysics is scientists. Like any advanced science it requires a lot of scientists to make the breakthroughs that trickle down into technology that improves everyday life, but those breakthroughs would be impossible otherwise. Gauss discovered the normal distribution while trying to understand the measurement error of his telescope. Without general relativity we would not have GPS or precision timekeeping. It uncovers the rules that will allow us to travel interplanetary. Understanding the composition and behavior of stars informs nuclear physics, reactor design, and solar panel design. The computation systems used by advanced science prototyped many commercial advances in computing (HPC, cluster computing, AI itself).
So not only are we developing the tools to improve our understanding of the universe faster, we're leveling everybody up. Students will take on the role of professors (badly, at first, but are professors good at first? probably not, they need time to learn under the guidance of other faculty). professors will take on the role of directors. Everybody's scope will widen because the tiny details will be handled by AI, but the big picture will still be in the domain of humans.
You have a good point, but I would argue that debugging itself is a foundational skill. Like imagine Sherlock Holmes being able to use any modern crime-fighting technology, and using it extensively. If Sherlock is not using his deductive reasoning, then he's not a 'detective'. He's just some schmuck who has a cool device to find the right/wrong person to arrest.
Debugging is "problem-solving" in a specific domain. Sure, if the problem is solved, then I guess that's the point of it all and you don't have to solve the problem. But we're all looking towards a world in which people have to solve problems, but their only problem-solving skill is trying to get an AI to find someone to arrest. We need more Sherlocks to use their minds to get to the bottom of things, not more idiot cops who arrest the wrong person because the AI told them to.
When I was fresh out of undergrad, joining a new lab, I followed a similar arc. I made mistakes, I took the wrong lessons from grad student code that came before mine, I used the wrong plotting libraries, I hijacked python's module import logic to embed a new language in its bytecode. These were all avoidable mistakes and I didn't learn anything except that I should have asked for help. Others in my lab, who were less self-reliant, asked for and got help avoiding the kinds of mistakes I confidently made.
With 15 more years of experience, I can see in hindsight that I should have asked for help more frequently because I spent more time learning what not to do than learning the right things.
If I had Claude Code, would I have made the same mistakes? Absolutely not! Would I have asked it to summarize research papers for me and to essentially think for me? Absolutely not!
My mother, an English professor, levies similar accusations about the students of today, and how they let models think for them. It's genuinely concerning, of course, but I can't help but think that this phenomenon occurs because learning institutions have not adjusted to the new technology.
If the goal is to produce scientists, PIs are going to need to stop complaining and figure out how to produce scientists who learn the skills that I did even when LLMs are available. Frankly I don't see how LLMs are different from asking other lab members for help, except that LLMs have infinite patience and don't have their own research that needs doing.
The problem, and I think the article indirectly points at that, is that the next generation to come along won't learn to think for themselves first. So they will on average end up on the 'B' track rather than that they will be able to develop their intelligence. I see this happening with the kids my kids hang out with. They don't want to understand anything because the AI can do that for them, or so they believe. They don't see that if you don't learn to think about smaller problems that the larger ones will be completely out of reach.
Yet what did not change in this process is that it only made the production of the text more efficient; the act of writing, constructing a compelling narrative plot, and telling a story were not changed by this revolution.
Bad writers are still bad writers, good writers still have a superior understanding of how to construct a plot. The technological ability to produce text faster never really changed what we consider "good" and "bad" in terms of written literature; it just allow more people to produce it.
It is hard to tell if large language models can ever reach a state where it will have "good taste" (I suspect not). It will always reflect the taste and skill of the operator to some extent. Just because it allows you to produce more code faster does not mean it allows you to create a better product or better code. You still need to have good taste to create the structure of the product or codebase; you still have to understand the limitations of one architectural decision over another when the output is operationalized and run in production.
The AI industry is a lot of hype right now because they need you to believe that this is no longer relevant. That Garry Tan producing 37,000 LoC/day somehow equates to producing value. That a swarm of agents can produce a useful browser or kernel compiler.
Yet if you just peek behind the curtains at the Claude Code repo and see the pile of unresolved issues, regressions, missing features, half-baked features, and so on -- it seems plainly obvious that there are limitations because if Anthropic, with functionally unlimited tokens with frontier models, cannot use them to triage and fix their own product.
AI and coding agents are like the printing press in some ways. Yes, it takes some costs out of a labor intensive production process, but that doesn't mean that what is produced is of any value if the creator on the other end doesn't understand the structure of the plot and the underlying mechanics (be it of storytelling or system architecture).
How do we know that while the AI was writing python scripts that Bob wasn't reading more papers, getting more data and just overall doing more than Alice.
Maybe Bob is terrible at debugging python scripts while Alice is a pro at it?
Maybe Bob used his time to develop different skills that Alice couldn't dream of?
Maybe Bob will discover new techniques or ideas because he didn't follow the traditional research path that the established Researchers insist you follow?
Maybe Bob used the AI to learn even more because he had a customized tutor at his disposal?
Or maybe Bob just spent more time at the Pub with his friends.
I don’t believe this. Totally plausible that someone would be able to produce passable work with LLMs at a similar pace to a curious and talented scientist. But if you, their advisor, are sitting down and talking with them every week? It’s obvious how much they care or understand, I can’t believe you wouldn’t be able to tell the difference between these students.
Very well said. I think people are about to realize how incredibly fortunate and exceptional it is to actually get paid, and in our industry very well, through a significant fraction of one's career while still "just" doing the grunt work, that arguably benefits the person doing it at least as much as the employer.
A stable paid demand for "first-year grad student level work" or the equivalent for a given industry is probably not the only possible way to maintain a steady supply of experts (there's always the option of immense amounts of student debt or public funding, after all), but it sure seems like a load-bearing one in so many industries and professions.
At the very least, such work being directly paid has the immense advantage of making artificially (often without any bad intentions!) created bullshit tasks that don't exercise actually relevant skillsets, or exercise the wrong ones, much easier to spot.
This is solvable with harness engineering.
The model’s first try is never ready for human consumption. There needs to be automation (bespoke, a mix of code and prompt based hooks - which agents can build) to force the agent’s output back through itself to tell it to be more rigorous, search online for proof of its claims, etc etc. and not stop until every claim is verifiable.
No human should see the model’s output until it’s met these (again bespoke but not hand written) guardrails.
What I’m talking about doesn’t exist and really has no analogy yet, so you can think of it as a super advanced form of linting. It’s grounding, but also verification that the grounding links to the material, and refusal to accept the model’s work until it meets the bar.
We are asking models to dream (invent purely from their weights), and are surprised when their dreams, just like ours, have little relationship to reality. The current state of the art is going to look very naive in a few years’ time.
> But the real threat isn't either of those things. It's quieter, and more boring, and therefore more dangerous. The real threat is a slow, comfortable drift toward not understanding what you're doing. Not a dramatic collapse. Not Skynet. Just a generation of researchers who can produce results but can't produce understanding. Who know what buttons to press but not why those buttons exist. Who can get a paper through peer review but can't sit in a room with a colleague and explain, from the ground up, why the third term in their expansion has the sign that it does.
I have seen versions of this in the wild where a firm has gone through hard times and internally systems have lost all their original authors, and every subsequent generation of maintainers… being left with people in awe of the machine that hasn’t been maintained in a decade.
I interviewed a guy once that genuinely was proud of himself, volunteering the information to me as he described resolving a segfault in a live trading system by putting kill -9 in a cronjob. Ghastly.
I’ve recently started csprimer and whilst mentally stimulating I wonder if I’m not completely wasting my time.
If that comes to pass, we'll be rediscovering the same principles that biological evolution stumbled upon: the benefits of the imperfect "branch" or "successive limited comparison" approach of agentic behaviour, which perhaps favours heuristics (that clearly sometimes fail), interaction between imperfect collaborators with non-overlapping biases, etc etc
https://contraptions.venkateshrao.com/p/massed-muddler-intel...
> Lindblom’s paper identifies two patterns of agentic behavior, “root” (or rational-comprehensive) and “branch” (or successive limited comparisons), and argues that in complicated messy circumstances requiring coordinated action at scale, the way actually effective humans operate is the branch method, which looks like “muddling through” but gradually gets there, where the root method fails entirely.
I think this is a simplification, of course Bob relied on AI but they also used their own brain to think about the problem. Bob is not reducible to "a competent prompt engineer", if you think that just take any person who prompts unrelated to physics and ask them to do Bob's work.
In fact Bob might have a change to cover more mileage on the higher level of work while Alice does the same on the lower level. Which is better? It depends on how AI will evolve.
The article assumes the alternative to AI-assisted work is careful human work. I am not sure careful human work is all that good, or that it will scale well in the future. Better to rely on AI on top of careful human work.
My objection comes from remembering how senior devs review PRs ... "LGTM" .. it's pure vibes. If you are to seriously review a PR you have to run it, test it, check its edge cases, eval its performance - more work than making the PR itself. The entire history of software is littered with bugs that sailed through review because review is performative most of the time.
Anyone remember the verification crisis in science?
[0] http://employees.oneonta.edu/blechmjb/JBpages/m360/Professio...
[1] https://s3.us-west-1.wasabisys.com/luminist/EB/A/Asimov%20-%...
https://boxobarks.leaflet.pub/3mj42airv3s2o#fingerprints-of-...
A two-hour thesis defense isn't enough to uncover this, but a 40-hour deep probing examination by an AI might be. And the thesis committee gets a "highlight reel" of all the places the student fell short.
The general pattern is: "Suppose we change nothing but add extensive use of AI, look how everything falls apart." When in reality, science and education are complex adaptive systems that will change as much as needed to absorb the impact of AI.
To some extent, the reason models will get better is because companies will hire PhDs to train them on increasingly complex problems.
The problem is that more complex problems take longer to train, more time to test, require more compute, and are harder to verify. This is why “just make it bigger” is a losing proposition imo.
A lot of what I just said is also true for RLVR.
Most of what we call thinking is merely to justify beliefs that emotionally make us happy and is not creative per-se. I am making a distinction between "thinking" as we know it and "creative thinking" which is rare, and can see things in an unbiased manner breaking out of known categories. Arguably, at the PhD level, there needs to be a new ideas instead of remixing the existing ones.
To be fair to pedagogy, it's also being defeated by eons of human brain wiring that says: "People in stories are somewhat real, and the author is always real."
We are disarmed by how well these things mimic all of typical indicators of thought, and it takes work— work our brains can't really sustain—to not fall into assumptions that there's another humanoid bearing some of the responsibility for catching problems.
The risk is that civilization is over its skis because humans are lazy. Humans are always lazy. In science there’s a limit to bs because dependent works fail. In economics there’s a crash. In physics stuff breaks. Then there is a correction.
This is going to get worse, and eventually cause disastrous damage unless we do something about it, as we risk losing human institutional memory across just about every domain, and end up as child-like supplicants to the machines.
But as the article says, this is a people problem, not a machine problem.
i.e. science.
If an AI is capable of producing an elegant solution with fewer levels of abstraction it could be possible that we end up drifting towards having a better understanding of what's going on.
It had great success, now when I propose to them to use some model to do something, they tends to avoid.
A combination of beancounters running the show and the old, experienced engineers dying, retiring, and going through buyouts has pretty much left things in a pretty sad state.
Its of course devious, exactly some of our styles :)
Give AI to VCs to use for all their domain stuff....
They than make wrong investment decisions based on AI wrong info and get killed in the market....
Market ends up killing AI outright....problem solved temporarily
The Aenid, Virgil, Dryden translation.
The way I think about this is : We can't catch the hallucinations that we don't know are hallucinations.
- Caddyshack
It's why I push for a hybrid mentor-apprentice model. We need to actively cultivate the next generation of "Schwartzes" with hands-on, critical thinking before throwing them into LLM-driven environments. The current incentive structure, as conception points out, isn't set up for this, but it's crucial if we want to avoid building on sand.
If I could make a rocket that could accelerate at 3 Gs for 10 years, how long would it take to travel from Earth to Alpha Centauri by accelerating at 3 Gs for half the time, then decelerating at 3 Gs for half the time?
Hint: They don't all get it right. Some of them never got it right after hints, corrections, etc.
This kind of follow-the-leader kind of "thinking" is probably a requirement. The amount of expertise it would require to understand and decide about things in our daily life is overwhelming. Do you fix your own car, decide each day how to travel, get food and understand how all that works? No.
So what is the problem? The problem is that if you follow the leader and the leader has an agenda that differs from your agenda. Do you really think that with Jeff Bezos being a (the?) major investor in Washington Post has anything to do with Democraccy? You know as in the WAPO slogan "Democracy dies in the Dark".
Does Jeff have an agenda that differs from yours? Yes. NYT? Yes. Hacker news? Yes. Google? Yes. We now live in a world so filled with propaganda that it makes no difference whether something is AI. We all "follow". Or not.
The bit they are missing, IMHO, is that if LLMs keep getting better, doing the steering-the-LLM version keeps getting easier, and being an LLM-using-expert rapidly drops to a value of zero. Like literally fucking nothing. If anyone can do it easily with an LLM, why would anyone pay you anything? Why would they care? You might as well be a teenager at a fast food job.
In this scenario, only actually understanding will be any kind of differentiator at all. And if that scenario doesn't come to pass, it will still be a far better differentiator to have a clue.
We're collectively racing towards the last scene of "Up", justifying each milestone along the way.
I've used this analogy more times than I thought I ever would; no regrets.
We're entering a post-thought era where knowledge itself is devalued and, worse, _seeking_ that knowledge is discouraged.
There's no time to stumble through learning a codebase when Claude can ship features faster than you can think. There's no time to learn.
Pixar warned us. Asimov warned us. Orwell warned us. This phenomenon isn't new; we just now have the technology to finally execute on this vision.
One of the marks of an educated person is the ability to dispassionately think from first principles. It is not a sufficient criterion, but it is a necessary one. In this case, the basic questions we must ask are: what is education, and what is education for?
An instrumentalist view of education, the one that has claimed the soul of the modern university and primary education , tells us that education is about preparing for a career - preparing to be an economic actor - and about the effect you can have. In short, it is about practical power and economic utility.
Now, the power to be able to do good things, to be practically able, is a good thing as such, and indeed one does acquire facility during one’s education. (And I would argue schooling today isn’t great at practicality either.) But the practical, unlike the theoretical, is always about something else. It is never for its own sake. What this means is that there must be a terminus. You cannot have an infinite regress of practical ends, because the justification for any practical end is not found in itself. And if the primary proximate end of education is the career, then what distinguishes education from training? Nothing. What’s more, if you then ask what the purpose of a career is, you find it is about consumption. So education today is about enabling people to be consumers. You wish to be effective so you can be payed more so you can buy more crap. Pure nihilism.
True education is best captured by the classical liberal arts, which is to say the free arts. Human beings are intellectual and moral creatures. The purpose of education is to free a person to be more human, to free them to be able to reason effectively and competently for the sake of wisdom and for the sake of living wisely. In other words, it is about becoming what you ought to become as a human being in the most definitive sense.
What good does AI do you if you haven’t become a better version of yourself in the process? So AI writes a paper for you. So what? The purpose of the paper is not the paper, but the knowledge, understanding, and insight that results from writing it.
Interestingly, the text has a number of AI-like writing artifacts, e.g. frequent use of the pattern "The problem isn't X. The problem is Y." Unlike much of the typical slop I see, I read it to the end and found it insightful.
I think that's because the author worked with an AI exactly as he advocates, providing the deep thinking and leaving some of the routine exposition to the bot.
D. W. Hogg, "Why do we do astrophysics?", https://arxiv.org/abs/2602.10181, February 2026.