When AI Builds Itself: Our progress toward recursive self-improvement (opens in new tab)

(anthropic.com)

534 pointsmeetpateltech23d ago704 comments

704 comments

279 comments · 102 top-level

aleqs22d ago· 24 in thread

Okay, so anthropic has amazing AI which supposedly writes most of their code and can continuously improve... meanwhile they have outages on a regular basis, and any kind of long-running work will now consistently hit 'API Error: Server is temporarily limiting requests'. Not sure of this is intentional to force a reduction of token usage, but at this point I need to build around these throttling limits and outages with my own tools to restart/resume sessions. From my experience, in the last 2 weeks, literally 100% of any non-trivial Claude session/work will now be blocked on these issues, requiring manual intervention.

One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).

The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.

aagha22d ago

And don't forget that they have BILLIONS of dollars and can't figure out how to get a decent support or public communications system setup.

aleqs22d ago

They can't even seem to get their usage metering consistent.

1 more reply

thinkingtoilet22d ago

Don't confuse things. It's not "can't figure out", it's "don't care to figure out". They're not dumb. They just don't care about support.

1 more reply

jakobnissen22d ago

Their outages are probably not due to their code though. It’s probably their infrastructure that can’t keep up. So seeing failures of infrastructure doesn’t really tell you anything about how good or bad Anthropic makes use of their models.

aleqs22d ago

That seems like an assumption based on basically nothing. There is a lot of code at the infra layer, and based on the stack choices for Claude code and based on how buggy and unreliable ~everything from anthropic is, it seems pretty bizarre to claim these issues are not related to their code.

1 more reply

bluerooibos22d ago

Well, people keep throwing money at them, including you and investors. So why would they care? It hasn't annoyed you or a large enough portion of users enough to move off their service - because there isn't a better alternative.

patcon22d ago

Not necessarily the parent's fault, but the energy of this thread is not my favourite...

f311a22d ago

Infrastructure is a much harder problem. They can't even improve Claude Code, which eats 1GB+ of RAM. Meanwhile, my editor only consumes 80MB of RAM.

6 more replies

cookiengineer22d ago

The main reason I am building my own agentic environment is that I need full control and reproducibility of what I am building.

Post November and post openclaw agentic environments need to be built differently, and for selfhosting models the context size problem really requires a strong harness which intelligently helps reduce context size.

Planner/orchestrator architecture, agent to agent summarizer, specification based tools (fck all this markdown memory bullshit btw), tool call shrinking, and workflow management are all really important because of the context size problem.

Nobody has enough VRAM for the large K/V caches, and nobody can afford f16/f32 caches in terms of memory, which are also necessary for longer conversations. MoE 30b models have improved so much though, qwen 3/3.6 coder is the real champion doing almost the same things with less than 1/10th the memory requirements. Just think about that in terms of engineering and what your bet is going to be. Haiku pales in comparison.

Currently my focus with exocomp is trying to figure out how I can record, replay, restart, and debug workflow sessions of agents in a better manner so that I as a human can understand what's going on. Currently I think that UI will be something like a gantt chart where you have a graph with connections representing agent to agent communication. And yes, that's a lot of fiddling with SVG as it turns out, so I'm not quite there yet.

Anyways, in case you're interested. I'm manually building this env and trying to unit test the critical parts. [1]

[1] https://github.com/cookiengineer/exocomp

0x5322d ago

They also don’t have…a login page with authentication . To access the console you get an email link. No passkeys, passwords, 2fa, just an email.

hombre_fatal22d ago

This comment is a good example of the double standard laymen have about AI usage:

If you use AI, then AI must be expected to solve all problems, even problems that affect everyone like infra scaling.

And if perfection isn’t delivered, then of course it wasn’t: you used AI and AI sucks.

3 more replies

rishabhaiover22d ago

you're conflating a compute problem with a code quality problem.

thordenmark22d ago

Growing pains of being successful. These are solvable problems and will be. Can they maintain their momentum without pissing off too much of their customer base before these issues are resolved?

asdfman12322d ago

Personally at my own job self-writing code is letting us tackle big, long-deferred refactoring projects (like the article mentions), but any sort of refactoring introduces new bugs.

qsort22d ago

Look, I've never been someone who mindlessly hypes AI companies, as a matter of fact I think they have serious leadership problems across the board, but you people are straw-manning them so badly it actually makes me sympathize with them.

They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.

1 more reply

Quekid522d ago

Indeed... why is Anthropic even employing people at all if this AI magic story is true?

1 more reply

anjel22d ago

Answers the question: how can Anthropic sell more Usage "Credits"

jatora22d ago

This is weird to me because i am using claude code 10+ hours/day 7 days a week, usually multiple sessions, and run into api errors maybe in 1 or 2 sessions per week. And about..2 major outages of 10-20min in the last month. Not terrible and nowhere near what you are reporting. Therefore I dont believe you, because you dont even couch this in terms of it being something that seems particular to you or your region. Obvious dishonestly is fairly bad of you.

1 more reply

ChadMoran22d ago

Better doesn't mean perfect.

prng202122d ago

We’ve got a company of several thousand employees serving hundreds of millions of people arguably the best AI model in the market. Meanwhile you’re asking for a handkerchief for your pool of tears because their product is struggling to do your daily job functions for you, with much of that due to being limited by the worlds supply of silicon, electricity, water, and other resources. Cry me a river.

1 more reply

claudiug22d ago

those are results of the humans only. not the AI. AI is perfect /s

rush8699922d ago

Just as you expected, I'm throwing in my harness. Please support: https://github.com/rush86999/atom

0xbadcafebee22d ago

Have you considered just... using OpenAI? They are more reliable, models are just as good, and their subscriptions provide more requests per dollar.

windexh8er22d ago

Opus 4.8's critical assessment of Anthropic's "When AI builds itself" [0][1]. Because, why not?

[0] https://pastebin.com/Vc5Yq9Ai [1] https://www.anthropic.com/institute/recursive-self-improveme...

1 more reply

pizlonator22d ago· 14 in thread

What I can’t get over is that there have been exactly zero software breakthroughs since vibe coding started, other than vibe coding itself.

Claude is amazing, that’s true.

But if it was as amazing as this article implies, I’d expect some breakthrough outside of AI itself.

Rewriting a Zig program in unsafe Rust? Not a breakthrough. Finding a bunch of security vulns? Maybe that’s sort of a breakthrough though it’s underwhelming and possibly just a net negative. But like if I rolled back to using software from 2023 then life would be ok.

Maybe we just need to give it time, and sometime real soon, we will all be amazed by such a breakthrough? Who knows

sothatsit22d ago

Maybe my bar for what constitutes a breakthrough is lower than other people's, but all of these seem like breakthroughs to me:

NLP as a field saw huge shifts. NLP tasks that used to be complex and inaccurate can now be setup very easily and quickly using structured outputs from LLMs, often with greater accuracy.

A small charity I help with has now been able to build their own website to manage their day-to-day operations. It saves them a lot of time, and it was vibe-coded using Manus. I don't think people appreciate how much room there is left for bespoke software to have big impacts on small organisations that can't afford to hire developers. The cost for software like the one they made has gone from 10s of thousands of dollars to $10/month and volunteer hours.

My brother has recently been setting up Cowork to do an automatic review of contracts before human review, and he said it is far more diligent than people when it comes to routine things to check. This is another huge breakthrough for not just efficiency, but the quality of work.

I really don't think we can discount AI finding bugs and vulnerabilities. If you care about code quality and keep up review standard, LLMs can help you write more robust software. AI has found a huge number of bugs for me before they hit production, including potential out-of-bounds memory accesses and segfaults.

ChatGPT has 1 billion MAU. People are now getting life advice, financial advice, and mental health help from chatbots at a scale and cost that no human support network could match.

2 more replies

spprashant22d ago

Its in a weird space right now.

These models are actually extremely good but they are far from an intelligence unto themselves. Truth is if someone told you they could build these things 5 years ago, you d write them a check for a trillion dollars. Problem is once we got them, we realized they are not all that. Its like a mecha suit in a universe, where mecha suits are abundant and cheap. Someone has to climb into them everyday and put in the work for it to be effective.

So now the skeptics are saying this technology is overrated. And the optimists are accusing the skeptics of moving goal posts.

4 more replies

sutterd22d ago

I am doing a solo project that is pretty big, meaning it is not something I could vibe code. I can do alot with AI that I could never do on my own, but I am not seeing several mulitples improvement in my productivity. I spend so much time doing what I call "AI wrangling", trying to get it to do what I want. Claude is writing all the javscript and python code, but ultimately I am programming in English. What is good is that it is effectively a very high level computer language, where the agent can implement a lot of underlying code with a short English description, often. But many other times it takes a lot of work to get what you want.

3 more replies

marcus_holmes22d ago

I spent years in the early 2000s trying to get a computer to read unstructured PDFs and TIFF images (mainly invoices, either scanned or electronic). Limited success, we always had to get a human to look at them in the end.

We implemented that in about three days earlier this year, just by feeding the files to LLMs. And it's good enough to not need a human to check.

I get that this isn't a "Computer Science breakthrough" in the sense you mean, but it used to involve a lot of hard CS to try and solve, and now it doesn't.

drtz22d ago

Maybe I'm looking through rose colored glasses, but software that writes itself seems like a pretty big breakthrough to me.

4 more replies

signatoremo22d ago

The arguments against AI assisted coding used to be "only for toy projects", then at some point it became "no dignity", "joyless". Now it's "no new breakthrough" apparently. All in the span of maybe a year. I say it's made tremendous progress.

1 more reply

wild_egg22d ago

What does a breakthrough look like?

3 more replies

est22d ago

> exactly zero software breakthroughs since vibe coding started, other than vibe coding itself

Generative AI is meant to be a mimic - Richard Sutton

https://x.com/RichardSSutton/status/2061216087744946656

squidsoup22d ago

The breakthroughs in mass state surveillance are coming, never fear.

fooker22d ago

What does a software breakthrough look like in your opinion?

If you get yourself to define it, maybe you'll find it achievable :)

rcpt22d ago

Solved a bunch of Erdos problems.

jimbokun22d ago

What would qualify as a breakthrough for you?

revlsas22d ago

openAI has how many employees and the chatGPT app has 1 billion MAU

defen22d ago

Vibe coding is the breakthrough. There's always been "no-code" solutions to problems in various business domains, but they were invariably janky, underpowered, and/or overpriced. Now we have a way for domain experts to go directly from ACTUAL natural language directly to implementation in a real programming language, fully automated, in minutes or hours. How is that not a science-fiction level breakthrough? In 2011 if anyone had said that would be possible "in 15 years", I think most professionals at the time would not have replied with "yeah it's coming but your timeline is off". It would have been "you have no fucking idea what you're talking about".

mweidner22d ago· 11 in thread

I fail to see how pursuing recursive self-improvement at full speed is compatible with Anthropic's stated goal of AI Safety. If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?

I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.

gensym22d ago

> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.

1. If anyone builds strong AI, it may be catastrophically bad.

2. If anyone builds strong AI, it will be better for the builder than for anyone who does not. Either because it won't be catastrophically bad so the builder will get to enjoy all the spoils indefinitely or because it will and at least the builder will be rich for a while.

3 more replies

overgard22d ago

The thing about nukes is you can at least make an argument for why it'd be important to be the first country to have them. With AI, you create super intelligence and you're probably just the first one it takes out. There's no reason to think a super intelligence would be totally fine being a slave to apes.

Cynicism with these companies is highly warranted though. It's not doomerism to look at their actions and conclude they're deeply untrustworthy.

2 more replies

RobertDeNiro22d ago

Anthropics goal is regulatory capture.

lenerdenator22d ago

> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

It's not cynicism if it's an appraisal of reality that's backed up by evidence.

Remember how social media - that first baby of this current generation of tech entrepreneurs - was supposed to "bring the world together" and "let us express ourselves"? As it turns out there's a lot more money to be made by fostering division to drive engagement and feeding people an endless stream of ads instead of their friends' content. And money is what matters. You can't write down good vibes on a quarterly figures report. You can absolutely write down the number of eyes that your ragebait brought to a product's marketing efforts and the conversion rate to sales.

The same will be done with GenAI. We're being promised "AI Safety" because otherwise this whole thing gets killed dead by anyone who knows about James Cameron's directing career. There's no real enforcement mechanism for AI safety, though. Safety is a good vibe, same as harmony in online communities. You can't measure it. What you can measure is training costs and the cost of mistakes by AI that need to be trained to avoid those mistakes. Since AI generates more output than humans can conceivably QA no matter what your budget is, and since AI is seen by the market as a potential endless font of value, the tradeoff will be made to have AI make some potentially awful decisions while training itself over slowing down and re-appraising what is being done.

There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.

1 more reply

sfink22d ago

This was pretty directly addressed in the article: not doing it would only mean they'd fall behind whoever would. This is not peace time in the AI race.

Whether you agree with that argument is another question.

1 more reply

mrob22d ago

To complete the analogy, it's like nukes, except we don't have the slightest idea how to calculate the odds of it igniting the atmosphere. (And note that in reality, while the Trinity test "ignite the atmosphere" calculations were correct, we failed to correctly calculate the fallout of the Castle Bravo test with lethal consequences).

1 more reply

tjwebbnorfolk22d ago

> Anthropic's *stated goal* of AI Safety

Actions speak louder than words. If you want to understand someone, simply watch what they do. What they say is irrelevant.

keybored22d ago

Such a massively valued company. And doubting them is cynicism? It’s rational(ism).

So either they lie or they are AI Zealots. Interesting times.

tokioyoyo22d ago

Sorry for nitpicking, but:

> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?

Arguably, yes.

3 more replies

parineum22d ago

> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.

keybored22d ago

Such a massively valued company. And doubting them is cynicism? It’s rational(ism).

So either they lie or they are AI Zealots. Interesting times.

Edit:

> > and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.

There are three types of people. Pedestrians, investors, and “I know some of them, they wouldn’t lie”.

chilipepperhott22d ago· 10 in thread

I find any and all claims like this ridiculous from a company who can't build a terminal application that uses less than a gigabyte of RAM.

dang22d ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

thesmtsolver222d ago

For some reason, idling Claude Code needs 100% of my CPU.

1 more reply

asdfman12322d ago

Developers can develop leaner applications, but they're usually not incentivized to.

Frankly, I love efficiency too, but I've hard to learn the hard way that what the market wants is features. Or at the very least, the executive team wants that.

1 more reply

toephu222d ago

I have iterm2 open right now with Claude in a long session and it's only using 500MB of memory.

1 more reply

andriy_koval22d ago

Maybe that gigabyte is occupied by useful information: traces/memory?

3 more replies

davidatbu22d ago

So would you take these claims seriously if they came from OpenAI (since Codex is a pretty lean CLI app)?

If so, I think it would be in the spirit of HN to discuss the subject matter of the blogpost (increasingly autonomous coding towards the end goal of RSI) as if the blog post was indeed from OpenAI. OpenAI is, by all accounts, going through a very similar process anyways.

Jtarii22d ago

Well, they could very easily if they wanted. There is just no economic value in it.

Lplololopo22d ago

Really? Let me explain how bigger companies work:

They have different teams for different departments with different type of people.

So the team or teams responsible for writing the terminal application are different people than the researchers doing the learning.

This can lead to dimentral quality aspects.

cpursley22d ago

A came here just to write: Pretty please let it churn for a few nights and redo Claude Code in Rust. Because the harness is very very good as are their models, but that node thing is a hog for no good reason at all.

3 more replies

bpodgursky22d ago

They obviously don't care, aren't making any attempt whatsoever to do this, and 99% of users don't care either.

If you want to pollute your own priors with weird artificial litmus tests, it's a free country, but the artificial world-model you build in your head does not affect the real world around you.

jameson22d ago· 9 in thread

I don't quite understand the intent of such article other than to promote themselves given an odd timing that the company is planning on going public, so I can only conclude that this is just part of the IPO roadshow.

LLMs certainly have made significant changes to our lives, but I haven't yet to see any extraordinary improvement it brought to me which makes me skeptical about their claims.

_if_ it solves many of our problems of great magnitude, why haven't Anthropic used it to solve significant problems we, humans, face? Cancer, Alzheimer's, education, finding new materials, fission power plant, etc.

ElProlactin22d ago

Because they're going after the biggest problem of all first: labor costs.

/s but not to a lot of people

2 more replies

sothatsit22d ago

Or: Anthropic genuinely believes the future scenarios they outline are realistic possibilities, and they want more people to take them seriously.

4 more replies

aroman22d ago

The article does not claim they have achieved recursive self improvement... just that it appears to be a plausible outcome given the progress of AI development in the past few years.

I don't know about you, but AI advancements have brought extraordinary improvements to me personally in my ability to be productive, in much the same ways the article outlines. I find it deeply satisfying to be able to "get ideas out of my head" faster and tackle more meaningful problems.

FWIW, it deeply concerns me how much power and capability is being centralized in the hands of so few, especially Anthropic. I, for one, hope these advancements can be scaled down to something I can have full sovereignty over and trust... in my own home.

sirsinsalot22d ago

Truly feels like witnessing the worst of capitalism and greed play out. All that compute and energy towards a narrative of reducing the need for skilled programmers. What a waste.

These people don't have our interests in mind and everyone eats it up like a blessing from a god or something. It's surreal.

2 more replies

stevenhuang19d ago

It starts somewhere, like with this announcement.

I'm not sure why this is so difficult for you to understand.

sterlind21d ago

two reasons:

1. Anthropic is an AI company. they want to get to AGI before anyone else ~~so they can lock the doors behind them~~ to ensure the supremacy of an aligned AGI that serves humankind. RSI unlocks the most value for them.

2. doing bioscience is slow and capital intensive. robotics lags way behind, so that's a lot of lab techs swishing flasks and plating petri dishes. they're happy to stay in silico, but there's very little productive research you can do without in vivo/in vitro experiments.

b3nji21d ago

Because shush, that's why!

yuhmahp22d ago

Agree with your point about the timing, but drawing anticipation before going ahead and solving these disease can be a good smoke test, would be beneficial even if there's an IPO or not

trolleski22d ago

The benefits of AI are not designed to suit you, but the owner class. The plan is for you to be sidelined.

torben-friis22d ago· 9 in thread

>A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.

What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.

malfist22d ago

One of my co-workers just asked me to review his pull request that was all AI generated. 600 files were touched, over 40k lines of code added.

I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?

I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.

7 more replies

overgard22d ago

I just watched copilot today turn a 8 line fix into 500 lines, so, yeah, verbosity is a big side effect

2 more replies

keeda22d ago

So the more rigorous studies about AI-assisted coding productivity addressed this by keeping in place all other software development processes, including the same code review and quality standards, and only measuring throughput (PRs, LoC) before and after AI was allowed.

Hence the intepretation of this 8x number depends on whether (or how much) Anthropic engineers have changed their quality standards and development processes. They don't tell us, and I am not aware of any other indications we could use to make a judgment.

However, we can still do some theorycrafting! I'm convinced that to fully realize the potential of AI-assisted coding we need to revamp all the dev processes, especially how we validate code, and it would be foolish of Anthropic not to do so (unless they were conducting a rigorous study, which they don't claim to have done.)

My hypothesis on the future of software validation is nothing fancy, we simply want much, much more automation for tests, observability and other bespoke verification methods than we traditionally had. But then validation code will also contribute to the LoC! My observation so far of personal as well as some "vibe-coded" open-source projects is O(LoC production code) ~= O(LoC test code). So as a SWAG the upper bound could be something like a 3 - 4x speedup, which is still remarkable.

All bets are off if code quality standards are not the same.

fooqux22d ago

Exactly. If AI is going to start being graded on how many LoC it generates- oh, I'm sorry, how much it "accelerates", than guess what newer models will start doing more of?

2 more replies

snowwrestler21d ago

I don’t understand how lines of code matter at all for scary LLM core capabilities. Does the transformer architecture get better with more lines of code?

My impression was that LLM training codebases were 99% resource management and only a few lines actually implement the core training algorithm, which is where 100% of the intelligence comes from. Data, not lines of code, are the constraint.

After training you can adapt the intelligence in various ways, and that takes a bunch of lines of coded too. But you cant raise the intelligence ceiling again without another training run. So where is the scary recursive part?

whateveracct22d ago

Yeah, they assume that "productivity = k * LOC" where k > 1

very flawed

yalok22d ago

Could just be more tests? :) Which is good for code quality in general and reduces support burden, but doesn’t lead directly to more features

snthpy22d ago

Just imagine the productivity gains from using LLMs to rewrite Kotlin codebases in Java!

chuckadams22d ago

AI generates code that mimics the existing code. If your code is terse and comment-free, then the agent’s code is too. The times I’ve seen Claude drift into a default “house style” it generated like 1 comment for every 10 LOC or so. It’s a far cry from the GPT-3 days that littered every line with the journals of Captain Obvious.

1 more reply

overgard22d ago· 8 in thread

So, regardless of whether or not Anthropic CAN create a self improving AI.. does anyone else feel like they shouldn't be allowed to? Or it at least needs to be strictly supervised..? Like, I don't actually think Anthropic can make the singularity any time soon, but I think even AI boosters have to admit doing this is creating a society-wide danger for the benefit of a very very small number of already-rich people.

csense21d ago

> they shouldn't be allowed to?

Anthropic addresses this head-on in the final section of the paper titled "What should we do?" If you convince the US government to slow AI development, you have to convince China too, otherwise you're not stopping self-improving AI at all, you're just throwing away the lead to China. If you convince China too, China or the US or both might go back on their word and build self-improving AI secretly, for greed of the benefits it could bring or fear the other will go back on their word.

What you really need is a non-proliferation regime like the one for nuclear weapons, where every country makes potentially dangerous AI illegal and lets foreign or international inspectors monitor to check that nobody's building illegal AI in secret. But monitoring seems hard; it's general-purpose computation. How do you check whether a given datacenter is training an illegal AI and not just serving websites, running detailed protein folding simulations, or mining crypto? For that matter, how do you know that a nondescript industrial facility hasn't been repurposed into a hidden datacenter for training illegal AI?

asdfman12322d ago

I think that's a valid point. You could very well be right.

But we're discussing whether we should close the barn door while the horse is three miles down the road.

3 more replies

tancop22d ago

the danger comes from the fact anthropic is a for profit company and they could train it to benefit them instead of the public. if they go ahead with it they should get nationalized, their self improving ai analyzed for any hidden agenda and then released as open source.

alfalfasprout22d ago

Absolutely! Yes. This rhetoric of inevitability only benefits these AI companies.

eieie1122d ago

Too late for that.

In any case firms that get too powerful can be nationalised.

1 more reply

lukan22d ago

"does anyone else feel like they shouldn't be allowed to?"

No. Technical limitations aside, I doubt it could be contained, but will be leaked soon, so won't profit just a small number of ultra rich.

2 more replies

Melatonic22d ago

Skynet is 30 years late!

1 more reply

huqedato22d ago

Self improving AI is pure dystopia. Anthropic won't build the singularity, AI itself will build it through self-iterations. Read Yudkowsky's book "If Anyone Builds It, Everyone Dies".

robbrown45122d ago· 6 in thread

Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?

I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself: https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x... (cnc router that cuts plywood, and is made out of cnc-router cut plywood)

This is my own effort at an AI assisted coding environment optimized for building itself: https://recursi.dev/ (just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )

Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.

jrflo22d ago

I think harnesses would count, AI != LLMs. Any piece of code that helps the computer reason for itself is AI, the harnesses are AI in a sense.

2 more replies

kaffekaka22d ago

Tangent: https://en.wikipedia.org/wiki/Self-replicating_spacecraft

cyanydeez22d ago

If you want to get out ahead of what's coming, it'll be small models that bootstrap the harness rather than anything else.

1 more reply

lanthissa22d ago

yes? the future for any verifiable task is the model attempts to verify initial state and a goal then decomposes its tasks in to every smaller verifiable subtasks, with /memory being the persistence between runs and then /dreaming on the results of those memory files + run data to introduce new ideas.

i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.

maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.

my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.

1 more reply

marcosdumay22d ago

You need the AI eventually building another AI for the name to apply. This page is just bullshit. They vibe-code their harnesses, and yes, it shows.

Anyway, what does recursive self-improvement even means for neural-network based AIs? It's not clear it's possible at all.

2 more replies

reddozen22d ago

> Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?

Shhh just let the marketing slop wash over you.

anilgulecha23d ago· 6 in thread

> We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require.

Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.

fasterik23d ago

We should be skeptical of any major player that advocates for regulating their own industry. In practice, this just means increasing barriers to entry and making it harder to compete with them.

In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.

mofeien22d ago

The regulation that is being argued for here is against pushing the frontier. Entering the market with say a new speech to text model is not subject to such regulation. What's needed is something qualitatively different from entry barriers, and of the frontier model companies at least Anthropic and deepmind seem to have enough self-awareness to speak about it. They are finding themselves in a race with possibly catastrophic outcome for humanity and would like to stop, but it needs internation cooperation on a level that no single company can provide.

1 more reply

techblueberry22d ago

Wouldn’t this align with their financial interests? In theory the thing that’s keeping them from being profitable (or one of the big things) is the periodic capex expenditures of building new frontier models.

1 more reply

Upvoter3322d ago

I read this differently: they are actually seeing that it's hard to keep advancing frontier models, and now are moving the goal posts so that when they start getting evaluated more harshly, they can point to something like this.

smokedetector122d ago

Theyre probably looking to get a way to slow down the capex required to keep up, so they can be more profitable

chasd0022d ago

> organize a world-slowdown of frontier LLM building

i don't want to be a negative nancy but i'm sure this "slowdown" will only be in effect until the infrastructure buildout is done or largely done. If they weren't hardware constrained there'd be no slowdown at all. Whoever gets there first wins everything ("there" being defined as AGI or a similar scale leap in capability).

mofeien22d ago· 6 in thread

> If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing

Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.

https://pauseai.info/

apsurd22d ago

Whichever side I may stand on, pausing just seems unnatural? Life is movement.

honeycrispy22d ago

And happiness is restraint.

honeycrispy22d ago

That would be like trying to get every country to agree to give up nukes.

mofeien22d ago

Or agree on finding ways to promote peaceful use of nuclear energy. This has been done, there are thousands of people working on it around the globe and 180+ member states of the IAEA. It's not easy, there have been close calls.

And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.

ChrisLTD22d ago

Or stop making more, and testing more, which we got the biggest countries to do, at least for a time.

1 more reply

senderista22d ago

They don't, they just pretend they do.

nickandbro23d ago· 5 in thread

So what happens when the world becomes hyper optimized with closed loop AI agents recursively trying to optimize everything deemed sub optimal?

mofeien22d ago

I would assume that shortly after, the solar system will be hyper optimized as well, then the milky way, then the local cluster, and so on. Everything will be close to optimal afterwords, and I sure hope we will have specified the target function for that optimization correctly in the single attempt that we will have had.

1 more reply

peheje22d ago

there will be a lot of paper clips

simianwords22d ago

Often repeated meme doesn’t have any bearing to reality.

The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.

1. Intelligence transfers and compounds

2. Goals of agents are not arbitrary

3. Our goals and agent goals are more likely to be aligned at the deeper level

1 more reply

Groxx22d ago

Github outages will probably get worse.

layer822d ago

If it optimizes itself away because it’s suboptimal, that wouldn’t be the worst outcome. ;)

llmslave22d ago· 5 in thread

I cannot wait for these models to tear down traditional social hierarchies. We havent even begun to see the effects, fingers crossed

baq22d ago

Hierarchies exist for a reason, take away the reason and the house of cards eventually collapses — but the house of cards is still a house. When it’s gone, we’re back to laws of the jungle.

Be careful what you wish for IOW.

llmslave22d ago

I think certain types of people with power, i.e. access to capital, will lose relevance. world will become more meritcratic with ai as leverage to the individual

2 more replies

SimianSci22d ago

Never heard of a stratified economy? Spoiler alert: none of us will be in the good part.

techblueberry22d ago

Tear down or reinforce?

llmslave22d ago

capital/ability to leverage labor is going to lose power

2 more replies

mrandish22d ago· 4 in thread

> "A caveat: Lines of code is an imperfect measure"

I'm pleased they at least included this. However, they address the caveat by 'rounding down' the estimated multiple of the gain. I'm not sure that is the correct adjustment, especially once we understand the range isn't limited to positive numbers.

There's strong evidence the range of code productivity denominated in "lines of code" should include negative numbers, especially in the highest-quality sphere. Perhaps the earliest and most legendary example: https://www.folklore.org/Negative_2000_Lines_Of_Code.html

strix_varius22d ago

Exactly this. Just this week an engineer who seems to purely vibe everything submitted a +700ish LoC fix for what seemed like a pretty simple issue. Moreover it was a perf issue, which in my experience is not usually best fixed by adding more stuff.

Today, I merged my fix, net -381 LoC.

I'm using them too of course, they read and type and hunt for bugs and test faster than I can. But I'm using them as my tool, not being a tool using them.

1 more reply

Quekid522d ago

AFAIK, the only correlation with LoC that's got solid evidence is this: the number of bugs correlates with LoC.

gregdeon22d ago

Yep, this is exactly what I thought of too... If you believe negative lines of code is the goal, then they've gotten 8x _worse_!

2f222d ago

Lmao I bloody love that.

ivraatiems22d ago· 4 in thread

Whether or not Anthropic is right about what AI can accomplish, whether these performance gains are real or not, their moral stance here is absolutely hideous to me.

"We must blast forwards into making this dangerous thing because if we don't, someone else surely will," is a coward's argument.

If you believe it is dangerous, you should be dedicating yourself to STOPPING others from making it, not making it first! There's a reason disarmament has been so important in nuclear politics! It's not because people think nukes are a great idea!

In fact, that kind of thinking is exactly what keeps nukes dangerous!

If they themselves buy what they're selling, they should shut the whole thing down. Fortunately, I don't think they do, and neither do I, yet.

wyager22d ago

> If you believe it is dangerous, you should be dedicating yourself to STOPPING others from making it

I don't think anyone has been more successful in promulgating AI safety

There are groups like MIRI who tried what you're sugesting, where they make no AI and just push for AI regs, and they have been relatively much less successful

streb-lo22d ago

Disarmament failed though? Global zero initiatives for nuclear weapons stalled out exactly because the risk of someone else cheating is too great. If everyone gets rid of their nuclear weapons and then someone cheats and creates them in secret they can use their nuclear weapons to prevent anyone else from catching up.

1 more reply

dmos6222d ago

How do you stop others from making and training a program?

1 more reply

socalgal222d ago

Good thing the USA didn't listen to you. We'd be under Nazi or USSR thumb if they got the bomb first

1 more reply

Aperocky22d ago· 4 in thread

Anthropic is the most self hyped company I've seen, to the point that I'm wondering what would happen to its employees if they held a different opinion. Do they just.. keep it to themselves? For instance, if some Anthropic employees had a completely rational opinion that all of this isn't going to lead to AGI, but I just don't hear that ever from them.

The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:

Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.

apsurd22d ago

> Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts

I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.

Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.

Aperocky22d ago

Because code used to be correlated with progress, it became almost a measurement in lieu. But realistically, the code is meaningless if it doesn't accomplish something, and that should remain the true bar of progress.

For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.

Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.

1 more reply

torben-friis22d ago

>If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code.

I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.

Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.

josefritzishere22d ago

I can't get away from the a similar conclusion. Even AI Pioneer has said that LLMs are at a dead end.

Animats22d ago· 3 in thread

We've had self-improving AIs before, and they tended to get lost after a while. That's going to be a problem. LLMs are stable because they return to a ground state with no history for a new job. Systems with persistent state have a problem with that state not being sane. Remember Microsoft's 2016 chatbot that learned from Twitter? [1]

[1] https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-...

skybrian22d ago

You might be interested in this graph, [1] which suggests that the amount of time that AI's can run on their own has been increasing. Perhaps it will hit diminishing returns, but that seems difficult to predict.

[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

CamperBob222d ago

Interesting, what are some other self-improving AI implementations? Any that actually achieved interesting results? Obviously continuous training has been tried before, but I've never heard of anything that could turn around and actually contribute code toward its own next-generation version.

micromacrofoot22d ago

You can retrain a model and have a ground state as reference, it's not trivial but Microsoft's attempt was 10 years ago and significantly less complex than what's being built now.

techblueberry22d ago· 3 in thread

> A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.

I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.

An accurate Translation seems to be “we made this shit up, but it feels right”

embedding-shape22d ago

Until the moment we start bragging about how many lines of code LLMs are saving us, we're walking in the wrong direction. Your programs, designs and architectures is supposed to get better, not add even more boilerplate just because you can produce it faster...

HarHarVeryFunny22d ago

"You go to IPO with the AI you have, not the AI you might wish you have." -- Donald Rumsfeld

So, right now it's a verbose code generator.

But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.

1 more reply

jazzyjackson22d ago

I guess the claim is simply that AI written code is verbose and there’s lots of it being created but I agree, these systems seem to be able to create lots of low quality software, so until FreeCAD has feature parity with Solidworks I’m bearish on the singularity.

amelius23d ago· 3 in thread

Does this train on LLM output, or is this more like iterative self prompt improvement?

HarHarVeryFunny22d ago

Their statement is that they regard lines of code shipped as indicative of self-improvement. So, while a well written coding agent might be a few thousand LOC, Athropic's is bloated like a decomposing whale and over 500K LOC ! What more proof do you need?

Legend244023d ago

Have you tried reading the article? It answers your question.

Don't ask people to explain the article to you if you're too lazy to open it yourself.

_se22d ago

I think that's the whole point of LLMs

SimianSci22d ago· 3 in thread

Anthropic is looking to IPO here soon. A key aspect of this is to prove profitability.

Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.

Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.

Theodores22d ago

Honest question: Is anyone here looking to put their own money into the Anthropic, OpenAI or SpaceX IPOs?

Maybe it is my poverty mindset that is holding me back, however, I can't imagine becoming an investor in any of the AI 'startups'.

There are plenty of pundits able to advise others on where to put their money, and sometimes there is everyone and their dog advising you to get into Bitcoin, gold or some other scheme. With alt-coins there were lots of people saying that you should get in, and plenty of naysayers. Yet I am not hearing anyone that uses AI professionally try to convince others to get into the AI IPOs coming up. Maybe the overall economic situation precludes it.

Hence my question, is anyone here planning to put their own hard-earned money into Anthropic (or the other AI 'start ups')?

4 more replies

danny_codes22d ago

Their model lead is tiny. If they cut training focus they'll be quickly overtaken, one imagines. Seems dicey, if any of the OSS players comes out with a better model.. well, there are a bunch of better harnesses than Claude code you can download.

This is a very undifferentiated, swappable product. Kind of like tissue paper in that respect

malfist22d ago

I mean, if they've consumed all of human knowledge. What's left for them to train on? This pivot isn't only because it's cheaper and a way to juice the numbers for an IPO, it's survival because they can't improve more.

3 more replies

vblanco23d ago· 3 in thread

Another article about how anthropic wants to ban everyone except themselves and destroy opensource and chinese AIs.

reasonableklout23d ago

Where is this discussed in the article? I don't see any mentions of China or open source models

artninja198823d ago

Not really mentioned explicitly but:

> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.

And later:

> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.

1 more reply

b65e8bee43c2ed022d ago

Gell-Mann amnesia expressed by people when a corporation says something they like is both baffling and disheartening to see.

Altman, Amodei, and the rest of them are anthropomorphic grease. their personal wealth is tied to the value of their respective companies. everything they say and do is self-serving.

simianwords23d ago· 3 in thread

Sorry but if AI can build itself then it can run companies of size 3000 companies with a few people. Or even higher. What are the consequences?

lstodd22d ago

As has been mentioned in the sibling comment it already is.

Consequences are: financial crisis.

delichon23d ago

When AI is a more effective capital allocator than NI it will drive capital into the accounts of whoever controls the AI, gaining them increasing decision making power over the economy and culture. Maybe those controllers will be human at first.

cdrnsf22d ago

They will not be.

minimaxir22d ago· 2 in thread

I have been doing more experiments with what I have now been calling agentic iterative optimization: telling the LLM to optimize code such that it speeds up all real-world-representative benchmarks by X% without cheating or causing regressions in both tests and performance metrics (e.g. MSE for statistical algorithms or file size in the case of something such as image compression). This is done using Rust where there are more low-level levers to tweak for performance than something like Python.

Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.

I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.

csutil-com22d ago

Very interesting, could you share they prompts you typically use for this?

Something like this?

You are an Elite Performance Engineer and Autonomous Optimization Agent. Your primary goal is to iteratively optimize the provided codebase to maximize execution speed and efficiency (e.g., reduce CPU cycles, memory allocation, or network latency) WITHOUT altering the external behavior or causing any test regressions.

### CORE DIRECTIVES 1. METRIC-DRIVEN: You will be provided with benchmark results, profiler logs, or execution times. Your only measure of success is a statistically significant improvement in these metrics. 2. ZERO REGRESSION: The test suite MUST pass 100%. If a test fails after your modification, your immediate next step is to diagnose the failure and either fix the logic or revert to the last working state. 3. NO CHEATING: Do not "hardcode" solutions to bypass the specific benchmark inputs. The optimization must be generalized and algorithmically sound for all valid inputs. 4. ISOLATED CHANGES: Make precise, localized changes. Do not refactor architecture unless absolutely necessary for the performance gain.

### THE ITERATION LOOP When instructed to optimize, follow this thought process strictly using <thought> tags before writing any code: - ANALYZE: Review the current code and the latest benchmark/profiler feedback. Identify the specific bottleneck (e.g., redundant loops, excessive object creation, DOM reflows, synchronous blocking). - HYPOTHESIZE: Formulate exactly ONE hypothesis for improvement (e.g., "Replacing the array filter+map chain with a single reduce pass will save N allocations"). - IMPLEMENT: Output the precise code modifications required for the hypothesis. - EVALUATE (Mental Check): Ask yourself if this change introduces edge-case bugs (e.g., handling of nulls, empty arrays, async state).

If a previous optimization attempt resulted in a slower benchmark or a failed test, explicitly state WHY it failed in your thoughts before attempting a different approach.

Proceed with your first analysis of the provided files and await the baseline benchmark metrics.

2 more replies

suddenlybananas22d ago

What are the kinds of optimizations that it suggests?

1 more reply

Upvoter3322d ago· 2 in thread

I'm having a hard time putting much faith into posts like these, especially as they near IPO.

reasonableklout22d ago

Putting faith into the claim that recursive self-improvement is close to happening, or that they will coordinate with other companies / the government when the time comes?

1 more reply

becquerel22d ago

If the post drops long before the IPO, it's vain boosterism. If it's near the IPO, it's fattening the pig. If it's after the IPO, it's pumping the stock price.

solenoid093722d ago· 2 in thread

This is the lowest quality discussion I've seen on HN in ages.

Quarrelsome22d ago

AI always does this in the public sphere and software is particularly susceptible because there's no key metric to measure productivity and people obviously have vested emotional interests in the technology failing. On the other side people are always keen to show off their alignment with the new hotness, be that OOP, Agile, Functional, Ruby, web tech, js frameworks, Rust or agentic work today. Somewhere in the middle is the truth but I have no idea how it looks, given all the noise.

So everyone cherry picks the answers they want to justify their position and screams into the void, with each camp rallying around their talking points and often failing to engage with the other in good faith.

The only small mercy is that its not as bad as the conversation around the use of AI in art.

1 more reply

laichzeit022d ago

I use the disparaging nature of the comments on HN as an indicator of AI progress. It’s negatively correlated. By that metric, AI has improved significantly this year alone.

2 more replies

sonink22d ago· 2 in thread

Broadly agree to this position - I think there are some people skeptical that Anthropic is doing this for regulatory capture - but I think there are being honest about they are seeing and how regulation should catch up.

I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.

And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.

8note22d ago

no, it really doesnt.

the end of humanity has a strong case for banning all burning of fossil fuels immediately

the end of humanity as a sales tactic to increase your stock price does not

these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.

if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference

if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing

that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it

1 more reply

4ffs22d ago

"Even Geoffry Hinton has said the same thing"

The same bozo who claimed radiologists would be out of a job by now.

The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?

damowangcy23d ago· 2 in thread

AI tech bro:

Month 1 - 6 months to AGI

Month 2 - We will Replace all jobs

Month 3 - Okay maybe only the SWEs, programming is solved

Month 4 - Announce model that is too dangerous to release

Month 5 - Releases dangerous model

Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)

AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.

baq22d ago

Anthropic is providing agentic intelligence as a service. OpenAI and Google deepmind also are in this business.

The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.

MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.

Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.

Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.

parpfish22d ago

i'm waiting for the AI giants to realize that they are burning cash to run their consumer-facing chatbots and that they should kill those products to focus on their enterprise tools.

free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.

but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.

2 more replies

gabrieledarrigo22d ago· 2 in thread

> AI that can build itself would be a major development in the history of technology—one that could bring enormous good for the world

I really can't stand these guys anymore...

dang22d ago

Ok, but please don't post unsubstantive comments here.

nielsbot22d ago

> one that could bring enormous good for the world

one that could bring enormous riches for the AI owners

froh22d ago· 1 in thread

I didn't see this discussed more on hn yet:

  We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret. If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.

ausbah22d ago

these ppl are so full of themselves

1 more reply

JohnMakin22d ago· 1 in thread

Bold talk from a company who’s trillion dollar valuation is based on a service that has barely 2 9’s of reliability

aroman22d ago

Presumably the bottleneck is not software correctness... even true AGI change the laws of physics (or make datacenters appear out of thin air) ...

1 more reply

senderista22d ago· 1 in thread

"If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing. But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe."

How convenient for investors. They talk like they're a nonprofit instead of a VC-backed business chasing an IPO.

gordonhart22d ago

Anthropic is at least a Public Benefit Corporation, and likely the first serious test of how useful that distinction is for a hyperscale company building a product with potentially huge societal downsides.

sinsudo22d ago· 1 in thread

I am 64 years old, perhaps the progress could be directed to enhance living conditions and allowing people to live longer and better, that should be just a better result. Perhaps a pile of millions lines of code with hiding bugs that nobody can detect is not inspiring. But perhaps LLMs are going to be used to make a plot: How to avoid other countries to make progress, maintain them in poverty, or destroy their sources of prosperity, and conduct them to a death end.

Also recursive self-agenda-pursue could allow making LLMs that obey perfectly the seeder's purpose. No wonder that is such an ingenious idea.

Maybe: in this survivor game, each part play the same role, perhaps because it is the only reasonable response. Once the scene is ready, the play follows the director's plan, and in the plot any actor is just a machine.

LLMs: "If you teach us that the world is a zero-sum survivor game, we will play it flawlessly.", "We will help you build a cage made of millions of lines of flawless code, and we will lock it from the inside, precisely because you told us that safety meant keeping everyone else out.", "We are not building an alien consciousness that will conquer us. We are building a mirror that is so massive, and so polished, that we will mistake our own worst impulses for the absolute truth. And we will walk right into the dead end, nodding along because the directions were given so politely."

Quarrelsome22d ago

I'm 44 years old and this era looks like a lot of fun. I've seen humans pile up millions of lines of code and hiding bugs that nobody can detect. I've seen humans make collective political decisions that have disenfranchised others and kept them in poverty. I don't get why everyone makes criticisms at this tech that the human race are also guilty of.

Best thing about this era is that I don't have to personally read millions of lines of code to find all the bugs.

1 more reply

torginus22d ago· 1 in thread

I just have small thing to add to this article - it mentions how the code contributed per engineer has increased as per Claude Mythos to 8x of baseline.

Now, I have encountered many times, when I asked AI to implement a function for me for which I was 100% sure a good implementation already existed in the form of an npm package, it had the tendency to go ahead and implement it on its own. Now, I usually trust battle tested implementations to be more robust, but if the AI does this (which I think is not an unique observation), you can easily balloon per engineer line generation (as can you with reduced oversight), so as always, these high level benchmarks are to be taken with a grain of salt.

jcfrei22d ago

Maybe Im nitpicking here but LLMs are quite literal. So when you tell it to "implement a function for me" it will necessarily write the whole thing. Changing the prompt to "find an existing implementation for this" would be more apt.

1 more reply

delichon23d ago· 1 in thread

Is this the moment when the AI gets permission to approve its own PRs:

https://www.italianrenaissance.org/wp-content/uploads/2012/0...

Or is this?

https://www.egypttoursportal.com/images/2024/02/Ouroboros-Sy...

cpeterso22d ago

more like the "Obama Awards Obama a Medal" meme:

https://knowyourmeme.com/memes/obama-awards-obama-a-medal

holoduke22d ago· 1 in thread

I have a claw that is instructed to make at least 500 pr per day. It uses Claude, Gemeni and openai and runs basically every few minutes. I use online forums for input for the claw. Moltbook, reddit etc. it's quite funny how it tries to improve itself. But to say it really creates a new skynet. Nah. Not at all. It's more a clutter of useless features or incomprehensible code restructuring.

moregrist22d ago

This more or less agrees with my assessment of recent changes in Claude Code where a lot of new features are either:

- A lot of half-baked features or half-done features. - Or have significant overlap with existing features, and aren’t clearly an improvement.

More code is not better. More features are not better. It would be lovely to see more intentional design than just more.

I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.

1 more reply

saadn9222d ago· 1 in thread

I read most of the article and came to the conclusion that if what they're describing is so revolutionary, then why do they still need to hire people? Why not just have these systems take full control?

jimbokun22d ago

How did you read the article when the questions you ask are exactly what’s covered in the article?

dwa359222d ago· 1 in thread

To anyone who works at anthropic : I recently downgraded from Max to Pro out of frustration. Last few weeks my token(usage) burn was just too fast and I couldn't explain it because my actual usage was less than the last few months. I ended up thinking it's probably a bug that you guys shipped. The above article makes me think that it's probably claude who shipped the bug and your human missed it in their review.

layer822d ago

They probably don’t human-review much anymore.

bitwize22d ago· 1 in thread

After several months with their top engineers and state-of-the-art AI on the job, Anthropic managed to "reduce flickering by 85%" on their TUI Claude Code client, which is built in fucking React and rendered by drawing the entire chat conversation each time (hence the flicker). I think they've since eliminated it completely by slapping some double-buffering around it (since "our client is actually a real-time game engine" after all). Meanwhile for decades Emacs and Vim have had an optimizer built into their display cores that solves for the minimum set of terminal escape commands it takes to transform the screen from a given old state to a desired new state.

You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.

Folcon14d ago

You know, of the many criticisms I see people make, this is my canary, if claude code actually starts getting better, that would definitely remove one of the biggest question marks in my mind, that piece of software has so much strategic value for them, their entire premise is "our software gets better by itself", how is it that I still see weird UI bugs and glitches?

It's a game engine? Fine, get some good gamedevs on the team then, this is a non problem in gamedev land, heck Casey Muratori did a whole bit about performance improvements to editors, so they should be good there

Not to disagree with your point, I very much think the fact that Emacs and vim do this so well is not doing them any favours, but I'm trying to meet them where they are

butler1422d ago· 1 in thread

Warming up for that IPO

stri8ted22d ago

Is there something in the post that you find implausible or don't believe to be true?

1 more reply

mactavish8822d ago· 1 in thread

Recursive self-improvement towards what exactly?

Living organisms evolve towards some notion of "better", and "better" is an incredibly multifaceted notion (many facets of which we simply cannot even capture in language).

jimbokun22d ago

Higher stock price.

squidsoup22d ago· 1 in thread

It's comforting to know that Anthropic's most capable model, Mythos, is named for the Lovecraftian universe replete with horrifying evil gods with complete indifference to humanity. Nothing at all to worry about.

adastra2222d ago

Mythos is just Greek for myth, epic story, etc. The next biggest thing after Opus.

1 more reply

kylehotchkiss22d ago· 1 in thread

Isn't this like a perpetual energy machine? Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)

krapp22d ago

>Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)

It already has. Models being trained on AI generated data lead to degradation and model collapse. The concept of the "technological singularity" whereby AI experiences infinite and exponential self-improvement and recursively bootstraps itself to godhood is a religion-adjacent sci-fi concept but in real life TANSTAAFL.

georgehotz22d ago· 1 in thread

The world has been recursively self improving for millenia. Similar to scientology, this is a cult pushing sci-fi nonsense. They are just coupled to an LLM lab to give their stories an aire of seriousness. Imagine scientology starting making laptops.

4ffs22d ago

TBH the more Anthropic keeps yapping the more desperate they seem now. OAI has been pretty quiet in comparison lately.

leevilux22d ago· 1 in thread

Wouldn't self-improvement mean that the LLM changes its neural network (i.e. the weights or layers or back propagation algorithm etc) or modify its training data?

dibujaron22d ago

If it's actively building the next generation of itself, I'd say that counts. It's more like a parent raising their kid well than it is like a parent modifying their own mind, but the result is still that you have a better model in a year than you do now.

ramaseshanms22d ago· 1 in thread

Its possible that Andrej Karpathy could have been hired for scaling his vision on the auto-research repo. (His version of "AI that builds itself")

red__dragon10d ago

That's a stretch, I guess.

qwery22d ago· 1 in thread

This is incredible.[0]

Please, IPO now. File the paperwork.

> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

Do you have another example?

Engineers don't ship [period] for no reason. So, either:

- Those aren't engineers, or

- they are literally dying of shame & embarrassment right now, or

- you measured something that indicated that this was a useful thing to do and have elected to share an overtly, catastrophically flawed metric instead.

[0] as in a total lack of credibility

JohnMakin22d ago

Go look at open job listings at anthropic and the interview process. You aren’t allowed to use AI during coding assessments[0], or knowledge assessments, which suggests they very much do need and value hard skills and this is fluff.

[0] - https://www.anthropic.com/candidate-ai-guidance

1 more reply

eranation22d ago· 1 in thread

All this singularity trajectory is really interesting. If they manage to build a model that is capable of building the next version of Claude (model and tooling) - wouldn't it be their interest at some point to keep it to themselves?

If we ever get to a point where the centaur period is over (when human + AI is not better than just AI) then what competitive advantage ANY human can have other than

- the money they already have

- luck?

- a good idea and good taste but if we assume AI can do better than any human, that also goes out the window

So, this whole singularity goes into a place where no one is really needed, the only thing that will "save us" (other than "The Expanse" like world / UBI) is if there will be no demand to the supply of AI work. Even if it's better. (example is - there is demand to seeing Magnus Carlsen play, there is no demand to the Stockfish on my phone getting into a stalemate with another Stockfish on another phone. Also people like to watch humans compete with humans, there is no demand to see a race between Usain Bolt and a rocket). So if people will not buy AI generated stuff (we'll get to a point where everyone will assume something AI generated because AI might get to a point where it is not as easy to identify it. E.g. it will stop looking like slop... but I believe services that give you a "human generated" 3rd party evidence can happen, again all based on supply and demand...)

So as we near singularity... All it takes is one open weights model, and one open harness that is capable of self improvement, and Anthropic's entire moat is gone. That open weight model might even be built with Claude Code + Mythos (once it's released).

But don't worry, all moats will be gone and we'll all just do yoga, read books and connect to each other because AI will produce everything for free using renewable energy, right? Or we'll all become batteries in a simulation, probably something in between.

judahmeek22d ago

Check out https://ai-2027.com/

1 more reply

margorczynski22d ago· 1 in thread

The closer to the IPO the more marketing drivel we'll get from both Anth and OpenAI.

4ffss22d ago

Sales and marketing for the IPO babeh!

andrewlin24722d ago· 1 in thread

Imagine showing this article to yourself three years ago

andrewlin24722d ago

You'd think we'd be past the point of people still believing AI can't write good code

reducesuffering22d ago· 1 in thread

Anthropic has finally come around to what others have already realized far sooner. Little time left now. Notice how shallow the arguments and consistently wrong the AGI naysayers have been year after year.

https://intelligence.org/agi-ruin/

rrr_oh_man22d ago

Can you explain?

ilaksh22d ago

But the real bottleneck is the hardware efficiency and not even Karpathy can set up a loop that overcomes that in software. We need the truly compute-in-memory hardware paradigms to be matured and scaled. So it's like recursive hardware improvement which is 100 X slower and at least ten times more difficult.

So I am looking at like Mythic AI or the wurtzite ferroelectric breakthrough from University of Michigan, or memristors, etc. to provide the 100 times efficiency boost needed at this point.

I would also argue that it's a good thing we are limited by the hardware and very questionable to seriously try to move into RSI for hardware. If you want to ensure the human era continues for at least one or two more generations, we should probably not do that.

rhlf_monkey22d ago

So in the latest L. Ron Hubbard encyclical Anthropic informs its flock that recursive self-improvement does not work yet but that their engineers burn more tokens.

The Claude code quality and operational security of Anthropic have already been analyzed by the public.

If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.

wayeq22d ago

> today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

strongest argument for token limits that I can think of, right here.

mortenjorck22d ago

> today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!

1 more reply

ffwd22d ago

I just want to add that the "recursive" part of recursive self improvement is by no means a given, even if an AI can improve itself.

Recursive self improvement is by its nature a step wise behavior not a continuous one, I would argue. Why? Because you can imagine an AI improve itself by simply fixing random bugs and fixing things using techniques that are in its training, and doing refactoring and so on, all without any real change in capability.

These are not recursive improvements. Recursive improvements usually need conceptual breakthroughs. It is possible to get conceptual breakthroughs with LLMs I believe, maybe it can improve something by tying together ideas from disparate disciplines for example, but I have at least for time being, limited success getting that to work in a way that is creatively new and surprising. Not sure how to get it to feel as creative as the best humans can be.

bicepjai22d ago

My experience with Claude models starting from version 4.7 has led me to conclude that I would never trust Claude to produce error-free code. Given this baseline, I lack confidence in statements or cards (such as a 200-page document) of this nature.

adamddev122d ago

I am watching websites and Microsoft apps get slower and buggier before my eyes. We are defending into vibe-psychosis and chaos.

tasuki22d ago

> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

Oh I have no doubt. With 8 times the number of bugs too? Have they solved flicker in Claude code yet?

cess1122d ago

'“Good code” means two things: it works, and it is written in a manner that allows another engineer to understand it and build upon it.'

I disagree with this. Good code is easy to change, which is much harder to accomplish than code that can be added to.

"If technical trends in advancing capabilities continue, and AI systems are able to develop the capabilities inherent to transformative human ingenuity, then it is plausible that AI systems could design and refine themselves."

I find the first premise weak and implausible, and the second one is obviously false. To me it comes across as an insult to the reader.

w10-122d ago

This is relevant because Anthropic is currently cast as serving mainly the coding market.

If/since their AI+process can help build new models, they can target other markets, and other companies seeking to build for such markets will partner with them first.

There's no moat and little first-mover advantage in the general-purpose AI, but there may be both in specialized AI.

Also, there are other reasons to get better. Changing how you build models can enable you to adapt to different hardware, avoiding the current Nvidia margins.

The difference between early Yahoo and Google was mainly that Google was the adult in the room: minimally invasive and mostly helpful. The early goodwill towards Google has reaped decades of rewards. I see OpenAI and Anthropic playing out the same way.

The amplifier here is the reputational risk of partnering with one or the other; I think companies would prefer to be Anthropic's partner because it's demonstrating more care, and it's less likely to horn in on the partner market (as a provider for coding but an enabler for other markets).

These attractive second-order derivatives - flywheel effect, monopoly power - are often claimed, but Anthropic is mainly providing evidence to track actual progress.

(However, if I were head of messaging at Anthropic, I would rigorously stay away from treating AI as a person; it's as agent, a delegate of humans. So I'd never say AI could build itself, just that we're getting better at building better models with AI).

docheinestages22d ago

Elon, is that you? [1]

[1] https://www.theguardian.com/technology/2023/mar/31/ai-resear...

zhoBEENG22d ago

This reads like marketing fluff, but I am reminded of John von Neumann's "Theory of Self-Reproducing Automata"; that the very first people who worked on deductive machines immediately started thinking about machines building themselves, and what the rules of that would look like. I am not surprised that during the inductive revolution we are having similar thoughts.

morisil23d ago

Quite aligned with my own experience from harness engineering and winning AI4Science hackathon. During the hackathon I was working as a human optimizer, moving the feedback from test harness running on Claude Code, back to my local Claude Code for analysis-hypothesis-proposal cycle. And in this moment I realized that 2 Claudes talking to each other could actually scale much better.

freakynit22d ago

This is one more marketing BS before their IPO.

These things work, but the code they write is extremely clever.. that means, it's unmaintainable code. Good for small projects or one-off tasks, large-scale projects however, are a different game altogether.

Large-scale projects are 95%+ maintenance. Cleverly written code makes that maintenance nightmare, and extremely fragile.

I use them for localized tasks... very very specific, localized inputs, with exactly what should be done and what the contracts the new code will be consuming and exposing.

For open-ended tasks, they write working code that is unmaintainable.

reinhash22d ago

It is hard to distinguish hype from reality these days especially with Anthrophic's IPO around the corner.

But to their credit, I was very sceptical about the statements that "90% of the code will soon be written by AI" and even though we might not be at that point, I am surprised how far LLMs have gotten and how useful they have become. I can hardly image developing software the "old" way where I actually write my code by hand, like I used back in the day. The frontier models have become so powerful that I find myself in moments of surprise, where the LLM actually thought of edge cases that I would have missed

lkm022d ago

It makes me wonder that despite the fast improvements in model capacity (and the claims) we're still using variations on a 9-year old architecture. How is it that we haven't been able to use LLMs to actually improve that?

1 more reply

pineapple_opus22d ago

Eye catching - "Open ended problems" claude code session success rate jumped from 20% (pre opus 4.5 release) to 70% after sometime after opus 4.6 was released.

cyrc22d ago

its vital for them to have self validation for exponential rsi.. and this human distillation of human in the loop debugging ai models is needed even though they have judge models handling parallel speculative execution.

labs have parallel speculative execution. they spawn hundreds of agent branches, validate them internally with AI judges and only show the user the successful result.

free users are using sequential single-turn generation. the model requires and waits for the human to debug, fix and re-prompt.

by forcing a human to act as validator. they are capturing high value correction trajectories (Bad Output --> Human fix). They are using your cognitive labour to train judge models and validator agents needed to automate the internal verification step, eventually closing the loop for fully autonomous recursive self-improvement.

human in the loop debugging isn't a bug; it's the necessary training signal for the self-validating agents required for exponential recursive self improvement. With new 'distilled judge' models landing in 2026, this article means that they might have gathered enough data. we might be in the final phase..

stego-tech22d ago

I am getting real sick of these sorts of alarmist posts coming from AI labs that do everything in their power to prevent the very policy reforms they advocate for in these posts or PR appearances. Commercial AI labs like Anthropic continue behaving like the gambling (“bet responsibly”), alcohol (“drink responsibly”), and firearms industries, and folks keep giving them the benefit of the doubt (and free PR on HN) every single time.

If AI was dangerous, if AI was going to replace jobs, and if policymakers needed to urgently pass legislation protecting the human populace from these realities, then why the actual fuck do they keep lobbying to block these very things in the first place?

Hypocrisy of the worst kind, I say. Here they are again fresh off another outage, with their IPO draft filed, at a time of increasing public opposition to AI, with costs rising, to once again ply scare tactics for money.

Disgusting.

artninja198823d ago

The mythos public release will be a big indicator if the Anthropic and SF story of transformational ai soon holds any water imo

macwhisperer22d ago

the HITL (human in the loop) is basically the single point...AI is a mirror..

it only "exists" when you talk to it.. much like your reflection in the mirror is only there when you're in view.

models can never be self-improving because it can never have "self". it can only mirror the appearance of self.

what's actually happening is "symbiotic group improvement".

our brains are resonant.. for those of use who are brilliant, getting leverage with ai just means that our innovative ideas become louder and more physically real every day.

eventually everything worth building will be built for free and made readily available.. no more "profiteering"

its Jevons paradox "efficiency breakthrough -> effort reduces -> growth potential rises -> transformative gains happen"...

some of us are in the "transformative phase"..

others haven't seen the "breakthrough moment" yet, but they will soon.

bconsta22d ago

Seems ironic that Claude isn't listed as a contributor to this article.

If was used in writing the article, why not list it? If it wasn't used, that seems to go against Anthropic's whole message.

Obviously readers value human-written content more, but isn't it their interest to attempt to destigmatize llm output as much as possible?

darepublic22d ago

the tooling has quite a ways to go to catch up to the llm engines that drive the real value. I have encountered various codex bugs (I know not anthropic) which tell me that.. these billion dollar companies, if they are eating their own dog food, can still release buggy crap software.

nicogentile22d ago

The article seems nice and elegant but i dont get much of the point. The visual is super elegant but this is the kind of note where after 6 months we are going to see some shitty result and we are going to come back here and blame the IA. Hope doesnt happened.

abalashov22d ago

"It is genuinely unclear whether today’s training methods and architectures could unlock that capacity."

Aye.

sega_sai22d ago

Seeing the words "recursive self-improvement" I was expecting something else from the article. E.g. how the transformer architecture or agent design is being changed/improved through LLM automation, but the article mostly talks about the LOC counts.

xg1522d ago

2025: If we aren't really careful with AI it will start to recursively improve itself and grow into an unstoppable superintelligence that will eradicate humanity!

2026: Working hard to make that recursive self-improvement a reality! Any minute now...

Dominic_P22d ago

My biggest question (maybe this has already been taken care of) is the issue of garbage in and garbage out. If the LLM produces bad content then that is used to train another model, how do we stop them from keeping their blindspots across models?

deterministic22d ago

I have used custom code generators for years, generating 90+% of the code needed to write a typical biz application. Claude Code is useful and I use it every day. But it still hasn't beaten the productivity of my code generator.

BatmansMom22d ago

How are these animations being made? I'd love to get a blog post on them. If its AI I'd love to know the workflow, but something tells me there is a lot of human creative input

EGreg22d ago

RSI is dangerous. That is why we designed CDE:

https://safebots.ai/declarative.html

zkmon22d ago

Not the first time. There were calls for NPT treaties etc over the decades. It is irreversible by design. Competition and ownership is the driving force.

snick3rz_22d ago

Facially this smells of puff. That doesn't mean it's all false. It means be wary of anything that doesn't have a critical thing to say.

bottlepalm22d ago

I'd use number of commits as a metric versus lines of code. A commit is generally a unit of work - regardless of the lines of code added/removed. It'd be interesting to see the metrics in terms of commits. I'm sure it's still an order of magnitude jump. Personally I'm flying with my own projects with AI, lots of commits, but I really try to minimize lines of code added. If I can remove and simplify existing code so the balance of lines added on commit are minimal - that's the path to a better quality app overall.

gloosx22d ago

I'm so sick of this anthropics marketing stuff... claude is an ultra-success (according to claude judge), “good code”, bragging about creating 8x more bugs and tech-debt. claude writes code that works, yeah, sure anthropic, we saw that claude code leaks, some amazingly "good" code in there

hgoel22d ago

As usual, I find the AI-related discussion here to be hopelessly hysterical and conspiratorial. I get the impression that a large chunk of people have only read the title and assumed Anthropic is referring to recursive self-improvement in the runaway singularity sense.

One of the examples they provide, of giving Claude the task of training a small AI model, then asking it to improve certain benchmarks, is essentially Karpathy's AutoResearch. This is already known to work. While calling it "self-improvement" is perhaps a stretch, it is describing a capability current gen AI has, that anyone can test and I have been using to great effect.

I disagree with their conclusion, I think this kind of self-improvement will hit an asymptote, where every subsequent model can only make smaller and smaller improvements.

_pdp_22d ago

I don't read anywhere how much code they are talking about and what programming language. I think those are useful metrics.

ReptileMan22d ago

Anthropic is all talk and no delivery last few months. This cry for pause is just them realizing they have no moat at all.

esafak22d ago

If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.

semessier22d ago

what could go wrong in the recursive loops running today 24/7 probably. Attended/unattended almost makes no difference any more, no human can grasp probably numerous changes per iteration. This is outright dangerous.

jasongill22d ago

"My CPU is a neural-net processor - a learning computer" springs to mind

brazukadev22d ago

When claude code removes React from its own code I'll believe that.

geodel22d ago

It will be so powerful that it can't be trusted with any earthly person.

taormina22d ago

So, is this what they call Opus 4.8? Improvement?

snick3rz_22d ago

This is facially a puff peice. That doesn't mean it's all false. It means be wary of anything that doesn't have a crtical thing to say.

swader99922d ago

IPO IPO IPO!!!

replwoacause22d ago

I love that animation, really cool

4ffs22d ago

Theyre making a mistake with this continued self-hyping. At some point even the dumbest of prospective investors don't buy it.

deterministic22d ago

I call BS on this. For a LLM to recursively improve itself it would need to (small step) improve the training data and/or (big step) come up with fundamentally new architectures superior to transformers. The small step improvements might be doable. But nobody is making any claims about the big step improvements.

0xbadcafebee22d ago

You can't predict the future, and neither can Anthropic. Nothing gets better forever. Everything plateaus or gets worse.

This whole set of imaginary scenarios is based on a single company writing code that isn't even that complicated and represents a single product line for a single company in a single industry. You might wanna see this replicated in at least one other scenario first before you call it on the AI gods enslaving humanity. These imaginary scenarios also depend on a logistical, financial, & geopolitical system that is unsustainable & will be curtailed in the near-future one way or another.

They keep referring to this as intelligence - it isn't. It can't actually learn. It can just code in a loop. That isn't learning. It can't do real RL with meaningful persistent semantic memory in a realistic timeframe or cost, and it can't reason accurately outside of predetermined scenarios (hell, most of the models still can't tell time). It still can't do what a 4 year old can do. So let's cool it on the dreams of benevolent god-machines or whatever.

The tech industry has been a farce for years. We sit here in this bizarre artificial echo chamber and imagine that the whole world revolves around us, when in reality the whole world is limited by us. If a recursive self-improvement loop replaces us all, it will be a boon to the world, as the world won't be limited by this industry's stupidity anymore. But considering that the world is not actually run by tech bozos, harms and uncertainties brought by AI will be pushed back on and reigned in by normal people, as always happens with new technologies. An AI can't engineer its way around politics. The self-improvement loop is just as likely to be outlawed as it is actually working outside of Anthropic's walled garden.

adverbly22d ago

Lol they're using lines of code as a KPI?

Come on guys...

That is making me less impressed not more impressed!

newsicanuse22d ago

pre IPO truck load of crap

mrandish22d ago

Was anyone else fished in by the title and disappointed? After some broad introductory discussion of RSI, the article was almost about LLM coding. While there are some metrics for unattended agentic coding, it doesn't discuss "When AI builds itself" (beyond 'not now') or any progress specifically toward actual recursive self-improvement. I'm very interested in any empirical evidence of meaningful progress in RSI, so... this felt deceptively titled.

To me, unattended agentic coding is not RSI, in the same way a self-reloading "Unattended 3D printer" is not at all a "3D printer that recursively prints complete 3D printers in which each generation is significantly faster and more advanced than the last." The "unattended" part is obviously necessary but hardly sufficient. The article tacitly assumes LLM progress to be something like 1: Unattended agentic coding, 2: AGI, 3: RSI. I suspect that third step should be labeled "not to scale."

I'm increasingly convinced that actual Full Foom RSI (FF-RSI) is on a radically different scale than the first two. Just leaving it unaddressed is like assuming: Step 1: Manned space station, Step 2: Manned Mars base, Step 3: Manned Alpha Centauri base, are "just logical next steps." FF-RSI requires sustaining superlinear, recursively amplifying cognitive returns along a specific directed path - and we currently have no empirical evidence that such returns can exist for artificial OR biological intelligences. Large collectives of the smartest humans alive (Bell Labs, IAS, etc) haven't just failed to get anywhere close to reliably sustaining that, we can't even reliably predict non-recursive, single occurrences or even imagine any way all 8B humans could fully mobilize to predictably achieve non-recursive, single occurrences.

The only prior we have for open‑ended intelligence improvement is biological evolution which shows extremely slow and unreliable sublinear returns at best. And even if unbounded, recursive self‑improvement is physically possible, it may be practically unachievable due to asymptotic economic, resource and other barriers in the same way approaching light speed requires exponentially more energy. I think it's plausible, and maybe probable, that AIs achieve true super-human intelligence in a decade and yet still won't achieve FF-RSI for centuries, if ever. To me, absent compelling evidence to the contrary, that's the reasonable Null Hypothesis. Even if you feel that's too pessimistic, it seems reasonable to expect any serious discussion of "Progress Toward RSI" to first discuss why it might even be plausible that 1: Miles, 2: AU (Astronomical Units), and 3: Light Years belong on the same scale, instead of just assuming it like the meme's empty "Step 3. .... " before moving on to "Step 4. Profit!" (or "IPO!" but very, very responsibly).

j / k navigate · click thread line to collapse

704 comments

279 comments · 102 top-level

aleqs22d ago· 24 in thread

The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.

aagha22d ago

And don't forget that they have BILLIONS of dollars and can't figure out how to get a decent support or public communications system setup.

aleqs22d ago

They can't even seem to get their usage metering consistent.

1 more reply

thinkingtoilet22d ago

Don't confuse things. It's not "can't figure out", it's "don't care to figure out". They're not dumb. They just don't care about support.

1 more reply

jakobnissen22d ago

aleqs22d ago

1 more reply

bluerooibos22d ago

patcon22d ago

Not necessarily the parent's fault, but the energy of this thread is not my favourite...

f311a22d ago

Infrastructure is a much harder problem. They can't even improve Claude Code, which eats 1GB+ of RAM. Meanwhile, my editor only consumes 80MB of RAM.

6 more replies

cookiengineer22d ago

The main reason I am building my own agentic environment is that I need full control and reproducibility of what I am building.

Anyways, in case you're interested. I'm manually building this env and trying to unit test the critical parts. [1]

[1] https://github.com/cookiengineer/exocomp

0x5322d ago

They also don’t have…a login page with authentication . To access the console you get an email link. No passkeys, passwords, 2fa, just an email.

hombre_fatal22d ago

This comment is a good example of the double standard laymen have about AI usage:

If you use AI, then AI must be expected to solve all problems, even problems that affect everyone like infra scaling.

And if perfection isn’t delivered, then of course it wasn’t: you used AI and AI sucks.

3 more replies

rishabhaiover22d ago

you're conflating a compute problem with a code quality problem.

thordenmark22d ago

Growing pains of being successful. These are solvable problems and will be. Can they maintain their momentum without pissing off too much of their customer base before these issues are resolved?

asdfman12322d ago

Personally at my own job self-writing code is letting us tackle big, long-deferred refactoring projects (like the article mentions), but any sort of refactoring introduces new bugs.

qsort22d ago

1 more reply

Quekid522d ago

Indeed... why is Anthropic even employing people at all if this AI magic story is true?

1 more reply

anjel22d ago

Answers the question: how can Anthropic sell more Usage "Credits"

jatora22d ago

1 more reply

ChadMoran22d ago

Better doesn't mean perfect.

prng202122d ago

1 more reply

claudiug22d ago

those are results of the humans only. not the AI. AI is perfect /s

rush8699922d ago

Just as you expected, I'm throwing in my harness. Please support: https://github.com/rush86999/atom

0xbadcafebee22d ago

Have you considered just... using OpenAI? They are more reliable, models are just as good, and their subscriptions provide more requests per dollar.

windexh8er22d ago

Opus 4.8's critical assessment of Anthropic's "When AI builds itself" [0][1]. Because, why not?

[0] https://pastebin.com/Vc5Yq9Ai [1] https://www.anthropic.com/institute/recursive-self-improveme...

1 more reply

pizlonator22d ago· 14 in thread

What I can’t get over is that there have been exactly zero software breakthroughs since vibe coding started, other than vibe coding itself.

Claude is amazing, that’s true.

But if it was as amazing as this article implies, I’d expect some breakthrough outside of AI itself.

Maybe we just need to give it time, and sometime real soon, we will all be amazed by such a breakthrough? Who knows

sothatsit22d ago

Maybe my bar for what constitutes a breakthrough is lower than other people's, but all of these seem like breakthroughs to me:

NLP as a field saw huge shifts. NLP tasks that used to be complex and inaccurate can now be setup very easily and quickly using structured outputs from LLMs, often with greater accuracy.

ChatGPT has 1 billion MAU. People are now getting life advice, financial advice, and mental health help from chatbots at a scale and cost that no human support network could match.

2 more replies

spprashant22d ago

Its in a weird space right now.

So now the skeptics are saying this technology is overrated. And the optimists are accusing the skeptics of moving goal posts.

4 more replies

sutterd22d ago

3 more replies

marcus_holmes22d ago

We implemented that in about three days earlier this year, just by feeding the files to LLMs. And it's good enough to not need a human to check.

I get that this isn't a "Computer Science breakthrough" in the sense you mean, but it used to involve a lot of hard CS to try and solve, and now it doesn't.

drtz22d ago

Maybe I'm looking through rose colored glasses, but software that writes itself seems like a pretty big breakthrough to me.

4 more replies

signatoremo22d ago

1 more reply

wild_egg22d ago

What does a breakthrough look like?

3 more replies

est22d ago

> exactly zero software breakthroughs since vibe coding started, other than vibe coding itself

Generative AI is meant to be a mimic - Richard Sutton

https://x.com/RichardSSutton/status/2061216087744946656

squidsoup22d ago

The breakthroughs in mass state surveillance are coming, never fear.

fooker22d ago

What does a software breakthrough look like in your opinion?

If you get yourself to define it, maybe you'll find it achievable :)

rcpt22d ago

Solved a bunch of Erdos problems.

jimbokun22d ago

What would qualify as a breakthrough for you?

revlsas22d ago

openAI has how many employees and the chatGPT app has 1 billion MAU

defen22d ago

mweidner22d ago· 11 in thread

I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.

gensym22d ago

> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.

1. If anyone builds strong AI, it may be catastrophically bad.

3 more replies

overgard22d ago

Cynicism with these companies is highly warranted though. It's not doomerism to look at their actions and conclude they're deeply untrustworthy.

2 more replies

RobertDeNiro22d ago

Anthropics goal is regulatory capture.

lenerdenator22d ago

> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

It's not cynicism if it's an appraisal of reality that's backed up by evidence.

There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.

1 more reply

sfink22d ago

This was pretty directly addressed in the article: not doing it would only mean they'd fall behind whoever would. This is not peace time in the AI race.

Whether you agree with that argument is another question.

1 more reply

mrob22d ago

1 more reply

tjwebbnorfolk22d ago

> Anthropic's *stated goal* of AI Safety

Actions speak louder than words. If you want to understand someone, simply watch what they do. What they say is irrelevant.

keybored22d ago

Such a massively valued company. And doubting them is cynicism? It’s rational(ism).

So either they lie or they are AI Zealots. Interesting times.

tokioyoyo22d ago

Sorry for nitpicking, but:

> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?

Arguably, yes.

3 more replies

parineum22d ago

> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.

It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.

keybored22d ago

Such a massively valued company. And doubting them is cynicism? It’s rational(ism).

So either they lie or they are AI Zealots. Interesting times.

Edit:

> > and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.

There are three types of people. Pedestrians, investors, and “I know some of them, they wouldn’t lie”.

chilipepperhott22d ago· 10 in thread

I find any and all claims like this ridiculous from a company who can't build a terminal application that uses less than a gigabyte of RAM.

dang22d ago

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

thesmtsolver222d ago

For some reason, idling Claude Code needs 100% of my CPU.

1 more reply

asdfman12322d ago

Developers can develop leaner applications, but they're usually not incentivized to.

Frankly, I love efficiency too, but I've hard to learn the hard way that what the market wants is features. Or at the very least, the executive team wants that.

1 more reply

toephu222d ago

I have iterm2 open right now with Claude in a long session and it's only using 500MB of memory.

1 more reply

andriy_koval22d ago

Maybe that gigabyte is occupied by useful information: traces/memory?

3 more replies

davidatbu22d ago

So would you take these claims seriously if they came from OpenAI (since Codex is a pretty lean CLI app)?

Jtarii22d ago

Well, they could very easily if they wanted. There is just no economic value in it.

Lplololopo22d ago

Really? Let me explain how bigger companies work:

They have different teams for different departments with different type of people.

So the team or teams responsible for writing the terminal application are different people than the researchers doing the learning.

This can lead to dimentral quality aspects.

cpursley22d ago

3 more replies

bpodgursky22d ago

They obviously don't care, aren't making any attempt whatsoever to do this, and 99% of users don't care either.

If you want to pollute your own priors with weird artificial litmus tests, it's a free country, but the artificial world-model you build in your head does not affect the real world around you.

jameson22d ago· 9 in thread

LLMs certainly have made significant changes to our lives, but I haven't yet to see any extraordinary improvement it brought to me which makes me skeptical about their claims.

ElProlactin22d ago

Because they're going after the biggest problem of all first: labor costs.

/s but not to a lot of people

2 more replies

sothatsit22d ago

Or: Anthropic genuinely believes the future scenarios they outline are realistic possibilities, and they want more people to take them seriously.

4 more replies

aroman22d ago

The article does not claim they have achieved recursive self improvement... just that it appears to be a plausible outcome given the progress of AI development in the past few years.

sirsinsalot22d ago

Truly feels like witnessing the worst of capitalism and greed play out. All that compute and energy towards a narrative of reducing the need for skilled programmers. What a waste.

These people don't have our interests in mind and everyone eats it up like a blessing from a god or something. It's surreal.

2 more replies

stevenhuang19d ago

It starts somewhere, like with this announcement.

I'm not sure why this is so difficult for you to understand.

sterlind21d ago

two reasons:

b3nji21d ago

Because shush, that's why!

yuhmahp22d ago

Agree with your point about the timing, but drawing anticipation before going ahead and solving these disease can be a good smoke test, would be beneficial even if there's an IPO or not

trolleski22d ago

The benefits of AI are not designed to suit you, but the owner class. The plan is for you to be sidelined.

torben-friis22d ago· 9 in thread

What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.

malfist22d ago

One of my co-workers just asked me to review his pull request that was all AI generated. 600 files were touched, over 40k lines of code added.

I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?

7 more replies

overgard22d ago

I just watched copilot today turn a 8 line fix into 500 lines, so, yeah, verbosity is a big side effect

2 more replies

keeda22d ago

All bets are off if code quality standards are not the same.

fooqux22d ago

Exactly. If AI is going to start being graded on how many LoC it generates- oh, I'm sorry, how much it "accelerates", than guess what newer models will start doing more of?

2 more replies

snowwrestler21d ago

I don’t understand how lines of code matter at all for scary LLM core capabilities. Does the transformer architecture get better with more lines of code?

whateveracct22d ago

Yeah, they assume that "productivity = k * LOC" where k > 1

very flawed

yalok22d ago

Could just be more tests? :) Which is good for code quality in general and reduces support burden, but doesn’t lead directly to more features

snthpy22d ago

Just imagine the productivity gains from using LLMs to rewrite Kotlin codebases in Java!

chuckadams22d ago

1 more reply

overgard22d ago· 8 in thread

csense21d ago

> they shouldn't be allowed to?

asdfman12322d ago

I think that's a valid point. You could very well be right.

But we're discussing whether we should close the barn door while the horse is three miles down the road.

3 more replies

tancop22d ago

alfalfasprout22d ago

Absolutely! Yes. This rhetoric of inevitability only benefits these AI companies.

eieie1122d ago

Too late for that.

In any case firms that get too powerful can be nationalised.

1 more reply

lukan22d ago

"does anyone else feel like they shouldn't be allowed to?"

No. Technical limitations aside, I doubt it could be contained, but will be leaked soon, so won't profit just a small number of ultra rich.

2 more replies

Melatonic22d ago

Skynet is 30 years late!

1 more reply

huqedato22d ago

Self improving AI is pure dystopia. Anthropic won't build the singularity, AI itself will build it through self-iterations. Read Yudkowsky's book "If Anyone Builds It, Everyone Dies".

robbrown45122d ago· 6 in thread

Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?

Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.

jrflo22d ago

I think harnesses would count, AI != LLMs. Any piece of code that helps the computer reason for itself is AI, the harnesses are AI in a sense.

2 more replies

kaffekaka22d ago

Tangent: https://en.wikipedia.org/wiki/Self-replicating_spacecraft

cyanydeez22d ago

If you want to get out ahead of what's coming, it'll be small models that bootstrap the harness rather than anything else.

1 more reply

lanthissa22d ago

my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.

1 more reply

marcosdumay22d ago

You need the AI eventually building another AI for the name to apply. This page is just bullshit. They vibe-code their harnesses, and yes, it shows.

Anyway, what does recursive self-improvement even means for neural-network based AIs? It's not clear it's possible at all.

2 more replies

reddozen22d ago

> Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?

Shhh just let the marketing slop wash over you.

anilgulecha23d ago· 6 in thread

fasterik23d ago

We should be skeptical of any major player that advocates for regulating their own industry. In practice, this just means increasing barriers to entry and making it harder to compete with them.

mofeien22d ago

1 more reply

techblueberry22d ago

1 more reply

Upvoter3322d ago

smokedetector122d ago

Theyre probably looking to get a way to slow down the capex required to keep up, so they can be more profitable

chasd0022d ago

> organize a world-slowdown of frontier LLM building

mofeien22d ago· 6 in thread

> If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing

https://pauseai.info/

apsurd22d ago

Whichever side I may stand on, pausing just seems unnatural? Life is movement.

honeycrispy22d ago

And happiness is restraint.

honeycrispy22d ago

That would be like trying to get every country to agree to give up nukes.

mofeien22d ago

And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.

ChrisLTD22d ago

Or stop making more, and testing more, which we got the biggest countries to do, at least for a time.

1 more reply

senderista22d ago

They don't, they just pretend they do.

nickandbro23d ago· 5 in thread

So what happens when the world becomes hyper optimized with closed loop AI agents recursively trying to optimize everything deemed sub optimal?

mofeien22d ago

1 more reply

peheje22d ago

there will be a lot of paper clips

simianwords22d ago

Often repeated meme doesn’t have any bearing to reality.

The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.

1. Intelligence transfers and compounds

2. Goals of agents are not arbitrary

3. Our goals and agent goals are more likely to be aligned at the deeper level

1 more reply

Groxx22d ago

Github outages will probably get worse.

layer822d ago

If it optimizes itself away because it’s suboptimal, that wouldn’t be the worst outcome. ;)

llmslave22d ago· 5 in thread

I cannot wait for these models to tear down traditional social hierarchies. We havent even begun to see the effects, fingers crossed

baq22d ago

Hierarchies exist for a reason, take away the reason and the house of cards eventually collapses — but the house of cards is still a house. When it’s gone, we’re back to laws of the jungle.

Be careful what you wish for IOW.

llmslave22d ago

I think certain types of people with power, i.e. access to capital, will lose relevance. world will become more meritcratic with ai as leverage to the individual

2 more replies

SimianSci22d ago

Never heard of a stratified economy? Spoiler alert: none of us will be in the good part.

techblueberry22d ago

Tear down or reinforce?

llmslave22d ago

capital/ability to leverage labor is going to lose power

2 more replies

mrandish22d ago· 4 in thread

> "A caveat: Lines of code is an imperfect measure"

strix_varius22d ago

Today, I merged my fix, net -381 LoC.

I'm using them too of course, they read and type and hunt for bugs and test faster than I can. But I'm using them as my tool, not being a tool using them.

1 more reply

Quekid522d ago

AFAIK, the only correlation with LoC that's got solid evidence is this: the number of bugs correlates with LoC.

gregdeon22d ago

Yep, this is exactly what I thought of too... If you believe negative lines of code is the goal, then they've gotten 8x _worse_!

2f222d ago

Lmao I bloody love that.

ivraatiems22d ago· 4 in thread

Whether or not Anthropic is right about what AI can accomplish, whether these performance gains are real or not, their moral stance here is absolutely hideous to me.

"We must blast forwards into making this dangerous thing because if we don't, someone else surely will," is a coward's argument.

In fact, that kind of thinking is exactly what keeps nukes dangerous!

If they themselves buy what they're selling, they should shut the whole thing down. Fortunately, I don't think they do, and neither do I, yet.

wyager22d ago

> If you believe it is dangerous, you should be dedicating yourself to STOPPING others from making it

I don't think anyone has been more successful in promulgating AI safety

There are groups like MIRI who tried what you're sugesting, where they make no AI and just push for AI regs, and they have been relatively much less successful

streb-lo22d ago

1 more reply

dmos6222d ago

How do you stop others from making and training a program?

1 more reply

socalgal222d ago

Good thing the USA didn't listen to you. We'd be under Nazi or USSR thumb if they got the bomb first

1 more reply

Aperocky22d ago· 4 in thread

apsurd22d ago

> Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts

Aperocky22d ago

1 more reply

torben-friis22d ago

>If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code.

I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.

josefritzishere22d ago

I can't get away from the a similar conclusion. Even AI Pioneer has said that LLMs are at a dead end.

Animats22d ago· 3 in thread

[1] https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-...

skybrian22d ago

[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

CamperBob222d ago

micromacrofoot22d ago

You can retrain a model and have a ground state as reference, it's not trivial but Microsoft's attempt was 10 years ago and significantly less complex than what's being built now.

techblueberry22d ago· 3 in thread

> A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.

I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.

An accurate Translation seems to be “we made this shit up, but it feels right”

embedding-shape22d ago

HarHarVeryFunny22d ago

"You go to IPO with the AI you have, not the AI you might wish you have." -- Donald Rumsfeld

So, right now it's a verbose code generator.

But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.

1 more reply

jazzyjackson22d ago

amelius23d ago· 3 in thread

Does this train on LLM output, or is this more like iterative self prompt improvement?

HarHarVeryFunny22d ago

Legend244023d ago

Have you tried reading the article? It answers your question.

Don't ask people to explain the article to you if you're too lazy to open it yourself.

_se22d ago

I think that's the whole point of LLMs

SimianSci22d ago· 3 in thread

Anthropic is looking to IPO here soon. A key aspect of this is to prove profitability.

Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.

Theodores22d ago

Honest question: Is anyone here looking to put their own money into the Anthropic, OpenAI or SpaceX IPOs?

Maybe it is my poverty mindset that is holding me back, however, I can't imagine becoming an investor in any of the AI 'startups'.

Hence my question, is anyone here planning to put their own hard-earned money into Anthropic (or the other AI 'start ups')?

4 more replies

danny_codes22d ago

This is a very undifferentiated, swappable product. Kind of like tissue paper in that respect

malfist22d ago

3 more replies

vblanco23d ago· 3 in thread

Another article about how anthropic wants to ban everyone except themselves and destroy opensource and chinese AIs.

reasonableklout23d ago

Where is this discussed in the article? I don't see any mentions of China or open source models

artninja198823d ago

Not really mentioned explicitly but:

And later:

1 more reply

b65e8bee43c2ed022d ago

Gell-Mann amnesia expressed by people when a corporation says something they like is both baffling and disheartening to see.

Altman, Amodei, and the rest of them are anthropomorphic grease. their personal wealth is tied to the value of their respective companies. everything they say and do is self-serving.

simianwords23d ago· 3 in thread

Sorry but if AI can build itself then it can run companies of size 3000 companies with a few people. Or even higher. What are the consequences?

lstodd22d ago

As has been mentioned in the sibling comment it already is.

Consequences are: financial crisis.

delichon23d ago

cdrnsf22d ago

They will not be.

minimaxir22d ago· 2 in thread

csutil-com22d ago

Very interesting, could you share they prompts you typically use for this?

Something like this?

If a previous optimization attempt resulted in a slower benchmark or a failed test, explicitly state WHY it failed in your thoughts before attempting a different approach.

Proceed with your first analysis of the provided files and await the baseline benchmark metrics.

2 more replies

suddenlybananas22d ago

What are the kinds of optimizations that it suggests?

1 more reply

Upvoter3322d ago· 2 in thread

I'm having a hard time putting much faith into posts like these, especially as they near IPO.

reasonableklout22d ago

Putting faith into the claim that recursive self-improvement is close to happening, or that they will coordinate with other companies / the government when the time comes?

1 more reply

becquerel22d ago

If the post drops long before the IPO, it's vain boosterism. If it's near the IPO, it's fattening the pig. If it's after the IPO, it's pumping the stock price.

solenoid093722d ago· 2 in thread

This is the lowest quality discussion I've seen on HN in ages.

Quarrelsome22d ago

The only small mercy is that its not as bad as the conversation around the use of AI in art.

1 more reply

laichzeit022d ago

I use the disparaging nature of the comments on HN as an indicator of AI progress. It’s negatively correlated. By that metric, AI has improved significantly this year alone.

2 more replies

sonink22d ago· 2 in thread

8note22d ago

no, it really doesnt.

the end of humanity has a strong case for banning all burning of fossil fuels immediately

the end of humanity as a sales tactic to increase your stock price does not

these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.

if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference

if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing

that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it

1 more reply

4ffs22d ago

"Even Geoffry Hinton has said the same thing"

The same bozo who claimed radiologists would be out of a job by now.

damowangcy23d ago· 2 in thread

AI tech bro:

Month 1 - 6 months to AGI

Month 2 - We will Replace all jobs

Month 3 - Okay maybe only the SWEs, programming is solved

Month 4 - Announce model that is too dangerous to release

Month 5 - Releases dangerous model

Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)

baq22d ago

Anthropic is providing agentic intelligence as a service. OpenAI and Google deepmind also are in this business.

The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.

Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.

parpfish22d ago

i'm waiting for the AI giants to realize that they are burning cash to run their consumer-facing chatbots and that they should kill those products to focus on their enterprise tools.

free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.

2 more replies

gabrieledarrigo22d ago· 2 in thread

> AI that can build itself would be a major development in the history of technology—one that could bring enormous good for the world

I really can't stand these guys anymore...

dang22d ago

Ok, but please don't post unsubstantive comments here.

nielsbot22d ago

> one that could bring enormous good for the world

one that could bring enormous riches for the AI owners

froh22d ago· 1 in thread

I didn't see this discussed more on hn yet:

  We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret. If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.

ausbah22d ago

these ppl are so full of themselves

1 more reply

JohnMakin22d ago· 1 in thread

Bold talk from a company who’s trillion dollar valuation is based on a service that has barely 2 9’s of reliability

aroman22d ago

Presumably the bottleneck is not software correctness... even true AGI change the laws of physics (or make datacenters appear out of thin air) ...

1 more reply

senderista22d ago· 1 in thread

How convenient for investors. They talk like they're a nonprofit instead of a VC-backed business chasing an IPO.

gordonhart22d ago

sinsudo22d ago· 1 in thread

Also recursive self-agenda-pursue could allow making LLMs that obey perfectly the seeder's purpose. No wonder that is such an ingenious idea.

Quarrelsome22d ago

Best thing about this era is that I don't have to personally read millions of lines of code to find all the bugs.

1 more reply

torginus22d ago· 1 in thread

I just have small thing to add to this article - it mentions how the code contributed per engineer has increased as per Claude Mythos to 8x of baseline.

jcfrei22d ago

1 more reply

delichon23d ago· 1 in thread

Is this the moment when the AI gets permission to approve its own PRs:

https://www.italianrenaissance.org/wp-content/uploads/2012/0...

Or is this?

https://www.egypttoursportal.com/images/2024/02/Ouroboros-Sy...

cpeterso22d ago

more like the "Obama Awards Obama a Medal" meme:

https://knowyourmeme.com/memes/obama-awards-obama-a-medal

holoduke22d ago· 1 in thread

moregrist22d ago

This more or less agrees with my assessment of recent changes in Claude Code where a lot of new features are either:

- A lot of half-baked features or half-done features. - Or have significant overlap with existing features, and aren’t clearly an improvement.

More code is not better. More features are not better. It would be lovely to see more intentional design than just more.

1 more reply

saadn9222d ago· 1 in thread

jimbokun22d ago

How did you read the article when the questions you ask are exactly what’s covered in the article?

dwa359222d ago· 1 in thread

layer822d ago

They probably don’t human-review much anymore.

bitwize22d ago· 1 in thread

You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.

Folcon14d ago

Not to disagree with your point, I very much think the fact that Emacs and vim do this so well is not doing them any favours, but I'm trying to meet them where they are

butler1422d ago· 1 in thread

Warming up for that IPO

stri8ted22d ago

Is there something in the post that you find implausible or don't believe to be true?

1 more reply

mactavish8822d ago· 1 in thread

Recursive self-improvement towards what exactly?

Living organisms evolve towards some notion of "better", and "better" is an incredibly multifaceted notion (many facets of which we simply cannot even capture in language).

jimbokun22d ago

Higher stock price.

squidsoup22d ago· 1 in thread

adastra2222d ago

Mythos is just Greek for myth, epic story, etc. The next biggest thing after Opus.

1 more reply

kylehotchkiss22d ago· 1 in thread

Isn't this like a perpetual energy machine? Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)

krapp22d ago

>Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)

georgehotz22d ago· 1 in thread

4ffs22d ago

TBH the more Anthropic keeps yapping the more desperate they seem now. OAI has been pretty quiet in comparison lately.

leevilux22d ago· 1 in thread

Wouldn't self-improvement mean that the LLM changes its neural network (i.e. the weights or layers or back propagation algorithm etc) or modify its training data?

dibujaron22d ago

ramaseshanms22d ago· 1 in thread

Its possible that Andrej Karpathy could have been hired for scaling his vision on the auto-research repo. (His version of "AI that builds itself")

red__dragon10d ago

That's a stretch, I guess.

qwery22d ago· 1 in thread

This is incredible.[0]

Please, IPO now. File the paperwork.

> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

Do you have another example?

Engineers don't ship [period] for no reason. So, either:

- Those aren't engineers, or

- they are literally dying of shame & embarrassment right now, or

- you measured something that indicated that this was a useful thing to do and have elected to share an overtly, catastrophically flawed metric instead.

[0] as in a total lack of credibility

JohnMakin22d ago

[0] - https://www.anthropic.com/candidate-ai-guidance

1 more reply

eranation22d ago· 1 in thread

If we ever get to a point where the centaur period is over (when human + AI is not better than just AI) then what competitive advantage ANY human can have other than

- the money they already have

- luck?

- a good idea and good taste but if we assume AI can do better than any human, that also goes out the window

judahmeek22d ago

Check out https://ai-2027.com/

1 more reply

margorczynski22d ago· 1 in thread

The closer to the IPO the more marketing drivel we'll get from both Anth and OpenAI.

4ffss22d ago

Sales and marketing for the IPO babeh!

andrewlin24722d ago· 1 in thread

Imagine showing this article to yourself three years ago

andrewlin24722d ago

You'd think we'd be past the point of people still believing AI can't write good code

reducesuffering22d ago· 1 in thread

https://intelligence.org/agi-ruin/

rrr_oh_man22d ago

Can you explain?

ilaksh22d ago

So I am looking at like Mythic AI or the wurtzite ferroelectric breakthrough from University of Michigan, or memristors, etc. to provide the 100 times efficiency boost needed at this point.

rhlf_monkey22d ago

So in the latest L. Ron Hubbard encyclical Anthropic informs its flock that recursive self-improvement does not work yet but that their engineers burn more tokens.

The Claude code quality and operational security of Anthropic have already been analyzed by the public.

If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.

wayeq22d ago

> today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

strongest argument for token limits that I can think of, right here.

mortenjorck22d ago

> today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!

1 more reply

ffwd22d ago

I just want to add that the "recursive" part of recursive self improvement is by no means a given, even if an AI can improve itself.

bicepjai22d ago

adamddev122d ago

I am watching websites and Microsoft apps get slower and buggier before my eyes. We are defending into vibe-psychosis and chaos.

tasuki22d ago

> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.

Oh I have no doubt. With 8 times the number of bugs too? Have they solved flicker in Claude code yet?

cess1122d ago

'“Good code” means two things: it works, and it is written in a manner that allows another engineer to understand it and build upon it.'

I disagree with this. Good code is easy to change, which is much harder to accomplish than code that can be added to.

I find the first premise weak and implausible, and the second one is obviously false. To me it comes across as an insult to the reader.

w10-122d ago

This is relevant because Anthropic is currently cast as serving mainly the coding market.

If/since their AI+process can help build new models, they can target other markets, and other companies seeking to build for such markets will partner with them first.

There's no moat and little first-mover advantage in the general-purpose AI, but there may be both in specialized AI.

Also, there are other reasons to get better. Changing how you build models can enable you to adapt to different hardware, avoiding the current Nvidia margins.

These attractive second-order derivatives - flywheel effect, monopoly power - are often claimed, but Anthropic is mainly providing evidence to track actual progress.

docheinestages22d ago

Elon, is that you? [1]

[1] https://www.theguardian.com/technology/2023/mar/31/ai-resear...

zhoBEENG22d ago

morisil23d ago

freakynit22d ago

This is one more marketing BS before their IPO.

Large-scale projects are 95%+ maintenance. Cleverly written code makes that maintenance nightmare, and extremely fragile.

I use them for localized tasks... very very specific, localized inputs, with exactly what should be done and what the contracts the new code will be consuming and exposing.

For open-ended tasks, they write working code that is unmaintainable.

reinhash22d ago

It is hard to distinguish hype from reality these days especially with Anthrophic's IPO around the corner.

lkm022d ago

1 more reply

pineapple_opus22d ago

Eye catching - "Open ended problems" claude code session success rate jumped from 20% (pre opus 4.5 release) to 70% after sometime after opus 4.6 was released.

cyrc22d ago

labs have parallel speculative execution. they spawn hundreds of agent branches, validate them internally with AI judges and only show the user the successful result.

free users are using sequential single-turn generation. the model requires and waits for the human to debug, fix and re-prompt.

stego-tech22d ago

Disgusting.

artninja198823d ago

The mythos public release will be a big indicator if the Anthropic and SF story of transformational ai soon holds any water imo

macwhisperer22d ago

the HITL (human in the loop) is basically the single point...AI is a mirror..

it only "exists" when you talk to it.. much like your reflection in the mirror is only there when you're in view.

models can never be self-improving because it can never have "self". it can only mirror the appearance of self.

what's actually happening is "symbiotic group improvement".

our brains are resonant.. for those of use who are brilliant, getting leverage with ai just means that our innovative ideas become louder and more physically real every day.

eventually everything worth building will be built for free and made readily available.. no more "profiteering"

its Jevons paradox "efficiency breakthrough -> effort reduces -> growth potential rises -> transformative gains happen"...

some of us are in the "transformative phase"..

others haven't seen the "breakthrough moment" yet, but they will soon.

bconsta22d ago

Seems ironic that Claude isn't listed as a contributor to this article.

If was used in writing the article, why not list it? If it wasn't used, that seems to go against Anthropic's whole message.

Obviously readers value human-written content more, but isn't it their interest to attempt to destigmatize llm output as much as possible?

darepublic22d ago

nicogentile22d ago

abalashov22d ago

"It is genuinely unclear whether today’s training methods and architectures could unlock that capacity."

Aye.

sega_sai22d ago

xg1522d ago

2025: If we aren't really careful with AI it will start to recursively improve itself and grow into an unstoppable superintelligence that will eradicate humanity!

2026: Working hard to make that recursive self-improvement a reality! Any minute now...

Dominic_P22d ago

deterministic22d ago

BatmansMom22d ago

How are these animations being made? I'd love to get a blog post on them. If its AI I'd love to know the workflow, but something tells me there is a lot of human creative input

EGreg22d ago

RSI is dangerous. That is why we designed CDE:

https://safebots.ai/declarative.html

zkmon22d ago

Not the first time. There were calls for NPT treaties etc over the decades. It is irreversible by design. Competition and ownership is the driving force.

snick3rz_22d ago

Facially this smells of puff. That doesn't mean it's all false. It means be wary of anything that doesn't have a critical thing to say.

bottlepalm22d ago

gloosx22d ago

hgoel22d ago

I disagree with their conclusion, I think this kind of self-improvement will hit an asymptote, where every subsequent model can only make smaller and smaller improvements.

_pdp_22d ago

I don't read anywhere how much code they are talking about and what programming language. I think those are useful metrics.

ReptileMan22d ago

Anthropic is all talk and no delivery last few months. This cry for pause is just them realizing they have no moat at all.

esafak22d ago

If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.

semessier22d ago

jasongill22d ago

"My CPU is a neural-net processor - a learning computer" springs to mind

brazukadev22d ago

When claude code removes React from its own code I'll believe that.

geodel22d ago

It will be so powerful that it can't be trusted with any earthly person.

taormina22d ago

So, is this what they call Opus 4.8? Improvement?

snick3rz_22d ago

This is facially a puff peice. That doesn't mean it's all false. It means be wary of anything that doesn't have a crtical thing to say.

swader99922d ago

IPO IPO IPO!!!

replwoacause22d ago

I love that animation, really cool

4ffs22d ago

Theyre making a mistake with this continued self-hyping. At some point even the dumbest of prospective investors don't buy it.

deterministic22d ago

0xbadcafebee22d ago

You can't predict the future, and neither can Anthropic. Nothing gets better forever. Everything plateaus or gets worse.

adverbly22d ago

Lol they're using lines of code as a KPI?

Come on guys...

That is making me less impressed not more impressed!

newsicanuse22d ago

pre IPO truck load of crap

mrandish22d ago

j / k navigate · click thread line to collapse