What if A.I. doesn't get better than this? (opens in new tab)

(newyorker.com)

90 pointssundache9mo ago112 comments

112 comments

My current intuition on this topic is that they are right about scaling but they are training on the wrong data.

LLMs were not intended to be the core foundation of artificial intelligence but an experiment around deep learning and language. Its success was an almost accidental byproduct of the availability of large amount of structured data to train from and the natural human bias to be tricked by language (Eliza effect).

But human language itself is quite weak from a cognitive perspective and we end up with an extremely broad but shallow and brittle model. The recent and extremely costly attempts to build reasoning around don't seem much more promising than using a lot of hardcoded heuristics, basically ignoring the bitter lesson.

I've seen many argue that a real human level AI should be trained from real-world experience, I am not sure this is true, but training should likely start from lower-level data than language, still using tokens and huge scale, and probably deeper networks.

reactordev9mo ago

Not all AI is LLMs. That's just what's most prevalent right now. There's still great work being done by models that don't "speak" but "perform". The issue is they need to be trained to perform like you said. The more tools like Claude Code are used, the more training they receive as well. I do think we'll see a plateau (if we haven't reached it already) of diminishing returns and we'll seek out new algorithms to improve it.

Never underestimate the will of someone determined to gain an extra 10% performance or accuracy. It's the last 1% I worry about. 99.99% uptime is great until it isn't. 99% accuracy is great until it isn't. These things could be mitigated by running inference on different quantinizations of a model tree but ultimately we're going to have to triple check the work somehow.

eddythompson809mo ago

> The more tools like Claude Code are used, the more training they receive as well.

What do you mean? A model doesn't improve because it's being used more. Are you saying Anthropic invests more into Claude Code the more people use it? Or are you saying they collect its output and train it on it?

xnx8mo ago

They probably mean https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu..., but I don't think that is a huge factor for Claude Code.

microtonal9mo ago

I assume they mean that they can gather users inputs (e.g. the user correcting the model, suggesting improvements, etc.).

bee_rider9mo ago

Definitely smarter people than me have thought about this already, but I’ve been trying to think about human language and how thoughts form in my head lately. How does thinking feel to you?

I feel like thoughts appear in my head conceptually mostly formed, but then I start sequentially coming up with sentences to express them, almost as if I’m writing them down for somebody else. In that process, I edit a bunch, so the final thought is influenced quite a bit by how English ends to be written. Maybe even constrained by expressability in English. But English has the ability to express fuzzy concepts. And the kernel started as a more intuitive thing.

It is a weird interplay.

olddustytrail9mo ago

LLMs aren't monolingual so you might want to expand that beyond English. Consider how multilingual people think.

Also, apparently it's pretty common for people to think in words and have an internal monologue. I hadn't realized this was a thing until recently but it seems many people don't think abstractly as you've described.

legacynl9mo ago

There have been psychological experiments that have shown what OP experiences.

In this particular experiment subjects were asked to make a series of arbitrary choices between 2 items, while being hooked up to some brain scanning equipment. Subjects were asked to think about it for 5 seconds before making the choice.

The experiment showed that scientists were able to predict the choice before the subjects consciously reasoned about it. So the experiment indicates that the choice is made subconsciously, and the reasoning ends up on whatever choice was made.

This aligns pretty well with some other psychological theories, and it points at that a lot/most of our brain processes happen subconsciously, and our conscious experience mainly serves as a way of providing a coherent story about why we experience what we experience.

So it seems that no matter if you 'think' abstractly or visually or with a monologue, this is merely a small step in our overall cognition, and doesn't really change the fact that thing bubble up from our subconscious before we become conscious of it.

2 more replies

rossdavidh9mo ago

What happens is they go out of business: "these firms spent five hundred and sixty billion dollars on A.I.-related capital expenditures in the past eighteen months, while their A.I. revenues were only about thirty-five billion."

DeepSeek (and the like) will prevent the kind of price increases necessary for them to pay back hundreds of billions of dollars already spent, much less pay for more. If they don't find a way to make LLMs do significantly more than they do thus far, and a market willing to pay hundreds of billions of dollars for them to do it, and some kind of "moat" to prevent DeepSeek and the like from undercutting them, they will collapse under the weight of their own expenses.

mgfist9mo ago

DeepSeek is also undercutting itself. No one is making a profit here, everyone is trying to gobble market share. Even if you have the best model and don't care to make a dime, inference is very expensive.

disgruntledphd29mo ago

I'd be surprised if Google weren't closer to profitability than basically anyone else, as they have their own hardware and have been running these kinds of applications for much longer than anyone else.

tonyedgecombe9mo ago

Google is still investing heavily in data centres. Presumably without the AI they could lift their foot off that throttle.

tim3339mo ago

Google are well set up to monetize compared to the others. They just need people to see their ads.

1 more reply

cloverich9mo ago

I'm not familiar with the actual strategy, but one strategy of something like DeepSeek or any open model could be, roughly, to avoid moat creation by other companies. E.g. one reason Google maintained dominance for so long is they simply paid off every smart person and company that might build competitors.

rossdavidh8mo ago

Matt Levine of Bloomberg said he believes most big hedge funds have a proprietary LLM (and various other machine learning models), which they use for their own purposes anyway, and therefore it was relatively straightforward for them to get into the business.

ben_w9mo ago

> inference is very expensive

I am surprised that this claim keeps getting made, given the observed prices.

Even if one thinks that the losses of big model providers are due to selling below operating costs (rather than below that plus training costs plus the cost of growth), then even big open-weights models that need beefy machines, look like they eventually* amortise the cost so low that electricity is what matters; so when (and *only* when) the quality is good enough, inference is cheaper than the food needed to have a human work for peanuts — and I mean literally peanuts, not metaphorical peanuts, as in the calories and protein content of bags of peanuts sufficient to not die.

* this would not happen if computers were still following the improvements trends of the 90s, because then we'd be replacing them every few years; a £10k machine that you replace every 3 years cost you £9.13/day even if it did nothing.

https://www.tesco.com/groceries/en-GB/products/300283810 -> £0.59 per bag * (2500 per day/645 per bag) = £2.29/day; then combine your pick about which model, which model of home server, electricity costs etc. with your estimate of how many useful tokens a human does in 8,760 hours per calendar year given your assumptions about hours per working week and days of holiday or sick leave.

I know that even just order-of 100k useful tokens is implausible for any human because that would be like writing a novel a day, every day; and this article (https://aichatonline.org/blog-lets-run-openai-gptoss-officia...) claims a Mac Studio can output 65.9/second = 65.9 * 3600 * 24 = 5,693,760 / day or ~= 2e9/year, compare to a deliberate over-estimate of human output (100k/day * 5 days a week * 47 weeks a year = 2.35e7/year)

The top-end Mac Studio has a maximum power draw of 270 W: https://support.apple.com/en-us/102027

270 W for *at least (2e9/year / 2.35e7/year) 85 times* the quantity (this only matters when the quality is sufficient, and as we all know AI often isn't that good yet) of output that a human can do with 100 W, is a bit over 31 times the raw energy efficiency, and electricity is much cheaper than calories — cheaper food than peanuts could get the cost of the human down to perhaps about £1/day, but even £1/day is equivalent to electricity costing £1/(24 hours * 100 W) = £0.416666… / kWh

mgfist8mo ago

Running a local model is not an apples comparison. Yes, if you run a small model 24/7 without a care for output latency and utilization is completely static with no bursts, then it can look cheap. But most people want output now, not in 10 hours. And they want it from the best models. And they want large context windows. And when you combine that with serving millions of users, it gets complicated and expensive.

1 more reply

rusk9mo ago

DeepSeek doesn’t need to make a profit to be successful.

rossdavidh8mo ago

I could think of several reasons this could be so, but it would be good to hear your logic on this claim?

1 more reply

9rx9mo ago

> If they don't find a way to make LLMs do significantly more than they do thus far...
They only need two things, really: A large user base and a way to include advertising in the responses. The market willing to pay hundreds of billions of dollars will soon follow.
The businesses are currently in the user base building stage. Hemorrhaging money to get them is simply the cost of doing business. Once they feel that is stable, adding advertising is relatively easy.
> and some kind of "moat" to prevent DeepSeek and the like from undercutting them*

Once users are accustomed to using a service, you have to do some pretty horrendous things to get them to leave. "Give me your best hamburger recipe" -> "Sure, here is my best burger recipe [...] However, if you don't feel like cooking tonight, give the Big Mac a try!". wouldn't be enough to see any meaningful loss of users.

tempodox9mo ago

That's nothing new. The question is, will those users be willing and able to pay ten times as much or more for those same services they get for a significant discount now.

9rx9mo ago

Will they pay ten times more for a Big Mac? Probably not, but why would they need to? Hundreds of billions is Facebook's revenue. The businesses in this space are there if they can take those customers alone, never mind all the other places where advertising takes place. The market exists, is sufficiently large, and willing to spend. All these "AI" businesses need to do is show that the users are spending time on their services instead, which is exactly what they are working on right now.

The question is really only: Will users actually want to continue to use these services once the novelty wears off? The assumption is that they are useful enough to become an integral part of our lives, but time will tell...

2 more replies

variadix9mo ago

I don’t see how advertising is going to work with agents, especially if they’re being used by companies to replace or supplement jobs. Am I going to have comments in my code with ads for McDonalds? Will the AI support agent start trying to sell me a VPN?

I don’t think any of these AI companies can justify their expenses without meaningfully automating a significant amount of white collar work, which is yet to happen.

9rx9mo ago

> I don’t think any of these AI companies can justify their expenses without meaningfully automating a significant amount of white collar work

The businesses in question are building chatbots, not trying to automate jobs. Their primary goal at this stage is to get legions of people hooked on chatting with their service on a regular basis. Once (if) they have succeeded with that, then they can move on to step two.

> Am I going to have comments in my code with ads for McDonalds?

"Give me a function which stores a supplied string to a file in $HOME." -> "I have updated your code with said function. It [...] Have you considered storing the file on Amazon S3 for additional robustness and reliability?"

ml_more9mo ago

We did a test of GPT5 yesterday. We asked it to generate a synopsis of a scientific topic and cite sources. We then checked those sources. GPT5 still hallucinated 65% of the citations. It did things like: Make up the paper title Make up the authors for a real paper title Mix a real title and a real journal If it can't even reference real papers it certainly can't be trusted to match up claims of fact with real sources.

Current AI tools generate citations that LOOK real but ARE fake. This might not be solvable inside the LLM. If anyone could do it, it'd be OpenAI. (OK maybe I'm giving them too much credit, but they have a crap-ton of money and seem to show a real interest in making their AI better)

If it can't be done in the LLM we can't trust LLMs basically ever. I suppose there's a pretty big loophole here. Doing it outside the LLM but INSIDE the LLM product would be good enough.

The first AI tool to incorporate that (internal citation and claim checking) will win because if the AI can check itself and prevent hallucinated garbage from ever reaching the user we can start to trust them and then they can do everything we've been promised. Until that day comes we can't trust them for anything.

beacon2948mo ago

Google already did this, give free gemini deepresearch a spin. It's not perfect, but I have a feeling you'll be surprised if this is your honest impression.

brainwipe9mo ago

The title is irritating, conflating AI with LLMs. LLMs are a subset of AI. I expect future systems will be mobs of expert AI agents rather than relying on LLMs to do everything. An LLM will likely be in the mix for at least the natural language processing but I wouldn't bet the farm on them alone.

DanHulton9mo ago

That battle was long-ago lost when the leading LLM companies and organizations insisted on referring to their products and models solely as "AI", not the more-specific "LLMs". Implementers of that technology followed suit, and that's just what it means now.

You can't blame the New Yorker for using the term in its modern, common parlance.

dasil0039mo ago

Agreed, and ultimately it's fine because they're talking about products not technology. If these products go in a completely different direction and LLMs become obsolete the AI label will adapt just fine. Once these things hit common parlance there's no point in arguing technical specificity as 99.99% of the people using the term don't care, will never care, and language will follow their usage not the angry pedant.

IAmGraydon9mo ago

This is something I immediately noticed when ChatGPT first released. It was instantly called "AI", but previous to that, HN would have been up in arms that it's "machine learning" not actual intelligence. For some reason, the crowd here and everywhere else just accepted the misuse of the word intelligence and let it happen. Elsewhere I can understand, but people here know better.

Intentionally misconstruing it as actual intelligence was all a part of the grift from the beginning. They've always known there's no intelligence behind the scenes, but pushing this lie has allowed them to take in hundreds of billions in investor money. Perhaps the biggest grift the world has ever seen.

brookst9mo ago

Sure I can. If someone writing for the New Yorker has conflated the two concepts and is drawing bad conclusions because of it, that’s bad writing.

A good writer would tease apart this difference. That’s literally what good writing is about: giving a deeper understanding than a lay person would have.

DanielHB9mo ago

The computing power alone of all these gpus would bring a revolution in simulation software. I mean 0 AI/machine-learning, just being able to simulate much more things than we can.

Most industry-specific simulation software is REALLY crap, most from the 90s and 80s and barely evolved since then. Many stuck on single core CPUs.

bee_rider9mo ago

It could be a nice side-effect of having all this “LLM hardware” built into everything, nice little throughput focused accelerators in everybody’s computers.

I think if I were starting grad school now and wanted some easy points, I’d be looking at mixed precision numerical algorithms. Either coming up with new ones, or applying them in the sciences.

simonw9mo ago

If the New Yorker published a story titled "What if LLMs Don't Get Better Than This?" I expect the portion of their readers who understood what that title meant would be pretty tiny.

rusk9mo ago

Indeed, and the title itself contains an operational definition of AI as “this” (LLM’s) - if AI becomes more than “this” then the question has been answered in the affirmative.

dr_dshiv9mo ago

AI is what people think AI is. In the 80s, that was expert systems. In the 2000s, it was machine learning (not expert systems). Now, it is LLMs — not machine learning.

You can complain, but it’s like that old man shaking their fist at the clouds.

Now, if you want to talk about cybernetics…

tim3339mo ago

The title annoys me more because if doesn't mention anything about time. AI will almost certainly get a good bit better eventually. The questions will it in the next couple of years or will we have to wait for some breakthrough.

I'm amused they seem to refer to Marcus and Zitron as "these moderate views of A.I". They are both pretty much professional skeptics who seem to fill their days writing AI is rubbish articles.

svara9mo ago

AI is LLMs now. Similar to how machine learning became AI 5-10 years ago.

I'm not endorsing this, just stating an observation.

I do a lot of deep learning for computer vision, which became AI a while ago. Now, when you use the word AI in this context, it will confuse people because it doesn't involve LLMs.

lokar9mo ago

A* search, literally textbook AI, is still doing great work.

qcnguy9mo ago

> You didn’t need a bar chart to recognize that GPT-4 had leaped ahead of anything that had come before.

You did though. I remember when GPT-4 was announced, OpenAI downplayed it and Altman said the difference was subtle and wouldn't be immediately apparent. For a lot of the stuff ChatGPT was being used for the gap between 3 and 4 wasn't going to really leap out at you.

https://fortune.com/2023/03/14/openai-releases-gpt-4-improve...

In the lead up to the announcement, Altman has set the bar low by suggesting people will be disappointed and telling his Twitter followers that “we really appreciate feedback on its shortcomings.”

OpenAI described the distinction between GPT-3.5—the previous version of the technology—and GPT 4, as subtle in situations when users are having a “casual conversation” with the technology. “The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5,” a research blog post read.

In the years since we got a lot more demanding of our models. Back then people were happy if they got models to write a small simple function and it worked. Now they expect models to manipulate large production codebases and get it right first time. So, the difference between GPT-3 and GPT-4 would be more apparent. But at the time, the reaction was somewhat muted.

supriyo-biswas9mo ago

> Back then people were happy if they got models to write a small simple function and it worked. Now they expect models to manipulate large production codebases and get it right first time.

This push is mostly coming from the C-level and the hustler types, both of which need this to work out in order for their employeeless corporation fantasy to work out.

noboostforyou9mo ago

> This push is mostly coming from the C-level and the hustler types, both of which need this to work out in order for their employeeless corporation fantasy to work out.

The irony, at least in my mind, is that C-level hustler types are exactly the perfect role to be replaced by "AI" for big cost-savings. For obvious reasons, it won't happen.

benterix9mo ago

Frankly, I'm not sure if LLM would be the best replacement. Given they need to make decisions, I'd say ensemble methods could work better, at least in scenarios where you can actually perform reliable simulations (e.g. production, but not necessarily marketing). LLMs tend to averages whereas what shareholders want is to optimize for X, where X is usually, but not necessarily, net profit.

(It would be a fun experiment to create such an optimization startup and advertise it to boards as "AI-CEO-as-a-service" and watch the faces of these CEOs pushing for AI in workplace. Maybe they would start having a more nuanced view.)

1 more reply

delusional9mo ago

I'm not going to say that nobody is expecting it to do these things, but I don't think they should. It's still unable to write a simple function.

What we've seen isn't a reasonable increase in expectations based upon validation of previous experiments. Instead it's racking up of expectations by all the signals of success. When they time and time again take in more VC cash at ever greater valuations, we are forced to assume they want to do something more, and since they get the cash we have to assume somebody believes them.

Its a pyramid scheme, but instead of paying out earlier investors with the later investors cash its a confidence pyramid scheme. They obsolete the previous investors valuations by making bigger claims with larger expectations. Then they use those larger expectations as proof they already fulfilled the previous expectations.

EcommerceFlow9mo ago

OpenAi has 700+ million users. Sam recently said only 7% of Plus users were using thinking (o3)!!! That means 93% of their users were using nothing but 4o!

Clearly the OpenAi leadership saw these stats and understood the main initial goal of GPT5 is to introduce this auto-router, and not go all in on intelligence for the 3-7% who care to use it.

This is a genius move IMO, and will get tons of users to flood to ChatGPT over competitors. Grok, Gemini, etc are now fighting over scraps of the top 1% while OpenAi is going after the blue ocean of users.

throwaway0123_59mo ago

> Sam recently said only 7% of Plus users were using thinking (o3)

Thinking or just o3, and over what timeframe? There were a lot of days where I would just rely on o4-mini and o4-mini (high) b.c. my queries weren't that complex and I wanted to save my o3 quota and get faster responses.

> That means 93% of their users were using nothing but 4o!

Also potentially 4.1 and 4.5?

simonw9mo ago

> the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to 24%.

https://x.com/sama/status/1954603417252532479

SideburnsOfDoom9mo ago

If they're not paying users then they're just a liability.

EcommerceFlow9mo ago

How so? You capture the market first, then you turn on paid ads and reap benefits for decades like Google.

SideburnsOfDoom9mo ago

Even when a business makes a profit, then the costs are still technically liabilities.

And you're assuming that ad revenues or whatever will be high enough to cover the costs. And that's not a given.

jcfrei9mo ago

monetizing those will come eventually - it's just hard to get right

SideburnsOfDoom9mo ago

> (monetizing LLMs is) just hard to get right

Time will tell if that's just a euphemism for "there is no business model here".

energy1239mo ago

How can you say progress has stalled two weeks after LLMs won gold medals at IOI and IMO?

How can you say progress has stalled without having visibility on the compute costs of gpt-5 relative to o3?

How can you say progress has stalled by referring to changes in benchmarks at the frontier over just 3.5 months?

svara9mo ago

You can't say that with any certainty, but I personally share the impression that growth has not kept up with the hype of 2023. Take the following for example. That's an article from April 2023, that strongly implies that the next version of GPT would be so much more powerful than the current one that it would be dangerous to work on or even release.

Altman specifically used the version number "GPT5" back then. GPT5 is quite good, but is it the kind of technology that requires a word-wide moratorium on its development, lest it make humanity redundant?

"""

(Friedman) asked Altman for his thoughts on the recently released and widely circulated open letter demanding an AI pause. In response, the OpenAI founder shared some of his critiques. “An earlier version of the letter claimed OpenAI is training GPT-5 right now. We are not, and won’t for some time,” Altman noted. “So in that sense, [the letter] was sort of silly.”

But, GPT-5 or not, Altman’s statement isn’t likely to be particularly reassuring to AI’s critiques, as first pointed out in a report from the Verge. The tech founder followed up his “no GPT-5″ announcement by immediately clarifying that upgrades and updates are in the works for GPT-4. There are ways to increase a technologies’ capacity beyond releasing an official, higher-number version of it.

"""

(from: https://gizmodo.com/sam-altman-open-ai-chatbot-gpt4-gpt5-185...)

rudedogg9mo ago

All that feels like specialized stunts like IBM’s Watson beating Ken Jennings at Jeopardy.

The rate of improvement has slowed significantly. And chasing benchmarks is making everything worse IMO. Opus 4.1 is worse than Sonnet 3.7 to me :/.

I think the future will be:

1. Ads and quantization/routing to chase profits

2. Local models start taking over. New companies will slide in without the huge losses and provide what Claude/OpenAI do today at reasonable margins

3. Apple/Google eat up lots of the market by shipping good-enough models with iOS/Android

AtlasBarfed9mo ago

My personal test question keeps bombing, and I think it's something they should be capable of doing?

Are those math contests? Are their questions and answers in the training set?

Let's say that these things really won a math Olympiad by thinking. Ok, I would like it to to write parsers based on a well defined expression or language spec. Not as bad as near unparseable C++ or JavaScript.

The AIs refuse, despite the prompt, to write a complete parser, hallucinate tests, do things like just call the already working compiler on the CLI, force repetitive reprompts that still won't complete the task.

To me, this is a good example of a task I would give AI as a service to see if it will reliably do something that's well specified, moderately annoying, and is most definitely in the training set if they are pulling data from "the internet".

energy1239mo ago

> My personal test question keeps bombing, and I think it's something they should be capable of doing?

The problem is that "they" isn't a monolith. How much compute went into your tests? Gpt-5 thinking in ChatGPT Plus uses less compute than Gpt-5 thinking in ChatGPT Pro, which uses less compute than the "high" reasoning effort when "gpt-5" is called via the API, which uses less compute than Gpt-5 Pro in ChatGPT Pro, which uses less compute than custom scaffolds, which uses less compute than what went into the IMO/IOI solutions. This is not just my idle speculation, it's publicly available information.

Mistletoe9mo ago

The bear market decade the stock market has been putting off since 2021 with this AI gasping phase happens.

https://www.currentmarketvaluation.com/models/s&p500-mean-re...

https://www.cell.com/fulltext/S0092-8674(00)80089-6

danjl9mo ago

Investors are betting on growth. The public loves the hype. As a user, I already have something useful. Schadenfreude.

zahirbmirza9mo ago

AI doesn't need to get better than this. It is already saving millions of hours of previously wasted human productivity. The biggest threat to these companies if their products do not improve is the local running of LLMS. That would finally justify consumers buying more memory and processor speed.

puppycodes9mo ago

AI getting better is like maybe 50% or less of the equation. The other part is the infrastructure supporting AI applications. The infrastructure and interfaces that need to be built to fully take advantage of whats already here already has a long way to catch up.

garyrob9mo ago

It appears that Cal Newport has decided to be the one to most publicly initiate the inevitable Trough Of Disillusionment stage of the Hype Cycle. I'm not sure it'll last very long, though, considering (for starters) Google DeepMind's gold medal at the recent International Math Olympiad. Also, while he criticizes the cost-cutting measure which is ChatGPT 5, he doesn't even mention ChatGPT 5 Pro, which is performing excellently.

KevinMS9mo ago

I always expect things like this to eventually deliver about 90% of what they promise, which turns out to be 100% for some niche uses, and for the rest it just gets abandoned because 90% isn't good enough. Like when voice recognition became super hyped in the late 90's, it was going to change how the world interacts with machines, and eventually it turned into "Hey Siri"

k__9mo ago

Good question.

In the one side I read stuff about exponential gains with every new model. On the other side, the coding improvements look logarithmic to me.

AtlasBarfed9mo ago

Which to me means, what's the Big o of this entire venture?

Ultimately, what they need to do is add nines of reliability. I guess I could argue that what they are producing now is like two nines: 99% accuracy.

Of course, that depends on how you measure it and yada yada yada. So for things like self-driving, I could see how people could argue that the accuracy rate is 99.9% on a minute by minute basis.

But how many nines do you need? Especially for self-driving five more? What's the computational cost to achieve that? Is it just five times? Is it 25 times? Is it two to the five power?

microtonal9mo ago

That's completely possible if the development of LLMs follow an S-curve (sigmoidal). At the beginning of the curve it will look exponential, then linear, and finally logarithmic. For different tasks, LLMs could be on different points on the curve, which would explain why some people perceive the improvements as exponential and others perceive them as logarithmic - they are simply working on different things and so experience different gradients.

latexr9mo ago

https://archive.ph/20250813061454/https://www.newyorker.com/...

Ekshef9mo ago

Thank you for that!

kerblang9mo ago

They didn't answer much of the "What if," though... Am just imagining the massive financial losses taken by so many, and if a bailout becomes necessary, because too-big-to-fail now means Microsoft, Google, Facebook et al since we transferred so much of financial engineering economics onto them since '08.

danjl9mo ago

Those three companies have products outside AI and won't die quickly. The ones that will collapse are betting exclusively on improvements in AI. It will be fun to watch the VC money burn.

BeFlatXIII9mo ago

I hope the necessary bailout fails due to political fighting in Congress.

woodpanel9mo ago

Last time I've checked each of these companies were still hugely profitable. So it's not going to be your average FANG in trouble here, but rather VCs and others who've jumped onto the AI-craze

alecco9mo ago

LLMs will sure hit a dead end. But I think they will be a major stepping stone to help figure out and write the next generation.

scotty799mo ago

I predict that the article of roughly the same title will be popping up on him every couple of years.

behole9mo ago

https://web.archive.org/web/20250813134114/https://www.newyo...

bbqfog9mo ago

AI is so new and so powerful, that we don't really know how to use it yet. The next step is orchestration. LLMs are already powerful but they need to be scaled horizontally. "One shotting" something with a single call to an LLM should never be expected to work. That's not how the human brain works. We iterate, we collaborate with others, we reflect... We've already unlocked the hard and "mysterious" part, now we just need time to orchestrate and network it.

monkpit9mo ago

I think you’re right - even if we accept the premise that there’s only room for minor marginal improvements, there’s vast amounts of room for improvement with integrations, mcp, orchestration, prompting, etc. I’m talking mostly about coding agents here but it applies more widely.

It’s a completely new tool, it’s like inventing the internal combustion engine and then going, “well, I guess that’s it, it’s kinda neat I guess.”

kbelder9mo ago

I think that's it. Even if there were no improvements with LLMs as they exist today, the integration and usage can still be vastly improved. Right now, we don't have multiple LLM-aware systems communicating, with a standardized information repository.

Right now we have the technology to have an AI observe a room, count the people in it, see what they're doing, observe their mood, and set the lighting to the appropriate level. We just don't have all the sensors and integrations and protocols to manage that. The LLM interfaces with email, your bank, your phone, etc., is crude and clunky. So much more could be done with the LLMs we have now.

(And just to be clear, most of those integrations sound horrible and dystopian. But they're examples.)

player12348mo ago

Powerful but we don't know how to use it? If it is as powerful as all you true believers spout the usefulness would be self evident and that would be the display of its power.

But apparently it is powerful just because you say so, and then something, something ... business model ...

bbqfog8mo ago

It does incredible things today that we wouldn't have thought possible 5 years ago.

fuzzfactor9mo ago

>What If A.I. Doesn't Get Better Than This?

What if it does?

There's a certain type of fear . . .

  "It's the fear . . . they're gonna take my job away . . . "

  It's the fear . . . I'll be working here the rest of my days . . . "

-- David Fahl

Same fear, different day.

AtlasBarfed9mo ago

Wow did you encapsulate millenia of management-labor disputes by saying don't worry be happy?

Let's play the same game with totalitarianism!

It's the fear they are watching everything

It's the fear nobody is watching at all

Oh wow, I totally understand the threat of totalitarianism from that.

And I bring up totalitarianism quite in particular, because aside from vastly empowering the elites in the war against labor, AI vastly empowers the elites for totalitarian monitoring and control.

fuzzfactor9mo ago

>Wow did you encapsulate millenia of management-labor disputes by saying don't worry be happy?

Nope, sorry to disappoint.

That would be quite an accomplishment though, but I can't take credit for any progress in that direction no matter how far others have gone :)

Not trying to hurt any feelings.

I probably should have kept it simple and not included the sample of vastly pre-AI lyrics from Fahl.

Just trying to emphasize that the fear of AI getting better, is very similar to the fear of it not getting better.

Like a number of other unrelated things. Which are nothing new at all.

I guess more often I've got to expect the unexpected with such a short comment, when I don't even try to explain very effectively, that some are going to read between the lines in some of the most unrelated ways I can not always anticipate.

If I may ask, what made you such a fan of totalitarianism anyway, I know it's more popular than ever but is that all there is?

piskov9mo ago

Because every s-curve looks like an exponent for those in the start.

I mean look at the first plane, then first air-jets: it’s understandable to assume we would travel the galaxy in something like 2050.

Meanwhile planes are basically the same last 60 years.

LLMs are great but I firmly believe that in 2100 all is basically the same as in 2020: no free energy (fusion), no AGI.

II2II9mo ago

> I mean look at the first plane, then first air-jets: it’s understandable to assume we would travel the galaxy in something like 2050.

To someone who did not understand what flight is, perhaps. For anyone who understood the laws of physics, no. A similar thing can be said of Moore's Law. The main difference is we likely exceeded the rational expectations derived from Moore's Law (though that was based more on computer performance, rather than the actual expression of Moore's Law in terms of transistor count) while the more rational expectations of flight (routine supersonic, perhaps even suborbital) are still flights of fancy. But that is more a product of cost than technical ability. Simply put, we figured out how to make semiconductors extraordinarily inexpensive. Flight is still expensive.

SAI_Peregrinus9mo ago

Nature abhors an exponential. They all seem to either turn out to be sigmoid or collapse entirely.

IAmGraydon9mo ago

Well...everything has a limit, so yes every example of exponential growth gets capped somewhere and becomes a sigmoid. Viral spread, population growth, radioactive decay, tree branching, etc. are all exponential until they hit their limit. Otherwise each process would quickly proceed to infinity. That doesn't really tell us much about where the ceiling is for LLMs, however.

marviio9mo ago

OTOH: First flight: 1903. Moon landing: 1969. Humanity went from ”Look, we’re 3 meters off the ground!” to “We just parked on the Moon” in barely a lifetime. 66 years.

CagedCoder9mo ago

And how much further have we gotten past the moon since then?

This isn't an OTOH, it's just another example of looking at the exponential part of the S-curve.

Yossarrian229mo ago

And has gone no further in almost 60 years

olddustytrail9mo ago

> Because every s-curve looks like an exponent for those in the start.

No, it looks like an exponential all the way to the top of the curve. And the natural reaction when you consider it an s-curve is to think you're near the top. Unfortunately near the top looks exactly the same as near the bottom, so you might consider that you're nowhere near the top and that there's no reason you should be.

Then you go on to speculate about tech that didn't exist 3 years ago and extrapolate 75 years in the future.

No one has any idea what we'll have in 10 years time never mind 75. Even linearly, it's like someone from 1950 trying to guess about 2025.

lenerdenator9mo ago

Nah, I'm not afraid of working here the rest of my days. Consistent paycheck, benefits, challenging-but-rewarding work.

If you provide people with that they typically shut up and stay out of the way. Everyone should be more afraid of the former than the latter.

izzydata9mo ago

There is no plan in place at all for the outcome of most work becoming redundant. At least in the US I highly doubt we will be capable of implementing some system such as UBI for the benefit of all citizens so everyone can take advantage of most work being automated. Everyone will be left to pick up scraps and barely survive.

But I am extremely skeptical that current "AI" will be capable of eliminating so much of the modern workforce any time soon if ever. I can see it becoming a common place tool, maybe it already has, but not as a human replacement.

disgruntledphd29mo ago

> There is no plan in place at all for the outcome of most work becoming redundant. At least in the US I highly doubt we will be capable of implementing some system such as UBI for the benefit of all citizens so everyone can take advantage of most work being automated. Everyone will be left to pick up scraps and barely survive.

If 80% of US citizens lose their jobs, I assure you that there will be a political response. It might not be one you (or I) like, but it will happen and it will be a big deal.

1 more reply

j / k navigate · click thread line to collapse

112 comments

stephc_int139mo ago

My current intuition on this topic is that they are right about scaling but they are training on the wrong data.

reactordev9mo ago

eddythompson809mo ago

> The more tools like Claude Code are used, the more training they receive as well.

xnx8mo ago

They probably mean https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu..., but I don't think that is a huge factor for Claude Code.

microtonal9mo ago

I assume they mean that they can gather users inputs (e.g. the user correcting the model, suggesting improvements, etc.).

bee_rider9mo ago

Definitely smarter people than me have thought about this already, but I’ve been trying to think about human language and how thoughts form in my head lately. How does thinking feel to you?

It is a weird interplay.

olddustytrail9mo ago

LLMs aren't monolingual so you might want to expand that beyond English. Consider how multilingual people think.

legacynl9mo ago

There have been psychological experiments that have shown what OP experiences.

2 more replies

rossdavidh9mo ago

mgfist9mo ago

disgruntledphd29mo ago

tonyedgecombe9mo ago

Google is still investing heavily in data centres. Presumably without the AI they could lift their foot off that throttle.

tim3339mo ago

Google are well set up to monetize compared to the others. They just need people to see their ads.

1 more reply

cloverich9mo ago

rossdavidh8mo ago

ben_w9mo ago

> inference is very expensive

I am surprised that this claim keeps getting made, given the observed prices.

The top-end Mac Studio has a maximum power draw of 270 W: https://support.apple.com/en-us/102027

mgfist8mo ago

1 more reply

rusk9mo ago

DeepSeek doesn’t need to make a profit to be successful.

rossdavidh8mo ago

I could think of several reasons this could be so, but it would be good to hear your logic on this claim?

1 more reply

9rx9mo ago

tempodox9mo ago

That's nothing new. The question is, will those users be willing and able to pay ten times as much or more for those same services they get for a significant discount now.

9rx9mo ago

2 more replies

variadix9mo ago

I don’t think any of these AI companies can justify their expenses without meaningfully automating a significant amount of white collar work, which is yet to happen.

9rx9mo ago

> I don’t think any of these AI companies can justify their expenses without meaningfully automating a significant amount of white collar work

> Am I going to have comments in my code with ads for McDonalds?

ml_more9mo ago

If it can't be done in the LLM we can't trust LLMs basically ever. I suppose there's a pretty big loophole here. Doing it outside the LLM but INSIDE the LLM product would be good enough.

beacon2948mo ago

Google already did this, give free gemini deepresearch a spin. It's not perfect, but I have a feeling you'll be surprised if this is your honest impression.

brainwipe9mo ago

DanHulton9mo ago

You can't blame the New Yorker for using the term in its modern, common parlance.

dasil0039mo ago

IAmGraydon9mo ago

brookst9mo ago

Sure I can. If someone writing for the New Yorker has conflated the two concepts and is drawing bad conclusions because of it, that’s bad writing.

A good writer would tease apart this difference. That’s literally what good writing is about: giving a deeper understanding than a lay person would have.

DanielHB9mo ago

The computing power alone of all these gpus would bring a revolution in simulation software. I mean 0 AI/machine-learning, just being able to simulate much more things than we can.

Most industry-specific simulation software is REALLY crap, most from the 90s and 80s and barely evolved since then. Many stuck on single core CPUs.

bee_rider9mo ago

It could be a nice side-effect of having all this “LLM hardware” built into everything, nice little throughput focused accelerators in everybody’s computers.

I think if I were starting grad school now and wanted some easy points, I’d be looking at mixed precision numerical algorithms. Either coming up with new ones, or applying them in the sciences.

simonw9mo ago

If the New Yorker published a story titled "What if LLMs Don't Get Better Than This?" I expect the portion of their readers who understood what that title meant would be pretty tiny.

rusk9mo ago

Indeed, and the title itself contains an operational definition of AI as “this” (LLM’s) - if AI becomes more than “this” then the question has been answered in the affirmative.

dr_dshiv9mo ago

AI is what people think AI is. In the 80s, that was expert systems. In the 2000s, it was machine learning (not expert systems). Now, it is LLMs — not machine learning.

You can complain, but it’s like that old man shaking their fist at the clouds.

Now, if you want to talk about cybernetics…

tim3339mo ago

I'm amused they seem to refer to Marcus and Zitron as "these moderate views of A.I". They are both pretty much professional skeptics who seem to fill their days writing AI is rubbish articles.

svara9mo ago

AI is LLMs now. Similar to how machine learning became AI 5-10 years ago.

I'm not endorsing this, just stating an observation.

I do a lot of deep learning for computer vision, which became AI a while ago. Now, when you use the word AI in this context, it will confuse people because it doesn't involve LLMs.

lokar9mo ago

A* search, literally textbook AI, is still doing great work.

qcnguy9mo ago

> You didn’t need a bar chart to recognize that GPT-4 had leaped ahead of anything that had come before.

https://fortune.com/2023/03/14/openai-releases-gpt-4-improve...

supriyo-biswas9mo ago

> Back then people were happy if they got models to write a small simple function and it worked. Now they expect models to manipulate large production codebases and get it right first time.

This push is mostly coming from the C-level and the hustler types, both of which need this to work out in order for their employeeless corporation fantasy to work out.

noboostforyou9mo ago

> This push is mostly coming from the C-level and the hustler types, both of which need this to work out in order for their employeeless corporation fantasy to work out.

The irony, at least in my mind, is that C-level hustler types are exactly the perfect role to be replaced by "AI" for big cost-savings. For obvious reasons, it won't happen.

benterix9mo ago

1 more reply

delusional9mo ago

I'm not going to say that nobody is expecting it to do these things, but I don't think they should. It's still unable to write a simple function.

EcommerceFlow9mo ago

OpenAi has 700+ million users. Sam recently said only 7% of Plus users were using thinking (o3)!!! That means 93% of their users were using nothing but 4o!

Clearly the OpenAi leadership saw these stats and understood the main initial goal of GPT5 is to introduce this auto-router, and not go all in on intelligence for the 3-7% who care to use it.

throwaway0123_59mo ago

> Sam recently said only 7% of Plus users were using thinking (o3)

> That means 93% of their users were using nothing but 4o!

Also potentially 4.1 and 4.5?

simonw9mo ago

> the percentage of users using reasoning models each day is significantly increasing; for example, for free users we went from <1% to 7%, and for plus users from 7% to 24%.

https://x.com/sama/status/1954603417252532479

SideburnsOfDoom9mo ago

If they're not paying users then they're just a liability.

EcommerceFlow9mo ago

How so? You capture the market first, then you turn on paid ads and reap benefits for decades like Google.

SideburnsOfDoom9mo ago

Even when a business makes a profit, then the costs are still technically liabilities.

And you're assuming that ad revenues or whatever will be high enough to cover the costs. And that's not a given.

jcfrei9mo ago

monetizing those will come eventually - it's just hard to get right

SideburnsOfDoom9mo ago

> (monetizing LLMs is) just hard to get right

Time will tell if that's just a euphemism for "there is no business model here".

energy1239mo ago

How can you say progress has stalled two weeks after LLMs won gold medals at IOI and IMO?

How can you say progress has stalled without having visibility on the compute costs of gpt-5 relative to o3?

How can you say progress has stalled by referring to changes in benchmarks at the frontier over just 3.5 months?

svara9mo ago

"""

(from: https://gizmodo.com/sam-altman-open-ai-chatbot-gpt4-gpt5-185...)

rudedogg9mo ago

All that feels like specialized stunts like IBM’s Watson beating Ken Jennings at Jeopardy.

The rate of improvement has slowed significantly. And chasing benchmarks is making everything worse IMO. Opus 4.1 is worse than Sonnet 3.7 to me :/.

I think the future will be:

1. Ads and quantization/routing to chase profits

2. Local models start taking over. New companies will slide in without the huge losses and provide what Claude/OpenAI do today at reasonable margins

3. Apple/Google eat up lots of the market by shipping good-enough models with iOS/Android

AtlasBarfed9mo ago

My personal test question keeps bombing, and I think it's something they should be capable of doing?

Are those math contests? Are their questions and answers in the training set?

energy1239mo ago

> My personal test question keeps bombing, and I think it's something they should be capable of doing?

Mistletoe9mo ago

The bear market decade the stock market has been putting off since 2021 with this AI gasping phase happens.

https://www.currentmarketvaluation.com/models/s&p500-mean-re...

https://www.cell.com/fulltext/S0092-8674(00)80089-6

danjl9mo ago

Investors are betting on growth. The public loves the hype. As a user, I already have something useful. Schadenfreude.

zahirbmirza9mo ago

puppycodes9mo ago

garyrob9mo ago

KevinMS9mo ago

k__9mo ago

Good question.

In the one side I read stuff about exponential gains with every new model. On the other side, the coding improvements look logarithmic to me.

AtlasBarfed9mo ago

Which to me means, what's the Big o of this entire venture?

Ultimately, what they need to do is add nines of reliability. I guess I could argue that what they are producing now is like two nines: 99% accuracy.

Of course, that depends on how you measure it and yada yada yada. So for things like self-driving, I could see how people could argue that the accuracy rate is 99.9% on a minute by minute basis.

But how many nines do you need? Especially for self-driving five more? What's the computational cost to achieve that? Is it just five times? Is it 25 times? Is it two to the five power?

microtonal9mo ago

latexr9mo ago

https://archive.ph/20250813061454/https://www.newyorker.com/...

Ekshef9mo ago

Thank you for that!

kerblang9mo ago

danjl9mo ago

Those three companies have products outside AI and won't die quickly. The ones that will collapse are betting exclusively on improvements in AI. It will be fun to watch the VC money burn.

BeFlatXIII9mo ago

I hope the necessary bailout fails due to political fighting in Congress.

woodpanel9mo ago

Last time I've checked each of these companies were still hugely profitable. So it's not going to be your average FANG in trouble here, but rather VCs and others who've jumped onto the AI-craze

alecco9mo ago

LLMs will sure hit a dead end. But I think they will be a major stepping stone to help figure out and write the next generation.

scotty799mo ago

I predict that the article of roughly the same title will be popping up on him every couple of years.

behole9mo ago

https://web.archive.org/web/20250813134114/https://www.newyo...

bbqfog9mo ago

monkpit9mo ago

It’s a completely new tool, it’s like inventing the internal combustion engine and then going, “well, I guess that’s it, it’s kinda neat I guess.”

kbelder9mo ago

(And just to be clear, most of those integrations sound horrible and dystopian. But they're examples.)

player12348mo ago

Powerful but we don't know how to use it? If it is as powerful as all you true believers spout the usefulness would be self evident and that would be the display of its power.

But apparently it is powerful just because you say so, and then something, something ... business model ...

bbqfog8mo ago

It does incredible things today that we wouldn't have thought possible 5 years ago.

fuzzfactor9mo ago

>What If A.I. Doesn't Get Better Than This?

What if it does?

There's a certain type of fear . . .

  "It's the fear . . . they're gonna take my job away . . . "

  It's the fear . . . I'll be working here the rest of my days . . . "

-- David Fahl

Same fear, different day.

AtlasBarfed9mo ago

Wow did you encapsulate millenia of management-labor disputes by saying don't worry be happy?

Let's play the same game with totalitarianism!

It's the fear they are watching everything

It's the fear nobody is watching at all

Oh wow, I totally understand the threat of totalitarianism from that.

And I bring up totalitarianism quite in particular, because aside from vastly empowering the elites in the war against labor, AI vastly empowers the elites for totalitarian monitoring and control.

fuzzfactor9mo ago

>Wow did you encapsulate millenia of management-labor disputes by saying don't worry be happy?

Nope, sorry to disappoint.

That would be quite an accomplishment though, but I can't take credit for any progress in that direction no matter how far others have gone :)

Not trying to hurt any feelings.

I probably should have kept it simple and not included the sample of vastly pre-AI lyrics from Fahl.

Just trying to emphasize that the fear of AI getting better, is very similar to the fear of it not getting better.

Like a number of other unrelated things. Which are nothing new at all.

If I may ask, what made you such a fan of totalitarianism anyway, I know it's more popular than ever but is that all there is?

piskov9mo ago

Because every s-curve looks like an exponent for those in the start.

I mean look at the first plane, then first air-jets: it’s understandable to assume we would travel the galaxy in something like 2050.

Meanwhile planes are basically the same last 60 years.

LLMs are great but I firmly believe that in 2100 all is basically the same as in 2020: no free energy (fusion), no AGI.

II2II9mo ago

> I mean look at the first plane, then first air-jets: it’s understandable to assume we would travel the galaxy in something like 2050.

SAI_Peregrinus9mo ago

Nature abhors an exponential. They all seem to either turn out to be sigmoid or collapse entirely.

IAmGraydon9mo ago

marviio9mo ago

OTOH: First flight: 1903. Moon landing: 1969. Humanity went from ”Look, we’re 3 meters off the ground!” to “We just parked on the Moon” in barely a lifetime. 66 years.

CagedCoder9mo ago

And how much further have we gotten past the moon since then?

This isn't an OTOH, it's just another example of looking at the exponential part of the S-curve.

Yossarrian229mo ago

And has gone no further in almost 60 years

olddustytrail9mo ago

> Because every s-curve looks like an exponent for those in the start.

Then you go on to speculate about tech that didn't exist 3 years ago and extrapolate 75 years in the future.

No one has any idea what we'll have in 10 years time never mind 75. Even linearly, it's like someone from 1950 trying to guess about 2025.

lenerdenator9mo ago

Nah, I'm not afraid of working here the rest of my days. Consistent paycheck, benefits, challenging-but-rewarding work.

If you provide people with that they typically shut up and stay out of the way. Everyone should be more afraid of the former than the latter.

izzydata9mo ago

disgruntledphd29mo ago

If 80% of US citizens lose their jobs, I assure you that there will be a political response. It might not be one you (or I) like, but it will happen and it will be a big deal.

1 more reply

j / k navigate · click thread line to collapse