undefined | Better HN

0 pointsaspenmartin3d ago0 comments

You are not wrong about anything you’re saying but like I said this misses the forest for the trees. I’m talking about like the next ~2 years. There is a common idea that we don’t understand this technology or what will happen performance wise. We know a lot more about what’s going to happen than people think. It’s because none of this is new. We’ve known about neural nets since the 40s, we know how RL works on a fundamental level and it has been an active and beautiful field of research for at least 30-40 years, we know what happens when you combine RL with verifiable rewards and throw a lot of compute at it.

One big misconception is that these models are trained to mimic humans and are limited by the quality of the human training data, and this is not true and also basically almost entirely the reason why you have so much bullishness and premature adoption of agentic coding tools.

Coding agents use human traces as a starting point. You technically don’t have to do this at all but that’s an academic point, you can’t do it practically (today). The early training stages with human traces (and also verified synthetic traces from your last model) get you to a point where RL is stable and efficient and push you the rest of the way. It’s synthetic data that really powers this and it’s rejection sampling; you generate a bunch of traces, figure out which ones pass the verification, and keep those as training examples.

So because

- we know how this works on a fundamental level and have for some time

- human training data is a bootstrap it’s not a limitation fundamentally

- you are absolutely right about your observations yet look at where you are today and look at say Claude sonnet 3.x. It’s an entire world away in like a year

- we have imperfect benchmarks all with various weaknesses yet all of them telling the same compelling story. Plus you have adoption numbers and walled garden data that is the proof in the pudding

The onus is on people who say “this is plateauing” or “this has some fundamental limitation that we will not get past fairly quickly”.

0 comments

crabbone3d ago

> look at say Claude sonnet 3.x. It’s an entire world away in like a year

In the area I work I find them to be of very little value both then and now... I see no real difference. They help in marginal tasks. Eg. they catch typos, or they help new programmers to faster explore the existing codebase.

So far, I haven't used a single line of code generated by AI, even though I've seen thousands. Some of them worked to draw attention to a problem, but none solved it successfully. It was all pretty lame.

I see no reason to believe it's going to get better. Waving hands more forcefully isn't helping, there's no argument behind the promise of "it will get better". No reason to believe it will...

But, more importantly, the AI is applied on a level where really important things don't happen. It's automating boilerplate work. It doesn't make decisions about the important parts. Like, in the example above, the AI is not capable of choosing a better strategy: use pyproject.toml or write code to build Python packages? It's not the kind of decision it's called to make and nobody sensible would trust it to make such a decision because there isn't a clear right or wrong answer, only the future will prove one or the other to be the right call.

locknitpicker3d ago

> So far, I haven't used a single line of code generated by AI, even though I've seen thousands. Some of them worked to draw attention to a problem, but none solved it successfully. It was all pretty lame.

I find this statement highly suspect. AI coding agents nowadays can spot subtle object lifetime management issues and even dependency lifecycle incompatibilities, and here you are stating you are unable to use them to fix things? How strange.

Not to mention that coding agents excel at creating greenfield projects and migrating whole frameworks.

But if you feel you can't use them then I feel sorry for you.

aspenmartinOP3d ago

I think if you honestly don’t believe there is a major difference between 3.x and 4.7 I don’t think there is much anyone will be able to do to convince you. I do find it disappointing when technical professionals are so disinterested in building a real understanding of a fairly complex topic.

> I see no reason to believe it's going to get better. Waving hands more forcefully isn't helping, there's no argument behind the promise of "it will get better".

That’s a real bummer to read that from someone who sounds like a professional, and not only a professional but someone thoughtful and smart. 30 years of brilliant work in RL, Bayesian stats, machine learning, measurement, and then trillions of dollars of funding and some of the best talent in the world, and your assertion is “I tried it on my codebase and I didn’t like it and that trumps literally entire fields of mathematics and statistics”. I mean, have you heard of Chinchilla scaling laws? Do you know how RL works? are you aware of benchmarks, their strengths and weaknesses? Are you following adoption numbers, accomplishments like new proofs of unsolved erdos problems?

> But, more importantly, the AI is applied on a level where really important things don't happen. It's automating boilerplate work.

Your experiences are your experiences, I don’t know what work you do or how it gets done, what languages you’re working with etc. but literally we’re at the point where the vast majority of code at major tech companies is fully AI written (not assisted).

> It's not the kind of decision it's called to make and nobody sensible would trust it to make such a decision because there isn't a clear right or wrong answer

What are you claiming is not fundamentally possible for an AI to do that a human can do here? People make judgement calls on ambiguous problems, taking into account vast amounts of context about the business, dev time, reliability, maintenance, etc; why do you think AI can’t do that?

imtringued3d ago

What's up with the buzzword bragging?

You don't know buzzword A, B, C? Heh, he must be incompetent and know nothing.

The buzzwords mean nothing, really. The math is the same for a stupid or a smart model, because the model is trying to mimic properties of the training dataset.

You can give me the ultimate model architecture that will beat every model in existence and I can still figure out a way to make it perform worse than what's available today, but you're not even doing that, you're just drumming up some old news.

If someone "threatened" me with tech advancements I would be more worried about things like an imminent massive drop in token costs for bigger context windows or other game changers like continual learning where the model internalizes your code base into its weights rather than just keeping it in its context.

1 more reply

j / k navigate · click thread line to collapse