ahamilton454 on Hacker News

Ask HN: Aren't Agents the Bitter Lesson?

It’s been a while since I read the bitter lesson by Rich Sutton, with that said, the general “vibe” I picked up from the essay was “building human features and heuristics into a program will eventually be surpassed by general brutish methods”

So this poses the question. Orchestrating workflows using an agent frameworks seems like it goes against the gospel that hacker news users (like myself) preach all so often.

So what do you think about approaching problems with specialize agent flows? Is this something that will shine in many use cases, or become obsolete in the next few years of innovation?

1ahamilton4541y ago1

Ask HN: Are LLMs getting better, how can you tell?

So there are so many benchmarks out there to evaluate models. ARC-AGI, frontier math, MMLU, Berkeley Function calling and many many more. And I guess the all together, general idea behind all these is to “approximate” all possible types of problems that can be tokenized and solved by an LLM.

That said, I can’t seem to do better than just “vibes”. Basically, oh this model gave me a good response to this question, it must be better.

Now I have tried keeping track of a couple benchmarks like the ones I mentioned above. But I generally can’t translate these benchmarks into utility outside of the small scope the benchmark test for. Also there are so many benchmarks to keep track of and each takes some learning to understand.

So perhaps my scope isn’t well enough defined. But as a programmer, everything >GPT4o feels pretty damn similar.

Would love to hear how others evaluate LLMs beyond “just vibes” generally for programming use, but also when trying to use create new ai projects.

1ahamilton4541y ago1

Ask HN: Aren't Agents the Bitter Lesson?

So this poses the question. Orchestrating workflows using an agent frameworks seems like it goes against the gospel that hacker news users (like myself) preach all so often.

So what do you think about approaching problems with specialize agent flows? Is this something that will shine in many use cases, or become obsolete in the next few years of innovation?

Ask HN: Are LLMs getting better, how can you tell?

That said, I can’t seem to do better than just “vibes”. Basically, oh this model gave me a good response to this question, it must be better.

So perhaps my scope isn’t well enough defined. But as a programmer, everything >GPT4o feels pretty damn similar.

Would love to hear how others evaluate LLMs beyond “just vibes” generally for programming use, but also when trying to use create new ai projects.

ahamilton454

Recent submissions

MCPVault (opens in new tab)

Ask HN: Aren't Agents the Bitter Lesson?

Making GitHub Copilot Order Me Cheeseburgers (opens in new tab)

Ask HN: Are LLMs getting better, how can you tell?

LLM Extensibility: Race for an Ecosystem (opens in new tab)

Stay DRY in Your API (APIFlask) (opens in new tab)

Recent submissions

MCPVault (opens in new tab)

Ask HN: Aren't Agents the Bitter Lesson?

Making GitHub Copilot Order Me Cheeseburgers (opens in new tab)

Ask HN: Are LLMs getting better, how can you tell?

LLM Extensibility: Race for an Ecosystem (opens in new tab)

Stay DRY in Your API (APIFlask) (opens in new tab)