The rumour/reasoning I’ve heard is that most advances are being made on synthetic data experiments happening after post-training. It’s a lot easier and faster to iterate on these with smaller models.
Eventually a lot of these learnings/setups/synthetic data generation pipelines will be applied to larger models but it’s very unwieldy to experiment with the best approach using the largest model you could possibly train. You just get way fewer experiments per day done.
The models bigger labs are playing with seem to be converging to about what is small enough for a researcher to run an experiment overnight.
Smaller/simpler/weird/different models can be an incredible advantage due to iteration speed. I think this is the biggest meta problem in AI development. If you can try a large range of hyper parameters, fitness function implementations, etc. in a few hours, you will eventually wipe the floor with the parties forced to wait days, weeks and months for their results each time.
The bitter lesson certainly applies and favors those with a lot of compute and data, but if your algorithms fundamentally suck or are approaching a dead end, none of that compute or information will matter.
I humbly suggest looking into guys more like Quine. His problem of "radical translation" is much more easily mapped to LLMs. (Thinking specifically here of the model as the "translator"). Its maybe a little harder to grasp for non-domain experts, but at least there is no need for hyperstitional armchair interpretations of old problems in order to make it relevant.
People jump straight into cognitive science/philosophy with this stuff, I just want to be like "whoa, slow down! So much to establish before that.."
Why not?
> Of course, at some level of complexity, it will be stuck in a local maximum of work quality simply because the book has no guide on how to solve the problem at hand.
I find this a pretty un-optimistic view, especially from someone building a coding autopilot. Having myself used LLMs for a bunch of software development in the last year, it seems its 'local maximum' is no different from a developer's _if_ you split the process up appropriately. The author alludes to this when they mention 'workflow'.
Everyone is trying to use LLMs in a 'single inference pass', assuming that's as good as it gets, but that's like trying to get find human creativity in a single cascading activation of neurons. A brain doesn't fit on an axon. So, I kinda think the author should be less shy about their optimism. Inference is soon ~free, as they say, so to me, naive as I might be, the future of AI coding agents is not limited to grunt tasks, it is as creative and exploratory as any human coder.
Ps. Fume looks cool. I'd suggest people take a look at aider.chat and claude-engineer too (on github).
Unsure if this is a useful answer. But Searle/LLM could make something that looks like it has a creative spark, and that's it.
Why I think that's different is in the case of a human artist, they create something because they have something they want to say. Whatever they produce is a way of saying 'this is what the world feels like to me, is it the same for you?'. And if it is, it resonates.
But I cannot see how an LLM would 'want' to say anything. If we're talking psychoanlytically of where wanting comes from, and call it a desire to fill a void of how incoherent you actually are, then an LLM doesn't go through that process.
Maybe Searle does, and still wants the characters to make you feel a certain way, in which case the comparison doesn't fit.
Ironically, many people complain LLMs are too incoherent, with all their confabulations and hallucinations.
But I agree. Desire is a good verb. I think that's what differentiates us from the 'machines'. In art, we try to create meaning. From our lives. From our discontents. Even a million LLMs cannot be in deficit of meaning; they are precisely tuned to their own capacity. Whereas something strange about humans is our endless desire for 'more'.
The whole free will debate seems a bit out of scope (and out of my reach, hah), but nonetheless it feels interesting in the LLM context.
edit: Note that i don't necessarily think LLMs are there or even can be. We seem to technologically small to produce the complexity in ourselves. Nonetheless i'm always interested in how far reduced complexity can take us.
> _if_ you split the process up appropriately I believe this pre-requisite is very important. LLMs so are terrible at planning and splitting a complex task into simpler steps. This might be natural limitation of `next token prediction`. For complex planning, each step should be the result of both the previous and speculative future steps. We try to tackle this by dividing a plan into two: a macro and a micro plan but still a lot to improve there.
p.s. thanks! aider is awesome too!
It seems we've reached the point that understanding of LLMs would be a great candidate for the beginner/intermediate/expert meme. "It's just autocomplete" -> "It's got a world model, it's thinking for itself" -> "It's just autocomplete".
If you zoom out on the first graphic from December 2023 back to 2020, the capabilities of models released at that time on these benchmarks would be much much lower. The best lens for future performance of large models is uncertainty.
This analogy seems flawed to me. Searle is in an empty room but LLMs are not. They are constantly learning from user inputs and data is continuously being made more available for LLM ingestion. I still don't think that an LLM will completely replace humans at pure creativity but I don't see why it can't come close. Especially since we're only 2 years in to this craze.
lol no. Are we looking at the same graph? You can't just slap a trend line on 9 data points from 3 different companies and call it "clear".
the entirety of common crawl is 424 terabytes. that's merely 6 days of 8K raw video.
424 terrabytes text is over a billion books worth of data. On the common crawl website it even says "Over 250 billion pages spanning 17 years." That's an impressive amount of information.
I'm so tired of the assumption that AI tools are going to get increasingly more capable until they can take effectively take over any task that humans currently excel at. They are already useful, but they don't seem likely to take over everything. This is especially true when it comes to making critical decisions.
This take about cost, however, seems well-grounded. I appreciate clear statements like this that can act as guiding principles for what kinds of things to build, and how to anticipate changes in the coming months and years.