I wasted a few days trying to incorporate aider and other tools into my workflow. I had a simple screen I was working on for configuring an AI Agent. I gave screenshots of the expected output. Gave a detailed description of how it should work. Hours later I was trying to tweak the code it came up with. I scrapped everything and did it all myself in an hour.
I just don't know what to believe.
I try to keep that in mind when I hear people who work with LLMs, who usually have an emotional investment in AI and often a financial one, speak about them in glowing terms that just don't match up with my own small experiments.
But the vast majority of the world is not A players. They’re B and C players
I don’t think the people evaluating AI tools have ever worked in wholly mediocre organizations - or even know how many mediocre organizations exist
An assembly language programmer might have said the same about C programming at one point. I think the point is, that once you depend on a more abstract interface that permits you to ignore certain details, that permits decades of improvements to that backend without you having to do anything. People are still experimenting with what this abstract interface is and how it will work with AI, but they've already come leaps and bounds from where they were only a couple of years ago, and it's only going to get better.
I think though there is a lot of focus on AI agents in software development though because that's just an early adopter market, just like how it's always been possible to find a lot of information on web development on the web!
> "you basically just need to know a lot of rules..."
This comment commits one of the most common fallacies that I see really often in technical people, which is to assume that any subject you don't know anything about must be really simple.I have no idea where this comment comes from, but my father was a chemical engineer and his father was mechanical engineer. A family friend is a structural engineer. I don't have a perspective about AI replacing people's jobs in general that is any more valuable than anyone elses, but I can say with a great deal of confidence that in those three engineering disciplines specifically literally none of any of their jobs are about knowing a bunch of rules and best practices.
Don't make the mistake of thinking that just because you don't know what someone does, that their job is easy and/or unnecessary or you could pick it up quickly. It may or may not be true but assuming it to be the case is unlikely to take you anywhere good.
In my experience this word means you don't know whatever you're speaking about. "Just" almost always hide a ton of unknown unknowns. After being burned enough times nowadays when I'm going to use it I try to stop and start asking more questions.
The main role of the engineer is being responsible for the building not collapsing.
Software development does not have that kind of protection.
This has been my observation. I got into Github Copilot as early as it launched back when GPT-3 was the model. By that time (late 2021) copilot can already write tests for my Rust functions, and simple documentation. This was revolutionary. We didn't have another similar moment since then.
The Github copilot vim plugin is always on. As you keep typing, it keeps suggesting in faded text the rest of the context. Because it is always on, I kind of can read into the AI "mind". The more I coded, the more I realized it's just search with structured results. The results got better with 3.5/4 but after that only slightly and sometimes not quite (ie: 4o or o1).
I don't care what anyone says, as yesterday I made a comment that truth has essentially died: https://news.ycombinator.com/item?id=43308513 If you have a revolutionary intelligence product, why is it not working for me?
I describe it like "an eager intern who can summarize a 20-min web search session instantly, but ultimately has insufficient insight to actually help you". (Note to current interns: I'm mostly describing myself some years ago; you may be fantastic so don't take it personally!)
Most of my interactions with it via text prompt or builtin code suggestions go like this:
1. Me: I want to do X in C++. Show me how to do it only using stdlib components (no external libraries).
2. LLM: Gladly! Here is solution X
3. Me: Remove the undefined behavior from foo() and fix the methods that call it
4. LLM: Sure! Here it is (produces solution X again)
5. Me: No you need to remove the use of uninitialized variables as the out parameters.
6. LLM: Oh certainly! Here is the correct solution (produces a completely different solution that also has issues)
7. Me: No go back to the first one
etc
For the ones that suggest code, it can at least suggest some very simple boilerplate very easily (e.g. gtest and gmock stuff for C++), but asking it to do anything more significant is a real gamble. Often I end up spending more time scrutinizing the suggested code than writing a version of it myself.
AI is just AI. You can upload a reference file for it to summarize, but it's not going to be able to look at the structure of the file and use that as a template for future reports. You'll still have to spoon-feed it constantly.
I'll very rarely ask someone to completely rewrite a patch, but so often a few minor comments get addressed with an entire new block of code that forces me to do a full re-review, and I can't get it across to him that that's not what I'm asking for.
interns can also be clever and think outside the box. this is mostly not good, but sometimes they will surprise you in a good way. the AI by definition can only copy what someone else has done.
I just recently heard this quote from a clip of Jeff Bezos: "When the data and the anecdotes disagree, the anecdotes are usually right.", and I was like... wow. That quote is the zeitgeist.
If it's so revolutionary, it should be immediately obvious to me. I knew Uber, Netflix, Spotify were revolutionary the first time I used them. With LLMs for coding, it's like I'm groping in the dark trying to find what others are seeing, and it's just not there.
Maybe re-tune your revolution sensor. None of those are revolutionary companies. Profitable and well executed, sure, but those turn up all the time.
Uber's entire business model was running over the legal system so quickly that taxi licenses didn't have time to catch up. Other than that it was a pretty obvious idea. It is a taxi service. The innovations they made were almost completely legal ones; figuring out how to skirt employment and taxi law.
Netflix was anticipated online by and is probably inferior to YouTube except for the fact that they have a pretty traditional content creator lab tacked on the side to do their own programs. And torrenting had been a thing for a long time already showing how to do online distribution of video content.
the personal computer
the internet
the internet connected phone
social media
those technologies are revolutionary, because they caused fundamental changes to how people behave. People who behaved differently in the "old world" were forced to adapt to a "new world" with those technologies, whether they wanted to or not. A newer more convenient way of ordering a taxicab or watching a movie or music are great consumer product stories, and certainly big money makers. They don't cause complex and not fully understood changes to way people work, play, interact, self-identify, etc. the way that revolutionary technologies do.
Language models feel like they have the potential to be a full blown sociotechnological phenomenon like the above four. They don't have a convenient consumer product story beyond ChatGPT today. But they are slowly seeping into the fabric of things, especially on social media, and changing the way people apply to jobs, draft emails, do homework, maybe eventually communicate and self-identify at a basic level.
I'd almost say that the lack of a smash bang consumer product story is even more evidence that the technology is diffusing all over the place.
Build the much maligned Todo app with Aider and Claude for yourself. give it one sentence and have it spit out working, if imperfect code. iterate. add a graph for completion or something and watch it pick and find a library without you having to know the details of that library. fine, sure, it's just a Todo app, and it'll never work for a "real" codebase, whatever that means, but holy shit, just how much programming did you need to get down and dirty with to build that "simple" Todo app? Obviously building a Todo app before LLMs was possible, but abstracted out, the fact that it can be generated like that's not a game changer?
It's not even like humans are all that different here. Strip a human of their tools (pen&paper, keyboard, monitor, etc.) and have them try solving problems with nothing but the power of their brain and they'll struggle a hell of a lot too, since our memory ain't exactly perfect either. We don't have perfect recall, we look things up when we need to, a large part of our "memory" is out there in the world around us, not in our head.
The open question is how to move forward. But calling AI progress a dead end before we even started exploring long term memory, tool use and on-the-fly learning is a tad little premature. It's like calling quits on the development of the car before you put the wheels on.
Is programming itself revolutionary? Yes. Does it work for most people? I don't even know how to parse that question, most people aren't programmers and need to spend a lot of effort to be able to harness a tool like programming. Especially in the early days of software dev, when programming was much harder.
Your position of "I'll only trust things I see with my own eyes" is not a very good one, IMO. I mean, for sure the internet is full of hype and tricksters, but your comment yesterday was on a Tweet by Steve Yegge, a famous and influential software developer and software blogger, who some of us have been reading for twenty years and has taught us tons.
He's not a trickster, not a fraud, and if he says "this technology is actually useful for me, in practice" then I believe he has definitely found an actual use of the technology. Whether I can find a similar use for that technology is a question - it's not always immediate. He might be working in a different field, with different constraints, etc. But most likely, he's just doing something he's learned how to do and I don't, meaning I want to learn it.
This is kind of like if I said when the first dumbbell was invented “why don’t I look like arnold schwarzenegger…
Claude Code, Cline, Cursor… all of them with Claude 3.7.
Oh course lesswrong, being heavily AI doomers, may be slightly biased against near term AGI just from motivated reasoning.
Gotta love this part of the post no one has yet addressed:
> At some unknown point – probably in 2030s, possibly tomorrow (but likely not tomorrow) – someone will figure out a different approach to AI. Maybe a slight tweak to the LLM architecture, maybe a completely novel neurosymbolic approach. Maybe it will happen in a major AGI lab, maybe in some new startup. By default, everyone will die in <1 year after that
(Sadly, I'm not.)
LessWrong was predicting AI doom within decades back when people thought it wouldn't happen in our lifetimes; even as recently as 2018~2020, people there were talking about 2030-2040 while the rest of the world laughed at the very idea. I struggle to accept an argument that they're somehow under-estimating the likelihood of doom given all the historical evidence to the contrary.
That said, the actual forecast odds on metaculus are pretty similar for nuclear and AI catastrophies: https://possibleworldstree.com/
1 year may be slightly exaggerated, but it aligns with his view
We're already seeing this with tech doing RIFs and not backfilling domestically for developer roles (the whole, "we're not hiring devs in 202X" schtick), though the not-so-quiet secret is that a lot of those roles just got sent overseas to save on labor costs. The word from my developer friends is that they are sick and tired of having to force a (often junior/outsourced) colleague to explain their PR or code, only to be told "it works" and for management to overrule their concerns; this is embedding AI slopcode into products, which I'm sure won't have any lasting consequences.
My bet is that software devs who've been keeping up with their skills will have another year or two of tough times, then back into a cushy Aeron chair with a sparkling new laptop to do what they do best: write readable, functional, maintainable code, albeit in more targeted ways since - and I hate to be that dinosaur - LLMs produce passable code, provided a competent human is there to smooth out its rougher edges and rewrite it to suit the codebase and style guidelines (if any).
Spoiler alert: they are giving just barely enough to not get prematurely fired, because they know if you’re cheap enough to outsource in the first place, you’ll give the contract to whoever is cheapest at renewal anyway.
There's absolutely no way that we're not going to see a massive reduction in the need for "humans writing code" moving forward, given how good LLMs are getting at writing code.
That doesn't mean people won't need devs! I think there's a real case where increased capabilities from LLMs leads to bigger demand for people that know how to direct the tools effectively, of which most would probably be devs. But thinking we're going back to humans "writing readable, functional, maintainable code" in two years is cope.
Sure, but in the same way that Squarespace and Wix killed web development. LLMs are going to replace a decent bunch of low-hanging fruit, but those jobs were always at risk of being outsourced to the lowest bidder over in India anyways.
The real question is, what's going to happen to the interns and the junior developers? If 10 juniors can create the same output as a single average developer equipped with a LLM, who's going to hire the juniors? And if nobody is hiring juniors, how are we supposed to get the next generation of seniors?
Similarly, what's going to happen to outsourcing? Will it be able to compete on quality and price? Will it secretly turn into nothing more than a proxy to some LLM?
increased capabilities from LLMs leads to bigger demand for people that know how to direct the tools effectively
This is the key thing.Just a simple crud-ish project needs frontend, backend, infra, cloud, ci/cd experience, and people who could build that as one man shows were like unicorns - a lot of people had a general how most of this stuff worked, but lacked the hands on familiarity with them. LLMs made that knowledge easy and accessible. They certainly did for me.
I've shipped more software in the past 1-2 years than the 5 years before that. And gained tons of experience doing it. LLMs helped me figure out the necessary software, and helped me gain a ton of experience, I gained all those skills, and I feel quite confident in that I could rebuild all these apps, but this time without the help of these LLMs, so even the fearmongering that LLMs will ;make people forget how to code' doesn't seem to ring true.
Talking with Claude about design feels like talking with that one coworker who's familiar with every trendy library and framework. Claude knows the general sentiment around each library and has gone through the quickstart, but when you start asking detailed technical questions Claude just nods along. I wouldn't bet money on it, but my gut feeling is that LLMs aren't going to be a straight or even curved shot to AGI. We're going to see plenty more development in LLMs, but it'll be just be that. Better LLMs that remain LLMs. There will be areas where progress is fast and we'll be able to get very high intelligence in certain situations, but there will also be many areas where progress is slow, and the slow areas will cripple the ability of LLMs to reach AGI. I think there's something fundamentally missing, and finding what that "something" is is going to take us decades.
Sometimes technology far predates science and other times you need a scientific revolution to develop new technology. In this case, I have serious doubts that we can develop "intelligent" machines without understanding the scientific and even philosophical underpinnings of human intelligence. But sometimes enough messing around yields results. I guess we'll see.
That seems a pretty human thought process and shows that fundamental improvements might not depend as much on the quality of the LLM itself but on the cognitive structure it is embedded.
So I wrote tests thinking it could implement the code from the tests, and it couldn't do that either. At one point it went so far with the edge cases that it just imported the test runner into the code so it could check the test name to output the expected result. It's like working with a VW engineer.
Edit: I ended up writing the code and it wasn't that hard, I don't know why it struggled with this one task so badly. I wasted far more time trying to make the LLM work than just doing it myself.
Be careful about consuming information from chatters, not doers. There is only knowledge from doing, not from pondering.
To make an analogy - most people who will tell you not to invest in cryptocurrency are not blockchain engineers. But does that make their opinion invalid?
You cannot lead to truth by learning from people who don't know. People who know can be biased, sure, so the best way to learn is to learn the knowledge, not the "hot-takes" or "predictions".
The doers produce a new javascript framework every week, claiming it finally solves all the pains of previous frameworks, whereas the chatters pinpoint all the deficiencies and pain points.
One group has an immensely better track record than the other.
One group has an immensely more convincing power to me.
He has tons of links for the objective statements. You either accept the interpretation or you don't.
I stopped at this quote
> LLMs still seem as terrible at this as they'd been in the GPT-3.5 age.
This is so plainly, objectively and quantitatively wrong that I need not bother. I get hyperbole, but this isn't it. This shows a doubling-down on biases that the author has, and no amount of proof will change their mind. Not an article / source for me, then.
It's easy to spot people who secretly hate LLMs and feel threatened by them these days. GPT-5 will be a unified model, very different from 4o or 4.5. Throwing around numbers related to scaling laws shows a lack of proper research. Look at what DeepSeek accomplished with far fewer resources; their paper is impressive.
I agree that we need more breakthroughs to achieve AGI. However, these models increase productivity, allowing people to focus more on research. The number of highly intelligent people currently working on AI is astounding, considering the number of papers and new developments. In conclusion, we will reach AGI. It's a race with high stakes, and history shows that these types of races don't stop until there is a winner.
It's slightly confusing terminology, but in fairness there is no agreed upon name for the next three orders of magnitude size-ups of pretraining. In any case, it's not the case that the author is confused about what OpenAI intends to brand GPT-5.
I'm a little confused by this confidence? Is there more evidence aside from the number of smart people working on it? We have a lot of smart people working on a lot of big problems, that doesn't guarantee a solution nor a timeline.
I really don't understand the level optimism that seems to exist for LLMs. And speculating that people "secretly hate LLMs" and "feel threatened by them" isn't an answer (frankly, when I see arguments that start with attacks like that alarm bells start going off in my head).
> It's easy to spot people who secretly hate LLMs and feel threatened by them these days.
I don't think OP is threatened or hates LLM, if anything, OP is on the position that LLM are so far away from intelligence that it's laughable to consider it threatening.
> In conclusion, we will reach AGI
The same way we "cured" cancer and Alzheimer's, two arguably much more important inventions than a glorified text predictor/energy guzzler. But I like the confidence, it's almost as much as OP's confidence that nothing substantial will happen.
> It's a race with high stakes, and history shows that these types of races don't stop until there is a winner.
So is the existential threat to humanity in the race to phase out fossil fuels/stop global warming, and so far I don't see anyone "winning".
> However, these models increase productivity, allowing people to focus more on research
The same way the invention of the computer, the car, the vacuum cleaner and all the productivity increasing inventions in the last centuries allowed us to idle around, not have a job, and focus on creative things.
> It's easy to spot people who secretly hate LLMs and feel threatened by them these days
It's easy to spot e/acc bros feeling threatened that all the money they sunk into crypto, AI, the metaverse, web3 are gonna go to waste and try to fan the hype around it so they can cash in big. How does that sound?
However, I'd like to clarify why optimism regarding AGI isn't merely wishful thinking. Historical parallels such as heavier-than-air flight, Go, and protein folding illustrate how sustained incremental progress combined with competition can result in surprising breakthroughs, even where previous efforts had stalled or skepticism seemed warranted. AI isn't just a theoretical endeavor; we've seen consistent and measurable improvements year after year, as evidenced by Stanford's AI Index reports and emergent capabilities observed at larger scales.
It's true that smart people alone don't guarantee success. But the continuous feedback loop in AI research—where incremental progress feeds directly into further research—makes it fundamentally different from fields characterized by static or singular breakthroughs. While AGI remains ambitious and timelines uncertain, the unprecedented investment, diversity of research approaches, and absence of known theoretical barriers suggest the odds of achieving significant progress (even short of full AGI) remain strong.
To clarify, my confidence isn't about exact timelines or certainty of immediate success. Instead, it's based on historical lessons, current research dynamics, and the demonstrated trajectory of AI advancements. Skepticism is valuable and necessary, but history teaches us to stay open to possibilities that seem improbable until they become reality.
P.S. I apologize if my comment particularly triggered you and compelled you to log in and downvote. I am always open to debate, and I admit again that I started too strongly.
1. Sonnet 3.7 is a mid-level web developer at least
2. DeepResearch is about as good an analyst as an MBA from a school ranked 50+ nationally. Not lower than that. EY, not McKinsey
3. Grok 3/GPT-4.5 are good enough as $0.05/word article writers
Its not replacing the A-players but its good enough to replace B players and definitely better than C and D players
(Based on my everyday experience with Sonet and Cursor)
Bumpers are not gonna make you a pro bowler. You aren't going to be hitting tons of strikes. Most pro bowlers won't notice any help from bumpers, except in some edge cases.
If you are an average joe however, and you need to knock over pins with some level of consistency, then those bumpers are a total revolution.
And the I stumble across a comment where some LLM hallucinated a library that means clearly AI is useless.
Bear cases always welcome. This wouldn't be the first time in computing history that progress just falls off the exponential curve suddenly. Although I would bet money on there being a few years left and AGI is achieved.
LLM Progression seems to be linear and compute needed exponential. And I don't see exponential hardware improvements besides some new technology (that we should not bet on coming ayntime soon).
Yeah? I'll take you up on that offer. $100AUD AGI won't happen this decade.
What about reforming democracy? Use the corrupt system to buy the votes, then abolish all laws allowing these kind of donations that allow buying votes.
I'll litigate the hell out of all the oligarchs now that they can't out pay justice.
This would pay off more than a moon shot. I would give a bit of money for the moon shot, why not, but not all of it.
I also like practical tools like NotebookLM where I can pose some questions, upload PDFs, and get a summary based in what my questions.
My point is: my brain and experience are often augmented in efficient ways by LLMs.
So far I have addressed practical aspects of LLMs. I am retired so I can spend time on non practical things: currently I am trying to learn how to effectively use code generated by gemini 2.0 flash at runtime; the gemini SDK supports this fairly well so I am just trying to understand what is possible (before this I spent two months experimenting with writing my own tools/functions in Common Lisp and Python.)
I “wasted” close to two decades of my professional life on old fashioned symbolic AI (but I was well paid for the work) but I am interested in probabilistic approaches, such as in a book I bought yesterday “Causal AI” that was just published.
Lastly, I think some of the recent open source implementations of new ideas from China are worth carefully studying.
I once spend over an hour trying to unescape JSON containing UTF8 values that's been escaped prior to being written to AWS's Cloudwatch Logs for MySQL audit logs. It was a horrific level of pain until I just asked ChatGPT to do it and it figured out all the series of escapes and encoding immediately and gave me the step to reverse them all.
LLM as a sidekick has saved me so much time. I don't really use it to generate code but for some odd tasks or API look up, it's a huge time saver.
Maybe that's changed recently, but I have struggled to get all but the most basic regex working from GPT-4o-mini
If you've been using LLMs effectively to build agents or AI-driven workflows you understand the true power of what these models can do. So in some ways the author is being a little selective with his confirmation bias.
I promise you that if you do your due diligence in exploring the horizon of what LLMs can do you will understand what I'm saying. If ya'll want a more detailed post I can get into the AI systems I have been building. Don't sleep on AI.
Current AI is clearly economically valuable, but if we freeze everything at the capabilities it has today it is also clearly not going to result in mass transformation of the economy from "basically being about humans working" to "humans are irrelevant to the economy." Lots of LW people believe that in the next 2-5 years humans will become irrelevant to the economy. He's arguing against that belief.
”People are extending LLMs a hand, hoping to pull them up to our level. But there's nothing reaching back.”
Shame on you for making me laugh. That was very inappropriate.
That is enough for me.
I'm not sure that it's a technology difference that makes LLM a better experience than search today, it's that the VC's are still willing to subsidize user experience today, and won't start looking for return on their investment for a few more years. Give OpenAI 10 years to pull all the levers to pay back the VC investment and what will it be like?
Ask it how to deploy an app to the cloud and it will insist you need to deploy it to Azure.
These ads would be easily visible though. You can probably sell far more malicious things.
The disruption goes both ways. When AI slashes production costs by 10-100x, what's the value proposition of traditional capital? If you don't need to organize large teams or manage complex operations, the advantage of "being a capitalist" diminishes rapidly.
I'm betting on the rise of independents and small teams. The idea that your local doctor or carpenter needs VC funding or an IPO was always ridiculous. Large corps primarily exist to organize labor and reduce transaction costs.
The interesting question: when both executives and frontline workers have access to the same AI tools, who wins? The manager with an MBA or the person with practical skills and domain expertise? My money's on the latter.
Capital is crucial when tools and infrastructure are expensive. Consider publishing: pre-internet, starting a newspaper required massive investment in printing presses, materials, staff, and distribution networks. The web reduced these costs dramatically, allowing established media to cut expenses and focus on content creation. However, this also opened the door for bloggers and digital news startups to compete effectively without the traditional capital requirements. Many legacy media companies are losing this battle.
Unless AI systems remain prohibitively expensive (which seems unlikely given current trends), large corporations will face a similar disruption. When the tools of production become accessible to individuals and small teams, the traditional advantage of having deep pockets diminishes significantly.
(IMO) Apart from programmer assistance (which is already happening), AI agents will find the most use in secretarial, ghostwriting and customer support roles, which generally have a large labor surplus and won't immediately "crash and burn" companies even if there are failures. Perhaps if it's a new startup or a small, unstable business on shaky grounds this could become a "last straw" kind of a factor, but for traditional corporations with good leeway I don't think just a few mistakes about AI deployment can do too much harm. The potential benefits, on the other hand, far outmatch the risk taken.
To me, this is the biggest question mark. If you could get good generalized "thinking" from just training on math/code problems with verifiers, that would be a huge deal. So far, generalization seems to be limited. Is this because of a fundamental limitation, or because the post-training sets are currently too small (or otherwise deficient in some way) to induce good thinking patterns? If the latter, is that fixable?
"Thinking" isn't a singular thing. Humans learn to think in layer upon layer of understandig the world, physical, social and abstract, all at many different levels.
Embodiment will allow them to use RL on the physical world, and this in combination with access to not only means of communication but also interacting in ways where there is skin in the game, will help them navigate social and digital spaces.
> (If math is solved, though, I don't know how to estimate the consequences, and it might invalidate the rest of my predictions.)
What does it mean for math to be solved in this context? Is it the idea that an AI will be able to generate any mathematical proof? To take a silly example, would we get a proof of whether P=NP from an AI that had solved math?
Regardless I don't think our AI systems are close to a proficiency breakthrough.
Edit: it is odd that "math is solved" is never explained. But "proficient to do math research" makes the most sense to me.
Um... I don't think companies are going to perform mass layoffs because "OpenAI said they must happen". If that were to happen it'd be because they are genuinely able to automate a ton of jobs using LLMs, which would be a bull case (not for AGI necessarily, but for the increased usefulness of LLMs)
I’ve known a guy since college who now has a PhD in something niche, supposedly pulls a $200k/yr salary. One of our first conversations (in college, circa 2014) was how he had this clever and easy way to mint money- by selling Minecraft servers installed on Raspberry Pis. Some of you will recognize how asinine this idea was and is. For everyone else- back then, Minecraft only ran on x86 CPUs (and I doubt a Pi would make a good Minecraft server today, even if it were economical). He had no idea what he was talking about, he was just spewing shit like he was God’s gift. Actually, the problem wasn’t that he had no idea- it was that he knew a tiny bit- enough to sound smart to an idiot (remind you of anyone?).
That’s an LLM. A jackass with access to Google.
I’ve had great success with SLMs (small language models), and what’s more I don’t need a rack of NVIDIA L40 GPUs to train and use them.
i think this is true of ai/ml systems in general. we tend to anthropomorphise their capability curves to match the cumulative nature of human capabilities, where often times the capability curve of the machine is discontinuous and has surprising gaps.
What we really need is people who can certify that a task was done correctly, who can use LLMs as an aid. LLMs simply cannot be responsible for complex requirements. There is no way to hold them accountable.
LLMs are already super useful.
It does all my coding and scripting for me @home
It does most of the coding and scripting at the workplace
It creates 'fairly good' checklists for work (not perfect, but it takes a 4 hour effort and makes it 25mins - but the "Pro" is still needed to make this or that checklist usable - I call this a win)(need both the tech AND the human)
If/when you train an 'in-house' LLM it can make some easy wins (on mega-big-companies with 100k staff they can get quick answers on "which Policy writes about XYZ, which department can I talk to about ABC, etc.)
We won't have the "AGI"/Skynet anytime soon, and when one will exist the company (let's use OpenAI for example) will split in two. Half will give LLMs for the masses at $100 per month, the "Skynet" will go to the DOD and we will never hear about it again, except in the Joe Rogan podcast as a rumor.
It is a great 'idea generator' (search engine and results aggregator): give me a list of 10 things I can do _that_ weekend in _city_I_will_be_traveling_to so if/when I go to (e.g. London): here are the cool concerts, theatrical performances, parks, blah blah blahI don't buy that at all, most of my use cases don't involve model's personality, if anything I usually instruct to skip any commentary and give the result excepted only. I'm sure most people using AI models seriously would agree.
> My guess is that it's most of the reason Sonnet 3.5.1 was so beloved. Its personality was made much more appealing, compared to e. g. OpenAI's corporate drones.
I would actually guess it's mostly because it was good at code, which doesn't involve much personnality
This is all the standard timeline for new technology - we enter the diminishing returns period, investment slows down a year or so afterwards, layoffs, contraction of industry, but when the hype dies down the real utilitarian part of the cycle begins. We start seeing it get integrated into the use cases it actually fits well with and by five years time its standard practice.
This is a normal process for any useful technology (notably crypto never found sustainable use cases so it’s kind of the exception, it’s in superposition of lingering hype and complete dismissal), so none of this should be a surprise to anyone. It’s funny that I’ve been saying this for so long that I’ve been pegged an AI skeptic, but in a couple of years when everyone is burnt out on AI hype it’ll sound like a positive view. The truth is, hype serves a purpose for new technology, since it kicks off a wide search for every crazy use case, most of which won’t work. But the places where it does work will stick around
Then other times it blows me away. Even figuring out things that can’t possibly have been in its training data.
I think there are groups of people that have either had all of the first experience or all of the latter. And that’s why we see over optimistic and over pessimistic takes (like this one)
I think the reality is current LLM’s are better than he realizes and even if we plateau I really don’t see how we don’t make more breakthroughs in the next few years.
I believe it is high time we come out this madness and reveal the lies of the marketers and grifters of AI for what it is. If AI can replace anyone, it should begin with doctors, they work with rote knowledge and service based on explicit(though ambiguous) inputs, same as an LLM needs, but I still have doctors and wait for hours on end in the waiting room to get prescribed a cough hard candy only to later comeback again because it was actually covid and my doctor had a brain fart.