Sadly, the answer was wrong.
It also returned 8 "sources", like stackexchange.com, youtube.com, mpmath.org, ncert.nic.in, and kangaroo.org.pk, even though I specifically told it not to use websearch.
Still a useful tool though. It definitely gets the majority of the insights.
Prompt: https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
> (for) most research projects, it would not help to have input from the general public. In fact, it would just be time-consuming, because error checking
Since frontier LLMs make clumsy mistakes, they may fall into this category of 'error-prone' mathematician whose net contributions are actually negative, despite being impressive some of the time.
How fast can you check the contribution? How small of a part is it? An unsolicited contribution is different from one you immediately directed. Do you need to reply? How fast are followups? Multi-day back and forths are a pain, a fast directed chat is different. You don't have to worry about being rude to an LLM.
Then it comes down to how smart a frontier model is vs the people who write to mathematicians. The latter groups will be filled with both smart helpful people and cranks.
What boggles the mind: we have gone for so long to try to strive for correctness and suddenly being right 70% of the time and wrong the remaining 30% is fine. The parallel with self driving is pretty strong here: solving 70% of the cases is easy, the remaining 30% are hard or maybe even impossible. Statistically speaking these models do better than most humans, most of the time. But they do not do better than all humans, and they can't do it all of the time and when they get it wrong they make such tremendously basic mistakes that you have to wonder how they manage to get things right.
Maybe it's true that with an ever increasing model size and more and more (proprietary, the public sources are exhausted by now so private data is the frontier where model owners can still gain an edge) we will reach a point where the models will be right 98% of the time or more but what would be the killer feature for me is an indication of the confidence level of the output. Because no matter whether junk or pearls it all looks the same and that is more dangerous than having nothing at all.
We're not exactly swimming in power generation and efficient code uses less power.
well, there's your problem. it behaves like a search summary tool and not like a problem solver if you enable google search
Its an old userscript so it is glitchy and halfway works. I already pre-chewed the work by telling Gemini 3 exactly which new HTML elements it needs to match and which contents it needs to parse. So basically, the scaffolding is already there, the sources are already there, it just needs to put everything in place.
It fails miserably and produces very convincing looking but failing code. Even letting it iterate multiple times does nothing, nor does nudging it in the correct direction. Mind you that Javascript is probably the most trained-on language together with Python, and parsing HTML is one of the most common usecases.
Another hilarious example is MPV, which has very well-documented settings. I used to think that LLMs would mean you can just tell people to ask Gemini how to configure it, but 9 out of 10 times it will hallucinate a bunch of parameters that never existed.
It gives me an extremely weird feeling when other people are cheering that it is solving problems at superhuman speeds or that it coded a way to ingest their custom XML format in record time, with relatively little prompting. It seems almost impossible that LLMs can both be so bad and so good at the same time, so what gives?
This is mostly because HA changes so frequently and the documentation is sparse. To get around this and increase my correction rate, I give it access to the source code of the same version I'm running. Then instructions in CLAUDE.md on where to find source and it must use source code.
This fixes 99% of my issues.
As the available work increases in complexity, I reckon more will push themselves to take jobs further out of their comfort zone. Previously, the choice was to upskill for the challenge and greater earnings, or stay where you are which is easy and reliable; the current choice is upskill or get a new career. Rather than switch careers to something you have zero experience in. That puts pressure on the moderately higher-skill job market with far fewer people, and they start to upskill to outrun the implosion, which puts pressure on them to move upward, and so on. With even modest productivity gains in the whole industry, it’s not hard for me to envision a world where general software development just isn’t a particularly valuable skill anymore.
I'm rooting for biological cognitive enhancement through gene editing or whatever other crazy shit. I do not want to have some corporation's AI chip in my brain.
Why?