I want to see what happens when somebody looks at the high-rise sand castle that is modern technology and decides that there wasn't enough sand involved.
I think enough companies are being stung by their hastily made AI decisions that the option of losing billions to save millions again isn't that appealing. That requires grown-ups to be involved in making decisions, though.
It seems very obvious that an AI will solve all the hardest coding problems first because they’re all documented really well, but fuck if I can get Claude to figure out a basic css layout. In wrestling with Cursor and tailwind now.
but if you task it with something that needs existing context for css, you're pretty much screwed.
I don't get how intelligent people here can fall for agi myth. i guess it's human nature, like a statician who keeps buying lottery tickets .
- Models: Not commoditized but evolving rapidly. Intelligence increasing, costs dropping 10x yearly. GPT-5 will merge GPT and reasoning model series.
- Coding Automation:
- O1 Preview: ~millionth best programmer
- O1: ~thousandth best
- O3 (upcoming): 175th best globally
- Will surpass humans at programming this year/next, not 2027 as Anthropic suggested
- Deep Research:
OpenAI's best product since ChatGPT. Provides capabilities users couldn't achieve alone.- Model Strengths:
- GPT-4.5: Better at writing, human-like interaction
- O1/O3: Better at reasoning, structured problems
- Large models encode more "subtlety and nuance"
- Future Applications: - AI tutoring for personalized education
- Robotics as the next frontier
- Creation Value: Expertise + AI will still outperform.
Value shifts toward idea generation, management, and quality assessment.Makes me wonder, are they testing this in real world scenarios or just benchmarking on Hackerrank, because 1000th best programmer in the world would be AMAZING!!! You could probably 10x the productivity of any engineering org if all engineers were suddenly 1000th best in the world. There’s absolutely no need to surpass human ability, which makes me think they’re measuring something tangentially related to the core skill of the job.