What it feels like to work with Mythos (opens in new tab)

(oneusefulthing.org)

365 pointsswolpers11d ago320 comments

320 comments

179 comments · 57 top-level

eithed11d ago· 45 in thread

What I find fascinating that there is so little substance in this article about the quality of produced code and the medium. Is the code documented and tested? Is it understandable and extendable? Is it secure? What language, framework, database was used? Author mentions judgement and taste - well, is the code tasteful? Will the model rearchitecture the entire thing if I ask it to add new functionality, spending another 9.5h in tokens? I assume that the research part is domain knowledge = how different types of travel translate to time making it presentable; how did the author verify this?

These questions are even not about AI: if I were to give money to a human agency and were given something they tell me works, I would ask the same questions. If I did not know how to evaluate, I would hire people that do. With LLMs the verification part is what bothers me the most.

an0malous11d ago

These posts are never written by software engineers, it’s always some tech exec, retired engineer, or VC. This author is apparently a professor at the Wharton School of Management? None of these people have to ship or maintain real products, they’re just making side projects.

The only decent software engineering perspective I’ve seen has been from Mitchell Hashimoto.

jimbokun11d ago

Well that’s kind of the point.

They can just summon bespoke software out of the ether that only handles the use cases of themselves and a few of their collaborators.

Making “side projects” was mot possible for non-developers before powerful LLMs. Now it is.

4 more replies

neilv10d ago

Relevant quote:

> I am sure it is not perfect (I only spent an hour working with the results), but a software engineer would iron out the remaining potential bugs that I could not find quickly [...]

People have said things like this many times in the past, and, in the past (perhaps not now), it's always been a misunderstanding of what is good and bad, what's difficult and easy.

For example, someone would draw a UI in a GUI painter that generates code (or a resource file), and a manager would see it and think the majority of the work towards the product is done. (Incidentally, then there seemed to be a reaction, towards making your UI mockups look abstract or otherwise different from runnable code, helping the nontechical to understand that this isn't 90% of the finished product.)

Or a student intern hacks out a homework-grade demo, and a manager who understands neither software engineering nor product domain says "we just need some engineers to polish it up for production", and thinks the student is a star and why can't their engineers be as brilliant and productive. (I might have once been that energetic intern, who was happy for the encouragement, but then learned more, and saw it was a thing.)

This common misunderstanding was sometimes self-correcting -- when trying to ship became a disaster of misery and regretted-attrition, or the product was poorly received by the market because it wasn't thought through nor implemented well, or building subsequent functionality atop it was a nightmare. (But adverse effects of bad approaches is one of the reasons for management and ICs to job-hop, before the unwanted effects affect them personally.)

What might be different now is that some of these AI tools are outputting better-engineered work than some software engineers, and much faster.

At the back of my mind, I'm wondering how the really great software engineers will continue to stand out, as the discipline is being devalued in the minds of most leadership, and anyone can prompt an AI to generate something that superficially appears to them like what they assume a great software engineer would produce. (Even if the great engineer would do much better quality of implementation, have innovative ideas that ML from open source code would not, and maybe arrive at better product concepts as they worked through the problems.)

cgearhart11d ago

I’m starting to realize that LLMs are really good at building low-stakes projects. Your questions mostly presume that the stakes are higher. The software will last a long time; the requirements will evolve; we can’t tolerate mistakes; etc.

The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.

qaq11d ago

You don't need LLM for that. You make _all_ projects low-stakes by working on green field project using (insert buzzword soup of the day) and leaving for a new green field opportunity (that requires experience with buzzword soup of the day) before the project ships.

2 more replies

rpdillon10d ago

This is really insightful, but I think it also extends to making the project either low stakes or low complexity. I have this lurking feeling that the preferable architecture for software will change as a result of LLMs because they're good at working on low complexity modular components more than they are on high complexity million-line code bases.

2 more replies

dchftcs10d ago

If there's a viable way to make all projects low-stakes we'd have done it. Consider this: microservices.

acedTrex11d ago

> The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.

this doesn't really work in the real world. There are many things that actually matter, engineering is fundamentally about handling them.

skywhopper10d ago

But not all projects can be low stakes. None of the important ones are.

spicyusername10d ago

    the quality of produced code and the medium

A thought I have been tossing around in my head as the models get better is that it really may not matter what the code looks like.

If the observed behavior of the software is good, then the software is good. If a bug, of whatever kind, can be fixed by a model on a vibe-coded codebase, then that's a fixable bug. If there are no exploitable vulnerabilities, then the code is secure. If the performance is adequate, then the code is performant.

It simply does not matter what the code looks like if, from the outside, it does what its supposed to, and, from the inside, a model can fix the issue if one is found.

More than ever, software engineering is now really a job about making sure the code is doing what its supposed to.

And even if it DOES matter what the code looks like, you can have a model fix that too.

skydhash10d ago

The thing is that a lot of code rely on multiple layers of abstractions with their own correctness and failure states. And then you overlay the domain correctness and failure cases on top of that.

But all of those correctness are imaginary. The hardware only enforce a few (and it may be buggy). The OS adds some more (and it’s buggy). The compiler/interpreter may have bugs (but that’s rarely a nuisance) and the libraries are often brittle. There are cracks everywhere in the tower of abstractions.

The code has never mattered. What has always mattered is the knowledge of what is the model of correctness of the software (programming as a theory by NauR), so that you can discern where a program is wrong.

The thing is a crash or some other immediate errors are actually nice to have. You get to react immediately and can have a core dump or a stacktrace that points you the error. What is truly a terror is silent corruption (wrong order of operations, wrong values for a comparison that has expanded the idea of correctness, security issues that has been backdoored for years,…).

As Hoare said:

  There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.
  The first method is far more difficult.

LLM are very much the second kind. You write a lot of complicated code, and then you can no longer reason about their correctness.

1 more reply

eithed10d ago

Don't forget that LLMs are trained on human code. If they cannot understand what your code does then they cannot make changes to it, or at least - having them understand your codebase becomes expensive (more trips to Anthropic servers)

coldtea11d ago

>What I find fascinating that there is so little substance in this article about the quality of produced code and the medium.

I clicked one of his examples intrigued "a snake game where the snake is self-aware and crazy things happen;". Played for 1-2 minutes, and it's the classic 1980s snake game. Am I missing something? What is "self-aware" about it? Some funny messages at the bottom of the screen? And what are the "crazy things"?

starshadowx210d ago

It sounds like you either didn't play enough or you are missing the new mechanics that get added over time. There's definitely more to it than just regular snake.

vunderba11d ago

I had the exact same thought. To me, it feels like they just took the fairly common “sentient video game character” trope and bolted it onto a very conventional snake game.

I will say, the act of eating creates a "bulge distortion" that flows down the length of the snake is a nice touch though.

kesor10d ago

You didn't play long enough. There are layers and layers and layers of features in that game if you play for 10 minutes or more.

1 more reply

soraminazuki10d ago

Welcome to every LLM discussion in the past 2 years or so. When asked for anything of substance, we're faced with a barrage of "but humans aren't good at this too!" Very few quantifiable evidence and lots of pure rhetoric.

skydhash10d ago

I’ve seen this pattern again and again, and I don’t bother replying. There’s also the “strong statement, and when you contradict it, they point out some particular circumstances that no one cares about”.

1 more reply

viking12310d ago

Yeah, never concrete examples from these guys.

I am creating a game and I can say that with the coding part the models help a lot, mostly gpt 5.5 high. Tbh to me all the frontier models feel the same and they can all solve the stuff I do quite well with some guidance and prompting. But that kind of makes me appreciate the other stuff more like visual style, sound design, mechanics etc etc. Tons of work still.

For brainstorming I find the models bad nowadays or maybe I am just too critical of the results

hypfer11d ago

Being the first to release an article gives you great SEO or whatever. Doing the things you've mentioned takes time.

jstummbillig11d ago

Less fascinating when you consider that this is a non-coders perspective.

CobrastanJorji10d ago

It's still fascinating, but for a different reason. The "Concord" tool that got created bills itself as "Instrument-grade measurement of qualitative text. Explore in minutes, publish with honest statistics." Instrument-grade! How wonderful! That presumably means its accuracy has been ensured, and it's been carefully calibrated, right? What, nobody's ever measured or even examined the code? Well, no matter, let's go ahead and publish it and advertise it as "honest" "instrument-grade measurements."

1 more reply

eithed11d ago

Fair enough, but enterpreunership should, I guess, ask questions if given Next Big Thing has substance behind it or is it just snake oil.

1 more reply

unholiness11d ago

Yeah, this made it basically clickbait for me, in terms of time I wasted with the wrong expectation.

The lack of downvotes on posts on HN has always felt like more of a bug than a feature to me.

nomel11d ago

So, the perspective of the one that gains the most, that will value this the most, and that will pay the most? ;)

andai10d ago

These days it's uneconomical for human to verify AI generated code. So we ask the AI to do it. Like when we asked the FBI to audit itself and they found no problems :)

chickensong10d ago

You probably don't care about the ingredients or engineering of asphalt, only if the road does its job well or is filled with potholes. Outside of the software industry, nobody gives a shit about code or databases.

geraneum10d ago

> You probably don't care about the ingredients or engineering of asphalt

Everyone does. You don’t think about it everyday because we’ve delegated it to experts which don’t come up with a new composition of Asphalt every time you press “generate”. It’s rigorously battle tested and short of intentional negligence, it’s consistent. I’m amazed how people are forgetting how the world actually works.

2 more replies

eithed10d ago

I agree. But if I'm paying for the road (even as a taxpayer) I get angry that after a year it's full of potholes and that there are unnecessary signs warning about penguin crossing, making it cost 2 times more than it should have (and dont get me started why this road is really a highway leading to my house). I'd want certain qualities. And this article is basically = you will get a road, built quickly

But yes, you are right - I don't build roads and don't know what is a price to build a road and how to determine the quality of correctly built one, nor I will ever care or learn.

1 more reply

fwip10d ago

Sure, but if there's a trillion dollar company saying that it's going to replace all our road workers or engineers - I'd want to listen to the opinion of an expert. Some reporter from CNN driving over it like "yeah seems good to me, good this" has approximately zero persuasive power to me.

queenkjuul10d ago

I care that the engineer followed industry standard best practices and used high quality asphalt. How could i not care about that? How do you think potholes aren't related to the engineering of asphalt?

Tylerian10d ago

The ingredients and composition of the tarmac is the difference between having the road full of pot holes after a week of use

jknoepfler10d ago

There also isn't any meaningful articulation of why this is a "leap forward"... literally everything claimed in the article has been claimed in the same breathless tones in articles written a year prior.

I get that there's little sense in arguing with the MBA hivemind, but... c'mon.

I manage two teams of highly motivated, largely pro-AI engineers. Both teams have independently concluded that they needed to ramp down GenAI usage because of code quality / maintainability concerns. Both teams have suffered from protracted outages caused by LLM jank not being sufficiently fenced off and guarded against. Both teams have expressed concern that the code generated by LLMs is far too verbose, full of slop, and rapidly becomes an unmaintainable mess.

These are teams that are building non-trivial LLM solutions (deep agentic data synthesis and multi-modal data tagging). They are using the technology creatively and pro-actively, not just vibe-coding slop and throwing their hands up when it fails. Both teams will continue using GenAI coding agents, don't get me wrong - but the gains are incremental, not transformative, and need careful fencing to make sustainable.

Nothing in these articles resonates as real. People who work in reality don't agree. I don't understand why this shit keeps getting attention (or rather I do, but the reasons aren't good).

markoloko10d ago

So would you be more comfortable if the user them just prompted the AI to use a specific language, framework and database. Aren't we all just going to reddit and finding out what all goes best with what? But also I don't trust nothing from it, even though I've seen it.

jimbokun11d ago

Does it matter to the people requesting the software if it acts in the way they expect?

crystal_revenge10d ago

We've lived in a software bubble for so long, most software engineers have completely forgotten that the purpose of (most) software is to solve a problem. If that problem solves the problem well and reliably it doesn't matter the quality of the code.

In fact, that's the entire reason we care about "quality code", because we assume that quality code is code that does what you expect well and consistently.

I say this as someone who hand writes code pretty much every night for fun, just to experiment with computation. Which, oddly, is more fun than ever because I don't feel like there's any need to connect this type of programming with "real world software", and I can really enjoy code for it's own sake, meanwhile my job is mostly just running agent loops (which I quite like as well).

2 more replies

eithed10d ago

True, but you should say that about every thing. Does it matter to you how the car drives, as long as it takes you to your destination? Well, yes, it matters: how will it deal with a crash, and if it's possible to replace a part and if anybody can just open it if you leave it outside. I will be amazed if somebody shows me their home-printed car, but if they'll try to sell it to me like a new one...

sexylinux10d ago

It still does make errors, yes? Because it is not usable, if we need to verify everything. AI is only interesting if it can do things that humans can not do. If you can verify results because you can do it yourself, then why use AI? It will just bind highly skilled people to do verification work. Instead these people should do the actual work, results will come quicker.

So AI is only interesting to you / your org / humans if it can do things that you can not achieve. But if it still does errors, how could we ever know that super-invention by AI is not wrong?

If we can not rely on the correctness of the result, it is not usable at all. AI must create reliable and correct results always. That was a very fundamental requirement for computing. This problem has not been solved.

fisf10d ago

By that measure, most software developers should be unemployed.

danlugo9210d ago

You can either adapt or survive man, coping and negation dont help, AI is here to stay and yes it does require pilots but this map would have taken you weeks to do, the AI did it in 10 hours, you can still dedicate a week to refactor.

Also this is easily solved by .md spec files, this whole "bad code" cope is just FUD'

Anamon5d ago

I don't think that putting a text file saying "don't make mistakes" is going to get LLM output to the point where it doesn't need professional input, guidance, review and refinement anymore. They don't make these systems more deterministic. There have even been study results showing spec files reducing prompt adherence.

grafporno11d ago

It's an ad.

otabdeveloper410d ago

Don't harsh my vibes, man.

adamtaylor_1311d ago

I'm becoming more convinced these are questions of the Before Times. Yes, yes—heresy, I know.

Yet, I can't deny the reality that I observe working with LLMs every day. If this truly is a step-function (as some are sgguesting), then I have absolutely zero concern for the quality of the code.

fwip10d ago

Kind of a circular argument, isn't it? "Some people are saying it's very good at coding. If that's true, I don't care if the code is good."

1 more reply

gopalv11d ago· 11 in thread

> It worked for nine and a half hours.

> Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct

That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.

My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.

At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.

matneyx11d ago

In Claude's defense (and I cannot believe I'm defending it), I know no single dev who could create what it did (Concord), from a 19-page design document, in 9.5 working hours.

We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."

petesergeant11d ago

Sadly I didn't get very many answers to my Ask HN, "What are you doing during inference?": https://news.ycombinator.com/item?id=47944917

2 more replies

giancarlostoro11d ago

This. I get told things like "you can't build all that on your own?" I've had Claude poop out full feature web apps in under 30 minutes, to a spec. Was it perfect? No, but sometimes even in a simple setup phase you can burn 15 minutes to some obscure setup step that's failing. I cannot just code nonstop at 900WPM or whatever ridiculous speed, and poop out an entire full feature web app, with maybe a few bugs here or there. If you can, come show me, I'll gladly have you race against my Claude prompting capabilities.

Will Claude's code be perfect in one shot? Probably not, will it get you 80 to 90% of the way there with your chosen design patterns in under a few hours? Absolutely.

2 more replies

torginus10d ago

I tried to read the 'design doc' - its slop full of vague platitudes and impressive sounding but impossible to pin down management speak - in short, it's slop, and I still don't really get what its supposed to do exactly.

It's some prompt engineered AI harness, that guides the AI to create stats after it researches a subject and ingests the data, but I'm not sure what is it that the tool actually does on top of this.

neogodless11d ago

For the rare uninitiated:

https://xkcd.com/303/

giancarlostoro11d ago

> At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.

At this point, pay me significantly more, and I'll do it.

warkdarrior11d ago

> pay me significantly more

Ha ha, that's how you negotiate yourself out of a job!

1 more reply

PeterStuer11d ago

My Opus 4.8 regularly works for 10+minutes on a single non-trivial coding request.

ASalazarMX11d ago

Your Opus 4.8? Is it now usual to refer to LLMs like that?

4 more replies

hedgehog11d ago

Work duration is also not that valuable of a measure, you're usually better off defining the process yourself in code and having that delegate chunks of work to the models. The only real issue there is that it's harder to take advantage of the providers' subscription discounts, but on the other hand it's easier to do your own model routing, and there's no way I've seen for the normal chatbots to maintain coherence on streams of work measured in days and weeks.

cyanydeez11d ago

I think we hit the sigmoid back when the QWEN models were released. By properly structuring my project, I can point it at any extension I want and get it going for 30 minutes to extend whatever. It can't effectively do 'god mode' on all the code, but being a mindful observer and code "professional" I don't need more than what a 128GB VRAM needs.

I'm amazed we're so far into SOTA bloat that the chinese will kill once they start etching silicon with these models.

theturtletalks11d ago· 10 in thread

This is what he built:

https://isochronic-passage-chart.netlify.app/

Doesn’t work too well on mobile but looks interesting

jampa11d ago

It is hallucinating many flights in my region, some that never existed (so it is not an outdated data problem).

I also see some logic flaws. It overlooks the option of going to a major hub to access faster aircraft, rather than hopping on local hubs.

Also, immigration and customs are cleared at the first airport you arrive at in the country, not at the last one.

In some countries, you need to clear immigration even while going to a third country, so 1 hour is not enough to do it.

skipants11d ago

It looks interesting but, like a lot of AI, looks correct but is not. Most of northwestern Canada says you can get there by road. If you look at Google Maps, there's no roads there for quite awhile. I see one highway between Inuvik and Tuktoyaktuk but that's about it.

neom11d ago

Reminds me of a fun story. Some 20 years ago when I moved from Fort Frances to Toronto for college, my high school best friend was also going to college in Toronto, and his dad offered to drive us together in his truck with all our stuff in the back. We were saying our goodbyes and my buddies dad said to my dad "We'll get there a lot faster, I found a shortcut!" My dad, confused says "shortcut? there is no shortcut, just highway 1..." and his dad insists he found an alternative route, much shorter by kms and we'll fly up there 6 hours faster! Get into the truck and he pulls out 5 pages of printed mapquest... I assure you, having done it, Sault Ste. Marie to Sudbury via Elliot Lake on logging roads, may look interesting, but not correct, added a good 8 hours to the trip.

rgmerk10d ago

It put the chart title directly on top of Australia.

Which just about sums up my experience with using LLMs to code, really (though not with these state-of-the-art models, admittedly) - it's amazing what they can do, but left to their own devices they'll make boneheaded decisions.

justinclift10d ago

> it's amazing what they can do, but left to their own devices they'll make boneheaded decisions.

Yeah, the whole "can run for 9 hours on a task" to me is not a positive.

I tend to find if Opus 4.8 runs for ~15 mins on a task, then the end result has gone off in a weird direction at some point, and it needs winding back a fair bit.

And that's with extremely clear direction, literal specification docs to follow, etc.

That being said, having functional code already created beforehand (ie by a human) goes a long way to ensuring the AI model has a path it can build on without making too many dumb architectural choices by itself. Generally.

alt22710d ago

I believe thats why they put 'Sydney' as an option at the top to recenter the map.

The real issue with the title is that it doesnt fit in the box!

ImaCake10d ago

It's fun and it looks good regardless of whether its 100% correct (It would certainly take me more than 9 hours of work to do better than this). Making these bespoke tools possible for most people is a big deal.

aix110d ago

The UI is full of glitches: the legend that's placed right on top of Australia, the title that doesn't fit in the box, the crosshair that doesn't accurate track the cursor, the pixellated fonts along the perimeter, the unreadable colour combinations in the overlay, the rendering glitches along the axes when you flip from tab to tab and so on and so forth.

It's like someone took a beatiful, intricate piece of vintage jewellry and made a slapdash imitation out of cheap plastic.

1 more reply

KeplerBoy10d ago

It is cool, but still weird that it get's very basic stuff wrong like mapping the cursor coordinate to the canvas. There's some y-axis scaling issue.

endymion-light10d ago

Doesn't work too well on desktop either! This is decent but it's also an early hackathon set-up - this is something that you can set up on a sonnet model fairly easily (without the weird CSS slop that anthropic models seem to love).

I'm not very threatened by this if this is the dangerous Mythos model - it just seems like a slightly incrementally better sonnet

selfawareMammal11d ago· 8 in thread

What are people working on that they see such a substantial difference between Mythos and Opus? I'd say I'm working with advanced stuff and more than often Deepseek is even more than enough. Why is everybody a genius in here?

jenniferhooley11d ago

Just depends what you are working on. If you are trying to make a video game that's at a level of a decent indie game (think Hades/Baazar/etc), making UI elements/VFX/complex shaders/etc that are organic/interactive/animated that don't feel like a little dogshit vibeslop web-game, then none of the models are even close to good enough to get it done easily. Huge percentage of problems in top 3% games is really hard for any of the models to do with simple prompting.

Personally I don't really care, because I like coding and learning myself and DeepSeek Flash is all I really care about. But it's really easy to have a ton of benchmarks where the top models can't get anywhere close - and I like to test them on these problems to see how good they are getting.

Fable 5 is def a little better than 4.8 btw.

mervz11d ago

We see the same thing when new laptops are announced and every employee all of a sudden needs to upgrade, despite the fact that 90% of people would be able to make do with a Macbook Neo.

Our_Benefactors11d ago

> despite the fact that 90% of people would be able to make do with a Macbook Neo.

Myth. Total myth! I recently had to beg for more RAM after continually hitting swap space which causes tools like dictation to stop working, failure to load certain websites without rebooting, and so on. Devs do in fact need powerful machines and the ~$500-1000 an employer saves upfront in machine costs is dwarfed by productivity losses.

Giving your engineering employees new machines in a 2-year cycle that are between the middle and high end is one of the cheapest ROI decisions that a tech org can make.

1 more reply

ianm21811d ago

I’ve been working on implementing some common web infra type projects in Rust lately. Basically trying to use a lot of the great primatives in Rust like rustls (modern openSSL) and Tokio (async) to build memory safe or close, nginx drop in replacements.

A small portion of this effort is having a high quality Lua in Rust repo. I’m using mythos to fix some of the performance issues with my Lua interpreter that gpt 5.5/ opus 4.8 had stone walled on.

Not sure if Mythos will be able to crack this but it has been running for a couple hours now with some promising results.

Performance charts linked here if your curious https://github.com/ianm199/lua-rs

mplanchard10d ago

What’s wrong with mlua?

1 more reply

matheusmoreira10d ago

I'm working on my own programming language. I've also been exploring open source projects to contribute to. Maybe something that helps me pivot from hobbyist to professional. If such a thing is even possible in this day and age.

Fable 5 found quite a few issues Opus 4.8 missed on code review, even though the stupid cybersecurity nonsense downgraded it. I can't tell you more, I only get a single session per 5h window on Max 5x. Only ran two sessions so far.

jstummbillig10d ago

I am sure you would not find it hard to exhaust any model, if you kept upping your ask enough times.

On the margins, suppose the prompt is literally: "Build a feature complete, high polish Facebook clone". Facebook is complex but likely not super complicated tech, and still I would assume that (after having burned through a substantial amount of tokens) you would find substantial enough differences in the outcomes between different models on that prompt on various fronts.

The above ask is obviously not useful, but what's preventing you from taking on bigger chunks until you approach the limit? At some point you would hit a boundary, where the diff will be obvious.

mohsen111d ago

I had a few of the benchmarks left alone and was working on tech debt knowing that a new model is going to be released soon. For my project (tsz.dev) Opus 4.8 was running in circles without producing results for a while for those tasks

JumpCrisscross11d ago· 6 in thread

Anecdote: I fed Fable some models I’ve been hand verifying (basically, I sketch out a scenario for Opus to model, it builds it, I ask it to show me the math, I correct it, we iterate like this, then I double check its code to make sure the math matches the model logic). Fable found almost every error I found, and then had some interesting suggestions for additional variables.

It also burned through my usage quota like a late-90s Hummer.

matheusmoreira11d ago

> It also burned through my usage quota like a late-90s Hummer.

Yeah. I have a Max 5x subscription and Fable burned through 16% of my weekly quota in a 40 minute code review session. It didn't even finish the review, it switched back to Opus 4.8 in the critical memory safety parts where I actually needed Fable.

I feel like I'm going to get priced out of these models soon. I should probably try to get the most out of Fable until June 22nd.

cyanydeez11d ago

now for the best question: whats your ROI here?

Ferret744611d ago

Humans are very expensive, so the equation almost always falls against them.

It's not just salary, but also safety/labor regulation, legal risk, vacations, sick time, personal conflicts, HR, benefits.

Even when automation is more expensive on paper, it's generally still cheaper

6 more replies

crystal_revenge10d ago

The parent comment is describing a test they ran so they could assess their trust in the model for scenarios they don't have time to fully understand.

Do you not believe in running tests, evaluations, or experiments at all to better understand your environment?

The ROI in the case of a positive outcome is the reduced time needed to inspect the results in the future (the entire point of AI is to know what you can trust it on, so you can delegate everything at that level with less oversight). The ROI in the negative case is the tokens not wasted on tasks to ambitious for the model.

Qhemlomo11d ago

It just got released, it shouldn't matter.

We know this model will be cheaper and faster with time.

And we have not even reached the timespan/timeframe were we have ASIC style models.

OpenAI has to do something which will beat Fable otherwise Anthropic won. China currently overtakes cars, pv, batteries and very soon silicon chip making, it has all the incentive to also take over AI.

2 more replies

PunchyHamster11d ago

It will be great when the price of compute/memory drops to normal level!

1 more reply

neaden11d ago· 5 in thread

Man, that poem it made is terrible. Like just incredibly bad. Sure it's neat that software can make an incredibly bad poem but there is enough bad poetry in the world that we don't need it.

Kiro11d ago

How good can a rhyming poem about a haircut where every word starts with S be?

electroweak10d ago

A whole lot better when written by a human, such as Michael Kandel. This was one of the tests of the electrobard in a story from the Cyberiad ("fables for the cybernetic age"). The key point about Samson was his suicide, which despite the obvious isn't mentioned in the six pages of this rubbish. Perhaps guardrails are throttling this corporate "fable"s ability to comment on the human condition.

The poem Kandel translated from the original Polish was, for artistic reasons, completely different. I will be impressed when machine translation can duplicate that!

endymion-light10d ago

Seduced, shaggy Samson snored.

She scissored short. Sorely shorn,

Soon shackled slave, Samson sighed.

Silently scheming,

Sightlessly seeking

Some savage, spectacular suicide.

- That's the translated Cyberiad Poem the blog post based it off off (or the AI decided to do so)

layer811d ago

I wonder what Vogons would think of it.

throw31082210d ago

Terrible? Incredibly bad? Something tells me you are not very familiar with poetry, literature or writing in general. This exercise gets its inspiration and tone from one of Stanislaw Lem's Cyberiad short stories ("Trurl's electronic bard"). Besides, what did you expect from a "10 pages epic rhyming poem about a haircut where every word starts with the letter S"? Robert Frost?

asdK12011d ago· 4 in thread

Mollick runs the Generative AI Lab at Wharton, with all the corporate sponsors.

He is a professor but sadly also an AI shill. He should switch to advertising washing power.

MostlyStable11d ago

So...no engagement with the substance? Not even to explain why it is that this is not a useful description or test of capabilities? Ok.

dthread311d ago

I would like to see it do something useful, like converting pytorch to golang.

4 more replies

CuriouslyC11d ago

Ethan is a booster but I wouldn't call him a shill. He cites data and mostly in a fair way, though you could argue the sources he chooses to focus on are biased.

whyenot11d ago

Instead of attacking the author, please respond to the content of the article. That is the HN way, and it leads to more substantive and interesting discussions.

michaelteter10d ago· 4 in thread

As a software engineer and solution provider, I do not feel threatened by this.

I do not fear that management will get tools like Mythos and then not need people like me. Most of the value I provide is in translating what the management/client _thinks_ they need into what is the real problem and solution.

That's not an insult to them, it's just pointing out that they see only their problem, and they imagine what would be the solution. They then ask for that solution. Quite often, what they want built isn't what they need. And I've seen so many problems, from so many domains and scenarios, that I can usually recognize the core need and propose (and build or direct building of) a solution which resolves that need AND has an eye toward the likely future needs.

Mythos may do an excellent job providing a high quality result based on what is asked of it. But the result will only be as good as the quality, clarity, and presentation of the request.

If I hire a home builder to build me a custom home, that builder is going to ask me a thousand questions - questions I had never even thought of. Mythos isn't going to ask all those questions - it's going to make the best choices it can without the consultant's level of interaction. And the buyer will get what they get. Sure, the buyer can then say, "oh, I don't want any hallways - just connected spaces." Then the house gets demolished and rebuilt to the new, clearer spec. Repeat, repeat repeat. Maybe eventually the buyer gets what they really want. More likely they give up before reaching that point, and they go and hire a real builder.

I'll sum it up like this: You can get great results with minimal effort if you don't really care too much about the details. But if you don't care much about the details, then your need probably wasn't very significant.

redhale10d ago

I will never stop being fascinated with takes like this. Maybe you're right. But people said very similar things over the last few years and many of those statements look unbelievably naive in retrospect.

Sure, AI can auto-complete the line, but it can't write full functions.

Sure, AI can write functions, but it can't complete full features.

Sure, AI can write full features, but it can't build full applications.

Sure, AI can write full applications, but it can't build them in the right way / ask the right questions / write beautiful maintainable code / do what _I_ do..

Time will tell.

ChrisLTD10d ago

Even the early versions of AI autocomplete tools like Tabnine and the original Copilot could autocomplete entire functions, so I think you might be strawmanning a bit.

zelphirkalt10d ago

I currently see the problem as follows: The knowledge worker like you sees the need for people like themselves to still be hired, and can reasonably argue for it. However, the management dudes and investors do not understand it, and it is difficult to make them understand, when their (short to medium term) profits depend on not understanding it. So whether you feel threatened or not, is just a matter of you feeling bad or not, but doesn't really matter, when it comes to finding a job.

michaelteter9d ago

You are exactly right. Regardless of whether AI is better than a human or not is irrelevant if the bad, unqualified corporate leaders are making rash silver bullet decisions that cost workers jobs.

The problem is much broader though - consolidation of wealth and power have enabled, frankly, idiots to be able to control how the world works - from politics to business. Greed and stupidity is eating the world.

I don't see any solution. This is like a disease that will either eventually kill the body or take a long time to heal, leaving deep scars and forever changing humanity.

Maybe War Games was right - the only way to win is not to play. Therefore, find something you love (even if it doesn't pay well), and do that.

(I spent two years looking for a tech job. My 30 years of broad and meaningful experience is apparently not interesting to at least the 200 companies I applied to. So now I'm a teacher, and I'm quite happy.)

1 more reply

olafmol11d ago· 3 in thread

This little line from the article scares me: "but a software engineer would iron out the remaining potential bugs that I could not find quickly"

Every sw dev knows this is a very dangerous, and unrealistic, assumption.

bluegatty10d ago

it's basically a tiny statement that kind of hand waves all the 'actual stuff'.

BigJono10d ago

It's "I did the first/easy 90% now someone else do the second/hard 90%". Same as it ever was.

1 more reply

eithed10d ago

https://www.danielzarick.com/uploads/2018-05-draw-the-owl.jp...

anonzzzies10d ago· 3 in thread

Been working on my pet project today with Fable; it seems pretty solid but not too far removed from 4.8; same hallucinating, same type of bugs, same focus in large projects on just doing what you ask and just ignoring whatever that may touch/break/influence. Running tests in the beginning but when fuller context, just 'will run later' and never doing it in the end unless you tell it to (using some assorted swear words). I will keep using it but it's incremental as far as i'm seeing, not the OMG OMG OMG Mythos is here!

LogicFailsMe10d ago

It clearly saw things immediately that 4.8 had missed on my projects. But shortly thereafter, having step functioned past those issues and impressed the crap out of me doing so, it got stuck in the usual endless loop talking about stuff more than doing stuff, occasionally deciding to pause so I'd have to whack it to get it going again.

So nope, not the AGI. But definitely an improvement.

justinclift10d ago

> it got stuck in the usual endless loop talking about stuff more than doing stuff

That's the kind of behaviour I've seen in Claude Code (Opus 4.8) when it's context space is over the 40-50% range.

I tend to keep an eye on the context usage (ie `/context`) quite a lot, and generally see good results as long as the context usage is ~30% or below.

Which isn't heaps, considering having to ensure it has the required docs/stuff it needs can take 15-20% of context by itself.

matheusmoreira10d ago

I'm having the opposite experience. Fable seems to anticipate everything and do it all without asking. It's been very impressive and great to work with.

Not exactly strange behavior, Opus acted just like this too when I first subscribed. The popupar meme is Anthropic nerfed Opus during their capacity cruch. No idea if it's true, but I do wonder if Fable will fall victim to the same fate.

mohsen111d ago· 3 in thread

I have been using it for less than an hour so take this with a grain of salt of being excited for the new tech.

In a project like mine (https://github.com/tsz-org/tsz) I am constantly frustrated that models were not doing enough research and were not taking into account other situations. Again and again models would produce code that would fix one thing and break 2 other tests that were "unrelated".

With Fable it seems like tasks are taking much longer (I have not seen a pull request from Fable sessions yet) but reading the transcription of those sessions I can see how it is doing the right thing by not leaving any stone unturned.

As the article says, it's hard to communicate this "feeling" about models because it is very project specific but I thought I share

anematode11d ago

Does this not indicate that the project might not be structured in an appropriate way that allows incrementally adding features?

layer811d ago

In general, sooner or later you need to restructure one thing or another when requirements are changing. Good code lets you reason about a refactoring, and experience tells you when it is necessary or appropriate. Coding agents aren’t very good at the latter.

mohsen111d ago

the setup is solid. there are thousands of tests and CI won't let things to merge if tests are failing.

But overall, this is pretty normal for compilers to have this sort of "unexpected" tests failing due to some work in an area. It happened to me when I was coding everything manually back in the day too

1 more reply

nstart10d ago· 3 in thread

Desperate to know what the prompt for the poem is. The idea of it felt familiar so I went down the rabbit hole and found: 14 years ago, a poem on reddit [https://www.reddit.com/r/RedditDayOf/comments/tjjw2/may_12_a...] . Nowhere near the length of the one the author shared but the same idea.

> This is from "The Cyberiad", a collection of science-fiction fairy tales by Polish author Stanislaw Lem ... In one of the stories, a robot constructor named Trurl creates a machine that writes poetry. A jealous rival named Klapaucian challenges the machine to compose "...a poem about a haircut! But lofty, noble, tragic, timeless, full of love, treachery, retribution, quiet heroism and in the face of certain doom! Six lines, cleverly rhymed, and every word beginning with the letter s!!"

And the computer responds with:

"Seduced, shaggy Samson snored.

She scissored short. Sorely shorn,

Soon shackled slave, Samson sighed.

Silently scheming,

Sightlessly seeking

Some savage, spectacular suicide"

The author had to be referencing this moment in their challenge to Fable/Mythos. I'm curious to know what their exact prompt was.

Erwin10d ago

What's fascinating is that this is the difficulty of English translation -- which uses a different start letter and different words than the Polish one:

  Cyprian cyberotoman, cynik, ceniąc czule
  Czarnej córy cesarskiej cud ciemnego ciała,
  Ciągle cytrą czarował. Czerwieniała cała,
  Cicha, co-dzień czekała, cierpiała, czuwała...
  ... Cyprian ciotkę całuje, cisnąwszy czarnulę!!

You can consider the job of a translator as compared to LLM. Both derivative works, working within some constraints but with room for creativity.

philipwhiuk10d ago

> the author had to be referencing this moment in their challenge to Fable/Mythos.

Or it just swept it up in the training data given Anthropic license Reddit comments.

nstart3d ago

Right. But this is why I want to know the prompt. My hunch is that the author knew this story. But likely prompted Fable without hinting at it. And if so, the fact that Fable defaulted to the story of Samson shows that while it can impressively extend that over so many "scrolls", it also could only generate the idea based on what it had gobbled up. I'm thinking of this because given to another human, I doubt they'd only ever go for Samson's story by default.

recursivedoubts11d ago· 2 in thread

would it be possible for mythos to make the space bar scroll the pages on your website properly?

mulr00ney11d ago

Seems to be hijacked the video of some game they generated. :(

albedoa11d ago

If you delete the video from the DOM, then click back into the content area, it reattaches the video lol.

mawadev11d ago· 2 in thread

Isn't it weird that we started to gauge the quality of a model by checking the vibe of the vibe coding?

geraneum10d ago

You can see this all over the place. Under the Fable post in HN, you have simonw talking about the “feel” of working with Fable and how much better it is. If I believed in conspiracies, I’d have said it’s all orchestrated marketing…

orangecat10d ago

And yet when there are objective results like "we ported Bun to Rust and 100% of the test suite passes" the response is "so what, the code is obviously crap and has tons of bugs that the tests don't cover".

Aperocky11d ago· 2 in thread

> This is a map that shows the distance you can travel in a given length of time, and the first one was created in 1881 showing travel times from London.

The first item on the article, the first thing it showed, was wrong though.

It is 100% faster to go from London to New York in 1881 than Volgagrad. Or any of the Russian hinterland colored green or Turkey or Egypt.

patcon11d ago

> faster to go from London to New York in 1881 than Volgagrad

the map is for 2026, yeah?

Aperocky10d ago

yeah the original map was not for this purpose. Though I would say there are heavy assumption made for 2026 too, namely the flights are available immediately upon demand.

younglunaman11d ago· 2 in thread

>What it feels like to work with Mythos >Looks Inside >So I did this with fable...

What?

warkdarrior11d ago

Fable is Mythos with extra guardrails, so the analysis holds.

Chu4eeno10d ago

Considering all the initial Mythos hype (before they released Fable) was for things that Mythos explicitly can't do, no, not really.

et-al11d ago· 2 in thread

[flagged]

astrange11d ago

It is not a sponsored article and he writes one of these every time a new model releases. Why would a professor at Wharton need to write sponsored Substack articles.

0x1ceb00da11d ago

"I don't care who the IRS sends I am not paying taxes!"

economistbob10d ago· 1 in thread

And therein is the problem most perfectly expressed. He prompted that all the data should be real and validated and then simply trusted that it was. That was for a data driven project. People will do that for countless things, even critical things.

an0malous10d ago

I wished I had learned earlier in life how much more I could BS things because no one was going to check

thepasch11d ago· 1 in thread

What it feels like to work with Fable:

> Switched to Opus 4.8: Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback or learn more.

matheusmoreira11d ago

Same experience here. The parts of my project that actually could have benefited from Fable's code review got this instead.

wxw11d ago· 1 in thread

I am… underwhelmed by the artifacts in the post.

I don’t see why working longer is a pro. The results don’t seem much better than you’d get from putting Opus in a long loop.

warkdarrior11d ago

> The results don’t seem much better than you’d get from putting Opus in a long loop.

Care to share the results you got from Opus working on the same prompt? It should be easy to compare quality.

catigula11d ago· 1 in thread

>Ethan Mollick

Just an FYI this guy is an AI hype-beast. Some of his tweets are truly out there.

dogmayor11d ago

Huge fanboy for sure

382hi11d ago· 1 in thread

I think Qwen 3.7-Plus is better at reasoning than Mythos, and I've used both for quite a while.

giancarlostoro11d ago

Would love to see samples of the kinds of prompts you use with both. I sometimes wonder if the specific wording is the secret sauce, I have very few issues with Opus / Claude, but when I try premier GPT models, I get weird output from what I've grown to expect with Claude.

steve197711d ago· 1 in thread

> it is indicative of AI solving a hard problem involving research, math, visual development, taste, judgement, complex coding, and more.

Is it a hard problem or is it just labor intensive?

warkdarrior11d ago

Depends on the skill of the person working on it.

the_doctah11d ago· 1 in thread

More Mythos Marketing.

boringg11d ago

The mythos of Mythos is marketing.

ecocentrik11d ago

Reading the first few paragraphs of what he calls "the most sophisticated academic social science paper I have yet seen from an AI" does not impress as much as I hoped.

"Posterior beliefs about market demand are purely referencedependent: holding dollars raised constant, they track only performance relative to the founder’s self-chosen goal—jumping half a standard deviation at the threshold, responding steeply for the first ten points past it, and flattening thereafter"

Humans generally don't verbalize data this way. The summary document is also very fluffy.

2 more replies

shadytrees10d ago

The Balatro game that Fable spit out (Flipside) https://play-flipside.netlify.app/ is buggy but fun. Fable also fixed one of my personal pet peeves. Unlike Balatro, it comes with a calculator to preview the score!

xavierforge10d ago

No question the capability jump is real, but in my experience it correlates with shortcut-taking. Fable 5 (and Opus 4.8 before it) hallucinates more than any Claude model I’ve used. The most common failure mode is asking it to modify existing code and watching it skip reading the original file, reconstruct that section from imagination, and then apply edits on top of its own invention, even with full context provided.

Maybe my prompts are too vague, but it’s worth noting that every example in the post is a greenfield build, and vague prompting seems to hold up fine when there are no existing constraints to respect.

pu_pe11d ago

The isochrone maps are quite beautiful [1], and go beyond the scope and refinement of some earlier human attempts I could find [2][3][4].

[1] https://isochronic-passage-chart.netlify.app/

[2] https://mapitout.welcome-to-nl.nl/

[3] https://commutetimemap.com/

[4] https://andrewding.ca/flightisochrones/

Ameo10d ago

Most depressing thing I've read in weeks, and that's a high bar. Hooray to humanity for creating the thing which has destroyed all the value of of being good at creating things.

mieubrisse10d ago

Having a good prompt-engineering skill is the highest-leverage thing IMO, so I burnt 2 Max 20x usage windows to help Fable help me refactor mine. With its partnership we:

- Went deep on "what types of guidance even are there? what does giving good guidance mean?"

- Sampled my existing Claude guidance (CLAUDE.md, skills, hooks, etc.) and broke their guidance into "atoms"

- Categorized them by clustering, the same way Big Five was generated

- Generated a new candidate

- Then used independent agents to compare it against my existing corpus assuming that the new one would be worse

Working with it felt like working with a supersmart entity capable of generating very plausible-sounding but not-necessarily-true statements. The outcome certainly felt like an alien artifact, like nothing I'd make myself.

Only time'll tell if it holds up, but it sure had some interesting ideas.

SupremumLimit10d ago

I took a brief look at the code for one of the projects (https://github.com/emollick/concord/) he breathlessly praises and says "a software engineer would iron out the remaining potential bugs that I could not find quickly". The code looks like an unmaintainable mess.

Other commenters have pointed out that his isochrone map contains a lot of nonsense as well.

So the most charitable interpretation here is that this is a case of Gell-Mann amnesia.

mjamesaustin11d ago

The snake game is legit very fun. Once I got the ability to pick up the apples and plant apple trees, I was sold.

fractorial10d ago

I’ve been falling back to Opus 4.6 since 4.7 and 4.8. I recently found success in using Opus 4.6 for cheap orchestration and reasoning while Opus 4.8 High/Max agents do the work.

I made serious progress towards repairing a proof for a conjecture that was published 10 days ago but kept running into a wall with one of the Lemmas.

I threw Fable 5 Max at it with the same subagent set up and in an hour it claimed to have disproved a core theorem of the paper.

The Lean construction looks correct, but I still need to verify it rigorously. This is certainly not something Opus 4.6 Max could do and it’s likely something Opus 4.8 Max could do with more delicate orchestration and time. However, the “one-shot” Fable 5 did give me pause.

Planktonne9d ago

The outputs are impressive because a machine made them, but they're not impressive in themselves.

Given a tool that is supposed to unlock creativity and excitement, he made a series of worse clones of things.

Again, technically impressive, but the world has never needed the ability to make Balatro but less polished and coherent. We already have Balatro.

I'd be more convinced if people made things that didn't already exist; show me that these tools enable something you actually want.

dmzxnico10d ago

Probably just a model that was trained on high code bases, tuned to find security breaches and bugs by being "smart" enough to actually test the code by itself / manually going through the app / website feels easy for Fable so Mythos is just a better version.

clhodapp9d ago

So, the author gave the model single sentence prompts like "Balatro, but for the game of coin flips" plus generic encouragement like "make it better" three times ended up with netlify-hostable web games each time? That is hard for me to believe.

It's likely that at least some amount of additional context was provided to the model to enable it to reliably create the desired form factor. This introduces the caveat that the author probably views some amount of context as being trivial / beneath the level of mentioning. But then the question becomes where they draw the line.

jgilias10d ago

Cool. But.

Most of the “impressive” stuff is not “the model” but “the harness”. Spinning up the subagents and teams of lower models, letting them explore, do adversarial coding. It’s all in the harness. Granted, Mythos might be better at that orchestration, but it’s still the harness.

Second is the prompting. The author is an expert in what they’re doing and prompts the system in a way that yields useful results. I see too many people believing that if an expert can achieve those results in a domain they’re familiar with, then them as non-experts will be able to as well. And that’s a fallacy that Mythos doesn’t change.

vb-844811d ago

Nice, but I'm really curious about how many tokens have been used.

There is only one hint: 475k tokens in the screenshot when OP asked the model to fix some behaviour, but it would be fascinating to know the total tokens amount.

kgeist10d ago

Judging by the benchmarks on Artificial Analysis, "a very real leap over every model" is 2-3 points over competitors (say, 62 for Fable 5 vs. 59 for ChatGPT 5.5 xhigh for coding).

12345hn678910d ago

The coin flip game does not work. I tossed 2 coins and it broke after that. You cannot progress forward.

Not a great start for "a generational leap in model effectiveness"

ElijahLynn10d ago

Loved the article!

And I'm excited to try it, but also have a fear that I will like it too much and then won't have access to it in 2 weeks... But maybe I will and maybe it will be worth it and I'll just pay a bunch of extra for it and it'll be great!

I think the article could be improved by actually sharing more feelings. I clicked on the article for feelings but I didn't see that many feelings described.

root_axis11d ago

I just can't stand this type of fawning language.

ComplexSystems11d ago

Who can afford to use this damn thing though? They're pricing everyone out of the market with stuff like this.

philipswood10d ago

> It also created a 10-page epic rhyming poem about a haircut where every word starts with the letter s

Wow

lominming10d ago

My main issue with many of these tests and reviews is that most of the results focus on testing the harness (in this case, likely Claude Code) rather than evaluating the model’s inherent performance.

zuzululu11d ago

> First, how good is Fable? In experiment after experiment I conducted, it outperformed basically every other public model I have used by a considerable margin.

What makes me excited is that GPT 5.6 (its actually GPT 6) is going to be crazy

kleiba210d ago

> So I asked Fable to solve the problem, first generating a complex 19 page design document and then executing it.
> It worked for nine and a half hours.

And how much did that cost?

brockVond202110d ago

on the places I've checked, mostly Paris to places in Ireland or Britain, the times are off by an order of magnitude

looks nice but deeply flawed

classic LLM output

queenkjuul10d ago

not only is the site completely unusable on mobile ootb, but when i enable desktop mode on Android, my taps are detected in the wrong spot--clicking Chicago registers as Saskatoon.

At first i thought its routing was just completely botched.

The text overflow on the legend is pretty funny considering how well the other graphics turned out

(Edit: referring to the map app)

honeycrispy11d ago

Reading it, I can't help but feel he's being paid to write this. Or maybe he hopes to be paid. The language he uses makes him sound like he's fawning over the lost days of his childhood. Pardon me for being skeptical, but a trillion dollar company running a net-loss is hoping to IPO, and needs to sway public opinion by any means necessary. I would imagine that no dirty marketing scheme is off of the table, even from the self-proclaimed "good guys".

philipwhiuk10d ago

Given that token counts are easily available not providing how much any of his examples cost is lunacy.

zb311d ago

Was the condition of being granted early access to this castrated model writing a post praising it?

ElijahLynn10d ago

> The work has shifted from process to outcome. I no longer steer; I commission.

PaulHoule11d ago

My wife likes to say "feelings aren't facts"

ThejaCH11d ago

What it feels like to work with Mythos? Feels like am poor

LogicFailsMe11d ago

I'm using Fable this afternoon and it's definitely a step up from Opus 4.8, finding and fixing things Opus 4.8 was blind to even perceiving. The next 13 days are going to be fun IMO. And Opus 4.8 was less annoying than Opus 4.7 FWIW.

Edit: A couple hours in and I just got my first gaslighting attempt from the model. Good times!

nickphx10d ago

oh look, more overhyped drivel from a non-technical person.

j / k navigate · click thread line to collapse

320 comments

179 comments · 57 top-level

eithed11d ago· 45 in thread

an0malous11d ago

The only decent software engineering perspective I’ve seen has been from Mitchell Hashimoto.

jimbokun11d ago

Well that’s kind of the point.

They can just summon bespoke software out of the ether that only handles the use cases of themselves and a few of their collaborators.

Making “side projects” was mot possible for non-developers before powerful LLMs. Now it is.

4 more replies

neilv10d ago

Relevant quote:

> I am sure it is not perfect (I only spent an hour working with the results), but a software engineer would iron out the remaining potential bugs that I could not find quickly [...]

People have said things like this many times in the past, and, in the past (perhaps not now), it's always been a misunderstanding of what is good and bad, what's difficult and easy.

What might be different now is that some of these AI tools are outputting better-engineered work than some software engineers, and much faster.

cgearhart11d ago

The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.

qaq11d ago

2 more replies

rpdillon10d ago

2 more replies

dchftcs10d ago

If there's a viable way to make all projects low-stakes we'd have done it. Consider this: microservices.

acedTrex11d ago

> The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.

this doesn't really work in the real world. There are many things that actually matter, engineering is fundamentally about handling them.

skywhopper10d ago

But not all projects can be low stakes. None of the important ones are.

spicyusername10d ago

    the quality of produced code and the medium

A thought I have been tossing around in my head as the models get better is that it really may not matter what the code looks like.

It simply does not matter what the code looks like if, from the outside, it does what its supposed to, and, from the inside, a model can fix the issue if one is found.

More than ever, software engineering is now really a job about making sure the code is doing what its supposed to.

And even if it DOES matter what the code looks like, you can have a model fix that too.

skydhash10d ago

The thing is that a lot of code rely on multiple layers of abstractions with their own correctness and failure states. And then you overlay the domain correctness and failure cases on top of that.

As Hoare said:

  There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.
  The first method is far more difficult.

LLM are very much the second kind. You write a lot of complicated code, and then you can no longer reason about their correctness.

1 more reply

eithed10d ago

coldtea11d ago

>What I find fascinating that there is so little substance in this article about the quality of produced code and the medium.

starshadowx210d ago

It sounds like you either didn't play enough or you are missing the new mechanics that get added over time. There's definitely more to it than just regular snake.

vunderba11d ago

I had the exact same thought. To me, it feels like they just took the fairly common “sentient video game character” trope and bolted it onto a very conventional snake game.

I will say, the act of eating creates a "bulge distortion" that flows down the length of the snake is a nice touch though.

kesor10d ago

You didn't play long enough. There are layers and layers and layers of features in that game if you play for 10 minutes or more.

1 more reply

soraminazuki10d ago

skydhash10d ago

1 more reply

viking12310d ago

Yeah, never concrete examples from these guys.

For brainstorming I find the models bad nowadays or maybe I am just too critical of the results

hypfer11d ago

Being the first to release an article gives you great SEO or whatever. Doing the things you've mentioned takes time.

jstummbillig11d ago

Less fascinating when you consider that this is a non-coders perspective.

CobrastanJorji10d ago

1 more reply

eithed11d ago

Fair enough, but enterpreunership should, I guess, ask questions if given Next Big Thing has substance behind it or is it just snake oil.

1 more reply

unholiness11d ago

Yeah, this made it basically clickbait for me, in terms of time I wasted with the wrong expectation.

The lack of downvotes on posts on HN has always felt like more of a bug than a feature to me.

nomel11d ago

So, the perspective of the one that gains the most, that will value this the most, and that will pay the most? ;)

andai10d ago

These days it's uneconomical for human to verify AI generated code. So we ask the AI to do it. Like when we asked the FBI to audit itself and they found no problems :)

chickensong10d ago

geraneum10d ago

> You probably don't care about the ingredients or engineering of asphalt

2 more replies

eithed10d ago

But yes, you are right - I don't build roads and don't know what is a price to build a road and how to determine the quality of correctly built one, nor I will ever care or learn.

1 more reply

fwip10d ago

queenkjuul10d ago

Tylerian10d ago

The ingredients and composition of the tarmac is the difference between having the road full of pot holes after a week of use

jknoepfler10d ago

I get that there's little sense in arguing with the MBA hivemind, but... c'mon.

Nothing in these articles resonates as real. People who work in reality don't agree. I don't understand why this shit keeps getting attention (or rather I do, but the reasons aren't good).

markoloko10d ago

jimbokun11d ago

Does it matter to the people requesting the software if it acts in the way they expect?

crystal_revenge10d ago

In fact, that's the entire reason we care about "quality code", because we assume that quality code is code that does what you expect well and consistently.

2 more replies

eithed10d ago

sexylinux10d ago

So AI is only interesting to you / your org / humans if it can do things that you can not achieve. But if it still does errors, how could we ever know that super-invention by AI is not wrong?

fisf10d ago

By that measure, most software developers should be unemployed.

danlugo9210d ago

Also this is easily solved by .md spec files, this whole "bad code" cope is just FUD'

Anamon5d ago

grafporno11d ago

It's an ad.

otabdeveloper410d ago

Don't harsh my vibes, man.

adamtaylor_1311d ago

I'm becoming more convinced these are questions of the Before Times. Yes, yes—heresy, I know.

Yet, I can't deny the reality that I observe working with LLMs every day. If this truly is a step-function (as some are sgguesting), then I have absolutely zero concern for the quality of the code.

fwip10d ago

Kind of a circular argument, isn't it? "Some people are saying it's very good at coding. If that's true, I don't care if the code is good."

1 more reply

gopalv11d ago· 11 in thread

> It worked for nine and a half hours.

> Again, it wasn’t perfect. As an expert, I was able to spot some errors and omissions (some as a result of the design I had asked for) that I had the AI correct

That's the bit that stuck out to me - that's longer than I would expect to work on a problem in a day or even expect to go back & fix the output of something that has a core reward loop of hours.

My customers are currently clamoring to push down my agent response times from 85 seconds down to below the 20s mark.

At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.

matneyx11d ago

In Claude's defense (and I cannot believe I'm defending it), I know no single dev who could create what it did (Concord), from a 19-page design document, in 9.5 working hours.

We're gonna go back to the days where our bosses ask why we're just sitting around, but instead of saying "compiling," we'll just say, "waiting for Claude."

petesergeant11d ago

Sadly I didn't get very many answers to my Ask HN, "What are you doing during inference?": https://news.ycombinator.com/item?id=47944917

2 more replies

giancarlostoro11d ago

Will Claude's code be perfect in one shot? Probably not, will it get you 80 to 90% of the way there with your chosen design patterns in under a few hours? Absolutely.

2 more replies

torginus10d ago

neogodless11d ago

For the rare uninitiated:

https://xkcd.com/303/

giancarlostoro11d ago

> At the same time, it is very dissonant to see the industry heading towards hour+ long workflows with an agent.

At this point, pay me significantly more, and I'll do it.

warkdarrior11d ago

> pay me significantly more

Ha ha, that's how you negotiate yourself out of a job!

1 more reply

PeterStuer11d ago

My Opus 4.8 regularly works for 10+minutes on a single non-trivial coding request.

ASalazarMX11d ago

Your Opus 4.8? Is it now usual to refer to LLMs like that?

4 more replies

hedgehog11d ago

cyanydeez11d ago

I'm amazed we're so far into SOTA bloat that the chinese will kill once they start etching silicon with these models.

theturtletalks11d ago· 10 in thread

This is what he built:

https://isochronic-passage-chart.netlify.app/

Doesn’t work too well on mobile but looks interesting

jampa11d ago

It is hallucinating many flights in my region, some that never existed (so it is not an outdated data problem).

I also see some logic flaws. It overlooks the option of going to a major hub to access faster aircraft, rather than hopping on local hubs.

Also, immigration and customs are cleared at the first airport you arrive at in the country, not at the last one.

In some countries, you need to clear immigration even while going to a third country, so 1 hour is not enough to do it.

skipants11d ago

neom11d ago

rgmerk10d ago

It put the chart title directly on top of Australia.

justinclift10d ago

> it's amazing what they can do, but left to their own devices they'll make boneheaded decisions.

Yeah, the whole "can run for 9 hours on a task" to me is not a positive.

I tend to find if Opus 4.8 runs for ~15 mins on a task, then the end result has gone off in a weird direction at some point, and it needs winding back a fair bit.

And that's with extremely clear direction, literal specification docs to follow, etc.

alt22710d ago

I believe thats why they put 'Sydney' as an option at the top to recenter the map.

The real issue with the title is that it doesnt fit in the box!

ImaCake10d ago

aix110d ago

It's like someone took a beatiful, intricate piece of vintage jewellry and made a slapdash imitation out of cheap plastic.

1 more reply

KeplerBoy10d ago

It is cool, but still weird that it get's very basic stuff wrong like mapping the cursor coordinate to the canvas. There's some y-axis scaling issue.

endymion-light10d ago

I'm not very threatened by this if this is the dangerous Mythos model - it just seems like a slightly incrementally better sonnet

selfawareMammal11d ago· 8 in thread

jenniferhooley11d ago

Fable 5 is def a little better than 4.8 btw.

mervz11d ago

We see the same thing when new laptops are announced and every employee all of a sudden needs to upgrade, despite the fact that 90% of people would be able to make do with a Macbook Neo.

Our_Benefactors11d ago

> despite the fact that 90% of people would be able to make do with a Macbook Neo.

Giving your engineering employees new machines in a 2-year cycle that are between the middle and high end is one of the cheapest ROI decisions that a tech org can make.

1 more reply

ianm21811d ago

A small portion of this effort is having a high quality Lua in Rust repo. I’m using mythos to fix some of the performance issues with my Lua interpreter that gpt 5.5/ opus 4.8 had stone walled on.

Not sure if Mythos will be able to crack this but it has been running for a couple hours now with some promising results.

Performance charts linked here if your curious https://github.com/ianm199/lua-rs

mplanchard10d ago

What’s wrong with mlua?

1 more reply

matheusmoreira10d ago

jstummbillig10d ago

I am sure you would not find it hard to exhaust any model, if you kept upping your ask enough times.

The above ask is obviously not useful, but what's preventing you from taking on bigger chunks until you approach the limit? At some point you would hit a boundary, where the diff will be obvious.

mohsen111d ago

JumpCrisscross11d ago· 6 in thread

It also burned through my usage quota like a late-90s Hummer.

matheusmoreira11d ago

> It also burned through my usage quota like a late-90s Hummer.

I feel like I'm going to get priced out of these models soon. I should probably try to get the most out of Fable until June 22nd.

cyanydeez11d ago

now for the best question: whats your ROI here?

Ferret744611d ago

Humans are very expensive, so the equation almost always falls against them.

It's not just salary, but also safety/labor regulation, legal risk, vacations, sick time, personal conflicts, HR, benefits.

Even when automation is more expensive on paper, it's generally still cheaper

6 more replies

crystal_revenge10d ago

The parent comment is describing a test they ran so they could assess their trust in the model for scenarios they don't have time to fully understand.

Do you not believe in running tests, evaluations, or experiments at all to better understand your environment?

Qhemlomo11d ago

It just got released, it shouldn't matter.

We know this model will be cheaper and faster with time.

And we have not even reached the timespan/timeframe were we have ASIC style models.

2 more replies

PunchyHamster11d ago

It will be great when the price of compute/memory drops to normal level!

1 more reply

neaden11d ago· 5 in thread

Man, that poem it made is terrible. Like just incredibly bad. Sure it's neat that software can make an incredibly bad poem but there is enough bad poetry in the world that we don't need it.

Kiro11d ago

How good can a rhyming poem about a haircut where every word starts with S be?

electroweak10d ago

The poem Kandel translated from the original Polish was, for artistic reasons, completely different. I will be impressed when machine translation can duplicate that!

endymion-light10d ago

Seduced, shaggy Samson snored.

She scissored short. Sorely shorn,

Soon shackled slave, Samson sighed.

Silently scheming,

Sightlessly seeking

Some savage, spectacular suicide.

- That's the translated Cyberiad Poem the blog post based it off off (or the AI decided to do so)

layer811d ago

I wonder what Vogons would think of it.

throw31082210d ago

asdK12011d ago· 4 in thread

Mollick runs the Generative AI Lab at Wharton, with all the corporate sponsors.

He is a professor but sadly also an AI shill. He should switch to advertising washing power.

MostlyStable11d ago

So...no engagement with the substance? Not even to explain why it is that this is not a useful description or test of capabilities? Ok.

dthread311d ago

I would like to see it do something useful, like converting pytorch to golang.

4 more replies

CuriouslyC11d ago

Ethan is a booster but I wouldn't call him a shill. He cites data and mostly in a fair way, though you could argue the sources he chooses to focus on are biased.

whyenot11d ago

Instead of attacking the author, please respond to the content of the article. That is the HN way, and it leads to more substantive and interesting discussions.

michaelteter10d ago· 4 in thread

As a software engineer and solution provider, I do not feel threatened by this.

Mythos may do an excellent job providing a high quality result based on what is asked of it. But the result will only be as good as the quality, clarity, and presentation of the request.

redhale10d ago

Sure, AI can auto-complete the line, but it can't write full functions.

Sure, AI can write functions, but it can't complete full features.

Sure, AI can write full features, but it can't build full applications.

Sure, AI can write full applications, but it can't build them in the right way / ask the right questions / write beautiful maintainable code / do what _I_ do..

Time will tell.

ChrisLTD10d ago

Even the early versions of AI autocomplete tools like Tabnine and the original Copilot could autocomplete entire functions, so I think you might be strawmanning a bit.

zelphirkalt10d ago

michaelteter9d ago

You are exactly right. Regardless of whether AI is better than a human or not is irrelevant if the bad, unqualified corporate leaders are making rash silver bullet decisions that cost workers jobs.

I don't see any solution. This is like a disease that will either eventually kill the body or take a long time to heal, leaving deep scars and forever changing humanity.

Maybe War Games was right - the only way to win is not to play. Therefore, find something you love (even if it doesn't pay well), and do that.

1 more reply

olafmol11d ago· 3 in thread

This little line from the article scares me: "but a software engineer would iron out the remaining potential bugs that I could not find quickly"

Every sw dev knows this is a very dangerous, and unrealistic, assumption.

bluegatty10d ago

it's basically a tiny statement that kind of hand waves all the 'actual stuff'.

BigJono10d ago

It's "I did the first/easy 90% now someone else do the second/hard 90%". Same as it ever was.

1 more reply

eithed10d ago

https://www.danielzarick.com/uploads/2018-05-draw-the-owl.jp...

anonzzzies10d ago· 3 in thread

LogicFailsMe10d ago

So nope, not the AGI. But definitely an improvement.

justinclift10d ago

> it got stuck in the usual endless loop talking about stuff more than doing stuff

That's the kind of behaviour I've seen in Claude Code (Opus 4.8) when it's context space is over the 40-50% range.

I tend to keep an eye on the context usage (ie `/context`) quite a lot, and generally see good results as long as the context usage is ~30% or below.

Which isn't heaps, considering having to ensure it has the required docs/stuff it needs can take 15-20% of context by itself.

matheusmoreira10d ago

I'm having the opposite experience. Fable seems to anticipate everything and do it all without asking. It's been very impressive and great to work with.

mohsen111d ago· 3 in thread

I have been using it for less than an hour so take this with a grain of salt of being excited for the new tech.

As the article says, it's hard to communicate this "feeling" about models because it is very project specific but I thought I share

anematode11d ago

Does this not indicate that the project might not be structured in an appropriate way that allows incrementally adding features?

layer811d ago

mohsen111d ago

the setup is solid. there are thousands of tests and CI won't let things to merge if tests are failing.

1 more reply

nstart10d ago· 3 in thread

And the computer responds with:

"Seduced, shaggy Samson snored.

She scissored short. Sorely shorn,

Soon shackled slave, Samson sighed.

Silently scheming,

Sightlessly seeking

Some savage, spectacular suicide"

The author had to be referencing this moment in their challenge to Fable/Mythos. I'm curious to know what their exact prompt was.

Erwin10d ago

What's fascinating is that this is the difficulty of English translation -- which uses a different start letter and different words than the Polish one:

  Cyprian cyberotoman, cynik, ceniąc czule
  Czarnej córy cesarskiej cud ciemnego ciała,
  Ciągle cytrą czarował. Czerwieniała cała,
  Cicha, co-dzień czekała, cierpiała, czuwała...
  ... Cyprian ciotkę całuje, cisnąwszy czarnulę!!

You can consider the job of a translator as compared to LLM. Both derivative works, working within some constraints but with room for creativity.

philipwhiuk10d ago

> the author had to be referencing this moment in their challenge to Fable/Mythos.

Or it just swept it up in the training data given Anthropic license Reddit comments.

nstart3d ago

recursivedoubts11d ago· 2 in thread

would it be possible for mythos to make the space bar scroll the pages on your website properly?

mulr00ney11d ago

Seems to be hijacked the video of some game they generated. :(

albedoa11d ago

If you delete the video from the DOM, then click back into the content area, it reattaches the video lol.

mawadev11d ago· 2 in thread

Isn't it weird that we started to gauge the quality of a model by checking the vibe of the vibe coding?

geraneum10d ago

orangecat10d ago

Aperocky11d ago· 2 in thread

> This is a map that shows the distance you can travel in a given length of time, and the first one was created in 1881 showing travel times from London.

The first item on the article, the first thing it showed, was wrong though.

It is 100% faster to go from London to New York in 1881 than Volgagrad. Or any of the Russian hinterland colored green or Turkey or Egypt.

patcon11d ago

> faster to go from London to New York in 1881 than Volgagrad

the map is for 2026, yeah?

Aperocky10d ago

yeah the original map was not for this purpose. Though I would say there are heavy assumption made for 2026 too, namely the flights are available immediately upon demand.

younglunaman11d ago· 2 in thread

>What it feels like to work with Mythos >Looks Inside >So I did this with fable...

What?

warkdarrior11d ago

Fable is Mythos with extra guardrails, so the analysis holds.

Chu4eeno10d ago

Considering all the initial Mythos hype (before they released Fable) was for things that Mythos explicitly can't do, no, not really.

et-al11d ago· 2 in thread

[flagged]

astrange11d ago

It is not a sponsored article and he writes one of these every time a new model releases. Why would a professor at Wharton need to write sponsored Substack articles.

0x1ceb00da11d ago

"I don't care who the IRS sends I am not paying taxes!"

economistbob10d ago· 1 in thread

an0malous10d ago

I wished I had learned earlier in life how much more I could BS things because no one was going to check

thepasch11d ago· 1 in thread

What it feels like to work with Fable:

matheusmoreira11d ago

Same experience here. The parts of my project that actually could have benefited from Fable's code review got this instead.

wxw11d ago· 1 in thread

I am… underwhelmed by the artifacts in the post.

I don’t see why working longer is a pro. The results don’t seem much better than you’d get from putting Opus in a long loop.

warkdarrior11d ago

> The results don’t seem much better than you’d get from putting Opus in a long loop.

Care to share the results you got from Opus working on the same prompt? It should be easy to compare quality.

catigula11d ago· 1 in thread

>Ethan Mollick

Just an FYI this guy is an AI hype-beast. Some of his tweets are truly out there.

dogmayor11d ago

Huge fanboy for sure

382hi11d ago· 1 in thread

I think Qwen 3.7-Plus is better at reasoning than Mythos, and I've used both for quite a while.

giancarlostoro11d ago

steve197711d ago· 1 in thread

> it is indicative of AI solving a hard problem involving research, math, visual development, taste, judgement, complex coding, and more.

Is it a hard problem or is it just labor intensive?

warkdarrior11d ago

Depends on the skill of the person working on it.

the_doctah11d ago· 1 in thread

More Mythos Marketing.

boringg11d ago

The mythos of Mythos is marketing.

ecocentrik11d ago

Reading the first few paragraphs of what he calls "the most sophisticated academic social science paper I have yet seen from an AI" does not impress as much as I hoped.

Humans generally don't verbalize data this way. The summary document is also very fluffy.

2 more replies

shadytrees10d ago

xavierforge10d ago

pu_pe11d ago

The isochrone maps are quite beautiful [1], and go beyond the scope and refinement of some earlier human attempts I could find [2][3][4].

[1] https://isochronic-passage-chart.netlify.app/

[2] https://mapitout.welcome-to-nl.nl/

[3] https://commutetimemap.com/

[4] https://andrewding.ca/flightisochrones/

Ameo10d ago

Most depressing thing I've read in weeks, and that's a high bar. Hooray to humanity for creating the thing which has destroyed all the value of of being good at creating things.

mieubrisse10d ago

Having a good prompt-engineering skill is the highest-leverage thing IMO, so I burnt 2 Max 20x usage windows to help Fable help me refactor mine. With its partnership we:

- Went deep on "what types of guidance even are there? what does giving good guidance mean?"

- Sampled my existing Claude guidance (CLAUDE.md, skills, hooks, etc.) and broke their guidance into "atoms"

- Categorized them by clustering, the same way Big Five was generated

- Generated a new candidate

- Then used independent agents to compare it against my existing corpus assuming that the new one would be worse

Only time'll tell if it holds up, but it sure had some interesting ideas.

SupremumLimit10d ago

Other commenters have pointed out that his isochrone map contains a lot of nonsense as well.

So the most charitable interpretation here is that this is a case of Gell-Mann amnesia.

mjamesaustin11d ago

The snake game is legit very fun. Once I got the ability to pick up the apples and plant apple trees, I was sold.

fractorial10d ago

I’ve been falling back to Opus 4.6 since 4.7 and 4.8. I recently found success in using Opus 4.6 for cheap orchestration and reasoning while Opus 4.8 High/Max agents do the work.

I made serious progress towards repairing a proof for a conjecture that was published 10 days ago but kept running into a wall with one of the Lemmas.

I threw Fable 5 Max at it with the same subagent set up and in an hour it claimed to have disproved a core theorem of the paper.

Planktonne9d ago

The outputs are impressive because a machine made them, but they're not impressive in themselves.

Given a tool that is supposed to unlock creativity and excitement, he made a series of worse clones of things.

Again, technically impressive, but the world has never needed the ability to make Balatro but less polished and coherent. We already have Balatro.

I'd be more convinced if people made things that didn't already exist; show me that these tools enable something you actually want.

dmzxnico10d ago

clhodapp9d ago

jgilias10d ago

Cool. But.

vb-844811d ago

Nice, but I'm really curious about how many tokens have been used.

There is only one hint: 475k tokens in the screenshot when OP asked the model to fix some behaviour, but it would be fascinating to know the total tokens amount.

kgeist10d ago

Judging by the benchmarks on Artificial Analysis, "a very real leap over every model" is 2-3 points over competitors (say, 62 for Fable 5 vs. 59 for ChatGPT 5.5 xhigh for coding).

12345hn678910d ago

The coin flip game does not work. I tossed 2 coins and it broke after that. You cannot progress forward.

Not a great start for "a generational leap in model effectiveness"

ElijahLynn10d ago

Loved the article!

I think the article could be improved by actually sharing more feelings. I clicked on the article for feelings but I didn't see that many feelings described.

root_axis11d ago

I just can't stand this type of fawning language.

ComplexSystems11d ago

Who can afford to use this damn thing though? They're pricing everyone out of the market with stuff like this.

philipswood10d ago

> It also created a 10-page epic rhyming poem about a haircut where every word starts with the letter s

Wow

lominming10d ago

zuzululu11d ago

> First, how good is Fable? In experiment after experiment I conducted, it outperformed basically every other public model I have used by a considerable margin.

What makes me excited is that GPT 5.6 (its actually GPT 6) is going to be crazy

kleiba210d ago

> So I asked Fable to solve the problem, first generating a complex 19 page design document and then executing it.
> It worked for nine and a half hours.

And how much did that cost?

brockVond202110d ago

on the places I've checked, mostly Paris to places in Ireland or Britain, the times are off by an order of magnitude

looks nice but deeply flawed

classic LLM output

queenkjuul10d ago

not only is the site completely unusable on mobile ootb, but when i enable desktop mode on Android, my taps are detected in the wrong spot--clicking Chicago registers as Saskatoon.

At first i thought its routing was just completely botched.

The text overflow on the legend is pretty funny considering how well the other graphics turned out

(Edit: referring to the map app)

honeycrispy11d ago

philipwhiuk10d ago

Given that token counts are easily available not providing how much any of his examples cost is lunacy.

zb311d ago

Was the condition of being granted early access to this castrated model writing a post praising it?

ElijahLynn10d ago

> The work has shifted from process to outcome. I no longer steer; I commission.

PaulHoule11d ago

My wife likes to say "feelings aren't facts"

ThejaCH11d ago

What it feels like to work with Mythos? Feels like am poor

LogicFailsMe11d ago

Edit: A couple hours in and I just got my first gaslighting attempt from the model. Good times!

nickphx10d ago

oh look, more overhyped drivel from a non-technical person.

j / k navigate · click thread line to collapse