reviews, Codex builds (opens in new tab)

(github.com)

104 pointsDanMcInerney14d ago41 comments

41 comments

34 comments · 17 top-level

felixgallo14d ago· 4 in thread

Fable will do this itself, by spawning Opus/Sonnet subagents to do easy work.

RazerWazer14d ago

GPT 5.5 xhigh is better than Opus and Sonnet.

timcobb14d ago

Not in my subjective experience sadly

sosodev14d ago

I don’t know why you’re getting downvoted. It’s true. Averaged across a wide variety of benchmarks Fable is the only Anthropic model that performs better than GPT 5.5 xhigh.

1 more reply

apsurd14d ago

/advisor has been really good experience for me especially with having only a Pro plan.

I exclusively use sonnet and advisor is basically “hey opus chime in on my approach”. been working great as far as i can tell.

colechristensen14d ago· 4 in thread

Last night I switched back to Codex for a minute having burned through my tokens for the week with Fable and oh boy I had a terrible experience. Running in circles over simple problems (which I ended up solving myself, like a peasant) and running "terraform apply" several times despite several instructions all over the place to never do that. The performance difference was stark.

malshe14d ago

I had a similar experience. So far Fable has been a game changer, at least for the work I used it for. Having said that, I think its writing is definitely worse than GPT 5.5. Ethan Mollick also observed the same. He called it more "Claudy." It generates worse academic prose than other frontier models.

colechristensen14d ago

I think the claude code harness made up a significant part of the improvements co-released with Fable, the nested agent capabilities seem to be much better even with opus (which I guess we're stuck with for a while).

1 more reply

nsingh214d ago

Could you provide some details, if possible, like what model & thinking effort, what kinds of tasks? I used to swap between Claude Code and Codex often, and these days use Codex more because of the usage limits. Wondering if I should go to Claude for a month, I get a strange FOMO when I read vague comments like this.

The one major difference I noticed is that the GPT models are more analytical (e.g. better at mathematical analysis, code review) vs Claude models tend to write more straight forward code. Besides that I don't really see any significant differences.

There are a few gotchas with swapping, like being careful with AGENTS.md/CLAUDE.md naming (Claude Code only recognizes CLAUDE.md, and I think Codex only works with AGENTS.md), and updating skill files to match the tool.

colechristensen14d ago

I just symlink AGENTS.md and CLAUDE.md

I was using gpt-5.5 high. Writing terraform code for GCP, debugging app launch and Dockerfile issues, that sort of thing. It was going in loops hallucinating features of GCP, looking things up in strange ways, running terraform apply after being explicitly told in the last interaction not to, and overall not solving problems. These were very straightforward tasks and it couldn't be trusted for five minutes. It's the difference in what I would trust an early senior engineer to do vs what I would trust an unreliable high school intern to do.

mpalmer14d ago· 3 in thread

Reduce Fable tokens by 80%, simply by not using it!

> I am fairly convinced this is the shape serious agent work keeps converging toward.

"this" being "plan with expensive model, implement with cheap model".

Anyone who follows HN would be hard-pressed to disagree; this architecture is re-invented twice monthly.

https://www.facebook.com/groups/vibecodinglife/posts/1946207... https://github.com/openai/codex/discussions/10628 https://build5nines.com/stop-burning-premium-requests-how-to...

> Not because it is aesthetically pleasing. Because every other shape eventually runs into the same boring failures: context rot, self-grading, goalpost drift, and merge chaos.

Actual failure isn't boring. But struggling through a generated software project that celebrates its own genius and doesn't have a single self-critical or genuinely reflective thing to say...at least watching paint dry I might get giddy off the fumes.

I'm not interested in critiquing the project itself, either, you'll just run that through a model, too.

seaal14d ago

>https://www.facebook.com/groups/vibecodinglife/posts/1946207...

wow linking a facebook groups post might actually be worse than x, is there an xcancel alternative for facebook?

DanMcInerneyOP14d ago

I don't disagree with any of this. It is generated software, and it's not a novel idea. I didn't mean for it to come off like that. It's just solving an itch that I couldn't find a solution to and I'm getting a lot of personal utility out of it. I do have a lot of experience with agentic memory, multi-agent systems and harnesses and wasn't super impressed by the workflow of Fable calling opus subagents so I figured I'd apply best practices to what already exists to make it a teensy bit better and easier to use.

mpalmer13d ago

Cheers. Absent explanation, I do think it's reasonable to assume that you stood by the wording/claims of the README when you posted it, but I appreciate the patch you made to the docs.

FWIW, re: best practices, your install script potentially runs `rm -rf` on the user's global skills whose names shadow your project's.

Denvercoder914d ago· 2 in thread

DESIGN.md:

> Each rule below is enforced mechanically by the skill, not left to vibes.

> R1. Repo docs are the memory; not in HANDOFF.md = didn't happen

SKILL.md:

> Not in docs/HANDOFF.md = didn't happen. Refuse to judge results that exist only in conversation or builder chat output.

"Mechnical enforcement" just means "prompting the LLM a bit extra" these days? It (still) amazes me how much effort and tokens we expend on what could and should be a two line script...

everforward14d ago

Agents are in a wacky state, which makes projects like this fall into a weird spot. Eg I vaguely expect my agent to do two disparate things: manage dependency injection for tools, prompt modifications, etc, but also be the sort of “brain trust” that controls the flow of execution (can we stop now, do we keep going, etc).

This project is meant to be the latter, but there’s not a clean way to integrate that into Claude Code or Codex because they expect to do both.

Pi can do it, but then your users can’t use their Claude subscriptions, so you have to cludgily try to do the same thing via LLM prompts.

nostrebored14d ago

But why does your agent control doneness? It seems to me the most odd part to delegate. All LLMs are terrible at it. Most LLM tasks can be expressed as a DAG or DAG of DAGs. Why delegate that to a random point in context instead of enforcing the flow?

1 more reply

Retr0id14d ago· 2 in thread

> freezes the gates

LLM-written readmes love to use inscrutable jargon that means nothing outside of the context window that birthed it.

nostrebored14d ago

LLMs are obsessed with “gates”. Freezing the gates here is intuitive to me as this point — don’t let validation drift.

Retr0id14d ago

"drift" is another one!

1 more reply

avaer14d ago· 1 in thread

Reducing token usage is this year's "one weird trick". It doesn't make sense on the face of it.

Even if one discovered something that millions (billions?) of dollars of AI compute and the best statisticians in the world was not able to find via exhaustive research, domain search and training... what do you think are the chances this won't be folded into the next update of every model, making the rigmarole moot?

Extraordinary claims require extraordinary evidence and technology-shattering innovations in AI are not know to come from a markdown.

apsurd14d ago

incentives aren’t aligned

analogpixel14d ago· 1 in thread

I know how to reduce Fable tokens by 100% ; https://www.anthropic.com/news/fable-mythos-access

testfrequency14d ago

I ran this and seem to have good results with a 100% reduction also: curl -fsSL https://chatgpt.com/codex/install.sh | sh

Teknomadix14d ago

US Govt reduces Fable Tokens by 100%.

rockwotj14d ago

I actually just started doing this by having Fable roleplay as Jeff Dean and to use Codex as Sanjay driving the implementation and have them go back and forth. Works really well and it’s cool to see AI pair program

phpp11d ago

@DanMcInerney Thank you for sharing this! Using a larger model for planning and a cheaper, smaller model for execution is a smart way to save tokens and seems like the way to go in general.

I wanted to see what would happen if Claude delegated work to pi wiht a model like Deepseek, so I forked your repo and tried it out. It's working really well so far. https://github.com/pcomans/architect-loop-pi

corvad14d ago

Who's gonna tell them...

cohix14d ago

I do exactly this with awman workflows: https://github.com/prettysmartdev/awman/blob/main/docs/05-wo...

You can use any agent and/or model for each step and share context between them.

diavelguru14d ago

yes I'm using Fable to inspect, generate plan and architectural docs then using Gemini to implement then have Fable review, find bugs. saving lots of usage.

hmokiguess14d ago

I guess that didn’t age well

aetherspawn14d ago

Fool me once. Fool me twice. Fool me thirty three times and here we are trying lucky number 34.

DanMcInerneyOP14d ago

ANNNNNND it's gone. Guys, I found a way to reduce Fable token usage 100%. You can find it here: github.com/USGov/idiotic-overreach.

Uptrenda14d ago

Reduce fable token usage even more by not using it. What a clever idea, op! Wow.

j / k navigate · click thread line to collapse

41 comments

34 comments · 17 top-level

felixgallo14d ago· 4 in thread

Fable will do this itself, by spawning Opus/Sonnet subagents to do easy work.

RazerWazer14d ago

GPT 5.5 xhigh is better than Opus and Sonnet.

timcobb14d ago

Not in my subjective experience sadly

sosodev14d ago

I don’t know why you’re getting downvoted. It’s true. Averaged across a wide variety of benchmarks Fable is the only Anthropic model that performs better than GPT 5.5 xhigh.

1 more reply

apsurd14d ago

/advisor has been really good experience for me especially with having only a Pro plan.

I exclusively use sonnet and advisor is basically “hey opus chime in on my approach”. been working great as far as i can tell.

colechristensen14d ago· 4 in thread

malshe14d ago

colechristensen14d ago

1 more reply

nsingh214d ago

colechristensen14d ago

I just symlink AGENTS.md and CLAUDE.md

mpalmer14d ago· 3 in thread

Reduce Fable tokens by 80%, simply by not using it!

> I am fairly convinced this is the shape serious agent work keeps converging toward.

"this" being "plan with expensive model, implement with cheap model".

Anyone who follows HN would be hard-pressed to disagree; this architecture is re-invented twice monthly.

https://www.facebook.com/groups/vibecodinglife/posts/1946207... https://github.com/openai/codex/discussions/10628 https://build5nines.com/stop-burning-premium-requests-how-to...

> Not because it is aesthetically pleasing. Because every other shape eventually runs into the same boring failures: context rot, self-grading, goalpost drift, and merge chaos.

I'm not interested in critiquing the project itself, either, you'll just run that through a model, too.

seaal14d ago

>https://www.facebook.com/groups/vibecodinglife/posts/1946207...

wow linking a facebook groups post might actually be worse than x, is there an xcancel alternative for facebook?

DanMcInerneyOP14d ago

mpalmer13d ago

Cheers. Absent explanation, I do think it's reasonable to assume that you stood by the wording/claims of the README when you posted it, but I appreciate the patch you made to the docs.

FWIW, re: best practices, your install script potentially runs `rm -rf` on the user's global skills whose names shadow your project's.

Denvercoder914d ago· 2 in thread

DESIGN.md:

> Each rule below is enforced mechanically by the skill, not left to vibes.

> R1. Repo docs are the memory; not in HANDOFF.md = didn't happen

SKILL.md:

> Not in docs/HANDOFF.md = didn't happen. Refuse to judge results that exist only in conversation or builder chat output.

"Mechnical enforcement" just means "prompting the LLM a bit extra" these days? It (still) amazes me how much effort and tokens we expend on what could and should be a two line script...

everforward14d ago

This project is meant to be the latter, but there’s not a clean way to integrate that into Claude Code or Codex because they expect to do both.

Pi can do it, but then your users can’t use their Claude subscriptions, so you have to cludgily try to do the same thing via LLM prompts.

nostrebored14d ago

1 more reply

Retr0id14d ago· 2 in thread

> freezes the gates

LLM-written readmes love to use inscrutable jargon that means nothing outside of the context window that birthed it.

nostrebored14d ago

LLMs are obsessed with “gates”. Freezing the gates here is intuitive to me as this point — don’t let validation drift.

Retr0id14d ago

"drift" is another one!

1 more reply

avaer14d ago· 1 in thread

Reducing token usage is this year's "one weird trick". It doesn't make sense on the face of it.

Extraordinary claims require extraordinary evidence and technology-shattering innovations in AI are not know to come from a markdown.

apsurd14d ago

incentives aren’t aligned

analogpixel14d ago· 1 in thread

I know how to reduce Fable tokens by 100% ; https://www.anthropic.com/news/fable-mythos-access

testfrequency14d ago

I ran this and seem to have good results with a 100% reduction also: curl -fsSL https://chatgpt.com/codex/install.sh | sh

Teknomadix14d ago

US Govt reduces Fable Tokens by 100%.

rockwotj14d ago

phpp11d ago

@DanMcInerney Thank you for sharing this! Using a larger model for planning and a cheaper, smaller model for execution is a smart way to save tokens and seems like the way to go in general.

corvad14d ago

Who's gonna tell them...

cohix14d ago

I do exactly this with awman workflows: https://github.com/prettysmartdev/awman/blob/main/docs/05-wo...

You can use any agent and/or model for each step and share context between them.

diavelguru14d ago

yes I'm using Fable to inspect, generate plan and architectural docs then using Gemini to implement then have Fable review, find bugs. saving lots of usage.

hmokiguess14d ago

I guess that didn’t age well

aetherspawn14d ago

Fool me once. Fool me twice. Fool me thirty three times and here we are trying lucky number 34.

DanMcInerneyOP14d ago

ANNNNNND it's gone. Guys, I found a way to reduce Fable token usage 100%. You can find it here: github.com/USGov/idiotic-overreach.

Uptrenda14d ago

Reduce fable token usage even more by not using it. What a clever idea, op! Wow.

j / k navigate · click thread line to collapse