With how much vendor harnesses are now actively steering the agent with their own instructions on top of user prompts, I think it’d be super interesting to see a comparison of one of the already tested models - so Opus 4.7 or GPT-5.5 - across a range of different harnesses that aren’t their native. OpenCode, Pi, Hermes, Kilo Code. The most popular coding-focused harnesses, basically.