What harness are you using? Some of them (e.g. OpenCode) mutate the system prompt every turn, and therefore can't work with a KV cache.
I've had the best luck with Pi so far, but it comes without some bells and whistles you might be used to (e.g. plan mode, subagents, MCP client support)