But that's only going to work if the cache looks like: "h", "hu", "hun", ..., "hunter2"
If just "hunter2" is in the cache, you won't get any signal until you stumble on exactly that password. And that's before getting into the block size granularity of the caches discussed elsewhere in this thread.
That's not to say timing attacks aren't possible. I haven't looked at Claude Code's prompt generation, but there's no intrinsic reason why you couldn't do things like figure out what open source code and research papers your competitors are loading into context.
Sharing caches between orgs would be an incredible misstep.
A local model running alone on your machine will 100% always return the exact same thing and the internal state will be exactly the same and you can checkpoint or cache that to avoid rerunning to that point.
But… conditions can be different, and batching requests tends to affect other items in flight. I believe Thinking Machines had an article about how to make a request deterministic again without performance going to complete crap.
I tend to think of things this way (completely not what happens though): what if you were to cache based on a tensor as the key? To generate a reasonably sized key what is an acceptable loss of precision to retrieve the same cache knowing that there is inherent jitter in the numbers of the tensor?
And then the ever so slight leak of information. But also multiplied since there are internal kv caches for tokens and blah blah blah.