undefined | Better HN

0 pointssamwho6mo ago0 comments

The only thing that comes to mind is some kind of timing attack. Send loads of requests specific to a company you’re trying to spy on and if it comes back cached you know someone has sent that prompt recently. Expensive attack, though, with a large search space.

0 comments

9 comments · 2 top-level

gwern6mo ago· 4 in thread

No, the search space is tiny: you can just attack 1 BPE at a time! Stuff like password guessing is almost trivial when you get to do a timing attack on each successive character. So that lets you quickly exfiltrate arbitrary numbers of prompts, especially if you have any idea what you are looking for. (Note that a lot of prompts are already public information, or you can already exfiltrate prompts quite easily from services and start attacking from there...)

reitzensteinm6mo ago

Hill climbing a password would only be possible if intermediate KV cache entries were stored. To hillclimb "hunter2", you're going to try "a", "b", "c", etc, until you notice that "h" comes back faster. Then you try "ha", "hb" and so on.

But that's only going to work if the cache looks like: "h", "hu", "hun", ..., "hunter2"

If just "hunter2" is in the cache, you won't get any signal until you stumble on exactly that password. And that's before getting into the block size granularity of the caches discussed elsewhere in this thread.

That's not to say timing attacks aren't possible. I haven't looked at Claude Code's prompt generation, but there's no intrinsic reason why you couldn't do things like figure out what open source code and research papers your competitors are loading into context.

Sharing caches between orgs would be an incredible misstep.

jgeralnik6mo ago

Right, you can’t actually guess a letter (byte) at a time but you can guess a token at a time (I believe the vocabulary is 200000 possible tokens in gpt 5) So you could send each of the 200000 possible tokens, see which is cached, and then send 200000 more tokens to find the next cached token Certainly less efficient but well within the realm of a feasible attack

1 more reply

IanCal6mo ago

Do any providers do this level of granularity? Anthropic require explicit cache markers, for example.

jgeralnik6mo ago

Anthropic requires explicit cache markers but will “look backwards” some amount, so you don’t need to fall on the exact split to get cached tokens

gunalx6mo ago· 3 in thread

I habe come across turning on caching means the llm has a faint memory of what was in the cache, even to unrelated queries. If this is the case its fully unreasonable to share the cache, because of possibility of information leakage.

weird-eye-issue6mo ago

This is absolutely 100% incorrect.

samwhoOP6mo ago

How would information leak, though? There’s no difference in the probability distribution the model outputs when caching vs not caching.

sroussey6mo ago

the probability distribution the model outputs is identical under identical conditions.

A local model running alone on your machine will 100% always return the exact same thing and the internal state will be exactly the same and you can checkpoint or cache that to avoid rerunning to that point.

But… conditions can be different, and batching requests tends to affect other items in flight. I believe Thinking Machines had an article about how to make a request deterministic again without performance going to complete crap.

I tend to think of things this way (completely not what happens though): what if you were to cache based on a tensor as the key? To generate a reasonably sized key what is an acceptable loss of precision to retrieve the same cache knowing that there is inherent jitter in the numbers of the tensor?

And then the ever so slight leak of information. But also multiplied since there are internal kv caches for tokens and blah blah blah.

j / k navigate · click thread line to collapse

0 comments

9 comments · 2 top-level

gwern6mo ago· 4 in thread

reitzensteinm6mo ago

But that's only going to work if the cache looks like: "h", "hu", "hun", ..., "hunter2"

Sharing caches between orgs would be an incredible misstep.

jgeralnik6mo ago

1 more reply

IanCal6mo ago

Do any providers do this level of granularity? Anthropic require explicit cache markers, for example.

jgeralnik6mo ago

Anthropic requires explicit cache markers but will “look backwards” some amount, so you don’t need to fall on the exact split to get cached tokens

gunalx6mo ago· 3 in thread

weird-eye-issue6mo ago

This is absolutely 100% incorrect.

samwhoOP6mo ago

How would information leak, though? There’s no difference in the probability distribution the model outputs when caching vs not caching.

sroussey6mo ago

the probability distribution the model outputs is identical under identical conditions.

And then the ever so slight leak of information. But also multiplied since there are internal kv caches for tokens and blah blah blah.

j / k navigate · click thread line to collapse