GoFetch: New side-channel attack using data memory-dependent prefetchers (opens in new tab)

(gofetch.fail)

297 pointskingsleyopara2y ago92 comments

92 comments

60 comments · 12 top-level

jerf2y ago· 19 in thread

As long as we're getting efficiency cores and such, maybe we need some "crypto cores" added to modern architectures, that make promises specifically related to constant time algorithms like this and promise not to prefetch, branch predict, etc. Sort of like the Itanium, but confined to a "crypto processor". Given how many features these things wouldn't have, they wouldn't be much silicon for the cores themselves, in principle.

This is the sort of thing that would metaphorically drive me to drink if I were implementing crypto code. It's an uphill battle at the best of times, but even if I finally get it all right, there's dozens of processor features both current and future ready to blow my code up at any time.

FiloSottile2y ago

Speaking as a cryptography implementer, yes, these drive us up the wall.

However, crypto coprocessors would be a tremendously disruptive solution: we'd need to build mountains of scaffolding to allow switching to and off these cores, and to share memory with them, etc.

Even more critically, you can't just move the RSA multiplication to those cores and call it a day. The key is probably parsed from somewhere, right? Does the parser need to run on a crypto core? What if it comes over the network? And if you even manage to protect all the keys, what if a CPU side channel leaks the message you encrypted? Are you ok with it just because it's not a key? The only reason we don't see these attacks against non-crypto code is that finding targets is very application specific, while in crypto libraries everyone can agree leaking a key is bad.

No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.

saagarjha2y ago

Processor designers are very unlikely to do that for you, because everyone not working on constant time crypto gives them a whole lot of money to keep doing this. The best you might get is a mode where the set of assumptions they violate is reduced.

olliej2y ago

> No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.

No, you don't get to say processor designers need to stop violating your assumptions. You need to stop making assumptions about behaviour if that behavior is important (for cryptographic or other reasons). Your assumptions being faulty are not a valid justification, because that would mean no one could have ever added any caches or predictors at any point because that would be "violating your assumptions". Also lets be real here: even if "not violating your assumptions" was a reasonable position to take, it is not reasonable in any way to make any kind of assumption about modern processors (<30 years old) processors not caching, predicting, buffering, or speculating anything.

If you care about constant time behaviour you should either be writing your code such that it is timing agnostic, or you could read the platform documentation rather than making assumptions. The apple documentation tells you how to actually get constant time behavior, rather than making assumptions.

FiloSottile2y ago

> you should either be writing your code such that it is timing agnostic, or you could read the platform documentation rather than making assumptions

Have you even read the paper? Especially the part where the attack applies to everyone’s previous idea of “timing agnostic” code, and the part where Apple does not respect the (new) DIT flag on M1/M2?

1 more reply

heresie-dabord2y ago

> processor designers "just" need to stop violating assumptions

"Security" rarely (almost never) seems to be part of any commercially-significant spec.

Almost as if by design...

eximius2y ago

Wouldn't that "just" allow someone to see if a key was present (and any information that informs) but dramatically help prevent secret key extraction?

bee_rider2y ago

I don’t think the security community is also going to become experts in chip design, these are two full skill sets that are already very difficult to obtain.

We must stop running untrustworthy code on modern full-performance chips.

The feedback loop that powers everything is: faster chips allow better engineering and science, creating faster chips. We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.

I mean we’re talking about prefetching. If there was a way to give ram cache-like latencies why wouldn’t the hardware folks already have done it?

titzer2y ago

I almost gave you up an upvote until your third paragraph, but I have to now give a hard disagree. We're running more untrusted code than ever, and we absolutely should trust it less than ever and have hardware and software designed with security in mind. Security should be priority #1 from here on out. We are absolutely awash in performance and memory capacity but keep getting surprised by bad security outcomes because it's been second fiddle for too long.

Software is now critical infrastructure in modern society, akin to the power grid and telephone lines. It's a strategic vulnerability to neglect security, and it must happen at all levels of the software and hardware stack. Meaning, trying to crash an enemy's entire society by bricking all of its computers and send them back to the dark ages in milliseconds. I fundamentally don't understand the mindset of people who want to take that kind of risk for a 10% boost in their games' FPS[1].

Part of that is paying back the debt that decades of cutting corners has yielded us.

In reality, the vast majority of the 1000x increase in performance and memory capacity over the past four decades has come from shrinking transistors and increasing clockspeeds and memory density--the 1 or 5 or 10% gains from turning off bounds checks or prefetching aren't the lion's share. And for the record, turning off bounds checks is monumentally stupid, and people should be jailed for it.

[1] I'm exaggerating to make a point here. What we trade for a little desktop or server performance is an enormous, pervasive risk. Not just melting down in a cyberwar, but the constant barrage of intrusion and leaks that costs the economy billions upon billions of dollars per year. We're paying for security, just at the wrong end.

3 more replies

FiloSottile2y ago

> download random programs onto their computers and run them at random

To be clear that includes what we're all doing by downloading and running Javascript to read HN.

Maybe I can say "don't run adversarial code on my same CPU" and only care about over-the-network CPU side-channels (of which there are still some), because I write Go crypto, but it doesn't sound like something my colleagues writing browser code can do.

3 more replies

tadfisher2y ago

> The feedback loop that powers everything is: faster chips allow better engineering and science, creating faster chips. We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.

Note that in the vast majority of cases, crypto-related code isn't what we spend compute cycles on. If there was a straightforward, cross-architecture mechanism to say, "run this code on a single physical core with no branch prediction, no shared caches, and using in-order execution" then the real-world performance impact would be minimal, but the security benefits would be huge.

1 more reply

sargun2y ago

I think what's more likely is "mode switching" in which you can disable these components of the CPU for a certain section of executing code (the abstraction would probably be at the thread level).

Kluggy2y ago

Isn't that the entire point of the secure enclave[1]?

https://support.apple.com/guide/security/secure-enclave-sec5...

lxgr2y ago

The secure enclave is not a general-purpose/user-programmable processor. It only runs Apple-signed code, and access is only exposed via the Keychain APIs, which only support a very limited set of cryptographic operations.

Presumably latency for any operation is also many orders of magnitude higher than in-thread crypto, so that just doesn't work for many applications.

john_alan2y ago

If you look at the cryptokit API docs the Secure Enclave essentially only supports P-256. Which is maybe why they didn’t include ECC crypto in the examples.

Joel_Mckay2y ago

Encrypted bus mmu have existed since the 1990's.

However, the trend to consumer-grade hardware for cost-optimized cloud architecture ate the CPU market.

Thus, the only real choice now is consumer CPUs even in scaled applications.

gabrielhidasy2y ago

Many modern architectures have crypto extensions, usually to accelerate a few common algorithms, maybe it would be good to add a few crypto-primitives instructions to allow new algorithms?

a-dub2y ago

see DIT and DOIT flags referenced in the paper and in the faq question about mitigations. newer CPUs apparently provide functions to do just that.

bee_rider2y ago

One option would be for people to stop downloading viruses and then running them.

SpaghettiCthulu2y ago

Except when these vulnerabilities are exploitable from JavaScript in your web browser.

martinky242y ago· 12 in thread

Why does every attack needs its own branding, marketing page, etc...? Genuine question.

modeless2y ago

Science isn't just about discovering information. Dissemination is critical. Communicating ideas is just as important as discovering them and promotion is part of effective communication. It's natural and healthy for researchers to promote their ideas.

FiloSottile2y ago

Names are critical to enable discussion.

The "marketing" page is where documentation is. Summaries that don't require reading a whole academic papers are a good thing, and they are the place where all the different links are collected. Same reason software has READMEs.

Logos... are cute and take 10-60 minutes? If you spend months on some research might as well take the satisfaction of giving it a cute logo, why not.

1 more reply

sapiogram2y ago

Well, names are useful for the same reason people's names are useful. The rest just kinda happens naturally, I think.

yborg2y ago

Yes, it saves time vs. starting a discussion on "that crypto cache sidechannel attack that one team in China found".

martinky242y ago

Name makes enough sense. "Branding, marketing page, etc..." was my question.

"Happens naturally" isn't really an answer.

ziddoap2y ago

Is your position that any write-up about an attack must be plain text only, and must not use its own URL?

I truly cannot understand why this is brought up so often. You aren't paying for it, it doesn't hurt you in any way, it detracts nothing from the findings (in fact, it makes the findings easier to discuss), etc. There is no downside I can think of.

Can you share what the downsides of a picture of a puppy and a $5 domain are? Sorry, "branding" and "marketing page"?

Or at least, maybe you can share what you think would be a more preferable way?

12_throw_away2y ago

Dunno, but I'm glad they do it. In other fields of research, researchers often purposely hold off on naming something, so that the community kind of has no choice but to name it after the authors themselves.

Eg in my field, they would have called Spectre "the Horn-Genkin-Hamburg vulnerability" or something. Which one of these is hard-to-remember jargon, and which one is catchy and evocative?

fruktmix2y ago

It's science these days. They need funding, one way is to get people to recognize the importance of their work

xena2y ago

So people talk about it

saagarjha2y ago

Why does the comments of every such attack need a question about why it has its own branding, marketing page, etc…? Genuine question.

(Seriously, this comes up every time, just do a search for it if you actually want to figure out why.)

0xedd2y ago

Because it makes it feel like you need some marketing department if you want to publish your work. Rather than give _only_ the work merit, we give too much merit to its colorful presentation. That shouldn't be the case.

howinteresting2y ago

Good communication has always been a part of making sure your work is influential.

theobservor2y ago· 8 in thread

The end result of these side channel attacks would be to have CPUs that perform no optimizations at all and all opcodes would run in the same number of cycles in all situations. But that will never happen. No one wants a slow CPU.

As long as these effects cannot be exploited remotely, it's not a concern. Of course multi-tenant cloud-based virtualization would be a no go.

bee_rider2y ago

We need to drop all the untrusted code on some horrible in-order, no speculative execution, no prefetching, 5 stage pipeline from architectures 101 class core.

graemep2y ago

It might be preferable.

We have ridiculously fast hardware. In many use cases (client machines in particular) we do not usually really need that. I would gladly drop features for security.

kbolino2y ago

If you account for all of the CPU "features" that can be exploited, you're looking at probably 80% of what makes it "ridiculously fast". If you also account for all of the ways in which the entire modern hardware ecosystem can be exploited, you're probably looking at gross performance loss of over 90% to remove these "holes".

An overclocked 486 PC that can only run a single program at a time and isn't continuously connected to a network might be very secure, but replacing every modern computer with something like it will not be even remotely feasible. In most situations, it would be better to have some risk tolerance, and couple modern hardware with mitigations, disposability, and supply-chain security instead.

bee_rider2y ago

It will also be good because users will become more annoyed when people try to sneak full programs into their websites, hopefully resulting in a generally less bloated internet.

wmf2y ago

If untrusted code includes JavaScript that would make Web apps ridiculously slow. (I know what you're thinking...)

bee_rider2y ago

Oh no, a totally unexpected side effect, less complex webpages.

lenerdenator2y ago

> multi-tenant cloud-based virtualization

And that's why I'm not as worried about this as I was about the same vulnerability in Intel chips a few years ago.

There are a few cloud service providers that will rent you clock cycles on a rack-mounted Mac Mini, but not many, and even then they're for highly-specific workloads or build tasks. I suppose that's a problem for people paying far out the butt for that kind of service, but the vast majority of Apple Silicon devices are never, ever going to host cloud services.

_factor2y ago

This is why high core counts and isolation matter. Isolate the code to a specific core. Assuming everything is working as intended, an exploit won’t compromise other tenants.

saagarjha2y ago· 3 in thread

> Can the DMP be disabled?

> Yes, but only on some processors. We observe that the DIT bit set on m3 CPUs effectively disables the DMP. This is not the case for the m1 and m2.

Surely there is a chicken bit somewhere to do this?

john_alan2y ago

I’ve often wondering how are these bits set?

Like can you do it from Swift? Or need assembly?

saagarjha2y ago

It's probably in a MSR accessible from the kernel?

lxgr2y ago

It seems to be userspace accessible: https://developer.apple.com/documentation/xcode/writing-arm6...

The kernel would have to be aware of it in order to be able to restore its state across context switches though, unless it's part of a set of registers that is automatically persisted. But given that Apple is publicly documenting this flag, I suppose it is.

Here's an interesting conversation by the Go developers from as early as 2021 being suspicious of DIT: https://github.com/golang/go/issues/49702

1 more reply

0xedd2y ago· 3 in thread

Why does Apple have so many hardware backd... innocent bugs?

olliej2y ago

why do we even need caches?

why do we need prefetchers?

But in answer to your bullshit backdoor conspiracy theory (JFC processors have caches and timing variants because people want fast CPUs, you cannot have constant time and fast, apple is not the only company with prefetchers), here's some apple provided documentation on how disable the hardware backd... enable constant time operations specifically for the purpose of cryptography, almost like it's designed into the hardware. So weird. https://developer.apple.com/documentation/xcode/writing-arm6...

howinteresting2y ago

The M1 and M2 don't have that bit.

m0rshu2y ago

Same reason Intel and AMD had Meltdown and Specter.

woadwarrior012y ago· 2 in thread

Reminded me of the Augury attack[1] from 2022, which also exploits the DMP prefetcher on Apple Silicon CPUs.

[1]: https://www.prefetchers.info

Findecanor2y ago

BTW. Three of the authors of GoFetch where also behind Augury.

loeg2y ago

Yes, they specifically mention that in the article and FAQ.

xiconfjs2y ago· 1 in thread

From the paper: "OpenSSL reported that local side-channel attacks (...) fall outside of their threat model. The Go Crypto team considers this attack to be low severity".

lxgr2y ago

At least one Go crypto developer publicly expressed concern about this very issue in 2021: https://github.com/golang/go/issues/49702

john_alan2y ago

On reading it seems a lib like libsodium can simply set the disable bit prior to cryptographic operations that are sensitive on M3 and above.

Also looks like they need to predetermine aspects of the key.

Very cool but I don’t think it looks particularly practical.

olliej2y ago

If you're writing cryptographic routines you should either use the platform cryptography libraries, or follow the documentation:

https://developer.apple.com/documentation/xcode/writing-arm6...

slowmovintarget2y ago

So malware scanning and virus scanners just became relevant for Macs and IPads.

(Compromise must be running on the same hardware.)

Shtirlic2y ago

Is it naive to ask whether implementing this mitigation would impact performance and memory interaction speed?

d-z-m2y ago

what's the attack vector here? access to an encrypt oracle and co-location on the target machine?

j / k navigate · click thread line to collapse

92 comments

60 comments · 12 top-level

jerf2y ago· 19 in thread

FiloSottile2y ago

Speaking as a cryptography implementer, yes, these drive us up the wall.

However, crypto coprocessors would be a tremendously disruptive solution: we'd need to build mountains of scaffolding to allow switching to and off these cores, and to share memory with them, etc.

No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.

saagarjha2y ago

olliej2y ago

> No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.

FiloSottile2y ago

> you should either be writing your code such that it is timing agnostic, or you could read the platform documentation rather than making assumptions

1 more reply

heresie-dabord2y ago

> processor designers "just" need to stop violating assumptions

"Security" rarely (almost never) seems to be part of any commercially-significant spec.

Almost as if by design...

eximius2y ago

Wouldn't that "just" allow someone to see if a key was present (and any information that informs) but dramatically help prevent secret key extraction?

bee_rider2y ago

I don’t think the security community is also going to become experts in chip design, these are two full skill sets that are already very difficult to obtain.

We must stop running untrustworthy code on modern full-performance chips.

I mean we’re talking about prefetching. If there was a way to give ram cache-like latencies why wouldn’t the hardware folks already have done it?

titzer2y ago

Part of that is paying back the debt that decades of cutting corners has yielded us.

3 more replies

FiloSottile2y ago

> download random programs onto their computers and run them at random

To be clear that includes what we're all doing by downloading and running Javascript to read HN.

3 more replies

tadfisher2y ago

1 more reply

sargun2y ago

I think what's more likely is "mode switching" in which you can disable these components of the CPU for a certain section of executing code (the abstraction would probably be at the thread level).

Kluggy2y ago

Isn't that the entire point of the secure enclave[1]?

https://support.apple.com/guide/security/secure-enclave-sec5...

lxgr2y ago

Presumably latency for any operation is also many orders of magnitude higher than in-thread crypto, so that just doesn't work for many applications.

john_alan2y ago

If you look at the cryptokit API docs the Secure Enclave essentially only supports P-256. Which is maybe why they didn’t include ECC crypto in the examples.

Joel_Mckay2y ago

Encrypted bus mmu have existed since the 1990's.

However, the trend to consumer-grade hardware for cost-optimized cloud architecture ate the CPU market.

Thus, the only real choice now is consumer CPUs even in scaled applications.

gabrielhidasy2y ago

Many modern architectures have crypto extensions, usually to accelerate a few common algorithms, maybe it would be good to add a few crypto-primitives instructions to allow new algorithms?

a-dub2y ago

see DIT and DOIT flags referenced in the paper and in the faq question about mitigations. newer CPUs apparently provide functions to do just that.

bee_rider2y ago

One option would be for people to stop downloading viruses and then running them.

SpaghettiCthulu2y ago

Except when these vulnerabilities are exploitable from JavaScript in your web browser.

martinky242y ago· 12 in thread

Why does every attack needs its own branding, marketing page, etc...? Genuine question.

modeless2y ago

FiloSottile2y ago

Names are critical to enable discussion.

Logos... are cute and take 10-60 minutes? If you spend months on some research might as well take the satisfaction of giving it a cute logo, why not.

1 more reply

sapiogram2y ago

Well, names are useful for the same reason people's names are useful. The rest just kinda happens naturally, I think.

yborg2y ago

Yes, it saves time vs. starting a discussion on "that crypto cache sidechannel attack that one team in China found".

martinky242y ago

Name makes enough sense. "Branding, marketing page, etc..." was my question.

"Happens naturally" isn't really an answer.

ziddoap2y ago

Is your position that any write-up about an attack must be plain text only, and must not use its own URL?

Can you share what the downsides of a picture of a puppy and a $5 domain are? Sorry, "branding" and "marketing page"?

Or at least, maybe you can share what you think would be a more preferable way?

12_throw_away2y ago

Eg in my field, they would have called Spectre "the Horn-Genkin-Hamburg vulnerability" or something. Which one of these is hard-to-remember jargon, and which one is catchy and evocative?

fruktmix2y ago

It's science these days. They need funding, one way is to get people to recognize the importance of their work

xena2y ago

So people talk about it

saagarjha2y ago

Why does the comments of every such attack need a question about why it has its own branding, marketing page, etc…? Genuine question.

(Seriously, this comes up every time, just do a search for it if you actually want to figure out why.)

0xedd2y ago

howinteresting2y ago

Good communication has always been a part of making sure your work is influential.

theobservor2y ago· 8 in thread

As long as these effects cannot be exploited remotely, it's not a concern. Of course multi-tenant cloud-based virtualization would be a no go.

bee_rider2y ago

We need to drop all the untrusted code on some horrible in-order, no speculative execution, no prefetching, 5 stage pipeline from architectures 101 class core.

graemep2y ago

It might be preferable.

We have ridiculously fast hardware. In many use cases (client machines in particular) we do not usually really need that. I would gladly drop features for security.

kbolino2y ago

bee_rider2y ago

It will also be good because users will become more annoyed when people try to sneak full programs into their websites, hopefully resulting in a generally less bloated internet.

wmf2y ago

If untrusted code includes JavaScript that would make Web apps ridiculously slow. (I know what you're thinking...)

bee_rider2y ago

Oh no, a totally unexpected side effect, less complex webpages.

lenerdenator2y ago

> multi-tenant cloud-based virtualization

And that's why I'm not as worried about this as I was about the same vulnerability in Intel chips a few years ago.

_factor2y ago

This is why high core counts and isolation matter. Isolate the code to a specific core. Assuming everything is working as intended, an exploit won’t compromise other tenants.

saagarjha2y ago· 3 in thread

> Can the DMP be disabled?

> Yes, but only on some processors. We observe that the DIT bit set on m3 CPUs effectively disables the DMP. This is not the case for the m1 and m2.

Surely there is a chicken bit somewhere to do this?

john_alan2y ago

I’ve often wondering how are these bits set?

Like can you do it from Swift? Or need assembly?

saagarjha2y ago

It's probably in a MSR accessible from the kernel?

lxgr2y ago

It seems to be userspace accessible: https://developer.apple.com/documentation/xcode/writing-arm6...

Here's an interesting conversation by the Go developers from as early as 2021 being suspicious of DIT: https://github.com/golang/go/issues/49702

1 more reply

0xedd2y ago· 3 in thread

Why does Apple have so many hardware backd... innocent bugs?

olliej2y ago

why do we even need caches?

why do we need prefetchers?

howinteresting2y ago

The M1 and M2 don't have that bit.

m0rshu2y ago

Same reason Intel and AMD had Meltdown and Specter.

woadwarrior012y ago· 2 in thread

Reminded me of the Augury attack[1] from 2022, which also exploits the DMP prefetcher on Apple Silicon CPUs.

[1]: https://www.prefetchers.info

Findecanor2y ago

BTW. Three of the authors of GoFetch where also behind Augury.

loeg2y ago

Yes, they specifically mention that in the article and FAQ.

xiconfjs2y ago· 1 in thread

From the paper: "OpenSSL reported that local side-channel attacks (...) fall outside of their threat model. The Go Crypto team considers this attack to be low severity".

lxgr2y ago

At least one Go crypto developer publicly expressed concern about this very issue in 2021: https://github.com/golang/go/issues/49702

john_alan2y ago

On reading it seems a lib like libsodium can simply set the disable bit prior to cryptographic operations that are sensitive on M3 and above.

Also looks like they need to predetermine aspects of the key.

Very cool but I don’t think it looks particularly practical.

olliej2y ago

If you're writing cryptographic routines you should either use the platform cryptography libraries, or follow the documentation:

https://developer.apple.com/documentation/xcode/writing-arm6...

slowmovintarget2y ago

So malware scanning and virus scanners just became relevant for Macs and IPads.

(Compromise must be running on the same hardware.)

Shtirlic2y ago

Is it naive to ask whether implementing this mitigation would impact performance and memory interaction speed?

d-z-m2y ago

what's the attack vector here? access to an encrypt oracle and co-location on the target machine?

j / k navigate · click thread line to collapse