This is the sort of thing that would metaphorically drive me to drink if I were implementing crypto code. It's an uphill battle at the best of times, but even if I finally get it all right, there's dozens of processor features both current and future ready to blow my code up at any time.
However, crypto coprocessors would be a tremendously disruptive solution: we'd need to build mountains of scaffolding to allow switching to and off these cores, and to share memory with them, etc.
Even more critically, you can't just move the RSA multiplication to those cores and call it a day. The key is probably parsed from somewhere, right? Does the parser need to run on a crypto core? What if it comes over the network? And if you even manage to protect all the keys, what if a CPU side channel leaks the message you encrypted? Are you ok with it just because it's not a key? The only reason we don't see these attacks against non-crypto code is that finding targets is very application specific, while in crypto libraries everyone can agree leaking a key is bad.
No, processor designers "just" need to stop violating assumptions, or at least talk to us before doing it.
No, you don't get to say processor designers need to stop violating your assumptions. You need to stop making assumptions about behaviour if that behavior is important (for cryptographic or other reasons). Your assumptions being faulty are not a valid justification, because that would mean no one could have ever added any caches or predictors at any point because that would be "violating your assumptions". Also lets be real here: even if "not violating your assumptions" was a reasonable position to take, it is not reasonable in any way to make any kind of assumption about modern processors (<30 years old) processors not caching, predicting, buffering, or speculating anything.
If you care about constant time behaviour you should either be writing your code such that it is timing agnostic, or you could read the platform documentation rather than making assumptions. The apple documentation tells you how to actually get constant time behavior, rather than making assumptions.
Have you even read the paper? Especially the part where the attack applies to everyone’s previous idea of “timing agnostic” code, and the part where Apple does not respect the (new) DIT flag on M1/M2?
"Security" rarely (almost never) seems to be part of any commercially-significant spec.
Almost as if by design...
We must stop running untrustworthy code on modern full-performance chips.
The feedback loop that powers everything is: faster chips allow better engineering and science, creating faster chips. We’re not inserting the security community into that loop and slowing things down just so people can download random programs onto their computers and run them at random. That’s just a stupid thing to do, there’s no way to make it safe, and there never will be.
I mean we’re talking about prefetching. If there was a way to give ram cache-like latencies why wouldn’t the hardware folks already have done it?
Software is now critical infrastructure in modern society, akin to the power grid and telephone lines. It's a strategic vulnerability to neglect security, and it must happen at all levels of the software and hardware stack. Meaning, trying to crash an enemy's entire society by bricking all of its computers and send them back to the dark ages in milliseconds. I fundamentally don't understand the mindset of people who want to take that kind of risk for a 10% boost in their games' FPS[1].
Part of that is paying back the debt that decades of cutting corners has yielded us.
In reality, the vast majority of the 1000x increase in performance and memory capacity over the past four decades has come from shrinking transistors and increasing clockspeeds and memory density--the 1 or 5 or 10% gains from turning off bounds checks or prefetching aren't the lion's share. And for the record, turning off bounds checks is monumentally stupid, and people should be jailed for it.
[1] I'm exaggerating to make a point here. What we trade for a little desktop or server performance is an enormous, pervasive risk. Not just melting down in a cyberwar, but the constant barrage of intrusion and leaks that costs the economy billions upon billions of dollars per year. We're paying for security, just at the wrong end.
To be clear that includes what we're all doing by downloading and running Javascript to read HN.
Maybe I can say "don't run adversarial code on my same CPU" and only care about over-the-network CPU side-channels (of which there are still some), because I write Go crypto, but it doesn't sound like something my colleagues writing browser code can do.
Note that in the vast majority of cases, crypto-related code isn't what we spend compute cycles on. If there was a straightforward, cross-architecture mechanism to say, "run this code on a single physical core with no branch prediction, no shared caches, and using in-order execution" then the real-world performance impact would be minimal, but the security benefits would be huge.
https://support.apple.com/guide/security/secure-enclave-sec5...
Presumably latency for any operation is also many orders of magnitude higher than in-thread crypto, so that just doesn't work for many applications.
However, the trend to consumer-grade hardware for cost-optimized cloud architecture ate the CPU market.
Thus, the only real choice now is consumer CPUs even in scaled applications.
The "marketing" page is where documentation is. Summaries that don't require reading a whole academic papers are a good thing, and they are the place where all the different links are collected. Same reason software has READMEs.
Logos... are cute and take 10-60 minutes? If you spend months on some research might as well take the satisfaction of giving it a cute logo, why not.
"Happens naturally" isn't really an answer.
I truly cannot understand why this is brought up so often. You aren't paying for it, it doesn't hurt you in any way, it detracts nothing from the findings (in fact, it makes the findings easier to discuss), etc. There is no downside I can think of.
Can you share what the downsides of a picture of a puppy and a $5 domain are? Sorry, "branding" and "marketing page"?
Or at least, maybe you can share what you think would be a more preferable way?
Eg in my field, they would have called Spectre "the Horn-Genkin-Hamburg vulnerability" or something. Which one of these is hard-to-remember jargon, and which one is catchy and evocative?
(Seriously, this comes up every time, just do a search for it if you actually want to figure out why.)
As long as these effects cannot be exploited remotely, it's not a concern. Of course multi-tenant cloud-based virtualization would be a no go.
We have ridiculously fast hardware. In many use cases (client machines in particular) we do not usually really need that. I would gladly drop features for security.
An overclocked 486 PC that can only run a single program at a time and isn't continuously connected to a network might be very secure, but replacing every modern computer with something like it will not be even remotely feasible. In most situations, it would be better to have some risk tolerance, and couple modern hardware with mitigations, disposability, and supply-chain security instead.
And that's why I'm not as worried about this as I was about the same vulnerability in Intel chips a few years ago.
There are a few cloud service providers that will rent you clock cycles on a rack-mounted Mac Mini, but not many, and even then they're for highly-specific workloads or build tasks. I suppose that's a problem for people paying far out the butt for that kind of service, but the vast majority of Apple Silicon devices are never, ever going to host cloud services.
> Yes, but only on some processors. We observe that the DIT bit set on m3 CPUs effectively disables the DMP. This is not the case for the m1 and m2.
Surely there is a chicken bit somewhere to do this?
Like can you do it from Swift? Or need assembly?
The kernel would have to be aware of it in order to be able to restore its state across context switches though, unless it's part of a set of registers that is automatically persisted. But given that Apple is publicly documenting this flag, I suppose it is.
Here's an interesting conversation by the Go developers from as early as 2021 being suspicious of DIT: https://github.com/golang/go/issues/49702
why do we need prefetchers?
But in answer to your bullshit backdoor conspiracy theory (JFC processors have caches and timing variants because people want fast CPUs, you cannot have constant time and fast, apple is not the only company with prefetchers), here's some apple provided documentation on how disable the hardware backd... enable constant time operations specifically for the purpose of cryptography, almost like it's designed into the hardware. So weird. https://developer.apple.com/documentation/xcode/writing-arm6...
Also looks like they need to predetermine aspects of the key.
Very cool but I don’t think it looks particularly practical.
https://developer.apple.com/documentation/xcode/writing-arm6...
(Compromise must be running on the same hardware.)