> While the PoC demonstrates the JavaScript Spectre attack against Chrome 88's V8 JavaScript engine on an Intel Core i7-6500U 'Skylake' CPU on Linux, Google notes it can easily be tweaked for other CPUs... It was even successful on Apple's M1 Arm CPU...
And Augury [1] in 2022 also affected Apple's A14 and M1 chips.
So have Apple been attempting to mitigate and failing, or ignoring the issue?
Surely chip manufactures can't keep ignoring these fundamental flaws
[0] https://security.googleblog.com/2021/03/a-spectre-proof-of-c...
The reason this keeps coming up is that it isn’t a single issue but a class of attacks exploiting performance features, and attackers are getting more sophisticated as smart people figure out new techniques. Chip designers have been adjusting but are trying not to throw out the last couple decades of performance improvements, too.
I'm not sure, but after a bit more reading, it sounds like private-keys or symmetric-keys can be extracted from other user-space or possibly kernel-space code execution. And NOT from the secure enclave.
Just for what it's worth.
So for a) it might be sufficient to run javascript and for b) of course there are ways to inject data into server processes, processing data submitted by clients is what servers are for.
But a happens on clients (web browsers) and b would be a way to extract encryption keys from servers. But in what case can an attacker run code on a machine where they can also trigger the encryption (constantly for an hour like in the demonstration)? The only thing that comes to my mind would be a server side code-execution-sandbox that runs SSL termination on the same machine.
edit: Maybe stealing client certificate keys?
> The cryptographic key itself isn’t placed in cache. But bits of material derived from the key gets placed in the cache, and an attacker can piece these bits together in a way that allows them to reconstruct the key, after causing the processor to do this multiple times. The researchers were able to derive the key for four different cryptographic algorithms: Go, OpenSSL, CRYSTALS-Kyber and CRYSTALS-Dilithium.
> [Green] notes that in theory this attack might be used to break the TLS cryptography that a computer’s browser uses to encrypt communication between their computer and web sites, which could allow attackers to decrypt that communication to extract a user’s session cookie for their Gmail or other web-based email account and use it to log into the account as them.
This is probably worse given people were trying to experiment with local LLMs on CPU. Its not like they even offer Nvidia.
The mitigation for the issue was posted in coordination with the publishing of the vulnerability. Given that the mitigation only applies to the M3 processor, it's reasonable to assume that there is no currently known mitigation for the M1 and M2 processors.
Discussion on the actual vulnerability post: https://news.ycombinator.com/item?id=39779195
For example the vaporware Mill architecture is in-order on the CPU level but compilers can optimize to run code very concurrently.
There are a bunch of second hand cars below 6000 Singapore Dollars on this website[1], which is the price of the 64GB/1TB Mac Studio[2].
1 - https://www.sgcarmart.com/used_cars/listing.php?MOD=&PRC=18&...
Found this on the exploit site.
But it also says: "We observe that the DIT bit set on m3 CPUs effectively disables the DMP. This is not the case for the m1 and m2."
So if libraries set the DIT bit then M3 should be safe.
Are we nearing any sort of consensus that any form of speculation is bad? Is there a fundamentally secure way to do it?
I think for "normal" users the main risk is JavaScript, which can (kind of) be mitigated in software without affecting the rest of the system, so no one really cares about these attacks. But the fundamental abstraction leak between physics and programming will always be there.
Examples (I'm running Debian)
The kernel, the X11 server, terminal, ssh, bash, anything coming from the official Debian repos including the password manager: in the trusted environment.
browsers, electron apps, anything installed from unofficial repos or language and package managers (npm, rvm, asdf, etc): in the untrusted environment.
It reminds me of mainframes and their redundant and compartmentalized hardware architecture.
> X11 server
Those can very easily execute untrusted code.
The risk here is that there are more individuals with the skills to take this type of attack and bring it to a browser near you.
One apps data is another apps code.
I would like to salute my shitty ISP for keeping me safe from timing attacks using their unreliable network infrastructure.
At the hardware level any optimizations that change performance characteristics locally (how long the crypto operation directly takes) or non locally (in this case the secrets leak via observation of cache timings in the attacker's untrusted code) are unsafe.
Intel DMPs already have a flag to turn off the same behavior that was exploited on the M1/M2. Which may suggest that the risk of this type of optimization was understood previously.
Mixing crypto operations with general purpose computation and memory accesses is a fragile balance. Where possible try utilizing HSMs, yubikeys, secure enclaves - any specialized hardware that has been hardened to protect key material.
Are there any circumstances where this hardware is accessible in the browser? As I understand, it is not generally available (if at all) for any cryptography you might want to do in the browser.
One thing to remember here, though, is that there isn’t anything special about key material in this attack other than it being a high-value target. If we move all crypto to purpose-made hardware, someone could just start trying to target the messages to/from the crypto system.
Why only cryptographic applications? What if I'm writing a very sensitive e-mail, for instance?
In contrast, the timing of virtually any email operation is not dependent on the contents of the email, other than the size. That is, whether you wrote "my password is hunter2" or "my password is passwor", the timing of any operation running on this email will be identical.
I’ve been using the TPM 2.0 chip on my ASUS based Linux box to store various keys. Tooling for this on the Linux side has improved significantly [0] and it’s been supported since kernel 3.20 (2015) [1].
How effective this is at improving one’s security posture is another question and it’s probably not a huge security upgrade, but it does mitigate some classes of attack.
I’m curious why you’re saying it’s explicitly not allowed? At least for standard TPM 1.2/2.0 chips, that isn’t the case.
- [0] https://wiki.archlinux.org/title/Trusted_Platform_Module
- [1] https://www.phoronix.com/news/Linux-3.20-TPM-2.0-Security
https://developer.apple.com/documentation/security/certifica...
https://developer.android.com/privacy-and-security/keystore#...
But I agree in general with your point
The questions is what could be the solution going forward, which is going to be a huge change anyway. I do not see a way out of this with our current architectures.
Lets assume I multiply two 64bit numbers. The CPU could just do it the same way every time and the worst-case has 4 cycles latency. It may also track if one of the factors is zero and dynamically replace the multiplication with a zeroing idiom that "executes" in 0 cycles when the scheduler learns that that either input is zero as an extreme example.
Less radical it could track if the upper halves of registers are zero to fast-path smaller multiplications (e.g. 32bit x 32bit -> 64bit) and shave off a cycle. IIRC some PowerPC chips did that, but for the second argument only. The ISA allowed it.
A realistic example are CPUs with data-dependent latency shift/rotate instructions. What do you do if an ISA doesn't specify if shift/rotate is constant time, but every implementation of it so far did it in constant time? Do you slowly emulate it out of paranoia that a future implementation may have variable latency? An other real-world example of this would be FPUs that have higher latency for denormalised numbers its just not relevant to (most) cryptographic algorithms.
How the fuck are you supposed to build anything secure, useful, and fast enough from that?
EDIT: The downvotes make no sense. What this bug has in common with Spectre is that it has to do with cache timing. But in Spectre, the cache is affected by speculative execution; with "GoFetch", it's the pre-fetcher pre-fetching things which look like memory addresses. Pre-fetching is not speculative execution.
But if they did mean to include pre-fetching in "speculation" then I retract my comment
Plus all the secrets in the Secure Enclave are immune to this attack, so your FileVault keys and your Apple Pay cards and all that jazz are completely safe.
It sucks that it exists, and crypto libraries that run on the platform outside of the Secure Enclave will get slightly slower, but no one will notice.
People said the _exact_ same thing about Spectre/Meltdown. Then the JS PoCs came out
The angle I’d worry about is something like a password manager, but most of those already have an authentication step and I’d be surprised if they didn’t have rate-limiting.
These vulnerabilities definitely exist, that much is a matter of fact. But whether it's something someone should consider in their threat model is a different matter.
What does this mean? All I read is access to user space. Wouldn’t any web browser be enough?
I may be misunderstanding what you intended but how do you use traditional means of payment (credit card) without an identity? How do you check your email without identity?
If I add a custom crypto extension to a RISC-V core and implement it badly, is that the fault of RISC-V? No! It's my own. And RISC-V doesn't help anyone here because their license allows me to keep my extension completely closed source - no different than Apple is today with ARM.
Complexity leads to bugs, some of which are going to be security bugs.
ISAs impose complexity upon implementations. To claim they do not matter would be disingenuous.
See the FAQ at https://gofetch.fail/