LLVM patch to fix half of Spectre attack (opens in new tab)

(reviews.llvm.org)

433 pointsKristine19758y ago242 comments

242 comments

Page was down when I tried to read it, but it's archived here: http://archive.is/s831k.

Its hard to get your head around how big a deal this is. This vulnerability is so bad they killed x86 indirect jump instructions. It's so bad compilers --- all of them --- have to know about this bug, and use an incantation that hacks ret like an exploit developer would. It's so bad that to restore the original performance of a predictable indirect jump you might have to change the way you write high-level language code.

It's glorious.

jasode8y ago

>Its hard to get your head around how big a deal this is.

It truly is difficult to predict all the ripple effects from this. I can't think of a single computer bug in the last 30 years that's similar in reach to this Intel Meltdown.

[EDITED following text to replace "Intel bug" with "Spectre bug" based on ars and jcranmer clarification. The Intel Meltdown can be fixed with operating system update patches for kpti instead of a complete recompile.]

Journalists like to overuse the bombastic metaphor "shaken the very foundations" but this Spectre bug actually seems very fitting of it. Off the top of my head:

- browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

- cloud providers have to recompile and patch their code to protect themselves from hostile customer vms

- operating systems like Linux/Windows/MacOS have to recompile and patch code to protect users from malware

Imagine the economics of all these mitigations. Also imagine that each of the cloud vendors AWS/Google/Azure/Rackspace had very detailed Excel spreadsheets extrapolating cpu usage for the next few years to plan for millions of $$$ of capital expenditures. Because of the severe performance implications of the bugfix (5% to 50% slowdown?), the cpu utilization assumptions in those spreadsheets are now wrong. They will have to spend more than they thought they did to meet goals of workload throughput.

There are dozens of other scenarios that we can't immediately think of.

ars8y ago

> to this Intel Meltdown.

Wrong bug. Intel meltdown is bad, but not anywhere near as bad as Spectre which affects everything! No AMD immunity here.

kllrnohj8y ago

Meltdown is far worse in practice than Spectre.

Spectre needs a more perfect storm of factors to lead to exploitation. No hardware is immune to it, but not all software is vulnerable, either. You need code execution and you need a vulnerable target and you need to somehow trigger the vulnerable targets path and that vulnerable target needs data you want.

Meltdown just needs code execution and you have full read access to all memory.

1 more reply

jasode8y ago

> Intel meltdown is bad, but not anywhere near as bad as Spectre

I mentioned Meltdown because multiple entities (gcc, llvm, Google Cloud, Azure, Linux, Windows, etc) have already converged on concrete solutions such as new compiler flags and patches which gives us a glimpse into the costs and severity. The Spectre bug may be "bigger" but it doesn't have complete consensus mitigation yet and in the meantime, we really can't tell people to "just keep your laptop unplugged from internet and don't run any apps to avoid the Spectre bug." The Spectre hole seems like it will be an open problem for many years and the new gcc/llvm is an incomplete fix.

1 more reply

ghaff8y ago

This document has performance impact estimates from Red Hat Performance Engineering: https://access.redhat.com/node/3307751

timeu8y ago

As far as I can tell this only measures the impact of the Meltdown bugifx/kernel patch for Intel CPUs. Would be interesting to see the accumulated impact of the mitigation of Meltdown (KPTI) and Spectre (using Retpolines)

1 more reply

baq8y ago

they'll be releasing a fix for RHEL 5. hats off to these gentlemen as the patches probably aren't anywhere close to applyable.

strongholdmedia8y ago

> - browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

Not meaning to be that rude, yet this itself summarises (and the issue perhaps will shed more light on) how stupid an idea is to let everybody run untrusted code from other peoples, let alone third party stuff like "privacy-intrusion-as-a-service" startups et aliae.

late2part8y ago

That won’t really be a problem for the cloud providers. That simply charge more because the customers will use more compute.

jacksmith210068y ago

But it will cost them on I/O which they can not be passed to consumers as the price is contractual. Either the cloud providers on the hook or have to pass it somehow to Intel.

Kristine1975OP8y ago

Which in some cases might make it cheaper for customers to use their own hardware, resulting in cloud providers losing business.

1 more reply

ikeyany8y ago

Is this a 5% to 50% performance hit on all workloads or specific workloads?

TheCycoONE8y ago

The 50% was for a microbenchmark of C++ code making heavy use of virtual. V-tables and jump tables get much more expensive. Any execution path that is known at compile time is not affected.

1 more reply

voidmain8y ago

And I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem. They were just the variants that the few people in on this found time to develop exploits for. There can now be security bugs in things your program doesn't do; it seems like there is room for nearly unlimited creativity in finding them.

From the spectre paper:

"A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further."

jcranmer8y ago

I think the main takeaway should be "speculative execution creates exploitable side-channels, and you should assume your hardware is exploitable until proven otherwise." AMD and ARM are probably still exploitable with unknown exploits, possibly even at Meltdown-levels of exploitability, but people haven't taken the time to reverse-engineer the microarchitecture enough to find the exploits.

If I were developing processors, I'd be having emergency meetings on trying to craft exploits to figure out where our processors' weaknesses are. While being happy that Intel is getting all the bad PR for this and I'm not.

eridius8y ago

AIUI the fundamental difference between Meltdown and Spectre is Meltdown involves speculating execution that loads memory across privilege domains and Spectre doesn't. If both AMD and ARM won't speculate memory loads across privilege domains, then it sounds like they're strictly immune to Meltdown.

bsdetector8y ago

> main takeaway should be "speculative execution creates exploitable side-channels, and you should assume your hardware is exploitable until proven otherwise."

Speculative execution does not create side-channels in and of itself, side effects of speculative execution does that. In this case the side effect of cache state. Just don't change the cache during speculative execution and there's no problem.

1 more reply

bluGill8y ago

This needs to find its way into the hands of every manager of companies that make processors.

geertj8y ago

> d I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem.

Agreed. This is an entirely new class of vulnerabilities, and we're just at the beginning.

alanfalcon8y ago

As evidenced by the Mozilla announcement.

pdpi8y ago

ARM’s white paper details a variant 3a that affects some of their cores that are unaffected by var3 (and vice versa)

rayiner8y ago

Is glorious the right word for it? We’re going back to the stone ages where processors couldn’t predict the targets of indirect jumps. More generally, this seems to me like an attempt to patch out of what is really a class of attacks leveraging fundamental assumptions about high-performance CPU design. Before, OOO just had to preserve correctness and (some of) the order of exceptions and memory operations. Now, it has to preserve (some of) the timing of in-order execution too? Where does this path end?

simias8y ago

Legitimate question: on any non-shared non-virtualized system is there any reason to enable these workarounds besides running sandboxed applications such as javascript in a web browser (or flash/java applets/Active X, but those are not really super popular nowadays)?

For any other non-sanboxed application you pretty much have to trust the code anyway. Privilege escalation is always a bad thing of course, but for single user desktop machines getting user shell access as an attacker means that you can do pretty much anything you want.

As far as I can see the only surface of attack for my current machine would be a website running untrusted JS. For all other applications running on my machine if one of them is actually hostile them I'm already screwed.

Frankly I'm more annoyed at the ridiculous over-engineering of the Web than at CPU vendors. Because in 2017 you need to enable a turing complete language interpreter in your browser in order to display text and pictures on many (most?) websites.

Gopher should've won.

sempron648y ago

This unfortunately also affects almost all mobile apps and modern Windows installations, as they all run Javascript-enabled ads. Maybe this might cause Microsoft to reconsider what it allows to run on Windows but I don't see mobile ads going away any time soon.

1 more reply

panarky8y ago

> on any non-shared non-virtualized system is there any reason to enable these workarounds

Does the non-shared non-virtualized system have any encryption keys in memory that you want to protect?

Do you use full-disk encryption or ssh to other machines or use a cryptocurrency wallet?

2 more replies

JdeBP8y ago

As the owner of gopher://jdebp.info/1/ and the author of gopher://jdebp.info/h/Softwares/djbwares/guide/gopherd.html , I disagree. GOPHER needs a lot of improvement merely to learn the lessons that people learned with FidoNet in the 1980s.

bsdetector8y ago

Followup legitimate question: the only way to read data is to control the results of a speculative execution or fetch right?

For JavaScript won't it be sufficient to check all the calls out of it so that they can't pass data that controls an exploitable speculative execution, and also generate JIT code so the JS itself can't create exploitable instructions. The API will have to be heavily scrutinized and the JS will run somewhat slower.

If the rest of the browser code is vulnerable, but the JS code can't control the speculative execution then it should be safe to run any JS.

1 more reply

catnaroek8y ago

> Where does this path end?

It ends with the performance advantages of OOO execution being effectively negated by the workarounds to address the security issues it causes.

The following parable is edifying: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...

tcoppi8y ago

Seems like the ultimate end-game here is to have mini-vms for every process using CPU-level ring protection. If you can't speculate across privilege levels, only inside them, it isn't a security problem anymore.

sandworm1018y ago

Or time to have Kernel live on dedicated cache not ever accessed/shared with anything else. Let the CPU speculate all it wants, just not when playing in the kernel's cache. It may even be time for dedicated kernel cpus/cores.

2 more replies

manol748y ago

Qubes OS [1] does something like that

[1]: https://www.qubes-os.org/

1 more reply

frik8y ago

> mini VMs for every process using CPU ring protection

Yes. We should really start to learn from history, MULTICS operating system had already 16 CPU ring support back in the early 1970s. MULTICS is the mother of UNIX, its smaller child. MULTICS had so many advanced features that barely got implemented (often reinvented) in newer OS. It's time to read old docs and ask the old devs who are still alive. (Another such often overlooked gem is Plan9, but it's better known thanks to Go lang devs).

Older Intel CPUs only supported 2 rings. Modern Intel CPU supports only 4 rings. Windows and Linux use ring 0 for kernel mode and ring 3 for user mode. And Intel introduced a ring -1 for VT.

  "To assist virtualization, VT and Pacifica insert a new 
  privilege level beneath Ring 0. Both add nine new machine 
  code instructions that only work at "Ring -1," intended to 
  be used by the hypervisor

It's time for modern operating systems to use more rings, and modern CPUs to correctly protect between different rings.

https://en.wikipedia.org/wiki/Multics

https://en.wikipedia.org/wiki/Protection_ring

tptacek8y ago

We have different utility functions, you and I.

1 more reply

bouk8y ago

tptacek exploits computers for a living, so it's glorious for him :)

tptacek8y ago

It's not that it makes the practice of breaking into computers that much more interesting so much as it makes the underlying field much more interesting to work in. The engineering problems just got a lot more complex. We're all taking an attack vector seriously --- microarchitectural side channels --- that we weren't taking as seriously before, except as an abstract threat to crypto and a way of defeating a mitigation --- KASLR --- that nobody believed in anyways.

What's glorious is that serious software security people now have to start being literate about what it means to reverse engineer and dump the branch history buffers on different CPUs. Getting dragged through this kind of minutiae is the reason I'm still in this field after 22 years.

And I'm just a bystander here. Imagine what it must have been like for Jann Horn over the last several months!

This subsection describes how we reverse-engineered the internals of the Haswell branch predictor. Some of this is written down from memory, since we didn't keep a detailed record of what we were doing.

... because shit was so crazy while they were working this out that they didn't have the cycles to write everything down!

3 more replies

tlrobinson8y ago

Isn't that a bit like a firefighter saying your house burning down is "glorious"?

3 more replies

wglb8y ago

I like to think of a lot of vulnerability discovery and research as solving a puzzle. In the sense that this puzzle has so many far reaching implications makes it totally compelling to me. tqbf says "glorious", and I couldn't disagree.

[Edit] Or, how far down does the rabbit hole go?

Additionally, it is quite fascinating to me to compare the complexity of modern CPRUs with, say, a compiler.

wyager8y ago

> leveraging fundamental assumptions about high-performance CPU design.

I believe the generalized fix is to restore the entire CPU state after a mispredict. You’d either need to add an extra copy of the entire processor state (tens of megabits) for every simultaneous predict you support ($$$) or keep track of how to revert all changes and revert them one at a time ($, slow).

paulmd8y ago

This is harder than it seems, because once cache is deleted you can't just un-delete it, you'd have to go back to memory and pull it again.

Only the "extra copy of processor state" thing is really viable. You have to have a speculative cache and buffer in reads that only get flushed to the main cache once they're confirmed to be valid, which is enormously complicated. This facility already exists for writes, but now it needs to exist for reads too.

GP is absolutely correct that this is a fundamental assault on processor design as we know it, the speculative execution concept is going back to the drawing board for a major re-think.

1 more reply

FractalNerve8y ago

Wasn't Intel's transactional CPU Memory the solution, but it also failed to to bugs?

Sorry for quoting wikipedia, but I'm not at school, hah! [1]

'''' TSX provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision (HLE) is an instruction prefix-based interface designed to be backward compatible with processors without TSX support. Restricted Transactional Memory (RTM) is a new instruction set interface that provides greater flexibility for programmers.[13]

TSX enables optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses, while aborting and rolling back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions.[13]

In other words, lock elision through transactional execution uses memory transactions as a fast path where possible, while the slow (fallback) path is still a normal lock. ''''

[1] https://en.wikipedia.org/wiki/Transactional_Synchronization_...

FractalNerve8y ago

Wasn't Intel's transactional CPU Memory the solution, but it also failed to to bugs?

js28y ago

CPUs have been vulnerable to this attack since 1995. How did it collectively take us 22 years to figure this out? I know it's a highly esoteric complex attack, but there's no shortage of clever hackers in the world.

tzahola8y ago

- we didn't have browsers compiling JavaScript into machine code

- we didn't have hyperconverged cloud infrastructures running arbitrary entities' code next to each other

ashleyn8y ago

Sounds like it's time for me to give up on the so-called "modern web" and install noscript.

1 more reply

jzl8y ago

Great answer.

The web issue is easier to mitigate if not fix completely since there is already a massive infrastructure for widespread, rapid browser updates, and crippling Javascript to eliminate attack vectors such as high-resolution timers is completely acceptable.

The cloud/vm infrastructure is a massive problem though. It is 100% required that VMs be fully isolated. The entire infrastructure breaks down if they aren't.

jcranmer8y ago

It's sort of been well-known that speculative execution opens up the possibility of side-channel attacks for quite some time. Hell, it's long-known that SMT (e.g., HyperThreading) can leak keys in a not-really-fixable way.

What's new and surprising is the power of these side-channel attacks--you can use these, reliably, to exfiltrate arbitrary memory, including across privilege modes in some cases (apparently, some ARM cores are affected by the latter vulnerability, in addition to Intel).

pixl978y ago

Honestly we knew about this in the 70s. Mainframe/time share systems had lots of protections against attacks like this. The problem is mainstream computing when cheap/single user and attempted to build a multi user/untrusted code execution environment on top of it. Now it's come back to bite us in the ass.

Danihan8y ago

>there's no shortage of clever hackers in the world.

Are you sure?

jmull8y ago

Well, these are workarounds because fixing the problem at the source is hard.

The right fix is to prevent speculatively executed code from leaking information.

Here that perhaps means associating cache lines with a speculative branch somehow so that they aren't accessible until/unless the speculative branch becomes the real branch. (I have no idea exactly how that would be done or what the performance cost might be... I'd really need to know the details of how speculative execution is implemented in a particular CPU to even be able to guess.)

jncraton8y ago

Agreed. I haven't had this much fun thinking through the implications of a new exploit technique in a long time. It is truly beautiful.

eric_b8y ago

Prediction: This will be just like any vulnerability disclosure. The infosec people and media will scream hysterically about how game changingly bad it is. The OS vendors will patch, and business will go on as usual.

leeoniya8y ago

i know this came out as a leak, but makes one wonder how "responsible" even a Jan 9 official announcement would have been. the scope is absolutely terrifying. this bug will be exploitable for a very long time.

jopsen8y ago

They had like 6 months or so... how is more time going to make things less painful?

Piskvorrr8y ago

Jan 9, 2019? 2050? How much longer is long _enough_?

leeoniya8y ago

i guess at minimum it's worth asking how many major hosting providers have been fully patched at the time of disclosure. in addition to browsers and OSes.

1 more reply

dzdt8y ago

When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%.

Ouch! This is independent of other performance hurts, like from the kernel syscall overhead that was the hot topic yesterday. This is pretty crazy.

jerf8y ago

That's bad. A single 5% hit might not be the end of the world, but 5% here and 10% there and another 5% over there in the common case adds up badly enough. Doubly-pathological cases (indirect calling-heavy code calling lots of syscalls)... a 50% slowdown and a 30% slowdown combines to a 60% total slowdown. Yeowch.

Will be intrigued to see how processor manufacturers respond to this. If they were even slightly relaxed about it prior to disclosure I expect there's going to be some very hurried attempts to engineer some solutions pronto. This is the sort of thing where it might even be worth throwing away all of your future roadmap plans and just getting a revision of the current chips out there ASAP, whatever that may do to the rest of your roadmap.

ant6n8y ago

Sounds like it could be great for processor manufacturers. In the age where CPUs don't get faster, there's finally a reason for customers to buy new CPUs again!

CmdDot8y ago

Not really, once a program is compiled with -retpoline, new hardware won't bring back reliable branch prediction.

I'd hope maybe, just maybe, this would be enough to put a focus on compilers producing code that ends up using processor-optimized paths chosen at runtime, to avoid "overheads ranging from 10% to 50%".

Though, in this case, that would essentially mean making the entire executable region writable for some window of time, which is clearly too dangerous, so I guess the 0.1% speedups from compiling undefined behavior in new and interesting ways, will continue taking priority.

I mean, it's a compiler flag right, obviously whoever's going to run a program on an unaffected platform will take the effort to recompile everything with the flag removed.

Just the same way every serious application currently provides different executables for running on systems where SSE2, SSE4.1, or AVX2 is present.

4 more replies

Splines8y ago

Makes me wonder if that's Intel's PR plan (hence their spin on the story). Assuage the general consumer and mainstream media, point the finger at the OS vendors if necessary, and fix the bug in their next-gen chips.

When this all shakes out, the general story is going to be "upgrade sooner, current-gen Intel chips are x% faster", where x is going to be a larger number than it was a week ago.

It's more-or-less the Apple battery story all over again. Current devices are going to be slower, newer ones are going to be faster. Even if you know the why and the how, you're still in the same place as everybody else (at best, you could upgrade just the chip if your MB is new enough, but you're still buying Intel). Unless there's some clear way of imposing the external cost of this bug on Intel, it's a win-win for them.

2 more replies

imtringued8y ago

Multicore performance is getting faster at a very high pace.

Ryzen 1700X has 3.3 times the multicore performance of my i5 3450 from 2012 for almost the same price. With 7nm we will see at least another 50% increase in multicore performance on top of that.

Paul-ish8y ago

Will linux distributions automatically use this compilation option (or its analog in GCC) for packages from now until forever, even if a faster mitigation is added to CPUs?

mappu8y ago

I use a binary distro and definitely don't want to be running massively slowed-down software mitigations on a corrected CPU.

Although actually, we already are - binary distros already don't take into account per-microarchitecture scheduling, nor any ISA extensions above a common baseline (e.g. just SSE2, no autovectorising to AVX2 etc).

This might provide enough impetus to restructure how binary distros work and get the whole distro compiled with some newer CPU flags (march={first corrected architecture}?) but in the short term i assume every package will take the hit.

Great time to learn about source-based distros!

vfaronov8y ago

Not to worry, it’s “just” 5–10% for “well tuned servers using all of [performance-saving] techniques”.

nindalf8y ago

The sentence that follows the line you quoted is

> However, real-world workloads exhibit substantially lower performance impact.

I feel like you could have mentioned this.

tzahola8y ago

I thought it would be Moore's law that forces people to care about their codes' performance. I was wrong, but am nevertheless happy about the recent developments. Programming will become an art once again :)

crb0028y ago

Agreed. dlopen() should wipe branch prediction caches by default, we need to add additional flags that turn this off.

chrisper8y ago

It's ok. The 9th generation of Intel will be 50% faster and the most secure CPU ever made! /s

AaronFriel8y ago

This is brutal for all interpreted/JITed languages and all statically compiled languages with dynamic dispatch. I can hardly imagine worse news for performance oriented engineers. And what's worse is that dynamic libraries will probably need to be rebuilt with these mitigations in mind, so nearly everyone will pay the cost even if they don't need it.

I feel bad for all of the engineers currently working on performance sensitive applications in these languages. There's a whole lot of Java, .NET, and JavaScript that's about to get slower[1]. Enterprise-y, abstract class heavy (i.e.: vtable using) C++ will get slower. Rust trait objects get slower. Haskell type classes that don't optimize out get slower.

What a mess.

[1] These mitigations will need to be implemented for interpreters, and JITs will want to switch to emitting "retpoline" code for dynamic dispatch. There's no world in which I don't expect the JVM, V8, and others to switch to these by default soon.

rntz8y ago

This mitigates spectre variant #2, branch target injection. We also have a mitigation for meltdown, namely KPTI. Is there a known mitigation for spectre variant #1, bounds check bypass?

Maybe I'm being naive, but would a simple modulo instruction work? Consider the example code from https://googleprojectzero.blogspot.com/2018/01/reading-privi...:

    unsigned long untrusted_offset_from_caller = ...;
    if (untrusted_offset_from_caller < arr1->length) {
     unsigned char value = arr1->data[untrusted_offset_from_caller];
     ...
    }

If instead we did:

    unsigned char value = arr1->data[untrusted_offset_from_caller % arr1->length];

Would this produce a data dependency that prevents speculative execution from reading an out-of-bounds memory address? (Ignore for the moment that a sufficiently smart compiler might "optimize" out the modulo here.)

jzl8y ago

A new thing that's going to become a standard part of systems engineering: deciding whether any given system needs to run with or without these kinds of protections. Do you want the speed of speculative execution or do you want Meltdown/Spectre protection? In some cases lack of protection is fine. But figuring out the answer for any given system is often going to take expert-level security knowledge. Security is all about multiple layers of protection, and even a non-public facing machine might benefit from these layers depending on the context.

s4vi0r8y ago

Spectre relies on tricking the CPU into branch predicting its way into accessing protected memory, no? Is it not possible that we can keep most of the performance benefits of speculative execution by somehow having a built in "Hey, never ever speculate that I'll want to access this region of memory" sort of thing?

lorenzq8y ago

I read an ars technica article that this would be a possible solution but isn’t right now because the hardware to check access rights isn’t fast enough yet

senatorobama8y ago

Uh, isn't this what AMD does?

crb0028y ago

CPUs should have a single instruction that wipes branch prediction caches. I would have it off by default, and add to the C/C++ spec this as a standard library macro or pragma. Easy peasy.

You only need to wipe between syscalls that have side effects. Number crunching AVX heavy subroutines should never have to deal with safety once entered.

ece8y ago

This is what KPTI does, wipe caches, and if you did this often in user code, performance degradation would be all over the place. Also, heavy AVX routines that use encryption keys... would be great to attack.

ece8y ago

More likely, this is a shift back to in-order processors, if the solutions aren't workable. If you're in an embedded scenario, sure you can make more trade-offs and have more control, but it's not going to look great when it happens to get hacked.

leni5368y ago

It has an interesting performance impact on calls to dynamic libraries. One alternative approach would be to avoid the indirect calls through not using '-fPIC --shared' when building shared libraries but '-mcmodel=large --shared'. This causes the relocations to happen at the direct calls and not through a GOT.

The obvious drawback that it effectively disables sharing code in memory, it would still allow sharing code on disk though. So it would be a middle ground between the current state in dynamic and static linking.

https://www.technovelty.org/c/position-independent-code-and-...

ealexhudson8y ago

This patch apparently implements this mitigation: https://support.google.com/faqs/answer/7625886

JdeBP8y ago

And once one knows the technical background, one is better positioned to consider the response of Linus Torvalds to the idea that the entire Linux kernel be recompiled for all x86 CPUs with a compiler that implements this.

* https://lkml.org/lkml/2018/1/3/797 (https://news.ycombinator.com/item?id=16066968)

tptacek8y ago

This would be more interesting if the attack that the compiler mitigations was designed for wasn't cross-vendor, cross-architecture.

coolspot8y ago

0 usages of word "fuck"

1 usage of word "shit"

Not bad for Linus.

kough8y ago

This is a really good writeup, thanks. I'm curious -- how often are google support faq articles deeply technical like this?

badrequest8y ago

I, for one, am eternally grateful for the incredibly bright people who take the time to patch this sort of stuff.

ben_jones8y ago

And the people who invented computers, programming languages, the internet, and all the learning resources, that allow me to get a paycheck writing extremely high level application code that feels like a coloring book in comparison. Truly the shoulders of giants.

jacksmith210068y ago

Also to Google for finding and documenting it so well. Google security team really should be given an award.

vfaronov8y ago

I have a hunch that the era of side-channel attacks is only now dawning, and that we should expect many more painful exploits and cumbersome mitigations in the coming years.

What do people more knowledgeable in the field think about this?

xigency8y ago

What about users who only execute trusted code?

All of these attacks assume you are running something you don't trust on your CPU, whether it is another user's program, a non-root executable, or a JavaScript program from a website.

When do we stop hacking processors, kernels, and compilers and revisit our assumptions of what we can and can't do securely.

pjc508y ago

Define "trusted"? Who do you trust to do your verification, and how much does it cost?

xigency8y ago

Well, critical applications, like flight systems, run on a different ecosystem and are verified. (And it costs a lot.)

But my usecase might be a physical computer that isn't networked which does data science with some programs and prints out results.

These patches are focused at Amazon and cloud providers that are in the business of running separate individual's applications on the same machine.

In the consumer world, the slope would be browser scripts and user applications that aren't running as super. But even then, do you download and run software that you expect might steal information or damage your computer?

These are fundamental security questions. Creating rings and sandboxes are what create the assumptions of privacy and security.

dingo_bat8y ago

Oops I basically wrote an identical reply before reading yours.

Klathmon8y ago

I'm not more knowledgeable than you, but I think I agree.

side channels have always been some of the most insidious exploits. Many are basically un-solvable (timing attacks are always going to leak some information, and compression is basically completely at odds with secure information storage), many more are easily enough overlooked that it would be easy to maliciously include them without raising any eyebrows, and the "fixes" for them almost always murder performance.

I think the only real fully-encompassing solution to this is a redesign in how we use computers. Either a massive step backwards on performance and turning off most automatic "optimizations" until they can be proven through a much more rigorous process (both in compilers, and in hardware), or a significant change in how computers are architected adding more hardware level isolation for processes and systems running on the machine (just daydreaming now, but something like a cluster of isolated micro-CPUs that run one application only).

dingo_bat8y ago

How about not running untrusted code? That seems to be much easier to do and won't kill performance. Kill js on the web. Run only apps signed by Microsoft. Develop ML based malware fingerprinting that can recognise timing attack patterns. Throwing OO execution away shouldn't be an option in the long term.

Klathmon8y ago

It's not just "untrusted code" any more.

The "true secure" way is not running any untrusted code, not connecting to any untrusted networks, and not accepting or storing any untrusted data.

At that point you can't run a computer. It's just not an answer to say "don't let bad things in" because bad things are always going to get in.

And with side channel exploits getting more and more common, and with them being worse and worse, running any code on your machine is basically giving that code any and all of your data on the machine...

Telling users (even highly technical users) to "never ever run any untrusted code ever, and if you mess up once and run untrusted code you have completely ruined the trust of the whole machine and need to start over from scratch while assuming all of your data has been compromised" is not only infeasible, it's impossible. If this is the case, we have lost the "security" game.

It's easy to say "throwing oo execution away" is an overreaction, but if it's necessary to allow multiple programs to run on one machine without them all having what amounts to full access to one another's information, then it might be necessary. Already we know that we can't use compression with encryption, we can't use any kind of "exit early" with most kinds of encryption. It might just be that OO execution is fundamentally opposed to secure computing. At it's core, it's letting the processor do different things depending on what it can see is coming, it's almost a definition of an oracle! That's always going to be a very dangerous game to play.

It could even just be that CPUs need a "secure computing" mode, or maybe even a secure co-processor that disables all of these optimizations. But at the very least, I think changes are going to be necessary, and a 15% perf reduction might be the least of our worries.

1 more reply

smt888y ago

"Kill JS on the web" is easy and only mildly inconvenient using NoScript, but that doesn't mean most laypeople are going to do it.

Chrome and FF need "execute JS" to be an explicit per-site permission, similar to the permissions model of native smartphone apps.

Google will never do this because they're an ad company and care more about targeting ads than protecting Chrome users.

2 more replies

bobajeff8y ago

There is no chance JavaScript/WebAssembly will get disabled in browsers. However, it is possible things like Zero-rating, Certification Authorities, Google AMP and Facebook Instant Articles could eventually help to lockdown the web to only trusted domains.

However this wouldn't completely solve the issue (see Google Play Store).

Dylan168078y ago

> I think the only real fully-encompassing solution to this is a redesign in how we use computers.

You can do a lot by separating these speedup mechanisms across security boundaries. The biggest factor that makes this hard to mitigate is in-process security boundaries. Total isolation between processes is neither necessary nor sufficient.

rsync8y ago

"What do people more knowledgeable in the field think about this?"

https://marc.info/?l=openbsd-misc&m=118296441702631&w=2

(from 2007)

phkahler8y ago

RISC-V impact? With all the reports of these attacks, I have not seen mention of risc-v. Since they are in the process of finalizing a lot of specs including memory model and privileged instructions, I wonder if there will be last minute changes to mitigate these vulnerabilities.

bem948y ago

The problem (in my understanding) is not with the specification of the x86 ISA, but with the implementation of the speculative execution micro-architecture and probably the memory sub-system as well. That is why Intel is so badly affected by the problem, but not AMD, despite them both implementing the same instruction set.

RISCV has already had to fix its memory consistency model, so it is not without problems. But it that is a spec bug, not an implementation bug. Whether there is an out of order, speculative execution RISCV core in the wild which suffers from this is as far as I know very unlikely. If there is, no doubt it's designers have had a busy time lately.

earenndil8y ago

My understanding is that meltdown is due to the implementation, but spectre occurs due to issues with the specification.

leoc8y ago

At the risk of being a HN self-parody, I’ve also been wondering what this means for the Mill...

https://millcomputing.com/docs/prediction/

gpderetta8y ago

The only speculation done on the mill currently is on whether it will ever be released, so I think they'll be safe.

sp3328y ago

Don't worry, I was wondering this myself. There doesn't seem to be anything official or even any discussion on the forums yet.

Tuna-Fish8y ago

The details that this attack depends on are outside the architecture of the system, in the microarchitecture. A cpu of almost any architecture can be vulnerable or not depending on how it was implemented, thus Ryzen is immune to the worst variant while both Intel and the fastest Arm cpus are vulnerable.

I'd presume that the slowest RISC-V designs are immune due to not speculating enough, while any high-performance implementation is vulnerable.

ars8y ago

As of right now every single CPU that does speculative execution. (I.e. runs both sides of a branch then throws away the one that didn't end up being valid.)

Keyframe8y ago

RISC-V is an ISA, so it depends on the implementation.

phkahler8y ago

From one of the papers:

>> While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruc- tion set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.

bpye8y ago

I imagine BOOM and BOOM v2 may be vulnerable as they support OoO execution.

coldcode8y ago

I remember doing tricks like this in 6502 assembly and in other early processors. Amazing that to stop these attacks you have to come up with clever tricks again. Back in the 80's I would have never imagined this type of attack being something to worry about.

FLUX-YOU8y ago

>early processors

Early processors had speculative execution? I thought this had been added to Intel/AMD/ARM about 20 years ago?

dzdt8y ago

I guess he means the retpoline. On the 6502 there is no indirect jump instruction, so you need such tricks just to achieve an indirect jump at all.

pubby8y ago

There's an indirect jump instruction. It's not very good though, and has a notorious bug with addresses ending in 0xFF.

1 more reply

DiThi8y ago

I think it means they're tricks for better performance when you _don't_ have speculative execution.

gpderetta8y ago

Speculative execution is as old as branch prediction, which is very, very old.

peapicker8y ago

This brings to mind Ken Thompson's "Reflections on Trusting Trust"[1] -- after all, all I have to do to write code with the exploit is be able to remove the patch and rebuild the compiler and build some executables.

Trusting in a compiler you hope was used to build all the executables on your system isn't trustworthy enough to be the final solution.

[1] https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

pwg8y ago

Every modern compiler usually has extensions that allow for bits of assembly to be inserted alongside the usual C or C++ code.

Unless the compiler is also patched to either disallow inserted assembly, or to modify the inserted assembly (this being both hard and dangerous), someone who wants to exploit the bug will just add their own inserted assembly code that exploits the bug, and a patched compiler won't help one bit in that case.

cws1258y ago

Just as a FYI, according to:

* https://lkml.org/lkml/2018/1/4/432 * http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a...

It appears that Skylake and later can actually predict retpolines? Some hardware features called IBRS, IBPB, STIBP (not a lot of details on this are out there) are supposedly coming in a microcode update.

jgowdy8y ago

The problem I see with this concept is ROP mitigations like Intel’s control flow enforcement don’t seem compatible with intentionally using tweaked addresses with ret. The address they inject won’t match the shadow stack and the program will be terminated.

DannyBee8y ago

This is true, and so far, nobody has a better idea. (IE i would expect that unless someone comes up with one, that hardware CFE in its current form dies and won't happen for Intel until the processors are changed in a way that mitigation is not needed)

teilo8y ago

Isn't it the case that the Itanium architecture would not be vulnerable to Spectre because it moves the onus of branch prediction from the CPU to the compiler?

als08y ago

Assuming the compiler knows what it's doing :)

teilo8y ago

That was always the problem with the Itanium compilers. They were crap because they couldn't benefit from the years of tuning traditional architectures enjoyed.

acdha8y ago

Also the compiler had to be absolutely brilliant to rewrite the serial branching code most programmers wrote to work with the EPIC model. They had some good results optimized math-heavy code but the general purpose code ended up with too many nops waiting on results.

nathell8y ago

I can't help thinking of how the early-ITS approach to security (not only was there none, but looking at other users' work was a deliberate feature) was embraced by its users. I'm way too young to remember, but it rings a bell somewhere down my heart.

There's a lot of prominence being given to all kinds of damage malicious users might inflict, and ways to prevent or mitigate, but little to the malice itself. Whence does it arise? What emotions drive those users? What unmet needs?

Meanwhile, when these slowing-down patches for Sceptre and Meltdown arrive, I intend to not run them, to the possible extent. I intend to keep aside a VM with patches for critical stuff, like banking or others' data entrusted to me. But I don't want my machine to be slowed down just because someone, sometime, might invest effort in targeting these attacks at it. Given how transparent I want to be with my life, that's a risk I'm willing to take.

fwip8y ago

Most attacks aren't targeted at specific people. Hackers don't want to read your emails, they want your credit-card information, digital account passwords, or to compromise your computer to use in their botnet.

Sure, you might not have anything you want to hide in your life, but the drive-by javascript doesn't care about your secrets - it'll hack you anyway. Best-case scenario, you lose access to a bunch of accounts you used to use and need to create new identities from scratch. Worst-case, they clean you out financially, steal your identity, etc.

fooker8y ago

retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

tptacek8y ago

An indirect jump is when your program asks the CPU to transfer control to a location that your code itself computes: "jmp %register". Compare to a direct jump, where the destination of the jump is hardcoded into the jump instruction itself: "jmp $0x100".

Most programs have indirect jumps somewhere. Higher-level languages with virtual function calls have lots of indirect jumps, because they parameterize functions: to get the "length" of the variable "foo", the function "bar" has to call one of 30 different functions, depending on the type of "foo"; the function to call is read out of a table at some offset from the base address of "foo". Or, another example is switch statements, which can compile down to jump tables.

What we want, to mitigate Spectre, is to be able to disable speculative execution for indirect jumps. The CPU doesn't provide a clean way to do that directly.

So we just stop using the indirect jump instructions. Instead, we abuse the fact that "ret" is an indirect jump.

"Call" and "ret" are how CPUs support function calls. When you "call" a function, the CPU pushes the return address --- the next instruction address after the "call" --- to the stack. When you return from a function, you pop the return address and jump to it. There's a sort of "jmp %register" hidden in "ret".

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump, where the mov does a switcheroo on the saved return address.

The obvious next question to ask here is, "why don't CPUs predict and speculatively execute rets?" And, they do. So the retpoline mitigates this: instead of just "call/pop/jump", it does "call/...pause/jmp.../mov/jmp", where the middle sequence of instructions set off in "..." is jumped over and not executed, but captures the speculative execution that the CPU does --- the CPU expects the "ret" to return to the original "call", and does not know how to predict around the fact that we did the switcheroo on the return address.

How'd I do?

kibwen8y ago

> Or, another example is switch statements, which can compile down to jump tables.

Is the overhead of the retpoline such that it's no longer a benefit to compile switches to jump tables?

StavrosK8y ago

Pretty well, thanks. What I'm wondering is: The attack is using the data fetched into the cache from a speculative indirect jump to do a timing attack and discover what's in the former, correct? Why can't the CPU mark the cache area it fetched in the speculative jump as "stale" and discard it? Why wouldn't that fix the problem?

ahh8y ago

I don't know any way to leave enough breadcrumbs to do that in four clock cycles, do you?

nwmcsween8y ago

Intel is going to release a microcode update for BTB control apparently.

revelation8y ago

retpoline is just a convoluted way of doing an indirect jump/call designed to make branch prediction entirely useless. It's a novel concept because doing this is completely opposite to making a program run faster.

Here is an example of the most common programming patterns that end up causing indirect jumps/calls:

https://godbolt.org/g/eThmnG

Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

(Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

littlestymaar8y ago

> Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

> (Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

Wouldn't the performance impact be dramatic ? In this[1] example there's a 6 times slowdown between situation with and without correct branch prediction.

[1]: https://stackoverflow.com/questions/11227809/why-is-it-faste...

nothrabannosir8y ago

I don’t think this is about all branch prediction—just about branch jump prediction. Like, “jump to %rax, but don’t try to guess what %rax is before you’re 100% certain.” Not the same as “jump to a known location if you think this here register is true/false”. As far as I can piece together, the exploit relies on making the branch predictor think the branch target will be somewhere you stored malicious code , which will then be executed by another process, e.g. a kernel. If it does harm before the branch predictor catches that it was wrong, you’re home free.

I’m not sure, but that’s what it looks like, so far.

scatters8y ago

This doesn't affect branches, only indirect jumps (and calls). The performance impact will still be considerable. It will make PGO more crucial (or smarter JITting, for VM languages) since the penalty can be avoided by prefixing an indirect call with a direct call to the most likely target - this is a well-known technique, useful on machines with weaker jump predictors than branch predictors.

Quite possibly, the worst affected code will be OO code that is dynamically (open) polymorphic.

1 more reply

revelation8y ago

There is still branch prediction for normal calls or jumps, which are the majority and should be in performance conscious code.

It's just that some language features such as virtual functions in C++ often require indirect invocation when the compiler can't devirtualize a call, and there is lots of it in the kernel in performance-critical paths (think interrupts, syscalls).

blattimwind8y ago

There are two paragraphs dedicated to performance impact in the linked PR.

sanxiyn8y ago

By design, with retpoline indirect branches won't be able to take advantage of branch prediction. This is nontrivial, but can't be helped. Performance impact should be negligible otherwise.

dingo_bat8y ago

Considering that every single function call into a dynamically loaded library will be affected, that negligible "otherwise" won't be so negligible in the real world.

contrarian_8y ago

Note for a true fix to the BTB poisoning attack you would additionally have to disable SMT/HT.

See here: https://news.ycombinator.com/item?id=16070304

Pelam8y ago

Maybe some future architecture will allow software to tell CPU which regions it considers to be secret from the point of view of each other region.

Something like that could allow the CPU to speculate agressively while preventing information leak exploits.

pwg8y ago

The CPU hardware already has that feature. It is the VM paging system and the permissions assigned thereto.

The bug here is that the CPU is not aborting the speculation when fetches occur to addresses marked as "access denied". Instead the fetch happens and a line of normally inaccessible memory is put into cache by code that should not be able to get it read into the cache normally.

One hardware fix would be to plug that hole. Speculative reads get blocked when they encounter permission denied errors from the paging system and do not change the cache state. That blocks the Meltdown attack, but not the Spectre attack.

Pelam8y ago

I thought about that too... AFAIK currently paging system is not generally accessible to userland programs like browsers. They would need some way to setup different contexts for untrusted javascript code and the internal services that the javascript can call.

Also maybe the context switching would need to be made faster, because you would need to do that whenever eg javascript calls browser interfaces.

jacobolus8y ago

https://millcomputing.com/docs/ e.g. the most recent talk https://millcomputing.com/docs/threading/

Pelam8y ago

Something like the portal calls and "turfs" described in there could help.

userbinator8y ago

This is horrible, really really horrible. And I'm not talking about the bug itself, but the mitigation --- which is basically "stop using indirect jump and call instructions and recompile all your software". The latter is beyond unrealistic.

It also sets a very bad precedent: I understand people want to mitigate/fix as much as possible, but this is basically giving an implicit message to the hardware designers: "it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software."

hn_throwaway_998y ago

> it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software.

What are any other options? It's hardware, that cannot be patched. Of course they will change chip designs going forward, but what else do you suggest folks do with the billions of chips that exhibit this problem?

ychen3068y ago

Go ahead, smash your computer, wait a few months, and buy a new one.

sempron648y ago

It's noted in the patch that one would have to recompile linked libraries, which seems impractical, unless a distro decides to build everything with this flag.

imtringued8y ago

And since this patch is opt in it isn't enough to secure cloud providers.

jacquesm8y ago

Not just linked binaries, also the whole underlying OS, and, critically, the compiler itself. Otherwise you could replace the 'proofed' construct with one that is not proofed against the bug.

JDevlieghere8y ago

Why would you need to recompile the compiler? Both variants only provide read access.

jacquesm8y ago

Ah right, of course. Sorry, in the midst of doing a pile of stuff I should not be commenting on this without studying it further, I figured that the first level read access would allow you to dig up the secrets required to give you write access which would then allow you the free run of the whole system, but if you are still on the other side of a virtual machine then that won't do any good unless that virtual machine can be escaped as well.

strongholdmedia8y ago

As Alex Ionescu has put it:

> We built multi-tenant cloud computing on top of processors and chipsets that were designed and hyper-optimized for

> single-tenant use. We crossed our fingers that it would be OK and it would all turn out great and we would all profit.

> In 2018, reality has come back to bite us.

This is the root of all the problems.

crb0028y ago

This was the fix I was going to suggest. Especially with AVX leakage.

Right now many function calls don't safely wipe registers and the new side channel caches found in Spectre. There really needs to be two kinds of function calls. Maybe a C PRAGMA?

The complier has parent function call wiping as a flag; the code has pragmas that over-ride the flag.

okneil8y ago

The site is down for me. HN hug of death?

arboroia8y ago

Google text cache: https://reviews.llvm.org/D41723

Wayback Machine: https://web.archive.org/web/20180104131631/https://reviews.l...

XnoiVeX8y ago

Yes. Give it about 5 minutes. It will load without images.

hultner8y ago

It was a bit slow but eventually loaded for me.

mayoralito8y ago

Yeah, same thing happened to me... slow as hell but I guess it's common due the severity of the issue. All people wants to see this at the same time.

lousken8y ago

what about performance impact after new CPU architecture arrives? how is that going to work?

eptcyka8y ago

Mill can't come soon enough.

mike_hearn8y ago

What makes you think the Mill would be immune to these issues?

eptcyka8y ago

Mill has no speculative execution.

phs25018y ago

Uh, yes it does. It has no /out of order/ execution but it certainly has speculative execution. They mention quite often in talks how they predict EBB exits and not each branch. They even go so far as to follow that EBB exit chain to speculatively load code from DRAM several calls ahead, which is much more speculation than current CPUs are capable of.

You basically can't make a deeply-pipelined processor fast without speculative execution.

marcosdumay8y ago

A simple model of access permissions that fit before L1 cache and can return a fault before loading anything.

silimike8y ago

If this were 15 years ago, I'd say the site was SlashDotted.

andrewmcwatters8y ago

In other news, Intel has found that by not using a computer at all, though performance overheads increase 100%, this counter-measure does secure any previously available attack vectors.

j / k navigate · click thread line to collapse

242 comments

tptacek8y ago

Page was down when I tried to read it, but it's archived here: http://archive.is/s831k.

It's glorious.

jasode8y ago

>Its hard to get your head around how big a deal this is.

It truly is difficult to predict all the ripple effects from this. I can't think of a single computer bug in the last 30 years that's similar in reach to this Intel Meltdown.

Journalists like to overuse the bombastic metaphor "shaken the very foundations" but this Spectre bug actually seems very fitting of it. Off the top of my head:

- browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

- cloud providers have to recompile and patch their code to protect themselves from hostile customer vms

- operating systems like Linux/Windows/MacOS have to recompile and patch code to protect users from malware

There are dozens of other scenarios that we can't immediately think of.

ars8y ago

> to this Intel Meltdown.

Wrong bug. Intel meltdown is bad, but not anywhere near as bad as Spectre which affects everything! No AMD immunity here.

kllrnohj8y ago

Meltdown is far worse in practice than Spectre.

Meltdown just needs code execution and you have full read access to all memory.

1 more reply

jasode8y ago

> Intel meltdown is bad, but not anywhere near as bad as Spectre

1 more reply

ghaff8y ago

This document has performance impact estimates from Red Hat Performance Engineering: https://access.redhat.com/node/3307751

timeu8y ago

1 more reply

baq8y ago

they'll be releasing a fix for RHEL 5. hats off to these gentlemen as the patches probably aren't anywhere close to applyable.

strongholdmedia8y ago

> - browsers like Chrome & Firefox have to compile with new defensive compilation flags because it runs untrusted Javascript

late2part8y ago

That won’t really be a problem for the cloud providers. That simply charge more because the customers will use more compute.

jacksmith210068y ago

But it will cost them on I/O which they can not be passed to consumers as the price is contractual. Either the cloud providers on the hook or have to pass it somehow to Intel.

Kristine1975OP8y ago

Which in some cases might make it cheaper for customers to use their own hardware, resulting in cloud providers losing business.

1 more reply

ikeyany8y ago

Is this a 5% to 50% performance hit on all workloads or specific workloads?

TheCycoONE8y ago

The 50% was for a microbenchmark of C++ code making heavy use of virtual. V-tables and jump tables get much more expensive. Any execution path that is known at compile time is not affected.

1 more reply

voidmain8y ago

From the spectre paper:

"A minor variant of this could be to instead use an out-of-bounds read to a function pointer to gain control of execution in the mis-speculated path. We did not investigate this variant further."

jcranmer8y ago

eridius8y ago

bsdetector8y ago

> main takeaway should be "speculative execution creates exploitable side-channels, and you should assume your hardware is exploitable until proven otherwise."

1 more reply

bluGill8y ago

This needs to find its way into the hands of every manager of companies that make processors.

geertj8y ago

> d I fear there's little reason to think that the "three variants" from project zero's announcement are the full scope of the problem.

Agreed. This is an entirely new class of vulnerabilities, and we're just at the beginning.

alanfalcon8y ago

As evidenced by the Mozilla announcement.

pdpi8y ago

ARM’s white paper details a variant 3a that affects some of their cores that are unaffected by var3 (and vice versa)

rayiner8y ago

simias8y ago

Gopher should've won.

sempron648y ago

1 more reply

panarky8y ago

> on any non-shared non-virtualized system is there any reason to enable these workarounds

Does the non-shared non-virtualized system have any encryption keys in memory that you want to protect?

Do you use full-disk encryption or ssh to other machines or use a cryptocurrency wallet?

2 more replies

JdeBP8y ago

bsdetector8y ago

Followup legitimate question: the only way to read data is to control the results of a speculative execution or fetch right?

If the rest of the browser code is vulnerable, but the JS code can't control the speculative execution then it should be safe to run any JS.

1 more reply

catnaroek8y ago

> Where does this path end?

It ends with the performance advantages of OOO execution being effectively negated by the workarounds to address the security issues it causes.

The following parable is edifying: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...

tcoppi8y ago

sandworm1018y ago

2 more replies

manol748y ago

Qubes OS [1] does something like that

[1]: https://www.qubes-os.org/

1 more reply

frik8y ago

> mini VMs for every process using CPU ring protection

Older Intel CPUs only supported 2 rings. Modern Intel CPU supports only 4 rings. Windows and Linux use ring 0 for kernel mode and ring 3 for user mode. And Intel introduced a ring -1 for VT.

  "To assist virtualization, VT and Pacifica insert a new 
  privilege level beneath Ring 0. Both add nine new machine 
  code instructions that only work at "Ring -1," intended to 
  be used by the hypervisor

It's time for modern operating systems to use more rings, and modern CPUs to correctly protect between different rings.

https://en.wikipedia.org/wiki/Multics

https://en.wikipedia.org/wiki/Protection_ring

tptacek8y ago

We have different utility functions, you and I.

1 more reply

bouk8y ago

tptacek exploits computers for a living, so it's glorious for him :)

tptacek8y ago

And I'm just a bystander here. Imagine what it must have been like for Jann Horn over the last several months!

... because shit was so crazy while they were working this out that they didn't have the cycles to write everything down!

3 more replies

tlrobinson8y ago

Isn't that a bit like a firefighter saying your house burning down is "glorious"?

3 more replies

wglb8y ago

[Edit] Or, how far down does the rabbit hole go?

Additionally, it is quite fascinating to me to compare the complexity of modern CPRUs with, say, a compiler.

wyager8y ago

> leveraging fundamental assumptions about high-performance CPU design.

paulmd8y ago

This is harder than it seems, because once cache is deleted you can't just un-delete it, you'd have to go back to memory and pull it again.

GP is absolutely correct that this is a fundamental assault on processor design as we know it, the speculative execution concept is going back to the drawing board for a major re-think.

1 more reply

FractalNerve8y ago

Wasn't Intel's transactional CPU Memory the solution, but it also failed to to bugs?

Sorry for quoting wikipedia, but I'm not at school, hah! [1]

In other words, lock elision through transactional execution uses memory transactions as a fast path where possible, while the slow (fallback) path is still a normal lock. ''''

[1] https://en.wikipedia.org/wiki/Transactional_Synchronization_...

FractalNerve8y ago

Wasn't Intel's transactional CPU Memory the solution, but it also failed to to bugs?

js28y ago

tzahola8y ago

- we didn't have browsers compiling JavaScript into machine code

- we didn't have hyperconverged cloud infrastructures running arbitrary entities' code next to each other

ashleyn8y ago

Sounds like it's time for me to give up on the so-called "modern web" and install noscript.

1 more reply

jzl8y ago

Great answer.

The cloud/vm infrastructure is a massive problem though. It is 100% required that VMs be fully isolated. The entire infrastructure breaks down if they aren't.

jcranmer8y ago

pixl978y ago

Danihan8y ago

>there's no shortage of clever hackers in the world.

Are you sure?

jmull8y ago

Well, these are workarounds because fixing the problem at the source is hard.

The right fix is to prevent speculatively executed code from leaking information.

jncraton8y ago

Agreed. I haven't had this much fun thinking through the implications of a new exploit technique in a long time. It is truly beautiful.

eric_b8y ago

leeoniya8y ago

jopsen8y ago

They had like 6 months or so... how is more time going to make things less painful?

Piskvorrr8y ago

Jan 9, 2019? 2050? How much longer is long _enough_?

leeoniya8y ago

i guess at minimum it's worth asking how many major hosting providers have been fully patched at the time of disclosure. in addition to browsers and OSes.

1 more reply

dzdt8y ago

Ouch! This is independent of other performance hurts, like from the kernel syscall overhead that was the hot topic yesterday. This is pretty crazy.

jerf8y ago

ant6n8y ago

Sounds like it could be great for processor manufacturers. In the age where CPUs don't get faster, there's finally a reason for customers to buy new CPUs again!

CmdDot8y ago

Not really, once a program is compiled with -retpoline, new hardware won't bring back reliable branch prediction.

I mean, it's a compiler flag right, obviously whoever's going to run a program on an unaffected platform will take the effort to recompile everything with the flag removed.

Just the same way every serious application currently provides different executables for running on systems where SSE2, SSE4.1, or AVX2 is present.

4 more replies

Splines8y ago

When this all shakes out, the general story is going to be "upgrade sooner, current-gen Intel chips are x% faster", where x is going to be a larger number than it was a week ago.

2 more replies

imtringued8y ago

Multicore performance is getting faster at a very high pace.

Ryzen 1700X has 3.3 times the multicore performance of my i5 3450 from 2012 for almost the same price. With 7nm we will see at least another 50% increase in multicore performance on top of that.

Paul-ish8y ago

Will linux distributions automatically use this compilation option (or its analog in GCC) for packages from now until forever, even if a faster mitigation is added to CPUs?

mappu8y ago

I use a binary distro and definitely don't want to be running massively slowed-down software mitigations on a corrected CPU.

Great time to learn about source-based distros!

vfaronov8y ago

Not to worry, it’s “just” 5–10% for “well tuned servers using all of [performance-saving] techniques”.

nindalf8y ago

The sentence that follows the line you quoted is

> However, real-world workloads exhibit substantially lower performance impact.

I feel like you could have mentioned this.

tzahola8y ago

crb0028y ago

Agreed. dlopen() should wipe branch prediction caches by default, we need to add additional flags that turn this off.

chrisper8y ago

It's ok. The 9th generation of Intel will be 50% faster and the most secure CPU ever made! /s

AaronFriel8y ago

What a mess.

rntz8y ago

This mitigates spectre variant #2, branch target injection. We also have a mitigation for meltdown, namely KPTI. Is there a known mitigation for spectre variant #1, bounds check bypass?

Maybe I'm being naive, but would a simple modulo instruction work? Consider the example code from https://googleprojectzero.blogspot.com/2018/01/reading-privi...:

    unsigned long untrusted_offset_from_caller = ...;
    if (untrusted_offset_from_caller < arr1->length) {
     unsigned char value = arr1->data[untrusted_offset_from_caller];
     ...
    }

If instead we did:

    unsigned char value = arr1->data[untrusted_offset_from_caller % arr1->length];

jzl8y ago

s4vi0r8y ago

lorenzq8y ago

I read an ars technica article that this would be a possible solution but isn’t right now because the hardware to check access rights isn’t fast enough yet

senatorobama8y ago

Uh, isn't this what AMD does?

crb0028y ago

CPUs should have a single instruction that wipes branch prediction caches. I would have it off by default, and add to the C/C++ spec this as a standard library macro or pragma. Easy peasy.

You only need to wipe between syscalls that have side effects. Number crunching AVX heavy subroutines should never have to deal with safety once entered.

ece8y ago

leni5368y ago

https://www.technovelty.org/c/position-independent-code-and-...

ealexhudson8y ago

This patch apparently implements this mitigation: https://support.google.com/faqs/answer/7625886

JdeBP8y ago

* https://lkml.org/lkml/2018/1/3/797 (https://news.ycombinator.com/item?id=16066968)

tptacek8y ago

This would be more interesting if the attack that the compiler mitigations was designed for wasn't cross-vendor, cross-architecture.

coolspot8y ago

0 usages of word "fuck"

1 usage of word "shit"

Not bad for Linus.

kough8y ago

This is a really good writeup, thanks. I'm curious -- how often are google support faq articles deeply technical like this?

badrequest8y ago

I, for one, am eternally grateful for the incredibly bright people who take the time to patch this sort of stuff.

ben_jones8y ago

jacksmith210068y ago

Also to Google for finding and documenting it so well. Google security team really should be given an award.

vfaronov8y ago

I have a hunch that the era of side-channel attacks is only now dawning, and that we should expect many more painful exploits and cumbersome mitigations in the coming years.

What do people more knowledgeable in the field think about this?

xigency8y ago

What about users who only execute trusted code?

All of these attacks assume you are running something you don't trust on your CPU, whether it is another user's program, a non-root executable, or a JavaScript program from a website.

When do we stop hacking processors, kernels, and compilers and revisit our assumptions of what we can and can't do securely.

pjc508y ago

Define "trusted"? Who do you trust to do your verification, and how much does it cost?

xigency8y ago

Well, critical applications, like flight systems, run on a different ecosystem and are verified. (And it costs a lot.)

But my usecase might be a physical computer that isn't networked which does data science with some programs and prints out results.

These patches are focused at Amazon and cloud providers that are in the business of running separate individual's applications on the same machine.

These are fundamental security questions. Creating rings and sandboxes are what create the assumptions of privacy and security.

dingo_bat8y ago

Oops I basically wrote an identical reply before reading yours.

Klathmon8y ago

I'm not more knowledgeable than you, but I think I agree.

dingo_bat8y ago

Klathmon8y ago

It's not just "untrusted code" any more.

The "true secure" way is not running any untrusted code, not connecting to any untrusted networks, and not accepting or storing any untrusted data.

At that point you can't run a computer. It's just not an answer to say "don't let bad things in" because bad things are always going to get in.

1 more reply

smt888y ago

"Kill JS on the web" is easy and only mildly inconvenient using NoScript, but that doesn't mean most laypeople are going to do it.

Chrome and FF need "execute JS" to be an explicit per-site permission, similar to the permissions model of native smartphone apps.

Google will never do this because they're an ad company and care more about targeting ads than protecting Chrome users.

2 more replies

bobajeff8y ago

However this wouldn't completely solve the issue (see Google Play Store).

Dylan168078y ago

> I think the only real fully-encompassing solution to this is a redesign in how we use computers.

rsync8y ago

"What do people more knowledgeable in the field think about this?"

https://marc.info/?l=openbsd-misc&m=118296441702631&w=2

(from 2007)

phkahler8y ago

bem948y ago

earenndil8y ago

My understanding is that meltdown is due to the implementation, but spectre occurs due to issues with the specification.

leoc8y ago

At the risk of being a HN self-parody, I’ve also been wondering what this means for the Mill...

https://millcomputing.com/docs/prediction/

gpderetta8y ago

The only speculation done on the mill currently is on whether it will ever be released, so I think they'll be safe.

sp3328y ago

Don't worry, I was wondering this myself. There doesn't seem to be anything official or even any discussion on the forums yet.

Tuna-Fish8y ago

I'd presume that the slowest RISC-V designs are immune due to not speculating enough, while any high-performance implementation is vulnerable.

ars8y ago

As of right now every single CPU that does speculative execution. (I.e. runs both sides of a branch then throws away the one that didn't end up being valid.)

Keyframe8y ago

RISC-V is an ISA, so it depends on the implementation.

phkahler8y ago

From one of the papers:

bpye8y ago

I imagine BOOM and BOOM v2 may be vulnerable as they support OoO execution.

coldcode8y ago

FLUX-YOU8y ago

>early processors

Early processors had speculative execution? I thought this had been added to Intel/AMD/ARM about 20 years ago?

dzdt8y ago

I guess he means the retpoline. On the 6502 there is no indirect jump instruction, so you need such tricks just to achieve an indirect jump at all.

pubby8y ago

There's an indirect jump instruction. It's not very good though, and has a notorious bug with addresses ending in 0xFF.

1 more reply

DiThi8y ago

I think it means they're tricks for better performance when you _don't_ have speculative execution.

gpderetta8y ago

Speculative execution is as old as branch prediction, which is very, very old.

peapicker8y ago

Trusting in a compiler you hope was used to build all the executables on your system isn't trustworthy enough to be the final solution.

[1] https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

pwg8y ago

Every modern compiler usually has extensions that allow for bits of assembly to be inserted alongside the usual C or C++ code.

cws1258y ago

Just as a FYI, according to:

* https://lkml.org/lkml/2018/1/4/432 * http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a...

jgowdy8y ago

DannyBee8y ago

teilo8y ago

Isn't it the case that the Itanium architecture would not be vulnerable to Spectre because it moves the onus of branch prediction from the CPU to the compiler?

als08y ago

Assuming the compiler knows what it's doing :)

teilo8y ago

That was always the problem with the Itanium compilers. They were crap because they couldn't benefit from the years of tuning traditional architectures enjoyed.

acdha8y ago

nathell8y ago

fwip8y ago

fooker8y ago

retpoline seems to be a novel concept. Can anyone ELI5?

Also, any insight about performance impact here?

tptacek8y ago

What we want, to mitigate Spectre, is to be able to disable speculative execution for indirect jumps. The CPU doesn't provide a clean way to do that directly.

So we just stop using the indirect jump instructions. Instead, we abuse the fact that "ret" is an indirect jump.

You abuse "ret" by replacing indirect jumps with a sequence of call/mov/jump, where the mov does a switcheroo on the saved return address.

How'd I do?

kibwen8y ago

> Or, another example is switch statements, which can compile down to jump tables.

Is the overhead of the retpoline such that it's no longer a benefit to compile switches to jump tables?

StavrosK8y ago

ahh8y ago

I don't know any way to leave enough breadcrumbs to do that in four clock cycles, do you?

nwmcsween8y ago

Intel is going to release a microcode update for BTB control apparently.

revelation8y ago

Here is an example of the most common programming patterns that end up causing indirect jumps/calls:

https://godbolt.org/g/eThmnG

Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

(Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

littlestymaar8y ago

> Imagine every virtual function call in a C++ program being mispredicted and taking twice as long.

> (Instead of forcing us to recompile the world, maybe Intel should just disable branch prediction in microcode.)

Wouldn't the performance impact be dramatic ? In this[1] example there's a 6 times slowdown between situation with and without correct branch prediction.

[1]: https://stackoverflow.com/questions/11227809/why-is-it-faste...

nothrabannosir8y ago

I’m not sure, but that’s what it looks like, so far.

scatters8y ago

Quite possibly, the worst affected code will be OO code that is dynamically (open) polymorphic.

1 more reply

revelation8y ago

There is still branch prediction for normal calls or jumps, which are the majority and should be in performance conscious code.

blattimwind8y ago

There are two paragraphs dedicated to performance impact in the linked PR.

sanxiyn8y ago

By design, with retpoline indirect branches won't be able to take advantage of branch prediction. This is nontrivial, but can't be helped. Performance impact should be negligible otherwise.

dingo_bat8y ago

Considering that every single function call into a dynamically loaded library will be affected, that negligible "otherwise" won't be so negligible in the real world.

contrarian_8y ago

Note for a true fix to the BTB poisoning attack you would additionally have to disable SMT/HT.

See here: https://news.ycombinator.com/item?id=16070304

Pelam8y ago

Maybe some future architecture will allow software to tell CPU which regions it considers to be secret from the point of view of each other region.

Something like that could allow the CPU to speculate agressively while preventing information leak exploits.

pwg8y ago

The CPU hardware already has that feature. It is the VM paging system and the permissions assigned thereto.

Pelam8y ago

Also maybe the context switching would need to be made faster, because you would need to do that whenever eg javascript calls browser interfaces.

jacobolus8y ago

https://millcomputing.com/docs/ e.g. the most recent talk https://millcomputing.com/docs/threading/

Pelam8y ago

Something like the portal calls and "turfs" described in there could help.

userbinator8y ago

hn_throwaway_998y ago

> it doesn't matter if our instructions are broken, regardless of how widespread in use they already are --- they'll just fix it in the software.

ychen3068y ago

Go ahead, smash your computer, wait a few months, and buy a new one.

sempron648y ago

It's noted in the patch that one would have to recompile linked libraries, which seems impractical, unless a distro decides to build everything with this flag.

imtringued8y ago

And since this patch is opt in it isn't enough to secure cloud providers.

jacquesm8y ago

Not just linked binaries, also the whole underlying OS, and, critically, the compiler itself. Otherwise you could replace the 'proofed' construct with one that is not proofed against the bug.

JDevlieghere8y ago

Why would you need to recompile the compiler? Both variants only provide read access.

jacquesm8y ago

strongholdmedia8y ago

As Alex Ionescu has put it:

> We built multi-tenant cloud computing on top of processors and chipsets that were designed and hyper-optimized for

> single-tenant use. We crossed our fingers that it would be OK and it would all turn out great and we would all profit.

> In 2018, reality has come back to bite us.

This is the root of all the problems.

crb0028y ago

This was the fix I was going to suggest. Especially with AVX leakage.

Right now many function calls don't safely wipe registers and the new side channel caches found in Spectre. There really needs to be two kinds of function calls. Maybe a C PRAGMA?

The complier has parent function call wiping as a flag; the code has pragmas that over-ride the flag.

okneil8y ago

The site is down for me. HN hug of death?

arboroia8y ago

Google text cache: https://reviews.llvm.org/D41723

Wayback Machine: https://web.archive.org/web/20180104131631/https://reviews.l...

XnoiVeX8y ago

Yes. Give it about 5 minutes. It will load without images.

hultner8y ago

It was a bit slow but eventually loaded for me.

mayoralito8y ago

Yeah, same thing happened to me... slow as hell but I guess it's common due the severity of the issue. All people wants to see this at the same time.

lousken8y ago

what about performance impact after new CPU architecture arrives? how is that going to work?

eptcyka8y ago

Mill can't come soon enough.

mike_hearn8y ago

What makes you think the Mill would be immune to these issues?

eptcyka8y ago

Mill has no speculative execution.

phs25018y ago

You basically can't make a deeply-pipelined processor fast without speculative execution.

marcosdumay8y ago

A simple model of access permissions that fit before L1 cache and can return a fault before loading anything.

silimike8y ago

If this were 15 years ago, I'd say the site was SlashDotted.

andrewmcwatters8y ago

In other news, Intel has found that by not using a computer at all, though performance overheads increase 100%, this counter-measure does secure any previously available attack vectors.

j / k navigate · click thread line to collapse