undefined | Better HN

0 pointskeldaris7y ago0 comments

> It is showing low-level architecture details that would be 100% identical between the two cases.

To be as charitable as I can possibly be, the only part that could theoretically make sense is that the compiler could emit non-temporal store instructions to bypass the cache. I know compilers currently don't do that for volatile, but I don't know why.

0 comments

2 comments · 2 top-level

comex7y ago

> the only part that could theoretically make sense is that the compiler could emit non-temporal store instructions to bypass the cache. I know compilers currently don't do that for volatile, but I don't know why.

Two reasons:

First, using nontemporal accesses would break mixed volatile and non-volatile accesses to the same memory, something which is not defined by the C standard but which some programs rely on anyway.

Second, more importantly: why would they?

- If the address you’re accessing points to hardware registers, the page table entry should be marked non-cacheable, which makes nontemporal accesses unnecessary. And if for some reason it’s not marked properly, nontemporal accesses wouldn’t be sufficient to guarantee that things work anyway, because nontemporal is just a hint which the hardware may not respect. In any case, at least on x86, AFAIK the only nontemporal instructions access 128+ bits of memory at a time, which wouldn’t even work for hardware registers (which generally require you to use a specific access size).

- If the address you’re using points to regular memory, on the other hand, volatile is probably being used to implement atomics, in which case bypassing the cache is unnecessary and also slow. In theory, compilers could compile volatile into accesses surrounded by memory barrier instructions, which would enforce a stronger memory ordering (while being faster than bypassing the cache entirely), especially useful on architectures with weaker memory models than x86. In fact, that’s what volatile does in Java. But in C, it’s pretty long-established that volatile accesses should just compile to regular load/store instructions, and any necessary barriers must be inserted manually. People writing high-performance code wouldn’t be happy if the compiler started inserting unnecessary barrier instructions for them… In any case, usage of volatile for atomics is deprecated in favor of C/C++11 atomics, which do insert barriers for you.

Gibbon17y ago

I think the reason is the details are too complicated to be captured by the volatile keyword.

For instance the processor I use has a controller that enforces consistency on IO memory operations. So volatile works 'fine'. I know that. The compiler is targeting a core not an implementation. So it has no idea.

j / k navigate · click thread line to collapse