undefined | Better HN

0 pointshaberman2y ago0 comments

> jump-if-equal and set-if-equal would seem to have the same level of predictability.

The difference is that branches have dedicated hardware (branch predictors) that will speculatively execute subsequent instructions based on their best guess about which way the branch will go. Whereas conditional moves cannot execute any subsequent instructions until the correct value is available.

Put another way, CPUs have control flow speculation, but not conditional move speculation. I don't know if conditional move speculation would be a feasible thing to implement or not, but I'm pretty sure that no mainstream CPUs have such a feature.

0 comments

11 comments · 2 top-level

Lk7Of3vfJS2n2y ago· 7 in thread

I'd be curious to learn why CPUs don't have conditional move speculation.

Tuna-Fish2y ago

Because modern CPUs as a rule don't speculate on values to arithmetic, only on control flow, and CMOV acts like arithmetic.

That is, if there is an add instruction on rax and rbx, no matter what, the add instruction will not execute until both rbx and rbx are available. If the result went into rax, and there is an another instruction that uses that as a source, no matter what that instruction will not execute until the add has completed.

CMOV is implemented as an ALU instruction that always writes into it's output, and either writes the value that is already in there (which is why it depends on the value of it's output) or the value provided, depending on flags.

torstenvl2y ago

I'm not saying you're wrong — I'm completely ignorant at the microcode level — but it seems to me like between

    cmp x, y
    je z

and

    cmp x, y
    sete z

the actual speculative part is the same: speculating as to the result of cmp x, y

If that's true, why would it not simply pipeline sete and the following instructions and simply execute (or not execute) sete according to its prediction, and then double check itself and reverse (or apply) the operation if the prediction was wrong?

I probably just have a bad mental model of what's going on under the (under the) hood, so whatever patience you have to deal with my stupid questions would be greatly appreciated.

3 more replies

sitkack2y ago

I hope you work on compiler backends.

1 more reply

nwallin2y ago

Speculative execution is all about control flow. It's about what value is in the instruction pointer at some nebulous point in the future.

A conditional jump can put one of two values into the instruction pointer, they will either increment the instruction pointer (jump not taken) or put the immediate value into the instruction pointer. (jump taken)

cmov/sete are utterly deterministic; they always increment the instruction pointer. There's nothing to speculate on, there's nothing to predict. They just go to the next instruction.

sweetjuly2y ago

> Speculative execution is all about control flow

It's murkier than that. Speculation also deals with the order in which instructions can be executed. Take for example memory ordering (discussed in a mini essay elsewhere here): we typically speculate that all loads are unrelated to any other older in-flight stores with unresolved addresses so that we can optimistically launch them. This is not a control flow issue but it is something we both speculate and predict (memory dependence predictors!) despite the next PC being essentially deterministic.

kazinator2y ago

> Speculative execution is all about control flow. It's about what value is in the instruction pointer at some nebulous point in the future.

.. and all about what we can wheedle out of all the background speculation that will help us get root on this box.

sweetjuly2y ago

One other perspective is that by speculating the outcomes of conditional instructions, you naturally open yourself up to mispeculating them. This sounds obvious but the consequences for the uarch are quite severe. This is because anytime you mispeculate an instruction, most (all?) contemporary CPUs throw out all younger speculative progress (even if it is unrelated!) and restart at the instruction it originally mispeculated. Throwing out all this work is both i) a waste of power/cycles (you did all this speculative work for nothing!) and ii) quite an expensive operation because you either have to iteratively rollback the state (slow!) or take a snapshot the state on every conditional instruction (expensive from power/area perspective).

A similar idea to what you're proposing (and a possible solution to the above issue) does come up in another part of the processor however! Specifically, high performance processors launch loads very aggressively and often times return data as soon as the address is known. This is because memory is often the bottleneck for performance. This, unfortunately, has some challenges. Namely, memory ordering violations. Take for example the following snippet (ARMv8):

    mov x1, #1    
    udiv x3, x2, x1
    str x2, [x3]
    ldr x4, [x2]
    add x5, x4, x4

This is a silly and somewhat contrived code sequence, but note here that both str x2 and ldr x4 access the same address and thus the value in x4 should be x2. Note, however, that since str x2's address (x3) is produced by a slow division operation but ldr x4's address (x2) is available much more quickly, ldr x4 likely will launch before the CPU even knows that str x2 conflicts with it. Thus, the data returned by the load will be whatever random old stale data is in the cache rather than the correct value that is currently sitting in x2. This means that the subsequent add which consumes this data will produce an incorrect value, leading the whole program to derail. Once the CPU detects this issue, it has to throw away all the state and restart execution of the program at ldr x4 in order to fix its mistake and fix up the memory ordering violation. In essence, the CPU is speculating that str x2 and ldr x4 are unrelated because doing so is very important for performance. Unfortunately, however, memory ordering violations are actually somewhat common and constantly having to restart execution has negative performance implication.

Now, this is actually a very similar problem as we'd see with conditional instruction speculation! So how do we solve this issue for memory ordering violations? Well, we predict which pairs of stores and loads are dependent and block the load from launching until the address of its supposed dependent store resolves. If this predictor is functioning well, we are able to both aggressively launch loads while also avoiding many costly fixups!

So, how would we translate this to conditional instruction speculation? Well, one idea is that we could predict both whether a given instruction is predictable and, if so, which way we should predict it. If a conditional instruction is predicted as unpredictable, its result will not be speculated (thereby avoiding frequent costly restarts) but if it is predicted to be predictable, we can try to predict which one to take.

Would this work? Maybe. Will anyone actually do this? Likely not. As others have suggested, conditional instructions are almost exclusively used for hard to predict conditions specifically because CPUs don't speculate them. Thus, in most existing code the predictor would just say "yep can't predict it" and we'd just have ended up wasting a bunch of area and power on a predictor that never gets used.

If you're really dedicated to this cause though, feel free to write a paper on it. Spitballing performance numbers is easy but often wrong in quite surprising ways, so maybe this might just work for some weird reason I've missed :)

bjourne2y ago· 2 in thread

> Whereas conditional moves cannot execute any subsequent instructions until the correct value is available.

That is incorrect. Super-scalar processors have no problem executing subsequent instructions before the cmov writebacks. However, the register cmov writes to can of course not be read before cmov has has passed the execution unit. But that's not different from other arithmetic instructions.

habermanOP2y ago

You are correct, I should have clarified, subsequent instructions that depend on the result of the cmov cannot execute until the cmov has executed. Whereas subsequent instructions that depend on the result of the branch instruction can be speculatively executed even before the branch conditional has been evaluated.

bjourne2y ago

True, but independently of whether "cmov rax, ..." or "jnz L; mov rax, ...; L:" is used, subsequent instructions that reads rax needs to stall until rax has been written to (or at least until cmov/mov has executed if bypasses are used).

1 more reply

j / k navigate · click thread line to collapse

0 comments

11 comments · 2 top-level

Lk7Of3vfJS2n2y ago· 7 in thread

I'd be curious to learn why CPUs don't have conditional move speculation.

Tuna-Fish2y ago

Because modern CPUs as a rule don't speculate on values to arithmetic, only on control flow, and CMOV acts like arithmetic.

torstenvl2y ago

I'm not saying you're wrong — I'm completely ignorant at the microcode level — but it seems to me like between

    cmp x, y
    je z

and

    cmp x, y
    sete z

the actual speculative part is the same: speculating as to the result of cmp x, y

I probably just have a bad mental model of what's going on under the (under the) hood, so whatever patience you have to deal with my stupid questions would be greatly appreciated.

3 more replies

sitkack2y ago

I hope you work on compiler backends.

1 more reply

nwallin2y ago

Speculative execution is all about control flow. It's about what value is in the instruction pointer at some nebulous point in the future.

cmov/sete are utterly deterministic; they always increment the instruction pointer. There's nothing to speculate on, there's nothing to predict. They just go to the next instruction.

sweetjuly2y ago

> Speculative execution is all about control flow

kazinator2y ago

> Speculative execution is all about control flow. It's about what value is in the instruction pointer at some nebulous point in the future.

.. and all about what we can wheedle out of all the background speculation that will help us get root on this box.

sweetjuly2y ago

    mov x1, #1    
    udiv x3, x2, x1
    str x2, [x3]
    ldr x4, [x2]
    add x5, x4, x4

bjourne2y ago· 2 in thread

> Whereas conditional moves cannot execute any subsequent instructions until the correct value is available.

habermanOP2y ago

bjourne2y ago

1 more reply

j / k navigate · click thread line to collapse