undefined | Better HN

0 pointscontravariant4y ago0 comments

It gets even weirder when you realize it's not just running your code for each pixel it's running the exact same instructions in parallel for large square blocks of pixels, which makes branching incredibly expensive.

0 comments

junon4y ago

Only as expensive as the slowest pixel in the batch :D

ladberg4y ago

That's not exactly true, it can be slower than the slowest individual pixel. It's not just running the same code for each pixel in parallel across many cores, a single core* actually runs pixels at once and therefore has to have the same program counter on all of those pixels. If two pixels diverge then the core has to alternate between the different PCs and toggle each lane on and off depending on which pixel is currently executing.

That means if you had a shader like:

    if (pixelIndex % 2) {
        longFunctionA();
    } else {
        longFunctionB();
    }

It would actually take twice as long to run compared to every pixel calling the same function. Each core is executing a batch of pixels (a warp) that is evenly split between two completely different sections of code, so it has to alternate between each until they both finish.

* Core might not be the exact right term, Nvidia calls them SMs and other GPU vendors have different names.

j / k navigate · click thread line to collapse