Phil style: track end and remaining. End doesn’t change. To allocate size, check if lesseq remaining and subtract from it. End minus remaining is the result (using the old value of remaining). Has two variants - one that burns an extra register but saves an instruction. Empirically, this one is fast enough that I just use it all the time.
Bmalloc style bump: independently subtract from remaining and add to bump. It’s more instructions but they are instruction level parallel.
Another friend’s bump style: have bump/end ptrs like usual. Fast path checks if size < end - bump. Then you add to bump like usual.
Note that instruction counts on these don’t matter so much on modern CPU’s. But dependency chain length does matter, a lot. Register usage matters only if you want to inline the allocator (and it’s not always beneficial to do it).