The algorithm is recursive in the sense that merged tokens can participate in further merges just as in byte-pair encoding, but it involves a whole lot of pointer-chasing, so the core is inherently quite iterative. Those pointers allow skipping over all tokens which are unaffected by a merge, so updating the counts is very cheap. You can imagine each initial token being equipped with a fixed compute budget, and merging two tokens uses up the compute budget of one token, but the compute budget for the other token remains for the merge result to use. So the overall time is bounded by (compute budget per token) x (number of tokens).