I would definitely be in favor of an even faster `SmallBump` variant which assumes that the allocations are small w.r.t. usize::MAX for some additional speed.
I also wouldn't mind the a default "fast" bump allocator library to do all tricks it can without sacrificing safety.