Implementations may also define certain sizes that are more efficient. On modern x86-64, large pages may allow 2MB, 1GB, 512 GB, etc. (multiples of 512) to be more efficient. On modern ARMv8/v9 there is similar large page support (the exact details depend on the minimum mapping granularity) and they may also support aligned, contiguous mappings which making other sizes efficient (the exact details are even more complicated than simple large pages and highly optional).
As such, if you want to get just "1 byte", then you create a userspace allocator that asks for memory that conforms to the limits of the MMU and then it is the job of the userspace allocator to abstract that away for you by chopping it up under its own auspices.