The same group did another paper https://arxiv.org/abs/2301.00774 which shows that in addition to reducing the precision of each parameter, you can also prune out a bunch of parameters entirely. It's harder to apply this optimization because models are usually loaded into RAM densely, but I hope someone figures out how to do it for popular models.
Note too that the numbers are standardized, e.g. floats are defined by IEEE 754 standard. Numbers in this format have specialized hardware to do math with them, so when considering which number format to use it's difficult to get outside of the established ones (foat32, float16, int8).
Ex: Since C and C++ number sizes depend on processor architecture, C++ has types like int16_t and int32_t to enforce a size regardless of architecture, Python always uses the same side, but Numpy has np.int16 and np.int32, Java also uses the same size but has short for 16-bit and int for 32-bit integers.
It just happens that some higher level languages hide this abstraction from the programmers and often standardize in one default size for integers.