You fundamentally misunderstand the bottleneck of large LLMs. It is not really possible to make gains that way.
A 405B LLM has 405 billion parameters. If you run it at full "prescision", each parameter takes up 2 bytes, which means you need 810GB of memory. If it does not fit in RAM or GPU memory it will swap to disc and be unusably slow.
You can run the model at reduced prescision to save memory, called quantisation, but this will degrade the quality of the response. The exact amount of degradation depends on the task, the specific model and its size. Larger models seem to suffer slightly less. 1 byte per parameter is pretty much as good as full precision. 4 bits per parameter is still good quality, 3 bits is noticeably worse and 2 bits is often bad to unusable.
With 128GB of RAM, zero overhead and a 405B model, you would have to quantize to about 2.5 bits, which would noticeably degrade the response quality.
There is also model pruning, which removes parameters completely, but this is much more experimental than quantisation, also degrades response quality, and I have not seen it used that widely.