In August 2023, llama2 34B was released and at that time, without employing model quantization, in order to fit this model one needed to have a GPU, or set of GPUs, with total of ~34x2.5=85G of VRAM.
That said, can you be more specific what are those "algorithmic" and "hardware" improvements that has driven this cost and hardware requirements down? AFAIK I still need the same hardware to run this very same model.