the naming is confusing... these models are aiming to equal or beat LLaMa by reproducing the trainign data and methodology that was used for LLaMa
But the actual model architecture is slightly different, based on Pythia
I guess what is needed is a pythia.cpp https://github.com/ggerganov/llama.cpp/issues/742