well, it seems like GPT is causing issues and there's only so much you can do
iirc, there is some analysis that the alignment training nerfs model capabilities, so we are likely unintentionally making the models less capable because... FUD?
> Codellama is likely a lot less powerful than GPT.
Certainly true, but is it capable enough for your task? You'd have to try and find out. There is more analysis showing that smaller models trained for a specific task outperform large, generalist model like GPT-3.5/4