GPT-2's largest model was 1.5B params, LLama-65B was similar to the largest GPT3 in benchmark performance but that model was expensive in the API, a number of the people would use the cheaper one(s) instead IIRC.
So this is similar to a mid tier GPT3 class model.
Basically, there's not much reason to Pooh-Pooh it. It may not perform quite as well, but I find it to be useful for the things it's useful for.