Perplexity is a rough metric for sure, but the non-linear dependency is also directly observable. I would definitely agree with 7b and 13b giving better results with 8-bit, but the difference with 30b is much more subtle.
It should also be noted that the method of quantization makes a big difference. In particular, if you were experimenting with llama.cpp, their original take on it was considerably inferior to GPTQ. And for the latter, parameters such as group size can also make a difference.