I'm not going off pure feelings either. I have benchmarks in place comparing pipeline outputs to ground truth. But like I said, it's comparable enough to 4, at a much lower price, making it a great model.
Edit: After the outage, the outputs are better wtf. Nvm it has some variance even at temp = 0. I should use a fixed seed.