But on a more serious note, writing automated tests for game engines involves a lot more than just "duh, unit tests" (especially when testability wasn't a concern in the original design).
For sure OpenGL/DX requires more infrastructure to run unit tests than a generic block of C code. But it's absolutely possible to "unit test" shader code, with buffer read-back and/or vertex stream out, among other options. It's more the game engines themselves that aren't setup for unit tests rather than the graphics stack
Which is what I said, you can screenshot & compare. But it becomes a fuzzy compare due to acceptable precision differences.
And it ends up being more of an integration test and not a unit test.
> Every language has valid-per-spec differences.
They really don't, but that's not entirely what I'm talking about. I'm talking about valid hardware behavior differences, which doesn't exist broadly. How a float performs in Java is well-defined and never changes. How numbers perform in most languages is well-defined and does not vary.
GPU shaders are completely different. Numbers do not have consistent behavior across differing hardware & drivers. This is a highly unique situation. Even in languages where things are claimed to be variable (like the size of int in C & C++), end up not actually varying, because things don't cope well with it. Shaders don't play any such similar games.
About the only thing you can do is take before/after screenshots and compute a signal-to-noise ration on a diff between the images. Which makes for an extremely fragile test definition. What if you change the default FOV of the camera? Now all your tests fail for no good reason.
Which is completely fine, because you probably wouldn't want to accidentally change the FOV, would you?
High confidence tests fail on unexpected results. If only some aspects of the results are checked, the tests have obvious blind spots.
A lot of small and low level stuff can be unit tested but during production things like writing good tests falls through the cracks.
There was one bug where a character has a deployable ability that doubles the damage and healing of all projectiles that pass through it. One day, the patch notes read "fixed an issue where healing was not amplified when passing through the amplification matrix". And, I totally get it... every conference talk I've seen out of Blizzard goes into details about all the infrastructure they've made for play testing their games. It sounds easy to get your coworkers into a build of your latest PR and try it out. But things like these subtle numbers adjustments just don't translate well to play testing -- sometimes the enemy is doing so much damage that you can't really be sure that the problem is the Amp Matrix isn't multiplying the healing by the right number. So, from time to time, refactors break it!
But, in a world where you could easily write integration tests, this problem would never happen. You'd write a simple scenario like "create empty room. place baptiste at position 0,0. deploy amp matrix at position 10,0 with orientation 90 degress. place sombra at position 20,0. set her health to 80. make baptiste fire a healing grenade along vector 1,0 at an angle of 45 degrees. wait 10 ticks. ensure that sombra's health is now 200." The framework to be able to write tests like this is not difficult (you can do it in their "workshop"), and it's not difficult to write a test like this for every ability, and even every combination of abilities. And, it would mean that play testers never ever need to be suspicious of numbers; the automated tests already check that. You'd make developers more productive (the computer can check the basics like this), and play testers more productive (they don't need to test simple stuff anymore). But... I don't think they do it. The buggiest releases are when the team is under time pressure to hit a deadline (Overwatch has seasonal events; the patch that introduces a seasonal event always has some weird bugs), and I don't think automated tests miss things under time pressure -- but humans sure do.
The one thing I'll give Blizzard credit for is that their games are fun. All that playtesting is certainly a good idea. I'd supplement it with some gameplay-focused integration tests, though. They have the money and the tools teams, and their games last longer than a few months, so it just seems like a smart investment to me. So it just baffles me what bugs ship to production.