I'm curious to know if these approaches are also considered a MUST for AAA game development.
Is there anyone that has worked with these approaches while developing games or know any big AAA game company that uses test first approaches as something required during the development?
Instead, AAA teams have large staffs of cheap manual testers--high school and college kids who think they're getting paid to play video games. The managers in charge of these QA teams work their manual testers to the bone, encouraging them to work late and do unpaid overtime. These testers inevitably burn out, and but there's always a new batch of kids coming in to replace them.
When manual QA is that cheap, it turns the economics of automated testing on its head. Why maintain an automated regression suite when you have humans who will do the testing for you?
There's a similar effect on the dev side. A lot of inexperienced developers are willing to take a significant pay cut to work at a AAA game company, just for the dream of working on a AAA game. The devs, too, are notoriously managed as an expendable human resource. "You'd better keep working overtime, or I'll just replace you with another kid who'll work harder than you for less money."
As a result, AAA game developers burn out pretty quickly, too, leaving the game industry for other software companies that both pay better and offer a better work/life balance.
As a result, you'll find that a lot of AAA teams are relatively inexperienced. Sure, there's a core of long-time devs on every team, technical leads who can point the way, but it's not an environment where you can get solid mentoring on the best practices of software development.
Even if one team does pick up TDD, who knows if the same team will even be there in another year or two? Good software practices are regularly "rediscovered" on AAA dev teams, and then forgotten, as the team turns over and their tribal knowledge fades away.
For example, how would you test your renderer? Do you run the pipeline and check it against a final image? No, strict TDD would require that the asserted image be hand-crafted. Do you check that each part of the pipeline works as a unit? You can't without running shaders on the GPU (requiring the previous image test).
What game developers are slowly starting to master (already far more than anyone else) is integration testing. Instead of requiring detailed repro steps, just record the testing session in extreme detail and replay it - if anything goes wrong this can be turned into an automated test case.
There are no golden hammers and that includes TDD. TDD is an amazing solution that can provide value in 99.99% of scenarios. We should be scrutinizing areas where it doesn't add value; there is interesting stuff going on there.
Example: Sam is writing an engine. He has OpenGL, DX9, and DX11 bindings. He also has a Mock that logs calls and checks to see that they match known good logs. Sam then sets up a demo to cycle through all the different graphical capabilities of his engine, this also gives Sam the ability to test his AI/scene crafting code.
Oh, that's easy.
npm install --save-dev mock-openglAll games are throwaway projects. Write one, sell for a year, gone. There is no maintenance period.
Also, he likes to chop and change every now and then, and the norm seems to be that once a company delivers a game the vast majority of the development team is made redundant. He takes the redundancy, moves to another company/game, and repeats this cycle.
One other thing to realize is that you're building a game, you're not building tech. The tech supports the game. A buggy game can be fixed, but a game that works but isn't fun is hard to mend.
A lot of the game is also in the UI, which is notoriously hard to test, let alone use TDD.
But in practice those who aren't paid a lot don't particularly care much about thorough testing. It requires a systematic approach that a kid out of college who just wants to get paid playing video games isn't going to develop
Aside from critical technical elements like netplay protocol and file parsing, the only reliable signal you'll get from automated testing is whether the code crashes or not under normal circumstances on the test platform with the tested driver and operating system patch release, most properties of the game are subjective or emergent from the code itself.
I think this is generally true although it has limits; the higher up you go in the stack, the more value the test has. Still, if it's prohibitively difficult to test higher up the stack, might as well test what you can, no?
I found from my years in at EA in the early 2000s that no matter what the technical issues, the absence of automated testing was at the very least a significant cultural issue. Engineers just didn't think about unit tests. Sounds like that hasn't changed.
And yes, we are recruiting : https://www.rare.co.uk/careers
Are you testing the rendering engine? Lighting / physics / shaders / collision detect?
Are you testing scripting? Win conditions? AI (probably not for SoT I guess)?
Are you testing the core engine? File mgmt? memory mgmt? input reading etc?
As someone who is a developer and a gamer but has never combined the two, the idea of TDD in that scenario is fascinating.
We are testing the rendering engine, lighting, shaders, etc. in a visual test framework we created ourselves that compares each run of the tests with the previous and uses SSIM to output a comparison metric. At the moment, that's still checked by someone every week but could be automated. However, when changing rendering code or shaders, we do require that a test result is shown during code review.
For physics and collision detection, we do have tests covering that too, a lot of them involving our FFT water simulation as well.
We are testing the whole flow of the game from start menu to entering a session, matchmaking, loading a map etc. I think AI also has some good coverage, notably for the navmesh which is essential.
The core engine is Unreal Engine 4. It's not an engine made for testing unfortunately, even though the vanilla version has some (but not nearly enough). We've stuck with a specific version for a while now as upgrading usually meant that we'd also get new bugs from the new version. All the core is covered by all the existing tests, we've added tests where we thought it was critical and where we fixed or improved internal code of the engine.
Before joining Rare, I really wasn't aware of any testing whatsoever in the video games industry. It's been an eye opener and I wouldn't do it any other way.
Reading other comments, we might not follow TDD all the time or to the letter but the point is that testing itself is a very part of everything we do as engineers. That allows our QA team to focus on stuff that is actually broken and can't be tested (such as if a feature is actually fun).
At our studio we don't do TDD, but we do continuous integration with a lot of asset testing.
In the early days, we tried doing unit tests, but honestly, it's very hard in a game. We had little success testing gameplay code.
What we have a huge amount of is asset tests.
We have 4 times more non-programmers than programmers working on the game, so most of what is created isn't code anyway.
Every time anyone commits anything, it goes through our continuous integration pipeline, and any asset that changed, or is dependent on an asset that changed, is loaded and tested. This includes test loading entire levels.
Test loading every asset catches most types of failures.
We try, where possible, to add specific tests for common mistakes in order to catch things as early as possible.
For example, if an animation has "walk" or "run" in the name, then we test it also contains footstep events in the animation somewhere. If a model is used as a monster we test that it has an attachment point called "frame_upper" onto which various effects can be attached. We look for things like bounding box of an object being far away from the co-ordinate origin (a common mistake when making a big maya scene with lots of exported objects in it).
We also do a test start of the servers, which includes loading all of their data, and a test start of the client.
All of this means that a build that goes through green is unlikely to cause crashes for other people trying to work on the game.
We do, of course, also have an excellent QA team.
Topical question; however, since the thread on a whole has focused on the more visible but more finniky-to-test aspects of these games. Do your same statements apply more or less to the backend/networking infra? I remember the decent ball of work you guys wrote about a while back when introducing the lockstep change, was curious what went on behind the scenes to support and validate that from an ops/maintainability perspective, if you can speak on that at all.
Testing lockstep was quite an endeavour. The approach I used was to add a very verbose log to the client and the server which logs basically every gameplay relevant decision. That includes moving a single unit, updating a stat, etc.
Then every time a line is logged, we update a hash, and then send the hash to the client. The client, being in lockstep, should be able to come up with the same hash. If the hash is different, then the client and server both break and we can then analyse the state and see what is different.
As we discovered issues, we added more and more log lines and fixed bugs until everything ran perfectly the same on the client and server.
Reason being, a high proportion of AAA game development is done using off-the-shelf game engines. Of course it's still necessary to write code for the game logic and features, but because of the stable base this code is written against, it might not be seen as necessary to follow a strict TDD/BDD approach.
I don't think this is particularly true. A high proportion of game development is, but many AAA studios are using either home-rolled engines, or started with an existing engine (Unreal or iDTech) and what they're currently running looks absolutely nothing like the original engine.
The problem is, most of the major game dev companies have their own engines. Unreal Engine is massive in AAA games, but most EA titles use Frostbite, Valve: Source, Ubisoft: Anvil, Crytek: CryEngine, Bethesda: Gamebryo, and the CoD devs use IW, which is a fork of id tech engine.
However, this doesn't mean that they aren't using off-the-shelf engines. As long as you count in-house "shelves".
> https://venturebeat.com/2014/11/04/the-talos-principle-under...
> http://www.gdcvault.com/play/1022784/Fast-Iteration-Tools-in...
Unless the AI also knows to check each possible interaction, movement, etc.
In the tech/data/network/engine areas there may be TDD/BDD if the engine is to run multiple titles beyond the current project, if not the budget may not even allow for time. Possibly other areas may have tests in the behavior/gameplay areas: components, AI etc, this depends on the current dev culture usually.
Really the main thing in game development is shipping and building fun, if tests help you do that faster/better then you do them. I think most devs developing anything that is not game specific but tech/data/network/engine related likes to have tests to rely on, time permitting.
Most studios I have worked with that do some TDD/BDD/automated testing it usually happens on the "tech" or core/engine teams rather than the "production" or "live" teams. The tech teams are working on engine/network/data while the live/production teams are using the engine/tools/scripts to make assets and gameplay. The tech team isn't as locked to game launch dates/crunches and has more time to be thorough because most of their code will last across multiple titles.
For the most part, in the asset creation/pipeline, behaviors and gameplay area, more of that is visual and QA based. There are testing companies and departments setup just for the gameplay/feature testing. So much of game development is going for fun and a good mechanic that actually playing it and testing it live is more effective than automated tests in some areas. Although there are some automated test tools in Unity[1], Unreal[2] and custom engines maybe for very rote areas of gameplay (physics engine, collectables, asset loading, very common unchanging actions etc).
Many times on the production/live side of game development, gameplay and behaviors change so much that any automated testing becomes a lag on dev/iteration time, or aren't updated and stagnate as the changes happen so quickly day to day. This is especially true in iterative development during prototyping or pre/post-production, since it is mostly visual the benefits are outweighed by the speed required to ship.
[1] https://unity3d.com/unity/qa/test-tools
[2] https://docs.unrealengine.com/latest/INT/Programming/Automat...
So you think, you if you do testing you are overall slower ? Skipping tests makes youufaster short term.
Long term, you are slow as hell because of the lack of coverage and manual effort.
Also, especially test help you with large scale changes. Because, these tell you what breaks what shouldn't have been broken.
Further, test is not the same as test.
In order to do efficiently testing you need to apply the test-pyramid.
i.e. have lots of unittest. less integration test. less system tests. less e2e tests.
another guiding principle is to focus automated testing on the things are most curicial for the success of the product, or the most difficult to do manually.
also, you need to chose the tools that support your testing stratgey. if you work against your tooling. oh boy, you have a journey ahead.
(For example unity unittesting framework was pretty crappy last time i check -- 3 years ago )
Not at all long run, but in the prototyping stage it may be a hindrance or when you are going for a fun gameplay mechanic over coverage. When you lock something down then tests make sense in this scenario. You'd always want tests for unit testing data, network, core tech, but might not need it when you are building gameplay that the test is really a user experiencing it/tuning it. Even writing some rendering or gameplay tests it takes time to get them right before they are actually useful tests.
Agree on all the other items there are many levels of testing. Usually core libraries are covered, when it comes to in-progress prototyping/pre-production, it can definitely slow you down (by adding weight to changes) if it is changing constantly and not yet locked down. Another area that weighs into it is how much budget and time that the code might be in use, if it is just a tool or something that isn't needed for live code (something prototyping or to help develop it, i.e. asset packing or the like) it might not be worth it to surround it with unit/integration testing.
[1] https://engineering.riotgames.com/news/automated-testing-lea...
Looks, like they also automatically create tickets for defects ! !
BDD in particular is tricky, because beyond the level of abstraction of the engine, you don't really know what you're building. How do you express that a thing should be fun, or weapons reloads feel punchy, in a BDD spec? How do you automatically test that?
Games code exists purely to elicit reactions in human brains. We don't have the technology yet to examine the desired state change in the target system.
Regular apps have the same problem, but with less expensive assets and less iteration. "As a user, I should feel that the login transition is slick" is a spec I can imagine that is UX-related. However, we can probably all agree on what slick is, and failing that you could user-test it to prove that the implementation is acceptable. A game is made of many complex adaptive interacting systems, so every change needs subjective validation.
On the subject of complex adaptive systems: many games feature them and so exhibit fundamentally unpredictable emergent behaviour. This can be hard to test for.
The most testing I've done in a games project is to figure what it is you're trying to build, then unit test the implementation when you've worked out if it's fun.
Long answer: The problems with the approach are manyfold. There are a few points where testing is automatable, but they don't describe entire game projects well.
* What are you testing when you add gameplay? You are testing for a whole set of design concerns across the project, not just a technical specification or quality of service metric. You cannot do it in isolation and get the results you want at the speed you need, because you need tons of feedback to discover what a complex game is currently strong or weak at. The only reasonable way to gain the necessary feedback is to allow prototyping code to drive near-term changes, manually playtest it, and then factor out the most stable parts of the resulting soup where you can. This works against test-first because you don't really know what you're testing. You throw in a feature on a hunch - "maybe this will give the experience we're looking for" - and see how it behaves hands-on. If it works, there are still usually ramifications and elaborations that didn't surface up-front. And once it does work it's hard to break out into a sandbox because the interesting part of the behavior is in the coupling to other systems and features.
* Games deal with vast quantities of mutable state strapped onto relatively straightforward data pipelines: the parts that are most testable are engine and toolchain core elements like asset builds and rendering systems. These parts do see unit tests and integration tests, although there isn't total consensus on how much testing is needed or how it should be done. They are the most like business and application software, though, since their development can be more structured around technical goals and quality metrics.
* Many bugs are ultimately data bugs. Some designer or artist set up the wrong binding of assets or made a valid yet broken combination of parameters. Games tend to fly apart very suddenly when they operate on bad data. But the code is doing exactly what it should do for that set of inputs - and if it's a failure it's at the level of process and tooling, not the runtime code.
I bet like anything through, if you go to shops that have strong direction, can work with a small team and over deliver, those standards paradigms are used.
Check out the tests in the Quake3 Source Code. It is a shining example for sure.
Even in big-spec-up-front triple-A games, there's a lot of exploratory work where you honestly don't know what you want. BDD and TDD don't help there either, rapid prototyping does. (TDD presupposes you know what you want, BDD presupposes someone does and you need to find and talk to them.)
None of which is to say that there aren't many places where test friendly design and regression tests wouldn't help. But the surface isn't the same as the average business app.
Also, last time I wrote any 3D code, it at least appeared to be incredibly hard to test that some visual thing was just so. Of course there are still things like business logic that surely could be tested, but it's just not going to be as straight forward as having some code generate some HTML string and then asserting things against it. Those last-mile things you might want to test are always a challenge IMHO.
I learned a lot of my coding chops at a Rails shop that was very in to testing, so that's definitely my perspective, YMMV.
Most of the time we had strict timelines with feature milestones. We were building mobile apps of licenced titles.
The lack of TDD was replaced with full time QA, sometimes multiple on the one game.
Their automated testing was mostly bruteforce stuff like,
* Placing a character at penalty spot and making it shoot with every possible angle and strength, making sure the ball didn't launch to space.
* They would leave a computer vs. computer game running for hours / days to make sure no kind of random crash would happen.
The main game had no tests... except that you need to realize that we had 480GB of asset compiler integration tests and 32GB of game engine integration tests.
As a heavy TDD person it took me a long time to realize that this can be very sufficient though it's a longer turn around loop. The test coverage just from being able to boot the game up is quite large. This is something that many non-game type programs have real issues with, they can't easily just "run all the user data". I know we had one project where we did a lot of TDD but it wasn't actually as good as just running the entire input set every build which I eventually switched us to. This was a non-game data processing system. Doing this + unit testing gave us great coverage and few bugs.
Generally major bugs were found within 15 minutes by another engineer, eg the next person to sync, build or run the engine or art compiler.
I do think we could have unit tested the core collections libraries in the engine to great effect and faster turn around but the build was not really segregated in a way that we would take advantage of this. It was just a monolithic build; the collections objects were heavily header included so there really would have been a build step to built them into their own test exe, run them then build the game. So I'm not sure how much benefit we would have gotten. I think in 4 years I saw one collections issue that lasted past just launching the engine and not having it crash.
I've seen growing interest in automated testing from mobile apps.
Basically guys were writing regression testing with framework like Selenium (UI testing) that goes throughout most common scenarios. (I think that was described by huuugegames.com on some presentation)
The main idea was to let QA focus on "exploratory" tests. (Testing rules of the game, design decisions or performance)
TDD & BDD can be used gamedev but not in places you will think of first.
I will simplify it vastly: programmers make tools for designers to use.
Tools are made by programmers - tools can be done with TDD and/or BDD. Games are made by designer - they use tools to make rules, systems etc.
Therefore gameplay is make mostly out of data (designer's parameters set in engine/tools) - you can't test your data with TDD/BDD.