> Use real instances of managed dependencies in tests.
Is the author suggesting I should spin up an entire instance of Oracle to run a single 50ms unit test? Because I often want to run my unit tests very frequently as I develop, ensuring I'm still green, etc. And my code may depend on an large, complex database. I could have a continually running instance just for my unit tests, but that's expensive (time, money, hosts, etc) and means I can't run my unit tests if the wifi goes down.
If we're talking about large, complete systems running very large suites of system tests as part of a CI/CD pipeline, then sure, I'm on board. Let's use a real database. But there's lots of other small, simple tests that absolutely don't need anything more than a mock while I work on business logic bugs.
And no, I'd not like to get into a debate of "you shouldn't bother to write small unit tests". I find them very useful.
Also, most unit tests don't need a db, it's more useful for integration tests.
What I don't understand is, a stub to me is almost the same level of coupling as a mock. Your test setup needs to know about the dependency on what you're stubbing. So when you want to change the interface of the dependency or use a different one, or remove the dependency, your tests have to change.
It's not like you can add a query to an operation that didn't have one previously and the result will magically appear without connecting to a real database. So you have to modify the tests to provide the stub. Seems like a logical contradiction to me.
They are still necessary, then move up the stack and you get into the end-to-end tests of a fully running system.
The reason isn’t only speed but also signal to noise ratio. The further up the testing pyramid you go, the less clear it becomes where errors were introduced.
* The more unrealistic the tests generally become (larger % of false positives - tests that fail when they shouldn't - and false negatives - unit tests that simply don't catch bugs at all).
* The less reusable the test infrastructure becomes. Stubbing/mocking individual method calls to the database is an ongoing cost of development whereas building scripts to start the database and shut it down is an investment cost that pays dividends.
On the whole I think the pyramid idea enforces a wrong view that there is a "right" mix of "test levels" across any project. The best mix is determined by the kind of bugs and code you have (integration vs logical) and the kinds of abstractions you need or already have (in general, the worse your abstractions, the higher level you need your tests to be).
A lot of projects are best done with 100% integration tests while others can be done with 100% unit (especially small, self contained, simple-to-interact-with code bases that are 99% about calculations/logical decision making).
I disagree. A failing unit test doesn't even necessarily indicate that an error was introduced: if a unit doesn't do what it's supposed to but the user doesn't see that, then an error wasn't introduced. Sure, when a unit test fails there's often an error, but if an end-to-end test fails, there's always an error--E2E tests are testing from the user's perspective, so what they're testing is actually errors. (This is assuming that both the unit tests and the E2E tests are correctly written).
You're positioning unit tests as a debugging tool, but I'd argue that there are much better debugging tools: REPLs and debuggers give you a lot more information than a unit test, and allow you to ask new questions quickly.
I don't want to come across as being anti-unit tests. On the contrary, I think unit tests are highly valuable. But I don't think the value comes from gathering information, debugging, or even catching bugs (in most cases). I think the value comes from a few things:
1. TDD forces you to design units for reuse from the start. Immediately you're using the code in two contexts: the application and the unit test. So right away your code is inherently reusable (in a binary sense) because de-facto you've re-used it. Reusability is more complicated than that (it's really more of a spectrum than a binary) but having at least two uses from the beginning pushes you toward the reusable side of the spectrum.
2. Unit tests act as living documentation for units. It is often unclear by reading code what the code does, because production concerns such as performance and security can lead you to do things in seemingly complicated ways. But unit tests don't have these concerns (at least not in the same way) so you can write code in unit tests that clearly communicates what a unit does. And unit tests don't fall out of sync with code like plain text documentation does.
3. TDD is incredibly motivating. Moving red->green on a quick cycle takes advantage of the dopamine reward system to increase productivity.
Unit tests on code with dependencies (whether injected so that they can be mocked, or directly referenced so that they end up more like mini integration tests) are less excellent. They're brittle, inhibit refactoring, and either don't test as much as you think they do (if mocking dependencies) or are slow (if not mocking).
The further up the testing pyramid you go, the less work it is to refactor things, because you don't need to rewrite as many tests. OTOH test are more complex to write and take longer to run.
And now I get to my point: I don't think the blanket statement of "your largest set of test should be ... unit testst that run quickly" is well-founded. There are trade-offs, and they shouldn't be trivially waved aside.
I built some framework code so that integration tests as easy to write as unit tests. Almost all of my tests are "integration tests". I never mock the database. My test harness clones a template database at the start of each test; that database is maintained with migrations just like every other database. I run against test accounts at all the 3rd party services I use. It's not super fast - a full CI run takes about 15 minutes. And occasionally a 3rd party service will cause a test to flake. But it's fast enough and most importantly it's thorough.
I still have some unit tests that run without the expensive setup harness, but they're for components that have a lot of algorithmic complexity. Anything that touches the database gets a real instance.
This works pretty well with Postgres, which is free and I can run locally and has a fast db clone operation. YMMV with other databases.
That way you can unit test without mocks and without heavy real dependencies either, and leave that for integration tests.
So what we do where I work is for unit tests, we mock services and repositories, unit tests don't go anywhere near the database.
For integration tests, we use an in memory database.
BUT! Be careful before you embark down the path of running integration tests against a different database than the one you use when running the application. There are SO many pitfalls, nasty bugs, and other warts along that road. EntityFramework alone has quite some weirdness there. Expect these kinds of integration tests to cost a lot of developer effort to build and maintain. For us, it took months of effort to get these kinds of tests working usefully.
Personally after working on applications that have a solid test pyramid, I would recommend:
* Write unit tests as per the standard advice (bottom of pyramid kind of quantity), but try and keep it sane (don't go for 90% coverage just for the sake of hitting an arbitrary number; don't pointlessly unit test your framework/libraries/other dependencies)
* Write some integration tests where they really add value (interaction between 2 or a few complicated components in your system, for example places where the state of a component changes a lot depending on input from another component). Make sure when you start out writing an integration test that it doesn't turn into an "almost" end-to-end test along the way. They have a habit of doing this and it can really cost you later. Integration tests should still be focused.
* Write end-to-end tests that test as much of your system as possible, including a database (preferably the same vendor; one approach is to truncate all tables before each run). IME it is very good here to have one e2e test that covers a big chunk of functionality in one run, than lots of separate tests covering different things. Why? Because no matter how "different" the things being tested by the e2e test there will still be LOTS of overlap by its very nature, and changing anything where that overlap is will tend to break ALL your e2e tests. Not a fun workflow.
One last comment on E2E tests. This is more opinionated. But try and limit E2E tests to things that are really business critical (e.g. your "sign up" process and your "renew subscription" process, but not every single form in your dashboard). This is just a cost-benefit thing. E2E tests help, but they also slow you down, and the more you have, the more slowly you will be able to change your software. Sometimes minor bugs slipping through is an acceptable cost if it means you release 1 week sooner.
I think this can work for running the tests locally when speed is what you want, but you should still have the tests run on the actual database on commits. It requires keeping a database running just for tests to run (it can be a smaller instance and startup/shutdown on demand), but it will save so much pain later.
Yes, there were knock-on performance questions that wanted answering - how do you avoid the database setup costs for those tests that don't care about it, for instance, the "fast Rails test" movement was a big thing - but by and large those were solved problems by the time I stopped writing Rails code professionally around the 5.1 era.
The answer is, of course, no, you don't spin up an entire instance of Oracle to run a single unit test. You run against a local instance, and you use whatever tricks your-RDBMS-of-choice gives you to make resetting the test tables to a known-good state extremely fast. That way you can have your tests running continuously, giving you fast feedback as you develop, only suffering db overheads when you actually need to run tests that hit it.
If your chosen stack makes it difficult to do this, it's worth asking why.
I know this gets horrbily pedantic but its easier to see in code
def difficult_function(dbconn, a,b,c): dbconn.execute(Select * from tbl) <comlicated stuff invlvoing results set and a,b,c>
I would not want to mock the dbase at this point. Please can we instead do this
def difficult_function(dbconn, a,b,c): resultset_as_dict = dbconn.execute(select * from tbl) insider_function(a,b,c,results_set_as_dict)
def insider_function(a,b,c,results_set_as_dict): This can now be tested without mocks quite easily.
I think if you are doing 'difficult' stuff with a exernal database you are de facto, writing integration tests.
In fact i would say anything involving a database MUST be treated as a integration test. If it takes ten minutes thats fine - its an integration test.
If you want fast and external connections, use sqllite as part of your testing CI suite. But dont moan.
and do not use Mocks.
I'm sort of doing that at the moment, with postgres and series of tests. But it is still useful for unit testing too: pull up the right CREATE tables, INSERT your test data, execute your tests, then drop tables (lots of safeguards here), and repeat. The container loads up in 1.5 sec, and all my tests (~100) are done in 10 seconds.
It's been great in my use case. I'm on postgres, Java, and using testcontainers. They have handy containers available, not limited to RDBMS and not even limited to databases; and here is Oracle Express edition, which should be enough for most tests:
The utility of a unit test suite is roughly proportional to the amount of in-process computation you do. “Glue” systems which mostly transform from one protocol to another usually need more integration systems than unit, making skipping unit tests not disastrous.
For instance, if the implementation involves calling up a bunch of value objects from a database, each of which do the same, and all of the code is inhouse so there is no standard mock or stub libraries available, adding unit tests is tantamount to rewriting the whole system without tests.
Existing codebases can also be too complicated for a few people to formalise into unit tests. The algorithm itself might be simple, but again, to discover that, you need to rewrite the whole system without tests. (You can add tests, of course, but when you're adding tests, you're making an assertion you cannot prove. Since you don't know what the code is meant to do, you don't know whether the tests are complete or even accurate.)
Once you're coding in a world where tests passing or tests failing has almost no predictive value on success or failure in release, you're working in a self-fulfilling prophecy where the code will never be tested because the tests literally make things worse.
Many codebases clearly do not have any automated tests at all, but unit tests can be the hardest to add onto a system after the fact.
With regards to TDD specifically. It isn't for everyone and many consider it to be a bit of a cult. While I don't consider it a cult, It doesn't work with how I personally solve problems. I normally for example get something extremely rough working and then iterate until I consider it to be perfect and then write my tests to define how it should behave.
In all seriousness, their numbers seem to be diminishing. But you certainly can't expect the Real Programmers to write test code. Why should a programmer write test code when they know they didn't write any bugs?
[1]: http://catb.org/~esr/jargon/html/R/Real-Programmer.html - I'd say as time goes on, this term is relative.
Our production database is postgres.
We kept bumping against things we wanted to do in the application code that worked well with postgres, but would fail in sqlite. We limited our development so that we could keep running the tests. We knew that we could run them against a postgres db, but the development time to rewrite our CI test runner to spin up a fresh database was not worth it.
Recently we migrated from github + Jenkins to gitlab + our own gitlab-runners in AWS. During that switch, we prioritized testing against a Postgres DB and got it all running in a containerized way that spins up at fresh DB for every test run. The tests are slower but the runners scale horizontally so we don't mind queueing up as many merge requests as we need to and deployments through Gitlab environments are a big improvement over our Jenkins deployment job, so we still ship as often as we want.
Now our biggest testing issue is keeping fixtures up to date.
It's quite fast. I run almost all of my tests this way.
I agree, forcing your app into the lowest common denominator of portable SQL is crippling. JSONB columns in particular are extremely useful in Postgres.
Right now we are pulling a postgres:10 container down from our ECR on every run :laughcry: so definitely some low hanging fruit around that.
I think we will rebuild the postgres container up to our most recent migrations in prod branch then bake that container onto the gitlab runner AMI daily. Then the test runner can just start that and apply any migrations in the merge request and proceed with the tests.
But if the difference is more substantial than those factors would suggest, I'd be interested to see if we can do something about it from the PG side.
Every benefit has a cost. The benefit doesn't always justify the cost.
edit: To add to that - I've seen more than a couple major database engine migrations in my day, so it's not to say that that isn't a concern. But none of them has ever been from one SQL RDBMS to another. More common is migrating among different classes of database. MySQL to Mongo, Oracle to BigTable, Couch to Cassandra, something like that. MS Access to MS SQL Server a couple times, but even those are different enough that it was never going to be as simple as changing the connection string and having a carefree life.
The speculative future proofing that you do almost never manages to work for the future you end up actually living.
Locally we have a flag to keep the test db alive between runs which speeds the tests up and can help with debugging.
The slowest part of the test run in the CI is building the application container and pulling down the postgres container. I'm sure there are improvements to how we are handling this but it isn't enough of an issue to prioritize it now.
Our issue with fixtures has more to do with changing application code and not having a great way to generate/regenerate the fixtures from live data. We've tried a few different libraries to do this but haven't found any that we love.
Anyway,
A story I've seen all too often when mocking the database: A large development effort goes into creating test infrastructure. And then there end up being scads of bugs that weren't caught by the unit tests, because the test doubles for the database don't accurately emulate important behaviors and semantics of the real database.
This isn't just a problem with mocking, mind - it's also a problem I've seen (albeit less often) when using some other DBMS during testing because it's designed to operate in-memory.
Nowadays it's not too hard to configure a RAM disk for the DBMS to use. Especially if your test stack runs it in Docker. If you're having performance problems with your test suite, start there. You might never achieve the same run times as you could with mocking, but, if there's one thing they hammered on in my Six Sigma training that I wholeheartedly agree with, it's that you shouldn't sacrifice quality or correctness for the sake of speed.
It's also not too difficult (not any more difficult than going hog-wild with mocks, anyway) to set up a mechanism that uses transactions or cleanup scripts or similar to ensure test isolation, so I don't find that complaint to be particularly compelling. You can even parallelize your tests if you can set things up so that each test is isolated to its own database or schema.
You write a comprehensive test of integration tests against the real object.
You then set it up so those same tests can run both the in-memory and real version.
Any differences should show up for the interactions you have specified, and tests should fail.
You can write sociable unit tests against the in-memory version, knowing it matches the same behaviour as closely as you have specified in the tests.
That said, discrepancies can still sink in, so I think it's best to also have some baseline level of tests that run against the real dependency, even if they're typically only run overnight on the CI server.
Which is fine! Unit tests will never catch all the bugs - neither will type safety. Neither will code reviews. Neither will manual testing. But unit tests do catch the kinds of bugs that unit tests are good at catching, which eases the burden on the manual testers.
Commenter implies that bugs occurred in code that was assumed to be tested and correct (according to the test specs) because it did have tests. Which is decidedly Not Fine.
Now, would I consider them to be “unit tests” in this case? Probably not. But the label you decide to slap on the test doesn’t change the fact that a spec was written and code was tested against it and passed (falsely) due to mocking the db.
If that's impossible (i.e. you get charged for the backends or you are controlling physical objects), then generalize the program to support alternative backends and frequently test only those that can work in the test environment, using some more ad-hoc methodologies for the others.
Also, in general, if you need to change your tests for valid changes in the implementation, then your approach to testing is completely broken. An example are dumb testing strategies where you check that the code produces specific SQL queries instead of checking that the code returns correct results.
I don't see how this relates to mocking at all. You can write good or bad tests this way regardless of mocking.
Interesting. I said this at my work recently and I got a condescending explanation about how production things are production, we don't touch them. If we need stuff for development, those are dev things.
I now think that whether this is or isn't a good idea depends on specifics. Most often than not, I think it makes sense.
It's also possible to test against live production instances and data itself (mostly useful for performance optimization work and testing), although that's more of an ad-hoc process and some kinds of tests are not possible because the actual data there is arbitrary. Also, those tests, if not used for development, might be better expressed as self-checks done on system start and monitoring systems.
The further you steer dev environments from production, the more you'll have these kinds of issues.
Would you mind expanding on their explanation? The way you described doesn't sound like an argument at all.
uBlock Origin, if the default filter lists don't catch it just right-click and block the element.
This also makes it infinitely easier to coordinate complex schema changes. Each developer can sort it all out on their local branch before we even know about it. If we had to share some common test database, this would become a much more painful process.
Also, I do not believe in using mock layers for the database interactions. Our service implementations are tightly-coupled with their backing datastore. This is the only way we are able to make SQLite a viable storage medium for a high-throughput business application. As a consequence, testing our services in absence of their concrete datastores would be an extremely disingenuous endeavor for us.
It makes everything harder: caching, testing, abstracting, balancing, etc.
On another note, a few hours/days spent implementing application code can save you 5 minutes of writing SQL.
One trick I use is that I write a in-memory version. I use that in unit tests.
I then write integration tests, that check behaviour of the InMemory and RealVersion are exactly the same. I inject either version into the same tests. They also check I haven't broken any code in the RealVersion which isn't been covered by unit tests mostly because its just external interaction in there.
If you have to verify inter-service interactions, use contract tests.
Unless you count the time taken to retrieve the data as a side effect. If you are implementing a cache with a well specified behavior then you might want to test incoming interactions.
Anyway, this just shows that having a side effect can be a matter of perspective.
All you have is a shared place, mocking is essentially putting up a fence around it to prevent testing into that boundary
If your database is immutable and is indexed for time travel, you can rewind or fast-forward your database into your desired state, if your database supports speculative writes you can even build up a non-committable state
I'm sure Rich Hickey will have a talk on this in relation to databases