Is that still a belief in some circles? I feel like the shift away from this started like 15 years ago (largely because of RoR in my mind).
Anyways, this essentially launches a pg instance with a postgresql.conf that is aimed for speed (at the risk of possible data loss/corruption). DO NOT DO THIS IN PRODUCTION, but I just bake the following in local/test/CI:
fsync = off
synchronous_commit = off
Some other things I've seen / done in the name of testing with PG:- Use unlogged tables. Maybe it's a bit faster, never really seemed to make much a different.
- Drop all foreign keys. This has significant non-performance impact. On the downside, it materially changes the environment that your tests are running vs your system: test could now pass for otherwise invalid code. On the plus side, it makes setting up fake data _a lot_ easier.
- Run tests with a pool_size of 1. This catches cases where you start a transaction, but forget to use the transaction object, e.g.:
db.begin_transaction(tx ->
tx.query("update x")
db.query("delete y") // should have been tx.query
)
- A robust Factory library (NIH or OSS) is pretty probably the most important thing you can do- If you can't pre-seed all the test data, try to write _all_ your tests without ever having to truncate/delete tables between tests. This (a) avoids the slow delete/truncate and (b) lets you run tests in parallel. This means using random keys to avoid duplicates (obviously uuid is the simple answer here) and not selecting all data from a table when verifying/asserting (since who knows what's going to be in there).
We use an embedded postgres in our DB tests, and we call those 'Integration Tests' and run them separately than the pure unit tests. While still tremendously valuable, they do take a bit longer to run, and currently aren't written to allow parallel tests running.
We've had a typical habit of writing most tests that hit the DB. Since applying a bit more discipline to remove the DB requirement, we've found it's a more pleasant experience with the quicker feedback loop in place.
Your last point is a good one - tests that can run in isolation and are agnostic to the presence of other data (and ideally clean up after themselves) tend to be handy.
If your running system speaks to the DB in parallel and handles pre-existing data, why wouldn't you want your tests to do the same?
What's really nice if you run the migrations once to setup the tests then subsequent tests can be fast. Each test run within a nested transaction and rolled back at the end. This ensures that each test get a clean DB to work off of with incurring the cost of running all the migrations.
It’s also stupendously easy if you’re using pytest (though the concurrency caching is not there as IIRC xdist can’t reuse session fixtures):
* create a session fixture to initialise and yield the cache db (and clean it up afterwards) * create a regular test fixture which copies (using `createdb`) the template to a new database, and hands that off to the test
Then tests which need the db just have to request the second fixture.
This is exactly what we do, and it works really well. We essentially spin up a new tenant for each test suite. Forcing all tests to pass regardless of what other unrelated data is in the db is a great way to ensure cross-tenant data isolation.
The other underrated benefit is that you don't need a separate test database for local development - local dev instance and tests can use the same db. This means you can work on db changes in one place and see their impact on both tests and the running application.
I've had really good success with running tests like this the way django does it, where you run your entire test within a migration, and then you just rollback the migration at the end of the test.
Does that mean you can't use regular transactions in code, since Postgres doesn't support 'true' nested transactions? Or does the Django ORM automatically convert those 'inner' transactions into Postgres SAVEPOINTS?
I do this and for “heavier” services that support some form of isolation/multi-tenancy (ex. Postgres) I use per-test databases(or whatever unit of isolation).
To cap all of this off, when you build your abstractions, build in the notion of clearing the backing store (and make sure you can only run that in test environments), and lots of things become much easier to simulate.
This approach works across languages, backing services (assuming they’re not cloud only), and testing frameworks.
I wrote a now relatively old post about it[0]
[0]: https://vadosware.io/post/a-recipe-for-running-containers-in...
I want my tests to talk to a real DB, or a real MQ, or a real instance of a service, because that's what the software does in real life. Running tests themselves inside of containers with access to the other services is good practice IMHO.
So what if a connectivity issue or some esoteric SQL thing explodes some rigorous test on a specific, seemingly-unrelated business function. In fact, I want that to happen. I want to discover failure modes and anomalies sooner than later, and to simulate real-world conditions.
The idea of mocks and stubs, and puritanical testing of things in complete isolation seems counterproductive to me. Why do people bother putting effort into abstracting away the most important components of their systems?
Imagine driving a car where every component was rigorously tested in isolation but never as a whole? It would be a disaster. Sometimes it seems like this is how teams approach testing. See the recent NPM registry issue where the integration of a few microservices led to a disastrous security issue.[0]
On my own projects, I go so far as to not even clearing my test database between tests, unless it's called for. Why? It surfaces bugs that are otherwise easy to miss in testing against a pristine database.
[0]: https://github.blog/2021-11-15-githubs-commitment-to-npm-eco... (second issue, Nov. 2)
CREATE DATABASE testX TEMPLATE seed;
...which takes about 1 second per test run. The seed data and schema is baked into a docker image, and recreated whenever there are new migration files. Starting the docker image is slow but that doesn't happen on every test run.- Create a template with necessary extensions, or just install them to "template1" which is the template if you don't explicitly specify a template. Installing PostGIS, for example, takes a few seconds - which is annoying if you create the schema from scratch in test runs. (`create extension if not exists postgis` can still be around in your schema, it'll just return right away) - Create a template for the test session, based on the template you've pre-installed extensions in, and apply the schema there. - Create a database based on the second template for whatever scope makes sense for your test.
If your Postgres cluster is only serving test workloads, `fsync=off` can speed up things as well. (Which a stock postgresql.conf will point out can cause irrecoverable data loss, which I don't care about for test data)
No docker though. I just run my migration scripts against the template database. In CI it runs every time; for local testing, I drop the template db and run the migrations.
The only annoying thing is that my local pg instance fills up with test runs. The good thing is that it's really easy to go back and inspect the database after a test run. But periodically I have to run a script that drops them all.
More about using full-sized Postgres clones for migration testing and why is that important: https://postgres.ai/products/database-migration-testing
Robust, highly recommended.
Here's the same thing as a docker container you can just keep around which auto-destroys the databases after a set time: https://github.com/ClockworkConsulting/tempgres-server
We originally tried to create temporary databases locally, but found that it the fact that each dev had to do extra setup to be a bit of a pain.
A docker container is on dockerhub as 'cwconsult/tempgres:v2.1.1' (I think I messed up the last publishing and 'latest' doesn't point to that. I should really fix that tag.)
We have a couple of published clients (just for convenience, the REST interface is super-simple, so 'manual' integration is trivial):
- https://github.com/ClockworkConsulting/tempgres-client (Java)
- https://github.com/ClockworkConsulting/django-tempgres (Django)
- https://github.com/BardurArantsson/pg-harness (Haskell)
A few nice things about this one is that:
- You don't even need a local PostgreSQL
- You can keep one somewhere in your LAN and never have to worry about it again.
- It integrates trivially with GitLab CI where you just use it as a service inside the build.
- Integration super-simple... simple enough to use even in e.g. shell scripts.
- No need to worry about tear-down
Anyway, just thought I'd mention it since it's relevant to this thread.
The "best" thing we did was actually using a "template database": https://www.postgresql.org/docs/14/manage-ag-templatedbs.htm...
We would start a Postgres Process. We would create a new database, run all of our migrations and basic data bring up. Then we would create a new Database per Test Suite, using the one we just ran migrations as the Template.
This meant the initial bring up was a few seconds, but then each test suite would get a new database in a dozen milliseconds (IIRC).
All things considered, an actual database probably gives you the least gray hair, but with some careful test setup I have had good success using the savepoint/rollback trick (and it trivially supports nested fixtures as well).
We install and leverage the Zonky Embedded Postgres Binaries [2] which are normally used for test suites. They have been great for us.
It's amazing how fast and light Postgres runs for these use cases!
1 - https://steampipe.io 2 - https://github.com/zonkyio/embedded-postgres-binaries