Ephemeral Postgres Databases (opens in new tab)

(eradman.com)

88 pointssac21714y ago36 comments

36 comments

32 comments · 12 top-level

latch4y ago· 7 in thread

> For some time proponents of unit testing have asserted that unit tests should not touch a real database

Is that still a belief in some circles? I feel like the shift away from this started like 15 years ago (largely because of RoR in my mind).

Anyways, this essentially launches a pg instance with a postgresql.conf that is aimed for speed (at the risk of possible data loss/corruption). DO NOT DO THIS IN PRODUCTION, but I just bake the following in local/test/CI:

     fsync = off
     synchronous_commit = off

Some other things I've seen / done in the name of testing with PG:

- Use unlogged tables. Maybe it's a bit faster, never really seemed to make much a different.

- Drop all foreign keys. This has significant non-performance impact. On the downside, it materially changes the environment that your tests are running vs your system: test could now pass for otherwise invalid code. On the plus side, it makes setting up fake data _a lot_ easier.

- Run tests with a pool_size of 1. This catches cases where you start a transaction, but forget to use the transaction object, e.g.:

    db.begin_transaction(tx ->
       tx.query("update x")
       db.query("delete y") // should have been tx.query
    )

- A robust Factory library (NIH or OSS) is pretty probably the most important thing you can do

- If you can't pre-seed all the test data, try to write _all_ your tests without ever having to truncate/delete tables between tests. This (a) avoids the slow delete/truncate and (b) lets you run tests in parallel. This means using random keys to avoid duplicates (obviously uuid is the simple answer here) and not selecting all data from a table when verifying/asserting (since who knows what's going to be in there).

switchbak4y ago

For real _unit_ tests? I would argue it's still a good distinction.

We use an embedded postgres in our DB tests, and we call those 'Integration Tests' and run them separately than the pure unit tests. While still tremendously valuable, they do take a bit longer to run, and currently aren't written to allow parallel tests running.

We've had a typical habit of writing most tests that hit the DB. Since applying a bit more discipline to remove the DB requirement, we've found it's a more pleasant experience with the quicker feedback loop in place.

Your last point is a good one - tests that can run in isolation and are agnostic to the presence of other data (and ideally clean up after themselves) tend to be handy.

jschrf4y ago

Aside from testing framework concurrency limitations, why wouldn't you allow parallel tests?

If your running system speaks to the DB in parallel and handles pre-existing data, why wouldn't you want your tests to do the same?

1 more reply

vinayan34y ago

Similarly I've called tests with a PG DB in a docker container an integration test cause they take longer to run.

What's really nice if you run the migrations once to setup the tests then subsequent tests can be fast. Each test run within a nested transaction and rolled back at the end. This ensures that each test get a clean DB to work off of with incurring the cost of running all the migrations.

masklinn4y ago

Also if you have to create new databases for each test (because, say, the data model will not allow reusing it, or you want to run tests concurrently and not all the tests are transactional) creating one as cached template then initialising test-local dbs with `createdb -T <cache>` is extremely effective.

It’s also stupendously easy if you’re using pytest (though the concurrency caching is not there as IIRC xdist can’t reuse session fixtures):

* create a session fixture to initialise and yield the cache db (and clean it up afterwards) * create a regular test fixture which copies (using `createdb`) the template to a new database, and hands that off to the test

Then tests which need the db just have to request the second fixture.

gregplaysguitar4y ago

> write _all_ your tests without ever having to truncate/delete tables between tests

This is exactly what we do, and it works really well. We essentially spin up a new tenant for each test suite. Forcing all tests to pass regardless of what other unrelated data is in the db is a great way to ensure cross-tenant data isolation.

The other underrated benefit is that you don't need a separate test database for local development - local dev instance and tests can use the same db. This means you can work on db changes in one place and see their impact on both tests and the running application.

geewee4y ago

> try to write _all_ your tests without ever having to truncate/delete tables between tests.

I've had really good success with running tests like this the way django does it, where you run your entire test within a migration, and then you just rollback the migration at the end of the test.

piaste4y ago

I assume that means running everything within a transaction?

Does that mean you can't use regular transactions in code, since Postgres doesn't support 'true' nested transactions? Or does the Django ORM automatically convert those 'inner' transactions into Postgres SAVEPOINTS?

1 more reply

mborch4y ago· 6 in thread

Another trick is to use (possibly nested) savepoints and rollback after each test, never actually committing any data.

pquerna4y ago

At a $previous_job I basically also did what the post is describing.

The "best" thing we did was actually using a "template database": https://www.postgresql.org/docs/14/manage-ag-templatedbs.htm...

We would start a Postgres Process. We would create a new database, run all of our migrations and basic data bring up. Then we would create a new Database per Test Suite, using the one we just ran migrations as the Template.

This meant the initial bring up was a few seconds, but then each test suite would get a new database in a dozen milliseconds (IIRC).

mborch4y ago

That's probably more or less the same thing in terms of what actually happens in PostgreSQL.

All things considered, an actual database probably gives you the least gray hair, but with some careful test setup I have had good success using the savepoint/rollback trick (and it trivially supports nested fixtures as well).

1 more reply

ccakes4y ago

The catch with this is you can't test any behaviour with deferred constraints. Any activity that doesn't take place until the commit will never run.

mborch4y ago

Should be possible with "SET CONSTRAINTS".

Foobar85684y ago

Sql server has snapshot database for something similar, and you don't have to play with transactions (that may or not breaks everything in case of errors)

luhn4y ago

If there's an error, the transaction just rolls back. I've been using this strategy for years and I have never once had a transaction "break everything."

1 more reply

tln4y ago· 2 in thread

My test suite has the schema and seed data in a template database then runs

    CREATE DATABASE testX TEMPLATE seed;

...which takes about 1 second per test run. The seed data and schema is baked into a docker image, and recreated whenever there are new migration files. Starting the docker image is slow but that doesn't happen on every test run.

brasetvik4y ago

That's a pretty reasonable approach, and similar in spirit to what I find to work well:

- Create a template with necessary extensions, or just install them to "template1" which is the template if you don't explicitly specify a template. Installing PostGIS, for example, takes a few seconds - which is annoying if you create the schema from scratch in test runs. (`create extension if not exists postgis` can still be around in your schema, it'll just return right away) - Create a template for the test session, based on the template you've pre-installed extensions in, and apply the schema there. - Create a database based on the second template for whatever scope makes sense for your test.

If your Postgres cluster is only serving test workloads, `fsync=off` can speed up things as well. (Which a stock postgresql.conf will point out can cause irrecoverable data loss, which I don't care about for test data)

stickfigure4y ago

I do the same thing, though it's quite a bit faster than 1s (perhaps my schema is smaller).

No docker though. I just run my migration scripts against the template database. In CI it runs every time; for local testing, I drop the template db and run the migrations.

The only annoying thing is that my local pg instance fills up with test runs. The good thing is that it's really easy to go back and inspect the database after a test run. But periodically I have to run a script that drops them all.

rubenv4y ago· 2 in thread

Or in Go, with full databases: https://github.com/rubenv/pgtest/

rad_gruchalski4y ago

And the same but with Docker: https://gruchalski.com/posts/2021-04-24-on-software-testing-....

numlock864y ago

I use the same approach but with other languages and runtimes like Java, .NETcore/C# or NodeJS, as this is basically just a CLI wrapper for pg anyway. Works very well and doesn't come with all the overhead.

hardwaresofton4y ago· 1 in thread

This is one of the areas that docker fits perfectly. Run your application with software that runs in docker containers (pg, redis, minio), and run them all for real when you test flows that require them.

I do this and for “heavier” services that support some form of isolation/multi-tenancy (ex. Postgres) I use per-test databases(or whatever unit of isolation).

To cap all of this off, when you build your abstractions, build in the notion of clearing the backing store (and make sure you can only run that in test environments), and lots of things become much easier to simulate.

This approach works across languages, backing services (assuming they’re not cloud only), and testing frameworks.

I wrote a now relatively old post about it[0]

[0]: https://vadosware.io/post/a-recipe-for-running-containers-in...

jschrf4y ago

Totally! Docker Compose is so useful for this.

I want my tests to talk to a real DB, or a real MQ, or a real instance of a service, because that's what the software does in real life. Running tests themselves inside of containers with access to the other services is good practice IMHO.

So what if a connectivity issue or some esoteric SQL thing explodes some rigorous test on a specific, seemingly-unrelated business function. In fact, I want that to happen. I want to discover failure modes and anomalies sooner than later, and to simulate real-world conditions.

The idea of mocks and stubs, and puritanical testing of things in complete isolation seems counterproductive to me. Why do people bother putting effort into abstracting away the most important components of their systems?

Imagine driving a car where every component was rigorously tested in isolation but never as a whole? It would be a disaster. Sometimes it seems like this is how teams approach testing. See the recent NPM registry issue where the integration of a few microservices led to a disastrous security issue.[0]

On my own projects, I go so far as to not even clearing my test database between tests, unless it's called for. Why? It surfaces bugs that are otherwise easy to miss in testing against a pristine database.

[0]: https://github.blog/2021-11-15-githubs-commitment-to-npm-eco... (second issue, Nov. 2)

stansler4y ago· 1 in thread

Another approach is to use Database Lab (https://gitlab.com/postgres-ai/database-lab). Our tool allows deploying disposable Postgres databases in seconds using REST API, CLI, or GUI. The difference is that we provide clones with snapshots of full-sized data. For example, you may have dozens of clones of your production databases (with masked sensitive data) to use as staging servers, database migration verification, tests, and all that provisioned in a couple of seconds.

More about using full-sized Postgres clones for migration testing and why is that important: https://postgres.ai/products/database-migration-testing

Kinrany4y ago

This sounds like a different problem, not a different approach: "in seconds" is only good for integration tests, and the topic is about (sub-second) unit tests.

zz8654y ago· 1 in thread

https://www.testcontainers.org/ has docker containers for this (for Java only?).

derekperkins4y ago

There's a Go port that we're pretty happy with.

kardianos4y ago

I also do something like this, but with SQL Server. Highly recommended. Except rather then going from a seed or backup, the schema and default data is defined in the application, then the data is entered in the test setup, the tables are sorted topographically, and inserted all at once, then the unit test runs (can be combined with a table driven test).

Robust, highly recommended.

cwcba4y ago

(Shameless plug:)

Here's the same thing as a docker container you can just keep around which auto-destroys the databases after a set time: https://github.com/ClockworkConsulting/tempgres-server

We originally tried to create temporary databases locally, but found that it the fact that each dev had to do extra setup to be a bit of a pain.

A docker container is on dockerhub as 'cwconsult/tempgres:v2.1.1' (I think I messed up the last publishing and 'latest' doesn't point to that. I should really fix that tag.)

We have a couple of published clients (just for convenience, the REST interface is super-simple, so 'manual' integration is trivial):

- https://github.com/ClockworkConsulting/tempgres-client (Java)

- https://github.com/ClockworkConsulting/django-tempgres (Django)

- https://github.com/BardurArantsson/pg-harness (Haskell)

A few nice things about this one is that:

- You don't even need a local PostgreSQL

- You can keep one somewhere in your LAN and never have to worry about it again.

- It integrates trivially with GitLab CI where you just use it as a service inside the build.

- Integration super-simple... simple enough to use even in e.g. shell scripts.

- No need to worry about tear-down

Anyway, just thought I'd mention it since it's relevant to this thread.

koeng4y ago

I use https://github.com/ory/dockertest for my Golang services that need testing. One thing that is nice is I can easily switch out/test both postgresql and cockroachdb.

nonane4y ago

We use ephemeralpg for unit tests: https://github.com/eradman/ephemeralpg

1 more reply

nathanwallace4y ago

Steampipe [1] is an open source CLI to query cloud APIs (e.g. AWS, GitHub, Kubernetes) with SQL. It uses an embedded Postgres instance internally and is built with Foreign Data Wrappers.

We install and leverage the Zonky Embedded Postgres Binaries [2] which are normally used for test suites. They have been great for us.

It's amazing how fast and light Postgres runs for these use cases!

1 - https://steampipe.io 2 - https://github.com/zonkyio/embedded-postgres-binaries

j / k navigate · click thread line to collapse

36 comments

32 comments · 12 top-level

latch4y ago· 7 in thread

> For some time proponents of unit testing have asserted that unit tests should not touch a real database

Is that still a belief in some circles? I feel like the shift away from this started like 15 years ago (largely because of RoR in my mind).

     fsync = off
     synchronous_commit = off

Some other things I've seen / done in the name of testing with PG:

- Use unlogged tables. Maybe it's a bit faster, never really seemed to make much a different.

- Run tests with a pool_size of 1. This catches cases where you start a transaction, but forget to use the transaction object, e.g.:

    db.begin_transaction(tx ->
       tx.query("update x")
       db.query("delete y") // should have been tx.query
    )

- A robust Factory library (NIH or OSS) is pretty probably the most important thing you can do

switchbak4y ago

For real _unit_ tests? I would argue it's still a good distinction.

Your last point is a good one - tests that can run in isolation and are agnostic to the presence of other data (and ideally clean up after themselves) tend to be handy.

jschrf4y ago

Aside from testing framework concurrency limitations, why wouldn't you allow parallel tests?

If your running system speaks to the DB in parallel and handles pre-existing data, why wouldn't you want your tests to do the same?

1 more reply

vinayan34y ago

Similarly I've called tests with a PG DB in a docker container an integration test cause they take longer to run.

masklinn4y ago

It’s also stupendously easy if you’re using pytest (though the concurrency caching is not there as IIRC xdist can’t reuse session fixtures):

Then tests which need the db just have to request the second fixture.

gregplaysguitar4y ago

> write _all_ your tests without ever having to truncate/delete tables between tests

geewee4y ago

> try to write _all_ your tests without ever having to truncate/delete tables between tests.

I've had really good success with running tests like this the way django does it, where you run your entire test within a migration, and then you just rollback the migration at the end of the test.

piaste4y ago

I assume that means running everything within a transaction?

1 more reply

mborch4y ago· 6 in thread

Another trick is to use (possibly nested) savepoints and rollback after each test, never actually committing any data.

pquerna4y ago

At a $previous_job I basically also did what the post is describing.

The "best" thing we did was actually using a "template database": https://www.postgresql.org/docs/14/manage-ag-templatedbs.htm...

This meant the initial bring up was a few seconds, but then each test suite would get a new database in a dozen milliseconds (IIRC).

mborch4y ago

That's probably more or less the same thing in terms of what actually happens in PostgreSQL.

1 more reply

ccakes4y ago

The catch with this is you can't test any behaviour with deferred constraints. Any activity that doesn't take place until the commit will never run.

mborch4y ago

Should be possible with "SET CONSTRAINTS".

Foobar85684y ago

Sql server has snapshot database for something similar, and you don't have to play with transactions (that may or not breaks everything in case of errors)

luhn4y ago

If there's an error, the transaction just rolls back. I've been using this strategy for years and I have never once had a transaction "break everything."

1 more reply

tln4y ago· 2 in thread

My test suite has the schema and seed data in a template database then runs

    CREATE DATABASE testX TEMPLATE seed;

brasetvik4y ago

That's a pretty reasonable approach, and similar in spirit to what I find to work well:

stickfigure4y ago

I do the same thing, though it's quite a bit faster than 1s (perhaps my schema is smaller).

No docker though. I just run my migration scripts against the template database. In CI it runs every time; for local testing, I drop the template db and run the migrations.

rubenv4y ago· 2 in thread

Or in Go, with full databases: https://github.com/rubenv/pgtest/

rad_gruchalski4y ago

And the same but with Docker: https://gruchalski.com/posts/2021-04-24-on-software-testing-....

numlock864y ago

hardwaresofton4y ago· 1 in thread

I do this and for “heavier” services that support some form of isolation/multi-tenancy (ex. Postgres) I use per-test databases(or whatever unit of isolation).

This approach works across languages, backing services (assuming they’re not cloud only), and testing frameworks.

I wrote a now relatively old post about it[0]

[0]: https://vadosware.io/post/a-recipe-for-running-containers-in...

jschrf4y ago

Totally! Docker Compose is so useful for this.

[0]: https://github.blog/2021-11-15-githubs-commitment-to-npm-eco... (second issue, Nov. 2)

stansler4y ago· 1 in thread

More about using full-sized Postgres clones for migration testing and why is that important: https://postgres.ai/products/database-migration-testing

Kinrany4y ago

This sounds like a different problem, not a different approach: "in seconds" is only good for integration tests, and the topic is about (sub-second) unit tests.

zz8654y ago· 1 in thread

https://www.testcontainers.org/ has docker containers for this (for Java only?).

derekperkins4y ago

There's a Go port that we're pretty happy with.

kardianos4y ago

Robust, highly recommended.

cwcba4y ago

(Shameless plug:)

Here's the same thing as a docker container you can just keep around which auto-destroys the databases after a set time: https://github.com/ClockworkConsulting/tempgres-server

We originally tried to create temporary databases locally, but found that it the fact that each dev had to do extra setup to be a bit of a pain.

A docker container is on dockerhub as 'cwconsult/tempgres:v2.1.1' (I think I messed up the last publishing and 'latest' doesn't point to that. I should really fix that tag.)

We have a couple of published clients (just for convenience, the REST interface is super-simple, so 'manual' integration is trivial):

- https://github.com/ClockworkConsulting/tempgres-client (Java)

- https://github.com/ClockworkConsulting/django-tempgres (Django)

- https://github.com/BardurArantsson/pg-harness (Haskell)

A few nice things about this one is that:

- You don't even need a local PostgreSQL

- You can keep one somewhere in your LAN and never have to worry about it again.

- It integrates trivially with GitLab CI where you just use it as a service inside the build.

- Integration super-simple... simple enough to use even in e.g. shell scripts.

- No need to worry about tear-down

Anyway, just thought I'd mention it since it's relevant to this thread.

koeng4y ago

I use https://github.com/ory/dockertest for my Golang services that need testing. One thing that is nice is I can easily switch out/test both postgresql and cockroachdb.

nonane4y ago

We use ephemeralpg for unit tests: https://github.com/eradman/ephemeralpg

1 more reply

nathanwallace4y ago

Steampipe [1] is an open source CLI to query cloud APIs (e.g. AWS, GitHub, Kubernetes) with SQL. It uses an embedded Postgres instance internally and is built with Foreign Data Wrappers.

We install and leverage the Zonky Embedded Postgres Binaries [2] which are normally used for test suites. They have been great for us.

It's amazing how fast and light Postgres runs for these use cases!

1 - https://steampipe.io 2 - https://github.com/zonkyio/embedded-postgres-binaries

j / k navigate · click thread line to collapse