What if database branching was easy? (opens in new tab)

(xata.io)

70 pointstee-es-gee2mo ago57 comments

57 comments

48 comments · 18 top-level

sastraxi2mo ago· 9 in thread

I’ve done experiments using BTRFS and ZFS for local Postgres copy-on-write. You don’t need anything but vanilla pg and a supported file system to do it anymore; just clone the database using a template and a newish version of Postgres.

Looking at Xata’s technical deep dive, the site claims that we need an additional Postgres instance per replica and proposes a network file system to work around that. But I don’t really understand why that’s needed. Can someone explain to me my misunderstanding here?

tee-es-geeOP2mo ago

> You don’t need anything but vanilla pg and a supported file system to do it anymore; just clone the database using a template and a newish version of Postgres.

Are you referring to `file_copy_method = clone` from Postgres 18? For example: https://boringsql.com/posts/instant-database-clones/

I think the key limitation is:

> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one.

sastraxi2mo ago

Yeah, that's the one. My use case is largely for local development, so the active connections thing isn't a limiter for me.

eatonphil2mo ago

I also don't really understand how being correct under physical branching with ZFS, or physical backups of a filesystem, are different from crash safety in general. As long as you replay the WAL at the point where you branch (or take a physical backup of the filesystem) you should not lose data?

At the same time Postgres people don't seem comfortable with the idea in practice so I'm not sure if this is actually ok to do.

hilariously2mo ago

Crash safety does mean rollbacking all things in progress, but yes, if your database cannot safely do it (even if it is yucky) then you do not have a safe database for any crash situation.

tee-es-geeOP2mo ago

For context for the others, I think you are referring to this blog post: https://xata.io/blog/open-source-postgres-branching-copy-on-... (in particular the "The key is in the storage system" section) right?

What I'm saying there is that if you do Postgres with on top of a local ZFS volume, the child branches Postgres instances need to be on the same server. So you are limited in how many branches you can do. One or two are fine, but if you want to do a branch per PR, that will likely not work.

If you separate the compute from storage via the network, this problem goes away.

zbentley2mo ago

ZFS snapshots can be transmitted over the network, with some diff-only and deduplication gains if the remote destination has an older instance of the same ZFS filesystem. It’s not perfect, and the worst case is still a full copy, but the tooling and efficiency wins for the ordinary case are battle-tested and capable.

1 more reply

sastraxi2mo ago

Yes, that's what I'm referring to.

You're still making the assumption in this comment: why does my 2nd (cloned) database need a separate postgres instance? One postgres server can host multiple databases.

1 more reply

wadefletch2mo ago

You can't have any other connections while a Copy-on-Write is happening, not even a logical replication slot. So you keep a read replica that then gets all connections briefly cut for the COW to avoid locking the master instance. Then you re-enable the logical replication slots on both the new, copied instance and the "copyable" read replica to get both back up to date w/ master.

sastraxi2mo ago

Thanks for sharing your workflow. My question is about why two databases on the same server would need two separate postgres instances.

comrade12342mo ago· 4 in thread

I was on a big team working on a giant oracle database over 25-years ago. I dont remember the term but each developer had their own playground of the giant database that wasn't affected by anyone else. The DB admin would set it up for each developer in just a few minutes so it definitely wasn't a copy. Then when a developer needed to reset and go back to the original db again it just took a few minutes. I just don't remember what it's called but I think Postgres has had it now for a few years.

tremon2mo ago

You don't actually need to physically copy data, just create a view for every table that does a replacing merge between the original read-only data and the developer's own copy. And you can put a trigger on the view to redirect writes to the same private-copy table, making the whole thing transparent to the user.

Not disputing that Oracle might have had something like this built-in, but it sounds like something that I could have whipped up in a day or so as a custom solution. I actually proposed a similar system to create anonymized datasets for researchers when I worked at a national archive institute.

TheMrZZ2mo ago

Snowflake uses a similar system with their 0-copy cloning. It starts with the original table's partition, and keeps track of the delta created by subsequent operations. Always found that builtin mechanism pretty neat!

1 more reply

bob10292mo ago

https://docs.oracle.com/database/121/ADMIN/create.htm#ADMIN1...

hilariously2mo ago

Sounds like a snapshot - a file based diff of the pages changed since the last full backup - easy to revert to for the same reasons.

bhouston2mo ago· 4 in thread

I have tried this before:

https://www.dolthub.com/

It was a lot of work and had poor performance with a lot of complications. I am not using it in my latest projects as a result.

zachmu2mo ago

Can you be more specific about what complications you ran into? As for performance, Dolt is faster overall than MySQL on sysbench now.

https://docs.dolthub.com/sql-reference/benchmarks/latency

bhouston2mo ago

It was https://threekit.com. It was a while ago now but we had to use MySQL for our primary copy that users used (e.g. prod), and only when they were working on branches did we use dolt. I think the second complication was that Dolt was not stable enough to use in heavy load scenarios as well.

I can delete this comment if you do not want to discuss this publicly.

2 more replies

evdubs2mo ago

I, too, have used it. It works well and is especially great for data sharing.

johnthescott2mo ago

makes sense. i can see things getting complex very quickly.

seems most versions would be better managed at application level, zfs/btrs snapshots not withstanding.

theaniketmaurya2mo ago· 3 in thread

i was using neon and they had some similar feature but now using planetscale. would be curious to know how you all are doing it?

tee-es-geeOP2mo ago

We have an overview on how it works here: https://xata.io/blog/open-source-postgres-branching-copy-on-...

miketery2mo ago

We used neon at last job. It seemed pretty cool. What made you switch to planetscale?

zwily2mo ago

Planetscale has better performance and uptime. But the branching keeps me going back to neon…

Yes planetscale can branch too, but it takes longer and you pay individually for each branch

1 more reply

Nihilartikel2mo ago· 3 in thread

This kind of magic is the reason that I'm very itchy to be able to line up real work on Datomic or XTDB someday.

xeubie2mo ago

Surprisingly, neither Datomic nor XTDB support branching. See: https://blog.danieljanus.pl/datomic-forking-the-past/

I actually built my own immutable database which does support branching (see profile), so it seems like a huge miss that these ones don't. It's pretty much the main reason I would want an immutable database.

nathell2mo ago

It appears that Datahike [0] is a Datomic workalike that supports branching. I haven’t tried it out myself (yet), but the documentation suggests it’s possible [1].

That said, I’m adding xitdb to the list of tech to try out. Thank you for building it!

Oh, and thanks for linking to my article :-)

[0]: https://github.com/replikativ/datahike

[1]: https://datahike.io/notes/the-git-model-for-databases/

camdez2mo ago

Wait, this statement seems way too strong.

The linked article points out that Datomic doesn't support branching from the past. It absolutely does support branching, and I've built entire test suites that way.

From a cursory glance, I'd say Datomic does exactly what the original parent article is discussing. It works great and it's super convenient.

1 more reply

SkyPuncher2mo ago· 2 in thread

I actually just did this recently. I looked at a bunch of solutions for my dev environment, but Claude kept pushing me back to a really simple one: use Postgres.

Postgres has template database that effectively give you a really easy means of "cloning" a database. On AFS (and several other file systems), copy-on-write is pretty much native.

gonzalohm2mo ago

I was also looking into this for dev environments but I haven't been able to solve "rebases". If someone merges to prod then the dev environment is "out of sync" and you have to clone the DB again. It's pretty painful to orchestrate

SkyPuncher2mo ago

We're using rails, so it mostly just comes down to pulling main and running migration.

I don't really worry about conflicts on branches since most features aren't long lived enough.

e7h4nz2mo ago· 1 in thread

We had a similar journey with Neon's branching. Initially it was a huge win for our CI workflows — spinning up an isolated, production-shaped database per PR made migration testing and integration checks dramatically more realistic than seed fixtures ever were.

That said, we've since pulled back from branching production schemas, and the reason is data masking. In principle you can define masking rules for sensitive columns, but in practice it's very hard to build a process that guarantees every new column, table, or JSON field added by any engineer is covered before it ever touches a branch. The rules drift, reviews miss things, and nothing in the workflow hard-fails when a new sensitive field slips through.

Most of the time that's fine. But "most of the time" isn't the bar for customer data — a single oversight leaking PII into a developer environment is enough to do real damage to trust, and you can't un-leak it. Until masking can be enforced by construction rather than by convention, we'd rather pay the cost of synthetic data than accept that risk.

pd_grahamt2mo ago

This was a big reason Xata acquired privacy dynamics in Jan - I was the founder. Definitely a tough problem to address because pii can take so many forms.

mwexler2mo ago· 1 in thread

I thought this was a repost but I was thinking of dolt which has similar capabilities but a different approach. As in https://news.ycombinator.com/item?id=38109880

zachmu2mo ago

Neon is basically this same idea: postgres on a copy-on-write file system.

These aren't really "branches" though, they're hard forks. You can't merge them back after making changes. Dolt is still the only SQL database with branch and merge.

mininao2mo ago· 1 in thread

Using neon for this and it's an absolute game changer, would recommend implementing database branching whatever solution you pick

gulcin2mo ago

May I ask your use case, I am curious.

tee-es-geeOP2mo ago· 1 in thread

Xata is open source btw (open core): https://github.com/xataio/xata

gulcin2mo ago

Came here to say this :) Anyone using Xata here?

sunny6782mo ago· 1 in thread

feels like branching doesn't replace seeding, just complements it- seeds for speed, branches for realism.

pd_grahamt2mo ago

Branching is more or less instant if you’re keeping staging in sync with a nightly/weekly cron job

sgarland2mo ago

> Imagine you need to add an index to a table with a few million rows. On a seeded database with 200 rows, the migration runs in milliseconds. Obviously. But on a branch with realistic data, it takes 40 seconds and needs CREATE INDEX CONCURRENTLY to avoid locking the table. The branch is isolated, so locking there isn't the issue — the point is that the rehearsal shows the production migration would need CONCURRENTLY.

A few million rows should take at most, on the most awful networked storage available, maybe 10 seconds. I just built an index locally on 10,000,000 rows in 4 seconds. Moreover, though, there are vanishingly few cases where you wouldn't want to use CONCURRENTLY in prod - you shouldn't need to run a test to tell you that.

IMO branching can be a cool feature, but the use I keep seeing touted (indexes) doesn't seem like a good one for it. You should have a pretty good idea how an index is going to behave before you build it, just from understanding the RDBMS. There are also tools like hypopg [0], which are also available on cloud providers.

A better example would be showing testing a large schema change, like normalizing a JSON blob into proper columns or something, where you need to validate performance before committing to it.

0: https://github.com/HypoPG/hypopg

UltraSane2mo ago

My company has a Pure storage array with always on dedupe and it works really well to make multiple copies of databases and only have to store modified data. For enterprise storage in 2026 I consider only storing unique blocks once to be table stakes as it enables so many useful capabilities and saves so much money.

condwanaland2mo ago

This is something that Palantir Foundry supports extremely well. Its data layer is built around the idea that anytime you're making a change, you make a branch, build on branch, only data you modified is copied to the branch, and then you can test it end to end on the branch.

Can't imagine doing it any other way

efficax2mo ago

what if writing a blog post without ai was easy?

SOLAR_FIELDS2mo ago

Wouldn't it be great if Aurora Serverless V2 actually supported this copy-on-write semantic? I would immediately be able to throw out a pile of slow code if they did.

dirtbag__dad2mo ago

Has anyone used this for debugging? Like an error in datadog triggers an agent sandbox and branched database?

throwaway815232mo ago

I mean there's a whole book about this, "Purely Functional Data Structures" (2009) by Chris Okasaki.

j / k navigate · click thread line to collapse

57 comments

48 comments · 18 top-level

sastraxi2mo ago· 9 in thread

tee-es-geeOP2mo ago

> You don’t need anything but vanilla pg and a supported file system to do it anymore; just clone the database using a template and a newish version of Postgres.

Are you referring to `file_copy_method = clone` from Postgres 18? For example: https://boringsql.com/posts/instant-database-clones/

I think the key limitation is:

> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one.

sastraxi2mo ago

Yeah, that's the one. My use case is largely for local development, so the active connections thing isn't a limiter for me.

eatonphil2mo ago

At the same time Postgres people don't seem comfortable with the idea in practice so I'm not sure if this is actually ok to do.

hilariously2mo ago

Crash safety does mean rollbacking all things in progress, but yes, if your database cannot safely do it (even if it is yucky) then you do not have a safe database for any crash situation.

tee-es-geeOP2mo ago

If you separate the compute from storage via the network, this problem goes away.

zbentley2mo ago

1 more reply

sastraxi2mo ago

Yes, that's what I'm referring to.

You're still making the assumption in this comment: why does my 2nd (cloned) database need a separate postgres instance? One postgres server can host multiple databases.

1 more reply

wadefletch2mo ago

sastraxi2mo ago

Thanks for sharing your workflow. My question is about why two databases on the same server would need two separate postgres instances.

comrade12342mo ago· 4 in thread

tremon2mo ago

TheMrZZ2mo ago

1 more reply

bob10292mo ago

https://docs.oracle.com/database/121/ADMIN/create.htm#ADMIN1...

hilariously2mo ago

Sounds like a snapshot - a file based diff of the pages changed since the last full backup - easy to revert to for the same reasons.

bhouston2mo ago· 4 in thread

I have tried this before:

https://www.dolthub.com/

It was a lot of work and had poor performance with a lot of complications. I am not using it in my latest projects as a result.

zachmu2mo ago

Can you be more specific about what complications you ran into? As for performance, Dolt is faster overall than MySQL on sysbench now.

https://docs.dolthub.com/sql-reference/benchmarks/latency

bhouston2mo ago

I can delete this comment if you do not want to discuss this publicly.

2 more replies

evdubs2mo ago

I, too, have used it. It works well and is especially great for data sharing.

johnthescott2mo ago

makes sense. i can see things getting complex very quickly.

seems most versions would be better managed at application level, zfs/btrs snapshots not withstanding.

theaniketmaurya2mo ago· 3 in thread

i was using neon and they had some similar feature but now using planetscale. would be curious to know how you all are doing it?

tee-es-geeOP2mo ago

We have an overview on how it works here: https://xata.io/blog/open-source-postgres-branching-copy-on-...

miketery2mo ago

We used neon at last job. It seemed pretty cool. What made you switch to planetscale?

zwily2mo ago

Planetscale has better performance and uptime. But the branching keeps me going back to neon…

Yes planetscale can branch too, but it takes longer and you pay individually for each branch

1 more reply

Nihilartikel2mo ago· 3 in thread

This kind of magic is the reason that I'm very itchy to be able to line up real work on Datomic or XTDB someday.

xeubie2mo ago

Surprisingly, neither Datomic nor XTDB support branching. See: https://blog.danieljanus.pl/datomic-forking-the-past/

nathell2mo ago

It appears that Datahike [0] is a Datomic workalike that supports branching. I haven’t tried it out myself (yet), but the documentation suggests it’s possible [1].

That said, I’m adding xitdb to the list of tech to try out. Thank you for building it!

Oh, and thanks for linking to my article :-)

[0]: https://github.com/replikativ/datahike

[1]: https://datahike.io/notes/the-git-model-for-databases/

camdez2mo ago

Wait, this statement seems way too strong.

The linked article points out that Datomic doesn't support branching from the past. It absolutely does support branching, and I've built entire test suites that way.

From a cursory glance, I'd say Datomic does exactly what the original parent article is discussing. It works great and it's super convenient.

1 more reply

SkyPuncher2mo ago· 2 in thread

I actually just did this recently. I looked at a bunch of solutions for my dev environment, but Claude kept pushing me back to a really simple one: use Postgres.

Postgres has template database that effectively give you a really easy means of "cloning" a database. On AFS (and several other file systems), copy-on-write is pretty much native.

gonzalohm2mo ago

SkyPuncher2mo ago

We're using rails, so it mostly just comes down to pulling main and running migration.

I don't really worry about conflicts on branches since most features aren't long lived enough.

e7h4nz2mo ago· 1 in thread

pd_grahamt2mo ago

This was a big reason Xata acquired privacy dynamics in Jan - I was the founder. Definitely a tough problem to address because pii can take so many forms.

mwexler2mo ago· 1 in thread

I thought this was a repost but I was thinking of dolt which has similar capabilities but a different approach. As in https://news.ycombinator.com/item?id=38109880

zachmu2mo ago

Neon is basically this same idea: postgres on a copy-on-write file system.

These aren't really "branches" though, they're hard forks. You can't merge them back after making changes. Dolt is still the only SQL database with branch and merge.

mininao2mo ago· 1 in thread

Using neon for this and it's an absolute game changer, would recommend implementing database branching whatever solution you pick

gulcin2mo ago

May I ask your use case, I am curious.

tee-es-geeOP2mo ago· 1 in thread

Xata is open source btw (open core): https://github.com/xataio/xata

gulcin2mo ago

Came here to say this :) Anyone using Xata here?

sunny6782mo ago· 1 in thread

feels like branching doesn't replace seeding, just complements it- seeds for speed, branches for realism.

pd_grahamt2mo ago

Branching is more or less instant if you’re keeping staging in sync with a nightly/weekly cron job

sgarland2mo ago

A better example would be showing testing a large schema change, like normalizing a JSON blob into proper columns or something, where you need to validate performance before committing to it.

0: https://github.com/HypoPG/hypopg

UltraSane2mo ago

condwanaland2mo ago

Can't imagine doing it any other way

efficax2mo ago

what if writing a blog post without ai was easy?

SOLAR_FIELDS2mo ago

Wouldn't it be great if Aurora Serverless V2 actually supported this copy-on-write semantic? I would immediately be able to throw out a pile of slow code if they did.

dirtbag__dad2mo ago

Has anyone used this for debugging? Like an error in datadog triggers an agent sandbox and branched database?

throwaway815232mo ago

I mean there's a whole book about this, "Purely Functional Data Structures" (2009) by Chris Okasaki.

j / k navigate · click thread line to collapse