undefined | Better HN

0 pointsok_dad4y ago0 comments

With Postgres, you might have one server, or one cluster of servers that are coordinated, and then inside there you have tables with users and the users' data with foreign keys tying them together.

With SQLite, you would instead have one database (one file) per user as close to the user as possible that has all of the user's data and you would just read/write to that database. If your application needs to aggregate multiple user's data, then you use something like Litestream to routinely back it up to S3, then when you need to aggregate data you can just access it all there and use a distributed system to do the aggregation on the SQLite database files.

0 comments

9 comments · 3 top-level

danappelxx4y ago· 5 in thread

Hold on, doesn’t one-database-per-user totally absolve all ACID guarantees? You can’t do cross-database transactions (to my knowledge), which means you can end up with corrupted data during aggregations. What am I missing?

mwcampbell4y ago

One database per tenant only makes sense in multi-tenant applications that don't have any cross-tenant actions. I imagine there are many B2B applications that fall into this category.

ok_dadOP4y ago

If you have a use case with data that is extremely partitionable (like controlling individual industrial devices and collecting data to improve their performance or monitor the processes), then SQLite and Litestream could be a great option, but if you can't reasonably partition the data then it's probably better to use a centralized database server.

HelloNurse4y ago

There can also be shared, mostly read-only databases, with no transactions crossing database boundaries.

For example, one database per retail store with sharded customers, orders and inventory (most transactions involve one order of one customer fulfilled from that store) and a shared one with item descriptions and prices (modified, say, daily).

imtringued4y ago

>Hold on, doesn’t one-database-per-user totally absolve all ACID guarantees?

No it doesn't. What gave you that idea? You still have all "ACID guarantees" within each database.

>You can’t do cross-database transactions (to my knowledge),

That's true of most databases. If you have two apps and they use two different databases you won't have transactions across those two apps.

>which means you can end up with corrupted data during aggregations.

No, aggregations within each database work as you would expect.

>What am I missing?

As others have said, you use this pattern only if you don't intend to cross databases. By the way, in NoSQL databases like MongoDB, every document is its own little database so having a per user database is a massive upgrade over NoSQL.

danappelxx4y ago

To clarify:

- I was talking about ACID guarantees across databases (ie across users) - I was talking about aggregations across databases (ie across users)

Of course working inside one database works as you would expect it to. My point was that this pattern of database-per-user seems to be a totally different design than people have used with traditional n-tier designs.

Good point about NoSQL! But, wasn’t part of the reason MongoDB fell out of favor because it was lacking consistency?

1 more reply

atombender4y ago· 1 in thread

This sounds a lot like ActorDB [1], which is a horizontally replicated and sharded SQLite.

With ActorDB, each shard (what it calls actors) is its own dedicated SQLite database (but efficiently colocated in the same block storage as all the others, so essentially a virtual database). The intention with the sharding is to allow data that logically belongs together to be stored and replicated together as a shard; for example, all of a single user's data. When you want to run an SQL query you tell it which shard to access (as part of the query), and it routes the request to the right server.

It has some pretty clever stuff to allow querying across multiple such "actors", so you can still get a complete view of all your data.

Sadly, I don't think it's maintained anymore. Last commit was in 2019.

[1] https://www.actordb.com/

ok_dadOP4y ago

Looks neat, I'll check this out.

For a project I currently have, I am parititioning not only the database but also the software for each "customer" (let's call them that for now), so basically I have my single-threaded Python application that uses SQLite and put that in a container with Litestream and run one of those containers for each customer on one or more kubernetes clusters or on anything else that runs a container (or you can run the software directly). Then you can take the data from one to N of those customers and aggregate it and process it for monitoring and central configuration control and backup with another bit of software that does a distributed query on the S3 data that Litestream is saving. I can also control several of the systems in a coordinated way, if needed, by linking them together in a configuration file and letting them elect a leader that will be the central point of contact for a source of "work" (let's call it that for now) and then that leader can notify the other instances of the work to be done and help to dole out that work in a reasonable manner based on the resources and capabilities of the aggregation.

A few similar projects I worked on that were centralized were a huge mess in both the database and the codebase due to trying to do the same thing for thousands of customers with several dozen horizontal-scaling services that had to be coordinated. In my system, I just write one single-threaded bit of software that does the whole thing, but just for one customer at a time, so the whole system is horizontally scaleable instead of just each service within the system. I can still do the same things to aggregate data from the individual customer software units, and as described above I can still add coordination features for groups of individual customer units, so there is no loss in features, but the whole system is simple to reason and think about with none of the usual synchronization pitfalls you get in the usual distributed system.

avinassh4y ago

How it would handle conflicts?

j / k navigate · click thread line to collapse

0 comments

9 comments · 3 top-level

danappelxx4y ago· 5 in thread

mwcampbell4y ago

One database per tenant only makes sense in multi-tenant applications that don't have any cross-tenant actions. I imagine there are many B2B applications that fall into this category.

ok_dadOP4y ago

HelloNurse4y ago

There can also be shared, mostly read-only databases, with no transactions crossing database boundaries.

imtringued4y ago

>Hold on, doesn’t one-database-per-user totally absolve all ACID guarantees?

No it doesn't. What gave you that idea? You still have all "ACID guarantees" within each database.

>You can’t do cross-database transactions (to my knowledge),

That's true of most databases. If you have two apps and they use two different databases you won't have transactions across those two apps.

>which means you can end up with corrupted data during aggregations.

No, aggregations within each database work as you would expect.

>What am I missing?

danappelxx4y ago

To clarify:

- I was talking about ACID guarantees across databases (ie across users) - I was talking about aggregations across databases (ie across users)

Good point about NoSQL! But, wasn’t part of the reason MongoDB fell out of favor because it was lacking consistency?

1 more reply

atombender4y ago· 1 in thread

This sounds a lot like ActorDB [1], which is a horizontally replicated and sharded SQLite.

It has some pretty clever stuff to allow querying across multiple such "actors", so you can still get a complete view of all your data.

Sadly, I don't think it's maintained anymore. Last commit was in 2019.

[1] https://www.actordb.com/

ok_dadOP4y ago

Looks neat, I'll check this out.

avinassh4y ago

How it would handle conflicts?

j / k navigate · click thread line to collapse