I'm all-in on server-side SQLite (opens in new tab)

(fly.io)

1353 pointsdpeck4y ago404 comments

404 comments

217 comments · 75 top-level

bob10294y ago· 36 in thread

> SQLite isn't just on the same machine as your application, but actually built into your application process. When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That's micro, with a μ. A 50-100x improvement over an intra-region Postgres query.

This is the #1 reason my exuberant technical mind likes that we use SQLite for all the things. Latency is the exact reason you would have a problem scaling any large system in the first place. Forcing it all into one cache-coherent domain is a really good way to begin eliminating entire universes of bugs.

Do we all appreciate just how much more throughput you can get in the case described above? A 100x latency improvement doesn't translate directly into the same # of transactions per second, but its pretty damn close if your I/O subsystem is up to the task.

WJW4y ago

How do any writes end up on other horizontally scaled machines though? To me the whole point of a database on another machine is that it is the single point of truth that many horizontally scaled servers can write to and read each others' updates from. If you don't need that, you might as well read the entire dataset into memory and be done with it.

I know TFA says that you can "soon" automagically replicate your sqlite db to another server, but it only allows writes on a single server and all other will be readers. Now you need to think about how to move all write traffic to a single app server. All writes to that server will still take several milliseconds (possibly more, since S3 is eventually consistent) to propagate around all replicas.

In short, 100x latency improvement for reads is great but a bit of a red herring since if you have read-only traffic you don't need sqlite replication. If you do have write traffic, then routing it through S3 will definitely not give you a 100x latency improvement over Postgres or MySQL anymore. Litestream is definitely on my radar, but as a continuous backup system for small apps ("small" meaning it runs and will always run on a single box) rather than a wholesale replacement of traditional client-server databases.

PS: Congrats Ben!

mrkurt4y ago

Litestream does a couple of things. It started as a way to continuously back sqlite files up to s3. Then Ben added read replicas – you can configure Litestream to replicate from a "primary" litestream server. It's still limited to a single writer, but there's no s3 in play. You get async replication to other VMs: https://github.com/fly-apps/litestream-base

We have a feature for redirecting HTTP requests that perform writes to a single VM. This makes Litestream + replicas workable for most fullstack apps: https://fly.io/blog/globally-distributed-postgres/

It's not a perfect setup, though. You have to take the writer down to do a deploy. The next big Litestream release should solve that, and is part of what's teased in the post.

1 more reply

nicoburns4y ago

> If you don't need that, you might as well read the entire dataset into memory and be done with it.

Over in-memory data structures,SQLite gives you:

- Persistence

- Crash tolerance

- Extremely powerful declarative querying capabilities

> if you have read-only traffic you don't need sqlite replication.

I agree with you that the main use-case here is backup and data durability for small apps. Which is pretty big deal, as a database server is often the most expensive part of running a small app. That said, there are definitely systems where latency of returning a snapshot of the data is important, but which snapshot isn't (if updates take a while to percolate that's fine).

2 more replies

nine_k4y ago

I do understand the point of running SQLite in-process to speed up reads.

I do not understand why SQLite must also handle intense write load with HA, failover, etc.

I would rather have the best of both worlds: a proper DB server (say, Postgres) replicated to super-fast and simple read replicas in SQLite on every node.

(My ideal case would be some kind of natural sharding where each node keeps its own updates, or just a highly available data browsing app, with data in SQLite files updated as entire files, like a deploymen.)

3 more replies

bob10294y ago

What if, due to ridiculous latency reductions, your business no longer requires more than 1 machine to function at scale?

I'm talking more about sqlite itself than any given product around it at this point, but I still think it's an interesting thought experiment in this context.

2 more replies

ok_dad4y ago

With Postgres, you might have one server, or one cluster of servers that are coordinated, and then inside there you have tables with users and the users' data with foreign keys tying them together.

With SQLite, you would instead have one database (one file) per user as close to the user as possible that has all of the user's data and you would just read/write to that database. If your application needs to aggregate multiple user's data, then you use something like Litestream to routinely back it up to S3, then when you need to aggregate data you can just access it all there and use a distributed system to do the aggregation on the SQLite database files.

3 more replies

samatman4y ago

A lot depends on your consistency requirements and data model here.

I use SQLite heavily, and have evaluated litestream and rqlite but not deployed them, so bear that in mind.

If the application is set up so that it serves a user for a session, so a given session ID is reading and writing from the same SQLite database, there are many opportunities to replicate that data optimistically, so that you won't lose it if a meteor hits the server, but it might not live in all the replicas right away, since applying patchsets off the gossip network happens in downtime.

If concerns can't be isolated like this then yes, dedicated swarms of database servers are the way to go. Frequently they can be, and using SQLite punches way above its weight here.

hinkley4y ago

There are many systems that have much higher read to write traffic and so writes only need logarithmic scaling or perhaps with the square root of the system size. Waiting for faster hardware worked for these system for a long time, and to a small extent, still does.

The dirty secret is that a lot of systems that require very high write traffic are essentially systems built for narcissists. "Social websites" have higher write traffic than simpler consumption based systems, but we've gone beyond those initial steps into very aggressive systems that are based on recording every interaction with the user and providing them instant gratification for many of those.

These applications don't scale in a way that others do, easily. And maybe it's a feature, not a bug, if the tools I use discourage me from jumping into the maelstrom by making it difficult to even consider doing so. Constraints are where creativity comes from, not possibility.

jolux4y ago

S3 is strongly consistent now: https://aws.amazon.com/s3/consistency/

1 more reply

judofyr4y ago

> Latency is the exact reason you would have a problem scaling any large system in the first place.

Let's not forget why we started using separate database server in the first now…

A web server does quite a lot of things: Parsing/formatting HTTP/JSON/HTML, restructuring data, calculating stuff. This is typically very separate from the data loading aspect and as you get more requests you'll have to put more CPU in order to keep up (regardless of the language).

By separating the web server from the database server you introduce more latency in favor of enabling scalability. Now you can spin up hundreds of web servers which all talk to a single database server. This is a typical strategy for scalability: decouple the logic and scale up individually.

If you couple them together it's more difficult to scale. First of all, in order to spin up a server you need a full version of the database. Good luck autoscaling on-demand! Also, now every write will have to be replicated to all the readers. That's a lot more bandwidth.

There are definitely use cases for Litestream, but it's far from a replacement for your typical Node + PostgreSQL stack. I can see it being useful as a lower-level component: You can use Litestream to build your "own" database server with customized logic which you can talk to using an internal protocol (gRPC?) from your web servers.

tptacek4y ago

I don't think anyone's seriously arguing that the n-tier database architecture is, like, intrinsically bankrupt. Most applications are going to continue to be built with Postgres. We like Postgres; we have a Postgres offering; we're friends with Postgres-providing services; our product uses Postgres.

The point the post is making is that we think people would be surprised how far SQLite can get a typical application. There's a clear win for it in the early phases of an application: managing a database server is operationally (and capitally) expensive, and, importantly, it tends to pin you to a centralized model where it really only makes sense for your application to run in Ashburn --- every request is getting backhauled their anyways.

As the post notes, there's a whole ecosystem of bandaids --- err, tiers --- that mitigate this problem; it's one reason you might sink a lot of engineering work into a horizontally-scaling sharded cache tier, for instance.

The alternative the post proposes is: just use SQLite. Almost all of that complexity melts away, to the point where even your database access code in your app gets simpler (N+1 isn't a game-over problem when each query takes microseconds). Use Litestream and read-only replicas to scale read out horizontally; scale the write leader vertically.

Eventually you'll need to make a decision: scale "out" of SQLite into Postgres (or CockroachDB or whatever), or start investing engineering dollars into making SQLite scale (for instance: by using multiple databases, which is a SQLite feature people sleep on). But the bet this post is making is that the actual value of "eventually" is "surprisingly far into the future", "far enough that it might not make sense to prematurely optimize for it", especially early on when all your resources, cognitively and financially and temporally, are scarce.

We might be very wrong about this! There isn't an interesting blog post (or technical bet) to make about "I'm all in on the n-tier architecture of app servers and database servers". We're just asking people to think about the approach, not saying you're crazy if you don't adopt it.

2 more replies

nicoburns4y ago

> There are definitely use cases for Litestream, but it's far from a replacement for your typical Node + PostgreSQL stack

If you're a language like Node.js then horizontal scaling makes a lot of sense, but I've been working with Rust a lot recently. And Rust is so efficient that you typically end up in a place where a single application server can easily saturate the database. At that point moving them both onto the same box can start to make sense.

This is especially true for a low-traffic apps. I could probably run most of my Rust apps on a VM with 128MB RAM (or even less) and not even a whole CPU core and still get excellent performance. In that context, sticking a SQLite database that backs up to object storage on the same box becomes very attractive from a cost perspective.

2 more replies

ithrow4y ago

As they say, "you are not twitter" ;)

Access to monstrous machines is easy today and you have very fast runtimes like Go and the JVM that can leverage this hardware.

closeparen4y ago

This is a large part of what Rich Hickey emphasizes about Datomic, too. We're so used to the database being "over there" but it's actually very nice to have it locally. Datomic solves this in the context of a distributed database by having the read-only replicas local to client applications while the transaction-running parts are remote.

abraxas4y ago

Only trouble with that particular implementation is that the Datomic Transactor is a single threaded single process that serializes every transaction going through it. As long as you don't need to scale writes it works like a charm. However, the workloads I somehow always end up working with are write heavy or at best 50/50 between read and write.

1 more reply

carry_bit4y ago

It's exciting to see Datomic's architecture realized using more conventional technology.

throwaway8943454y ago

If you're pushing the database up into the application layer, do you have to route all write operations through a single "master" application instance? If not, is there some multi-master scheme, and if so, is it cheaper to propagate state all the time than it is to have the application write to a master database instance over a network? Moreover, how does it affect the operations of your application? Are you still as comfortable bouncing an application instance as you would otherwise be?

mrkurt4y ago

The answer is: yes, you do have to write through a single primary application instance.

So far.

The two important things here are:

1. Fly.io makes it really easy to write through a single primary application instance

2. There are ways to solve this problem so your application doesn't have to worry about it.

Right now, you have to be a little careful bouncing app instances. If you bounce the writer, you can't perform writes for 15s or whatever. This is a big problem during deploys.

There are a tremendous number of Fly.io users that are fine with this limitation, though. It's pretty valuable for some segment of our customers right now.

2 more replies

funstuff0074y ago

This is exactly the reason I am so skeptical of the cloud. I don't care how easy it is to stand up VMs, containers, k8s, etc. What I need to know is how hard is it to lug my data to my application and vice a versa. My feelings on this are so strong as I work mostly on database read-heavy applications.

teleforce4y ago

Local-first software is the future:

[1]Local-First Software:You Own Your Data, in spite of the Cloud:

https://martin.kleppmann.com/papers/local-first.pdf

sanderjd4y ago

What confuses me about this architecture I guess is: why have a SQL database at all? This sounds like a local cache. Which sure, of course those are super fast. But why does it need to be relational if all the data fits on the edge?

zarzavat4y ago

You get SQL and ACID. If you don't need those then you pay a performance price for having them. If you do need them, then you pay a price for not having them.

The best solution depends on the unit economics of the problem you are trying to solve. If you have a small number of high value users, then these approaches are premature optimisation, just use Postgres. If your business model is ad eyeballs then squeezing every last drop begins to seem very attractive because you can multiply your profitability (potentially).

1 more reply

pgwhalen4y ago

Most data is relational, so why not store it that way?

Or, from another angle, what would your “local cache” be?

1 more reply

a-dub4y ago

if you can tolerate eventual consistency and have the disk/ram on the application vms, then sure, keeping the data and the indices close to the code has the added benefit of keeping request latency down.

downside of course is the complexity added in synchronization, which is what they're tackling here.

personally i like the idea of per-tenant databases with something like this to scale out for each tenant. it encourages architectures that are more conducive for e2ee or procedures that allow for better guarantees around customer privacy than big central databases with a customer id column.

mwcampbell4y ago

> personally i like the idea of per-tenant databases with something like this to scale out for each tenant.

So do I. And that type of architecture has come up a few times now in this comment thread. Given that Fly has the lead developer of the Phoenix web framework on staff, maybe it would make sense for him to work on integrating this type of architecture, with Litestream-based replication and the ability to have different master regions for different tenants, into Phoenix.

1 more reply

vmception4y ago

> SQLite isn't just on the same machine as your application, but actually built into your application process.

How is that different than whats commonly happening? Android and iOS do this... right? ... but its still accessing the filesystem to use it.

Am I missing something or is what they are describing just completely commonplace that is only interesting to people that use microservices and never knew what was normal.

mrkurt4y ago

This is how client apps use sqlite, yes. Single instance client apps. Litestream is one method of making sqlite work for server side apps. The hard part on the server is solving for multiple processes/vms/containers writing to one sqlite db.

2 more replies

tlb4y ago

It's normal (and HN does something similar, working from in-process data) for systems that don't have to scale beyond one server. If you need multiple servers you have to do something, such as Litestream.

deepstack4y ago

a few years back working on Java project. Used H2 instead of postgres, and included H2 db as in application memory access. It speed up quries tremendously. There is just no beating in application db.

errantmind4y ago

Just wait until (some) devs realize they don't even need sqlite, and can serialize their data directly to binary flat files with simple locking synchronization for backups.

I'm half joking but I've witnessed many devs use databases when a binary file will do. I've done this personally for years for most of my 'fits-in-RAM', non-transactional, denormalized datasets, which is almost all of them.

Better yet, use both if you have both types of data. The performance benefits are enormous and well worth the complexity tradeoff in my experience.

kortex4y ago

That seems exactly opposite to the growing trend of "sqlite-as-application-file-format". There's a lot of nice features you get "for free" doing this, primarily way better consistency, than you do rolling your own binary format.

I don't want to have to deal with locks if at all possible. Binary works fine if each file is atomic, but that does not sound like the case you are advocating.

1 more reply

raxxorraxor4y ago

The usual use case for a database is that it has multiple users in different places which would be difficult with SQLite. But for other use cases I don't see a problem. I don't know how it scales ad infinitum, but you can manage a lot of data with it and the usual SQL server has limits too. Could be a good choice even before you care about latency.

overview4y ago

> Latency is the exact reason you would have a problem scaling any large system in the first place.

Not always. It depends on the architecture and your hosting strategy. I think it’s more likely for an instance of a web app to receive more requests than it can handle, causing the app to not service any requests.

iveqy4y ago

Just the latency is really important to me! I even built an ERP system that has a response time below 100 ms for all operations, it's a design goal.

My thought is that if you can see consumer changes depending on latency (for example on amazon or google) it is equally important for internal tools. Employee time is expensive.

kumarvvr4y ago

Throughput for a single service / app improves, but does it really scale? Across a cluster, you will have to have data replication and sync routines, that are a whole mess themselves.

The latency is not reduced, it is shifted elsewhere.

mekster4y ago

I want functions.

dsincl124y ago· 14 in thread

Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

Why do everyone seem so hyped on this when it can't really work properly IRL? If you have large amounts of data that need to be stored the app would die instantly, or leave all your users waiting for their changes to be saved.

What am I missing?

masklinn4y ago

> Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

"Lots of readers few writers" is an extremely common application property tho. Your average HN has significantly more reads than it has writes, especially if you bump the "worthless" writes (e.g. vote counts) out of using the DB and only flush them to the durable DB one in a while, for instance.

And with SQLite's WAL support it's supported even better: while it still has a singular writer, the writer doesn't block the readers anymore, which is a workload issue in the exclusive case (as the single writer would bring read concurrency down to 0).

littlecranky674y ago

Another pattern to avoid "worthless" writes is using statistical sampling. I.e. if you have votes in the >100.000 range, generate a random number p in [0, 1], and only perform a write if p < 0.01 - when reading votes, multiply by 100 etc. Of course, you have to assess individually if its feasible for your operation, hence the "worthless".

pc864y ago

Is there something you can point to that explains this "flush them to the durable DB once in a while" pattern in more detail?

1 more reply

jbverschoor4y ago

- Most transactions are read-only

- "Large" applications can usually be sharded by account. This means 1 file per account, and can easily be put on the most optimal geolocation of the account

- If you defer locking until commit, allowing multiple writers ( https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi... ). This is good enough for most applications anayway.

- Sqlite simple, fast enough for almost anything, supports a good set of features and datatypes, is very easy to embed.

tidenly4y ago

Why would I bake all of those assumptions and limitations into my system though just on the hope it won't ever become a problem

5 more replies

samwillis4y ago

Quite right it’s not one size fits all but for any site that’s mostly read only it’s a brilliant solution.

Simon Willison has written about it and coined the term “baked data”: https://simonwillison.net/2021/Jul/28/baked-data/

Mozilla.org uses this architecture, Django app running off SQLite with the db rsync’ed to each application server.

quickthrower24y ago

The confusion is probably a lot of us work at smaller companies that serve a wide solution to a niche customer, and that kind of app has a lot of reads and writes but doesn't need to scale. This app might be doing the invoicing/shipments/specialist parts of a business for example.

Whereas there is another different kind of Engineering which I probably will never be a part of (simply due to mathematics of available positions doing it) where you are scaling something up for millions of users but the app is much simpler like a Twitter or Reddit, and the challenge is in the scaling.

1 more reply

phaedrus4y ago

The default settings of SQLite are very conservative and essentially enforce serial writes. With tuning and loosening that enforcement, you can go from 50 writes per second to 50,000.

Edit: forgot to mention that yes a major part of that is batching writes into fewer, bigger transactions; AFAIK you can't really get around that.

andai4y ago

https://www.sqlite.org/faq.html#q19

>INSERT is really slow - I can only do few dozen INSERTs per second

>Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second.

>By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction.

>Another option is to run PRAGMA synchronous=OFF. This command will cause SQLite to not wait on data to reach the disk surface, which will make write operations appear to be much faster. But if you lose power in the middle of a transaction, your database file might go corrupt.

dagw4y ago

What am I missing?

Many sites are Read (almost) Only. For sites where users interactively query/view/explore the data, but (almost) never write their own, it works great.

unicornporn4y ago

Speaking of this, I really wish there was SQLite support in WordPress...

1 more reply

beberlei4y ago

use more than one SQLite file? we have one per day and project for example.

smt884y ago

I don't know if you're joking or not, but this would just be reinventing the Postgres/SQL Server/Oracle/MySQL wheel using duct tape and wishes.

If you're doing something that multiple systems have had millions of hours of development to do, just use one of those.

1 more reply

sph4y ago

But why? That seems such an unnecessary hack.

1 more reply

paulhodge4y ago· 8 in thread

Wow Litestream sounds really interesting to me. I was just starting on an architecture, that was either stupid or genius, of using many SQLite databases on the server. Each user's account gets their own SQLite file. So the service's horizontal scaling is good (similar to the horizontal scaling of a document DB), and it naturally mitigates data leaks/injections. Also opens up a few neat tricks like the ability to do blue/green rollouts for schema changes. Anyway Litestream seems pretty ideal for that, will be checking it out!

mwcampbell4y ago

An architecture like yours has certainly been done before, though AFAIK it never went mainstream. In particular, check out this post from Glyph Lefkowitz of Twisted Python fame, particularly the section about the (apparently dead) Mantissa application server:

https://glyph.twistedmatrix.com/2008/06/this-word-scaling.ht...

hantusk4y ago

Same pattern is ActorDB: https://github.com/biokoda/actordb

freedomben4y ago

I actually did something very similar to this for an app that produced a lot of data. I wrote a small middleware that automatically figured out which shard to use so the app logic could pretend that it was all just one big db. The app ultimately ended up in the can so it never needed to scale, but I always wonder how it would have gone.

Scarbutt4y ago

Each user's account gets their own SQLite file.

So now you need one database connection per user...

tptacek4y ago

And? It's SQLite; it's a file handle and some cache, not a connection pool.

mwcampbell4y ago

Depending on how you define "account", that can be quite reasonable. In a B2B application, each business customer could get their own SQLite database, and the number of SQLite connections would likely be quite manageable, even though some customers have many users.

freedomben4y ago

Without knowing details about the app, it's hard to know if that would matter. If a small number of concurrent users would ever be using it, I would think it would be NBD.

robertlagrant4y ago

If by connection you mean in-process database.

swaraj4y ago· 8 in thread

Looks v cool, but I feel like I'm missing a big part of the story, how do 2 app 'servers/process' connect to same sqlite/litestream db?

Do you 'init' (restore) the db from each app process? When one app makes a write, is it instantly reflected on the other app's local sqlite?

judofyr4y ago

Each server would have one copy of the SQLite database. Only one of the server would support writes — and those write will be replicated to the other server. Reads in the other server will be transactionally safe, but might be slightly out of date.

zepolen4y ago

I don't think you understand what transactionally safe means. SQLite used in this manner is not a database, it's a cache. Thinking otherwise will give you a bad time when the value you're writing is based on the stale value you read.

swaraj4y ago

This is my main q: are the writes replicated in real-time? Do the apps that just need read access have to repeatedly call 'restore'?

3 more replies

johnrrk4y ago

I also investigated SQLite and it's not clear how we can use it with multiple servers.

The WAL documentation [1] says "The wal-index greatly improves the performance of readers, but the use of shared memory means that all readers must exist on the same machine. This is why the write-ahead log implementation will not work on a network filesystem."

So it seems that we can't have 2 Node.js servers accessing the same SQLite file on a shared volume.

I'm not sure how to do zero downtime deployment (like starting server 2, checking it works, and shutting down server 1, seems risky since we'll have 2 servers accessing the same SQLite file temporarily)

[1] https://sqlite.org/wal.html

tptacek4y ago

The point of Litestream is that you don't have multiple servers accessing the same SQLite file. They all have their own SQLite databases. Of course, you only write to one of them, but that's a common constraint for database clusters.

gizzlon4y ago

> I'm not sure how to do zero downtime deployment

AFAIK, you either:

1) Don't, and eat a few seconds of downtime (f.ex if the clients re-try in the background, or..)

2) Start two processes on the same machine (believe that's always safe)

3) Share the database over the network in a way that's safe with sqlite3. Think it's possible, but at this point things are getting too complicated to be worth it IMO.

thruflo4y ago

Also how does the WAL page based replication maintain consistency / handle concurrent updates?

infogulch4y ago

It doesn't, this gives you a read-only replica only.

wasd4y ago· 6 in thread

Fly is putting together a pretty great team and interesting tech stack. It's the service I see as a true disruptor to Heroku because it's doing something novel (not just cheaper).

I'm still a little murky on the tradeoffs with Fly (and litestream). @ben / @fly, you should write a tutorial on hosting a todo app using rails with litestream and any expected hurdles at different levels of scale (maybe comparing to Heroku).

the_biot4y ago

If only they could keep their website reachable, that would be the icing on the cake. Like every time I see them linked on HN, I click and cannot connect to their website.

Last time somebody from fly said they'd look into it, but alas. It was related to IPv6 on their end, was as far as I could tell.

mrkurt4y ago

We have been chasing this down for weeks and can't find the actual bug/workaround here. It's definitely IPv6 related, we think having something to do with weird MTUs. Are you using an IPv6 tunnel or connecting via a vpn by chance?

4 more replies

quickthrower24y ago

How does Vercel fit in? I am having a lot of pleasure using their free tier and would be happy to pay if needed. My only concern is the pricing model being 0/20/Call us. I think clear usage-based pricing plans going 0-infinity should be the norm.

purplerabbit4y ago

Render is more of the successor IMO. Fly is a bit of a wildcard — they are bleeding edge, certainly, but they seem to shy away from focusing on implementation of some of the “boring” but extremely useful features present in most managed services (e.g., scaling volumes for Postgres)

michaeldwan4y ago

We're not shying away from "boring" stuff at all. We just have a small team with bigger priorities that's spread too thin. There's a million things like resizable volumes we need to ship and we're aggressively hiring to get them done.

1 more reply

tptacek4y ago

There's no one "successor to Heroku". The successor to Heroku is a collection of different companies that work well together. What's important is the Heroku idea of what an application is, as a developer-first prospect rather than an ops-first prospect like Kubernetes running on a cloud platform.

tiffanyh4y ago· 6 in thread

@dang, the actual title is “ I'm All-In on Server-Side SQLite”

Maybe I missed it but where in the article does it say Fly acquired Litestream?

EDIT: Ben Johnson says he just joined Fly. Nothing about Fly “acquiring” Litestream.

https://mobile.twitter.com/benbjohnson/status/15237489883352...

dang4y ago

Elsewhere in this thread he says "the project was acquired" which is more or less "Fly.io Buys Litestream" (the submitted title).

I'm honestly not sure whether we should change it or not - minimizing complaints is the goal - what's it called when a function has two points that it keeps unstably jumping between?

mrkurt4y ago

that function is correct when it agrees with me.

lnsp4y ago

> Litestream has a new home at Fly.io, but it is and always will be an open-source project. My plan for the next several years is to keep making it more useful, no matter where your application runs, and see just how far we can take the SQLite model of how databases can work.

As far as I understood it, Fly.io hired the person working on Litestream and pays them to keep working on Litestream.

tiffanyh4y ago

That’s how I understood it and that’s radically different than how this HN post got titled.

Ben Johnson confirms how you framed it here:

https://mobile.twitter.com/benbjohnson/status/15237489883352...

1 more reply

gamblor9564y ago

"Litestream has a new home at Fly.io, but it is and always will be an open-source project"

Very bottom of the post. Technically, Litestream remains an open-source project, so it's more accurate to say that Fly.io acquired the brand IP and the owner of that IP.

bussetta4y ago

The tweet[1] links the blog post and says Litestream is part of fly.io now.

[1]https://twitter.com/flydotio/status/1523743433109692416

anyfactor4y ago· 5 in thread

Story time!

A client told me that they will use a DigitalOcean droplet for a web app. Because the database was very small I chose to use SQLite3.

After delivery the client said their devops guy wasn’t available they would like to deploy to Heroku. Heroku being a ephemeral cloud service couldn’t handle the same directory SQLite3 db I had there. The only solution was to use their Postgres database service.

For some reason, it was infuriating that I have to use a database like that to store few thousand rows of data. Moreover, I would have to rewrite a ton of stuff accommodate the change to Postgres.

I ended up using firestore.

---

I think something like this could have saved me a ton of hassle that day.

luhn4y ago

It was too much work to migrate from SQLite to PostgreSQL, so you migrated to... a NoSQL DB?

pjot4y ago

I think they’re referring to the trade from managing one system (DO + SQLite) to two (Heroku + pg) and instead choosing Firestore instead as it’s only one system to manage.

szundi4y ago

He wrote it was a “day” at the end. This guy is fast.

1 more reply

me_me_mu_mu4y ago

Please let me know if you’ve ever had to move data out of firestore. I’m currently using firestore for some real time requirements but the data is written to Postgres before the relevant data for real time needs (client needs to show some data updating constantly) is written to firestore.

Just curious if you’ve ever had to migrate data out of firestore.

somishere4y ago

Migrating data out of firestore is a bit tedious, but not difficult. It just requires a lot of iteration. That said, if I were simply looking for realtime updates for a subset of my data (and was determined to use the firebase system) I would go for realtimedb over firestore every time. It's much simpler, cheaper, and export, when necessary, is straightforward.

ignoramous4y ago· 5 in thread

Looking forward to ditching my PlanetScale plans for this!

> ...people use Litestream today is to replicate their SQLite database to S3 (it's remarkably cheap for most SQLite databases to live-replicate to S3).

Cloudflare R2 would make that even cheaper. Cloudflare set to open beta registration this week.

And if you squint just enough, you'd see R2, S3 et al are nosql KV store themselves, masquerading as disk drives, and used here to back-up a sql db...

> My claim is this: by building reliable, easy-to-use replication for SQLite, we make it attractive for all kinds of full-stack applications to run entirely on SQLite.

Disruption (? [0]) playing out as expected? That said, the world reliable is doing a lot of heavy lifting. Reliability in distributed systems is hard (well... easy if your definition of reliability is different ;) [1])

> And if you don't need the Postgres features, they're a liability.

Reminds me of WireGuard, and how it accomplishes so much more by doing so much less [2].

Congratulations Ben (but really, could have taken a chance with heavybit)!

----

[0] https://hbr.org/2015/12/what-is-disruptive-innovation

[1] God help me, the person on the orange site saying they need to run Jepson tests to verify Litestream WAL-shipping. Stand back! You don’t want to get barium sulfated!, https://twitter.com/tqbf/status/1510066302530072580

[2] "...there’s something like 100 times less code to implement WireGuard than to implement IPsec. Like, that is very hard to believe, but it is actually the case. And that made it something really powerful to build on top of*, https://www.lastweekinaws.com/podcast/screaming-in-the-cloud...

chloerei4y ago

> Cloudflare set to open beta registration this week.

Any source? I wait for a long time.

jgrahamc4y ago

That's correct. Tomorrow. R2 open beta and a hell of a lot more.

ignoramous4y ago

May 11: https://archive.is/2u5Rt

unmole4y ago

> Cloudflare R2 would make that even cheaper.

Cloudflare R2 has free egress. The read and write operations themselves are not that much cheaper than S3.

ignoramous4y ago

Ah, you're right! Litestream isn't likely be egress/ingress heavy...

foodstances4y ago· 5 in thread

Just curious, is there any financial compensation/support going to Richard Hipp with all of this money changing hands?

When I see these startups making a business that is so heavily based on open-source software (like Tailscale on top of Wireguard), I have to wonder what these companies do to actually support the author(s) of the software that so much of their company is based on.

mrkurt4y ago

Yes. We (Fly.io) are buying a sqlite support agreement. We also send money WireGuard's way. I'm pretty sure Tailscale does too.

We have also given OSS authors advisor equity. A couple of folks wrote libraries that were important to keeping us going, and we've granted them shares the same way some startups would to MBA advisors.

defen4y ago

> We have also given OSS authors advisor equity

That's a fantastic idea. In retrospect it's a really obvious idea but I've never heard of anyone doing it before. Is this a common thing that I'm just oblivious to?

1 more reply

foodstances4y ago

That's great to hear, thank you!

qbasic_forever4y ago

I agree Richard Hipp should be compensated but he explicitly licensed and releases SQLite under a public domain license: https://www.sqlite.org/copyright.html Not Apache, not MIT, not GPL... public domain. You can do almost anything with it and not be beholden to any demands. You can tell people you built your business on SQLite... or not. It's public domain.

That said SQLite has a business model of selling support and premium features like encryption: https://www.sqlite.org/prosupport.html

foodstances4y ago

Sure, but Apache, MIT, and GPL licenses don't require payment to the author either. That's why it's up to the company to decide to offer compensation without being required to, and why I'm curious which companies actually do it.

It's like when RedHat when public and offered pre-IPO stock to open source developers.

kall4y ago· 4 in thread

I am as obsessed with sub 100ms responses as the people at fly.io, so I think the one writer and many, many readers architecture is smart and fits quite a few applications. When litestream adds actual replication it will get really exciting.

> it won't work well on ephemeral, serverless platforms or when using rolling deployments

That's... a lot of new applications these days.

mwcampbell4y ago

> it won't work well on ephemeral, serverless platforms or when using rolling deployments

I assumed that was what Fly was hiring Ben to work on.

mrkurt4y ago

Yes. Yes it is.

emptysea4y ago

Yeah the rolling deployments gotcha really stuck out to me. I think most PaaS will provide that by default anyways because who wants downtime during deploys?

mwcampbell4y ago

mrkurt specifically mentioned that a solution for that is in the works. https://news.ycombinator.com/item?id=31319544

otoolep4y ago· 4 in thread

Congratulations to Ben! This project has been like a rocket ship.

wolfhumble4y ago

I always thought that SQLite was kind of operating in stealth mode. Everyone was talking nicely about it, but it lacked a few things so it was a "super DB" but not in the "big boys league". And now it is taking off and the other DB's are saying "you here?", and SQLite goes "Yup, bye bye" ;-)

This is really useful and fun, thanks! Godspeed on this new part of the journey!

benbjohnson4y ago

Thanks, Philip!

Loic4y ago

Thank you Ben.

We have a small server[0], running since 2016, pushing a great amount of data incredibly fast, with BoltDB as backend. In the past two months we have been restructuring it to use SQLite, it will come online with more data in June. It looks like we are going to continue using your software... knowing first hand the quality of BoltDB, I will have no problems trusting your work with SQLite!

[0]: https://www.chemeo.com/search?q=methane

abrookewood4y ago

Hey Ben, any chance you can sit next to Chris McCord and get SQLite support in Phoenix :)

1 more reply

no_wizard4y ago· 3 in thread

This a great and interesting offering! I think this fits well with fly.io and their model of computing.

I now wish that I had engaged with this idea that was very similar to litestream that I had about a year and half ago. I always thought SQLite just needed a distribution layer to be extremely effective as a distributed database of sorts. Its flat file architecture means its easy to provision, restore and backup. SQLite also has incremental snapshotting and re-producible WAL logs that can be used to do incremental backups, restores, writes etc. It just needs a "frontend" to handle those bits. Latency has gotten to the point where you can replicate a database by its continued snapshots (which is, on a high level, what litestream appears to be doing) being propagated out to object / blob storage. You could even achieve brute force consensus with this approach if you ran it in a truly distributed way (though RAFT is probably more efficient).

Reason I didn't do this? I thought to myself - why in the world in 2020 would someone choose to use SQLite at scale instead of something like Firebase, Spanner, Fauna, or even Postgres? So after I did an initial prototype (long gone, never pushed it to GitHub) I just felt like...there was no appetite for it.

Now I regret!

Just a long winded way of saying, congrats! This is awesome! Thanks for doing exactly what I wanted to do but didn't have the guts to follow through with.

evntdrvn4y ago

there’s some stuff out there:

- https://github.com/rqlite/rqlite - https://github.com/chiselstrike/chiselstore - https://dqlite.io/

I’m sure there’s more, those are just the ones I remember.

epilys4y ago

I implemented exactly this setup, in Rust, last year for a client. Distributed WAL with write locks on a RAFT scheme. Custom VFS in Rust for sqlite3 to handle the IO. I asked the client to opensource it but it's probably not gonna happen... It's definitely doable though.

ComputerGuru4y ago

Did you write your own rust raft implementation or reuse something already available?

1 more reply

NeutralForest4y ago· 3 in thread

There's something I don't understand, it says that the "data is next to the application", what does it mean? Where is stored and how is it accessed by the application?

tptacek4y ago

The data lives in a file the application reads/writes directly (and in a cache that the sqlite libraries can park inside the application itself). The point is that you're not calling out over the network to a "database server"; your app server is the database server.

NeutralForest4y ago

Thanks for the explanation!

ledauphin4y ago

it means the data is stored in a file on the local drive of a computer that is also running the application.

it also means that it is the application itself (via the SQLite library) that reads and modifies that database file. There is no separate database process.

1 more reply

jchw4y ago· 2 in thread

This is interesting! I like using Fly.io today, but I’m currently using a single node for most stuff with SQLite. Having some kind of failover and replication would be pretty awesome. I have yet to try Litestream and it does sound like there’s some issues to work out that could be pretty nasty, but I’ll definitely be watching.

Fly.io is very nice. It’s what I hoped Hyper.sh would be, except it isn’t dead. That said, there are a couple things I worry about… like, there’s no obvious way to resize disks, you pretty much need to make a new disk that’s larger, launch a new instance with it mounted, and transfer data from an existing instance. If it was automated, I probably wouldn’t care, though a zero downtime way of resizing disks would be a massive improvement. Another huge concern is just how good the free tier is. I actually am bothered that I basically don’t get billed. Hyper.sh felt a bit overpriced, and by comparison Fly.io does scale up in price but for small uses it feels like theft.

michaeldwan4y ago

> there’s no obvious way to resize disks

Yes, this sucks right now. Resizable disks is on our list, we just need somebody to spend a few days on it. Luckily we're hiring platform engineers [1] to work on fun problems like that.

> I actually am bothered that I basically don’t get billed.

We actually had a bug that skipped charging a bunch of accounts. :) Regardless, we're not overly concerned about making $1/mo from small accounts. Large customers more than make up for it. Turns out building something devs _choose_ to use on their free time often leads to using it at work too.

[1] https://fly.io/jobs/platform-product-engineer/

ignoramous4y ago

> Yes, this sucks right now.

If I may, really need to hire sudhirj back or get someone doing the tedious work of answering dumb/advanced questions in the forums and doing follow-ups! Even if it doesn't scale, this high-touch forum engagement may not only help inform the product roadmap but help eventually cultivate a stronger community.

tyingq4y ago· 2 in thread

Dqlite is also interesting, and in a similar space. It seems to have evolved from the LXC/LXD team wanting a replacement for Etcd. It's Sqlite with raft replication and also a networked client protocol.

https://dqlite.io/docs/architecture

tptacek4y ago

There's also rqlite. There's definitely a place for this kind of stuff. But we already use a bunch of stuff that does distributed consensus in our stack, and the experience has left us wary of it, especially for global distribution. We almost used rqlite for a statekeeping feature internally, but today we'd certainly just use sqlite+litestream for the same kinds of features, just because it's easier to reason about and to deal with operationally when there's problems.

https://fly.io/blog/a-foolish-consistency/

otoolep4y ago

rqlite author here. Anything else you can tell me about why you decided against it? Just simpler, as you say, to avoid a distributed system when you can (something I understand).

1 more reply

obiwanpallav14y ago· 2 in thread

In which scenario would you use litestream[1] vs rqlite[2]?

1 - https://github.com/benbjohnson/litestream

2 - https://github.com/rqlite/rqlite

otoolep4y ago

rqlite author here. The way I think about it is that both systems add reliability to SQLite, but in addition rqlite also offers high-availability. Another important difference is that Litestream does not require you to change how your application interacts with the SQLite database, but rqlite does.

Another way I think about it (I'm sure Ben may have other ideas!) is that if you want to add a layer of reliability to a SQLite-based application, Litestream will work very well and is quite elegant. But if you have a set of data that you absolutely must have access to at all times, and you want to store that data in a SQLite database, rqlite could meet your needs.

Check out the rqlite FAQ for more.

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md#How-...

benbjohnson4y ago

Litestream author here. I agree with Philip. Litestream relaxes some guarantees about durability and availability in order to make it simpler from an operational perspective. I would say the the two projects generally don't have overlap in the applications they would be used for. If your application is ok with the relaxed guarantees of Litestream, it's probably what you want. If you need stronger guarantees, then use rqlite.

1 more reply

mwcampbell4y ago· 2 in thread

Congratulations to Ben on getting a well-funded player like Fly to buy into this vision. I'm looking forward to seeing a complete, ready-to-deploy sample app, when the upcoming Litestream enhancements are ready.

I know that Fly also likes Elixir and Phoenix; they hired Chris McCord, after all. So would it make sense for Phoenix applications deployed in production on Fly to use SQLite and Litestream? Is support for SQLite in the Elixir ecosystem, particularly Ecto, good enough for this?

warmwaffles4y ago

> Is support for SQLite in the Elixir ecosystem, particularly Ecto, good enough for this?

Why yes it is. I maintain the `exqlite` and `ecto_sqlite3` libraries and it was just integrated in with `kino_db` which is used by `livebook`.

https://github.com/elixir-sqlite/exqlite

lawik4y ago

I still love you for making this happen.

netcraft4y ago· 2 in thread

This is similar to what I hoped websql had eventually grown into. sqlite in the browser, but let me sync it up and down with a server. Every user gets their own database, the first time to the app they "install" the control and system data, then their data, then writes are synced to the server. If it became standard, it could be super easy - conflict resolution notwithstanding.

bambax4y ago

You can make webapps using exactly this approach, with json in localstorage as the client db, and occasiona, asynchronous, writes to the server. I'm now building a simple webapp exactly like this, and the server db is sqlite. So far it works perfectly fine.

netcraft4y ago

In my experience the size limitations of localstorage keeps this from really being viable. And I just really like SQL. But your point is well taken, it is possible to do it today. My hope back then is that there would be libraries over it that would have made it easy and commonplace.

_vufv4y ago· 2 in thread

I absolutely love this. I think so called n-tier architecture as a pattern should be aggressively battled in the attempt to reduce the n. Software is so much more reliable when the communication between different computational modules of the system are function calls as opposed to IPC calls. Why does everything that computes something or provides some data need to be a process? It doesn't.

Postgresql and every other server/process should have first class support for a single CLI command that: spins up the DB that slurps up the config and the data storage, takes the SQL command provided through the CLI arguments, runs it, returns results and terminates. Effectively, every server/process software should be a library first, since it's easy to make a server out of a library and the reverse is anything but.

jjeaff4y ago

If you want to maintain much of the data in memory, wouldn't that require a process?

_vufv4y ago

Sure. If you need your software to be a process I think you should build it to be both: a library first and a process second. Libraries are so much easier to use, test and reason about.

melony4y ago· 2 in thread

Note that the popular Node.js ORM Prisma does not support WAL.

https://github.com/prisma/prisma/issues/3303

tylergetsay4y ago

It also crashes if you try to write to the DB while its open https://github.com/prisma/prisma/issues/2955

LAC-Tech4y ago

Best option for SQlite with node is this.

https://github.com/JoshuaWise/better-sqlite3

Author is all over the issues section, and seems very knowledgeable about how SQLite works.

beck54y ago· 2 in thread

I have found it easy to overload SQLite with too many write operations (20+ Concurrently), is this typical behaviour referred to in the post, or a write heavy workload?

benbjohnson4y ago

It can depends on a lot of factors such as the journaling mode you're using as well as your hardware. SQLite has a single-writer-at-a-time restriction so it's important manage the size of your writes. I typically see very good write throughput using WAL mode and synchronous=normal on modern SSDs.

Scarbutt4y ago

How big are the writes? are you storing blobs?

endisneigh4y ago· 2 in thread

What’s an example of a popular app (more than 100K users) that uses lite stream? Curious to see how this looks like in production

benbjohnson4y ago

Litestream author here. That's a good question. There's not very good visibility into open source usage so it's hard to say unless folks write blog posts about it. For example, I know Tailscale runs part of their infrastructure with SQLite & Litestream[1].

I wrote a database called BoltDB before and I have no idea how widespread it is exactly. It's used in a lot of open source projects like Consul & etcd but I don't know anything about non-public usage.

[1]: https://tailscale.com/blog/database-for-2022/

1 more reply

jkaplowitz4y ago

Tailscale: https://tailscale.com/blog/database-for-2022/

I don't know their user count, but they are growing well and just raised their Series B.

seanwilson4y ago· 2 in thread

SQLite uses dynamic types? Is this an issue in practice, especially for large apps? Don't you lose guarantees about your data which makes it messy to handle on the backend?

Context from https://www.sqlite.org/datatype3.html: "SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container. The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statements that work on statically typed databases work the same way in SQLite. However, the dynamic typing in SQLite allows it to do things which are not possible in traditional rigidly typed databases. Flexible typing is a feature of SQLite, not a bug."

ripley124y ago

You can use SQLite in strict mode if you prefer. https://www.sqlite.org/stricttables.html

aliswe4y ago

This sounds like schemalessness to me? serious "question".

1 more reply

nojvek4y ago· 2 in thread

Somebody needs to build litestream for duckdb (columnstore oriented sqlite like db).

That would be epic. DuckDB speed is crazy fast when it comes to aggregate/analysis queries.

viktour194y ago

Yes please!

safehell4y ago

DuckDB doesn't need an equivalent to litestream, it already have parquet files and object storage support.

1 more reply

jasfi4y ago· 2 in thread

Is there a good DB admin GUI that supports both SQLite and Postgres?

jwaterhouse4y ago

You mean like DBeaver?

https://dbeaver.io/

jpcapdevila4y ago

I love Datagrip by jetbrains.

0xbadcafebee4y ago· 1 in thread

Things I would like in a database:

- All changes stored as diff trees with signed cryptographic hashes. I want to check out the state of the world at a specific commit, write a change, a week later write another change, revert the first change 3 weeks later. And I want it atomic and side-loaded with no performance hit or downtime.

- Register a Linux container as a UDF or stored procedure. Use with pub/sub to create data-adjacent arbitrary data processing of realtime data

- Fine-grained cryptographically-verified least-privilege access control. No i/o without a valid short-lived key linked to rules allowing specific record access.

- Virtual filesystem. I want to ls /db/sql/SELECT/name/IN/mycorp/myproduct/mysite/users/logged-in/WHERE/Country/EQUALS/USA. (Yes, this is stupid, but I still want it. I don't want to ever have to figure out how to connect to another not-quite-compatible SQL database again.)

sa464y ago

> I want to check out the state of the world at a specific commit, write a change, a week later write another change, revert the first change 3 weeks later.

This is sort of like temporal tables but with the ability to branch from a previous point of history. I'm not sure it would play well with foreign keys.

You could branch the entire database with either point-in-time recovery or with the file-system using ZFS. Postgres.ai turned this into a product.

splitrocket4y ago· 1 in thread

There are a couple of interesting options in a similar space: BedrockDB ( https://bedrockdb.com/ ) Dqlite ( https://dqlite.io/ ) Rqlite ( https://github.com/rqlite/rqlite )

I'm interested in how this performs and particularly, what are the tradeoffs relative to the other options above.

benbjohnson4y ago

Litestream author here. The tl;dr is that Litestream trades operational complexity for reduced durability guarantees and increased write performance. Those 3 options mentioned use distributed consensus to ensure higher durability but that consensus also takes time so writes can be slowed. Litestream is an async replication tool so you can have a configurable window (1 second by default) where you could lose data if you have a catastrophic failure.

LunaSea4y ago· 1 in thread

I wonder if we'll ever see an embedded version of PostgreSQL?

nicoburns4y ago

That's basically what SQLite is (notably, SQLite makes an effort to be compatible with Postgres's SQL syntax). If you mean based off the actual PostgreSQL codebase, then I highly doubt it.

1 more reply

learndeeply4y ago· 1 in thread

Since both Fly.io and Litestream founders are here - why not disclose the price?

benbjohnson4y ago

Litestream author here. I just posted it as a reply here: https://news.ycombinator.com/item?id=31319556

downut4y ago· 1 in thread

(I am attempting my first "as much as possible make the database do the work" app right now, after 35 years in the business. Yeah I started out on the scientific side, and then the sort of things SQLite is obviously great for.)

I do not understand how one implements the multi-role access system on top of SQLite that postgresql gives you for free.

Other than do it from scratch (eeek!) on the app side.

Just as an example, think of the smallest db backed factory situation you can imagine... as small as you like. There will need to be multiple roles if more than one role accesses the database tables.

tptacek4y ago

I spent from 2005 to 2020 doing almost nothing but vulnerability research, where the modal client project was a SAAS-type app, and my experience is that only a tiny fraction of companies building on Postgres actually use Postgres authorization features. It's far more typical to build this logic into the application than to build off the database's authorization features.

Nevertheless, if you're building an app that takes advantage of database auth features, that's a powerful reason to keep on using Postgres. You actually have one of the major problems Postgres solves for!

ok_dad4y ago· 1 in thread

I was just about to start using this for a project, I hope the license won’t change.

Congrats to the author though, no matter what! I wish everyone could be so successful.

benbjohnson4y ago

Litestream author here. It'll continue to be open source under an Apache 2 license.

ilrwbwrkhv4y ago· 1 in thread

For how much?

benbjohnson4y ago

Litestream author here. I've been on the fence about disclosing the amount. I'm generally open about everything but I know some people get weird about money stuff. I'm also autistic so I tend to not navigate social norms very well. That all being said, the project was acquired for $500k.

1 more reply

kondro4y ago· 1 in thread

Curious about the costs of this. Wouldn't it cost at least $13/month just in PutObject request costs to replicate Sqlite to S3 at the default of 1 sync per second? Or is it smart enough to only sync if there have been additions to the WAL?

jpcapdevila4y ago

It only PUTs if there's writes.

1 more reply

daniel_iversen4y ago· 1 in thread

For most people's purposes I'd assume that ease-of-use, ease-of maintenance, relatively good speed, safe, documented, feature rich and scalable is important. I like SQLite, and while it's cool that they've fixed some big things around safety and clustering, it still seems like a "below-bare-minimum" choice for a lot of production systems, or am I just being old school? MariaDB (/ MySQL) really has a whole lot of good features that I thought would just make it a safer choice? What do people think and why?

jpcapdevila4y ago

Could you elaborate on the features that would make MySQL a safer choice?

coliveira4y ago· 1 in thread

What I think interesting is that people write articles about technology architectures without even bothering trying to use the said architecture. I would be very interested in reading from someone who actually used sqlite in a large scale application in the way he described, and then tell what worked or not in this setup. Until then, this article is nothing more than a proposal, another kind of vaporware.

case4y ago

Fwiw, Tailscale has done this, and written about it:

https://tailscale.com/blog/database-for-2022/

zsims4y ago· 1 in thread

> It was reasonable to overlook this option 170 years ago, when the Rails Blog Tutorial was first written.

Woah. Rails is really old

faitswulff4y ago

This whole article is written in an amusing way. It was really easy reading.

jrochkind14y ago

While the title is about a business acquisition, the article is mostly about the technology itself -- replicating SQLite, suggested as a superior option to a more traditional separate-process rdbms, for real large-scale production workloads.

I'd be curious to hear reactions to/experiences with that suggestion/technology, inside or outside the context of fly.io.

mtlynch4y ago

Super cool! Congrats, Ben!

I've been building all of my projects for the last year with SQLite + fly.io + Litestream. It's already such a great experience, but I'm excited to see what develops now that Litestream is part of fly.

aidenn04y ago

The SQLite team has done a good job over the years establishing an ethos (in the rhetorical sense) of writing reliable software. The degree to which this can transfer to Lighstream is the degree to which Lightstream is intrusive on the SQLite code.

Another way of saying it: I trust the SQLite's team statements of stability for SQLite because of history and a track-record for following stringent development processes. The same is not true of the Lighstream team. Does anybody know how much any potential damage introduced by the Lightstream code could affect the integrity of my data on disk -- obviously replication added by Lightstream will be only as good as the Lighstream team makes it, but to what degree is the local data-store affected?

kgeist4y ago

>But database optimization has become less important for typical applications. <..> As much as I love tuning SQL queries, it's becoming a dying art for most application developers.

We thought so, too, but as our business started to grow, we had to spend months, if not years, rewriting and fine-tuning most of our queries because every day there were reports about query timeouts in large clients' accounts... Some clients left because they were disappointed with performance. Another issue is growing the development team. We made the application stateless so we can spin up additional app instances at no cost, or move them around between nodes, to make sure the load is evenly distributed across all nodes/CPUs (often a node simply dies for some reason). Since they are stateless, if an app instance crashes or becomes unstable, nothing happens, no data is lost, it's just restarted or moved to a less busy node. DB instances are now managed by the SRE team which consists of a few very experienced devs, while the app itself (microservices) is written by several teams of varying experience and you worry less about the app bringing down the whole production because microservice instances are ephemeral and can be quickly killed/restarted/moved around. Simple solutions are attractive but I'd rather invest in a more complex solution from the very beginning, because moving away from SQLite to something like Postgres can be costlier than investing some time in setting up 3-tier if you plan your business to grow, otherwise eventually you can end up reinventing 3-tier, but with SQLite. But that's just my experience, maybe I'm too used to our architecture.

krts-4y ago

A great project with awesome implications. Well deserved, and the fly.io team are very pragmatic.

This will be even more brilliant than it already is when fly.io can get some slick sidecar/multi-process stuff.

I ended up back with Postgres after my misconfigs left me a bit burned with S3 costs and data stuff. But I think a master VM backed by persistent storage on fly with read replicas as required is maybe the next step: I love the simplicity of SQLite.

rco87864y ago

All of the action around SQLite recently is very exciting!

RcouF1uZ4gsC4y ago

I love Litestream! It is so simple and it just works!

Congratulations, Ben, on making a great product and on the sale!

One thing I have had in the back of my mind, but have not had the time to pursue is using SQLite replication to make something similar to CloudFlare's durable objects but more open.

A "durable object" would be an SQLite database and some program that processes requests and accesses the SQLite database. There would be a runtime that transparently replicates the (database, program) pair where they are needed and routes to them.

That way, I can just start out locally developing my program with an SQLite database, and then run a command and have it available globally. At the same time, since it is just accessing an SQLite database, there would be much less risk of lockin.

scwoodal4y ago

> According to the conventional wisdom, SQLite has a place in this architecture: as a place to run unit tests.

Be careful with this approach. Frameworks like Django have DB engine specific features[1]. When you start using them in your application you can no longer use a different DB (SQLite) to run your unit tests.

[1] https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/f...

farmin4y ago

> The upcoming release of Litestream will let you live-replicate SQLite directly between databases, which means you can set up a write-leader database with distributed read replicas. Read replicas can catch writes and redirect them to the leader; most applications are read-heavy, and this setup gives those applications a globally scalable database.

Would this make lightstream a possible fit to sync a mobile device to a users own silo of data on 'server'? Would need a port of lightstream to Dart.

CGamesPlay4y ago

I agree with this article! I even went so far as to write a Prisma-like SQL client generator that uses better-sqlite3 under the hood, so you get the nice API of Prisma and the synchronous performance of better-sqlite3. I’ve been using it for a few small projects, but I just released it at 1.0 yesterday.

https://github.com/CGamesPlay/rapid-cg

onetom4y ago

+1 for SQLite! I've used it from Clojure, via HoneySQL, so no ORM, no danger of SQL injection. It was really wonderful!

https://github.com/seancorfield/honeysql

I used it to quickly iterate on the development of migration SQL scripts for a MySQL DB, which was running in production on RDS.

I might have switched to H2 DB later, because that was more compatible with MariaDB, but I could use the same Clojure code, representing the SQL queries, because HoneySQL can emit different syntaxes. Heck, we are even using it to generate queries for the SQL-variant provided by the QuickBooks HTTP API! :)

https://www.hugsql.org/ it's pretty good too, btw! it's just a bit too much magic for me personally :)

Also, you should really look into JetBrains database tooling, like the one in IntelliJ Ultimate or their standalone DataGrip product! It's freaking amazing, compared to other tools I tried. If you are an Emacs person, then I think even with some inferior shells to the command-line interfaces of the various SQL system, you can go very far a lot more conveniently, than thru some ORMs.

Either way, one secret to developing SQL queries comfortably is to utilize some more modern features, like the WITH clause, to provide test data to your queries: https://www.sqlite.org/lang_with.html

You can use it to just type up some static data, but you can also compute test data dynamically and even randomly!

Other little-known feature is the RETURNING clause for INSERT/UPDATE/DELETE: https://www.sqlite.org/lang_returning.html

It can highly simplify your host-code, which embeds SQL, because you don't have to introduce UUID keys everywhere, just so you can generate them without coordination.

DeathArrow4y ago

>SQLite isn't just on the same machine as your application, but actually built into your application process. When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That's micro, with a μ. A 50-100x improvement over an intra-region Postgres query.

We will make up for those latency losses by throwing more microservices in our fat microservices architectures, add more message brokers in the flow. For sure will find a way to bring those milliseconds back. :)

3np4y ago

A common gotcha with sqlite and WAL is how it's not supported on networked filesystems, which will bite anyone trying to keep their data volumes replicated over glusterfs, ceph, and similar with corruption.

Let's say we're running a vendored application (forking it is not an option) utilizing WAL and want to store the db on one of those filesystems not traditionally suitable for WAL'd sqlite.

Would dropping in Litestream on the db allow us to do so safely?

mrcwinn4y ago

I have really enjoyed using Fly. Great service and support.

mkleczek4y ago

I guess I am in minority here now but... Embedding an SQL database in the application is really missing the point of having RDBMS. The goal of RDBMS is not merely to persist application data but to _share_ data between different applications.

And while current trend is to implement sharing by applications I expect this to change in the future as it is much more economical to use RDBMS to share data.

whazor4y ago

I like the idea. It indeed sounds faster to redirect all write API's via your own proxy to a single write instance remote (or maybe multiple via sharding).

Via Kubernetes you could have a cross region cluster that will deal with nodes going offline and like the author said, you would have a couple of seconds downtime with speeds nowadays. Which you could resolve by smarter frontends.

DeathArrow4y ago

In terms of CAP theorem, you give up consistency and partition tolerance, leaving only availability.

For many, giving up consistency would be a big deal.

fareesh4y ago

For me the dream seems to be a relational, real-time (with optionally configurable JSON/HTML snippet updates going to client applications), with extremely good latency, offline sync, etc. Bonus if the client can pick the fields it wants a-la graphql. Some sort of Rails + Hotwire + Firebase combination which works with web pages and apps alike.

steve_gh4y ago

Thank you Ben! This is exactly what I need for the data science and analytics problems I work on. We import data from a variety of sources via an ETL process, but we want to distribute the data analytics to multiple read-only process nodes.

This gives is the speed is SQLite plus easy replication and a single source of truth.

Chapeau!!!

swlkr4y ago

The reduction in complexity from using sqlite + litestream as a server side database is great to see!

quintes4y ago

What’s the use case here, a single web app with inproc db?

More complex use cases?

I remember I could do this on azure at one point in time with app services, not Sure if it’s still a thing.. but heavy writes and scaling of those types of apps would lead to to rethink this approach right?

jl64y ago

Perhaps you could avoid the need for an additional replication tool if you happened to have some kind of synchronous stretch clustered SAN storage on which to place the SQLite database file. Moving HA to the infra layer?

thdxr4y ago

in practice how do you make a single application node the writer?

do you now need your nodes to be clustered + electing a leader and shipping writes there?

know fly.io did this with PG + Elixir but BEAM makes this type of stuff pretty easy

PhineasRex4y ago

It's been a while since we reinvented the wheel, hasn't it.

mro_name4y ago

> The conventional wisdom could use some updating.

how true in so many fields.

swah4y ago

Reminds me of https://github.com/fpereiro/backendlore

kukabynd4y ago

Great move, congrats to everyone involved. Fly is very promising player in the space. Pipeline looks amazing, and I’ll be trying more of your offerings down the road.

DeathArrow4y ago

With SQLite you embed the DB in the application. If I have 6 Kubernetes pods and the pod containing the writer dies, all other 5 pods will be useless.

InitEnabler4y ago

SQLite, has to be one of my favorite databases. It's always improving and the story behind it's creation is really quite something.

1 more reply

DeathArrow4y ago

If we don't need SQL capabilities of SQLite, we can use the file system as a document database. Rsync will take care of replication.

pjmlp4y ago

If it comes with the same tooling as Oracle and SQL Server, I might think about using it server side, until then not really.

tybit4y ago

I think this architecture would be really powerful paired with the actor model to shard databases to nodes.

rullopat4y ago

My question is: what would happen if my server blows up while Litestream is still streaming to S3?

boesboes4y ago

How well does this scale for larger data sets? Could I use it with 100GB of data for instance?

nh24y ago

The article doesn't seem to discuss one of the most fundamental guarantees of current-day DB-application interaction:

Acknowledged writes must not be lost.

For example, if a user hits "Delete my account", and gets a confirmation "You account was deleted", that answer must be final. It would be bad if the account reappeared afterwards. Similarly, if a user uploads some data, and gets a confirmation (say via HTTP 200), they should be able to assume that the data was durably stored on the other side, and that they can delete it locally.

Most applications make this assumption, and that makes sense: Otherwise you could never know how how much longer a client needs to hold onto the data until being sure that the DB stored it.

This can only be achieved reliably with a server-side network roundtrip on write ("synchronous replication"), because a single machine can fry any time.

The approach presented in the article does not provide this guarantee. It provides low latency by writing to the local SSD, acknowledging the write to the client, and then performing "asynchronous replication" with some delay afterwards. If the server dies after the local SSD write, but before the WAL is shipped, the acknowledged write will be lost. It will still be on the local SSD, but that is not of much use if the server's mainboard is fried (long time to recovery) and another server with old data takes over as the source of truth.

This is why I think it's justified that some other commenters call this approach a "cache" when compared with a multi-AZ DB cluster doing synchronous replication.

The Litestream approach seems to provide roughly the same properties as postgres-on-localhost with async replication turned on. (I also wonder if that would be an interesting implementation of this approach for Fly.io -- it should provide similar microsecond latency while also providing all features that Postgres has.)

As I understand it, Fly.io provides Postgres with synchronous replication (kurt wrote "You can also configure your postgres to use synchronous replication", https://community.fly.io/t/early-look-postgresql-on-fly-we-w...), and https://fly.io/docs/reference/postgres/#high-availability explains that it uses Stolon, which does support synchronous replication if you turn it on. But the "Postgres on Fly" page doesn't seem to explain whether sync or async is the default, and how exactly I can turn on sync mode on Fly.

So I think it would be helpful if the article stated clearly "this is asynchronous replication", thus making clear that it will likely forget acknowledged writes on machine failure, and maybe link to Fly's Postgres offering that provides more guarantees.

Maksadbek4y ago

SQLite is known for having many various extentions. If the streaming replication is so important, why didn't sqlite authors create such one before ?

pbowyer4y ago

Not surprised. Congratulations Ben!

vinay_ys4y ago

In the past two decades we have done this enough times to know better. Here's what we know:

1. Compute and storage should be decoupled because the compute vs storage hardware performance increases at different rate over generations of hardware and if our application is coupled, then choosing an efficient shape of the server hardware is very difficult.

2. We know making a single server highly reliable is very difficult (expensive) but making a bunch of servers in aggregate reliable is much much easier. Hence, we should spread our workload on a bunch of servers to reduce the blast radius of any one single server failing.

3. We know making a single server very big (scale vertically) and utilise it efficiently is also very difficult (again, read: expensive). But using a bunch of smaller servers efficiently is relatively easier and more cost effective. Here, big vs small is relative at any given point in time – the median/average size server is whatever is most popularly used – hence it is mass manufactured and sold at volume-pricing-margins and popular software has caught up to use it efficiently (read: linux kernel and popular server software).

4. We know data is ever growing and application is ever more hungry to use more data in 'smart' ways. Hence, overall size of data upon which we want to operate is ever increasing. Hence scalable data architectures are very crucial to keep up with the market competition. (Even if you believe your app can be dumb and simple, the market competition forces will move you towards becoming more data 'smart').

5. We know a lot of business models are viable only at huge scale of users. At smaller scales, the margins are so low that it isn't viable to operate. Again this is due to competition. Only scale operator survives. Hence, we know building architectures that doesn't scale to "millions of users" (even in enterprise software world) isn't viable anymore.

6. We know such scale brings more complexity – multi-tenancy, multiple regions, multiple jurisdictions etc. Internet world is becoming very complex, geo-politically etc. Multi-tenant usage based pricing models bring interesting challenges w.r.t usage metering, isolation, utilisation efficiency and security challenges. Multi-region and multi-jurisdiction brings interesting challenges w.r.t high-availability/continuity and traffic routing and cross-region data storage/replication along with encryption and key-management.

7. With all this, we have learned that layered architecture is critical to managing complexity while providing both feature agility and non-functional stability. Hence we know a lot of these complex capabilities should be solved by the lower layers in a reusable high-leverage way and not be tied to application layers. This is crucial for application layer to rapidly iterate on features to find product-market fit without destabilising these crucial non-functional core capabilities.

8. We know being able to refactor your application domain logic rapidly and efficiently is a super power for a startup hunting product market fit, for a big tech keeping up the innovation speed or any company in between just surviving the competition everyday. This refactoring super-power is crucial for keeping tech debt in control (and being able to take tech debt strategically) and not blowing up your engineering budget by having to hire like crazy (throwing bodies a the problem).

We know all this..and more.. but I'll stop here... for now.

OOPMan4y ago

Cool technical marketing blog story bro

j / k navigate · click thread line to collapse

404 comments

217 comments · 75 top-level

bob10294y ago· 36 in thread

WJW4y ago

PS: Congrats Ben!

mrkurt4y ago

It's not a perfect setup, though. You have to take the writer down to do a deploy. The next big Litestream release should solve that, and is part of what's teased in the post.

1 more reply

nicoburns4y ago

> If you don't need that, you might as well read the entire dataset into memory and be done with it.

Over in-memory data structures,SQLite gives you:

- Persistence

- Crash tolerance

- Extremely powerful declarative querying capabilities

> if you have read-only traffic you don't need sqlite replication.

2 more replies

nine_k4y ago

I do understand the point of running SQLite in-process to speed up reads.

I do not understand why SQLite must also handle intense write load with HA, failover, etc.

I would rather have the best of both worlds: a proper DB server (say, Postgres) replicated to super-fast and simple read replicas in SQLite on every node.

3 more replies

bob10294y ago

What if, due to ridiculous latency reductions, your business no longer requires more than 1 machine to function at scale?

I'm talking more about sqlite itself than any given product around it at this point, but I still think it's an interesting thought experiment in this context.

2 more replies

ok_dad4y ago

With Postgres, you might have one server, or one cluster of servers that are coordinated, and then inside there you have tables with users and the users' data with foreign keys tying them together.

3 more replies

samatman4y ago

A lot depends on your consistency requirements and data model here.

I use SQLite heavily, and have evaluated litestream and rqlite but not deployed them, so bear that in mind.

If concerns can't be isolated like this then yes, dedicated swarms of database servers are the way to go. Frequently they can be, and using SQLite punches way above its weight here.

hinkley4y ago

jolux4y ago

S3 is strongly consistent now: https://aws.amazon.com/s3/consistency/

1 more reply

judofyr4y ago

> Latency is the exact reason you would have a problem scaling any large system in the first place.

Let's not forget why we started using separate database server in the first now…

tptacek4y ago

2 more replies

nicoburns4y ago

> There are definitely use cases for Litestream, but it's far from a replacement for your typical Node + PostgreSQL stack

2 more replies

ithrow4y ago

As they say, "you are not twitter" ;)

Access to monstrous machines is easy today and you have very fast runtimes like Go and the JVM that can leverage this hardware.

closeparen4y ago

abraxas4y ago

1 more reply

carry_bit4y ago

It's exciting to see Datomic's architecture realized using more conventional technology.

throwaway8943454y ago

mrkurt4y ago

The answer is: yes, you do have to write through a single primary application instance.

So far.

The two important things here are:

1. Fly.io makes it really easy to write through a single primary application instance

2. There are ways to solve this problem so your application doesn't have to worry about it.

Right now, you have to be a little careful bouncing app instances. If you bounce the writer, you can't perform writes for 15s or whatever. This is a big problem during deploys.

There are a tremendous number of Fly.io users that are fine with this limitation, though. It's pretty valuable for some segment of our customers right now.

2 more replies

funstuff0074y ago

teleforce4y ago

Local-first software is the future:

[1]Local-First Software:You Own Your Data, in spite of the Cloud:

https://martin.kleppmann.com/papers/local-first.pdf

sanderjd4y ago

zarzavat4y ago

You get SQL and ACID. If you don't need those then you pay a performance price for having them. If you do need them, then you pay a price for not having them.

1 more reply

pgwhalen4y ago

Most data is relational, so why not store it that way?

Or, from another angle, what would your “local cache” be?

1 more reply

a-dub4y ago

downside of course is the complexity added in synchronization, which is what they're tackling here.

mwcampbell4y ago

> personally i like the idea of per-tenant databases with something like this to scale out for each tenant.

1 more reply

vmception4y ago

> SQLite isn't just on the same machine as your application, but actually built into your application process.

How is that different than whats commonly happening? Android and iOS do this... right? ... but its still accessing the filesystem to use it.

Am I missing something or is what they are describing just completely commonplace that is only interesting to people that use microservices and never knew what was normal.

mrkurt4y ago

2 more replies

tlb4y ago

deepstack4y ago

a few years back working on Java project. Used H2 instead of postgres, and included H2 db as in application memory access. It speed up quries tremendously. There is just no beating in application db.

errantmind4y ago

Just wait until (some) devs realize they don't even need sqlite, and can serialize their data directly to binary flat files with simple locking synchronization for backups.

Better yet, use both if you have both types of data. The performance benefits are enormous and well worth the complexity tradeoff in my experience.

kortex4y ago

I don't want to have to deal with locks if at all possible. Binary works fine if each file is atomic, but that does not sound like the case you are advocating.

1 more reply

raxxorraxor4y ago

overview4y ago

> Latency is the exact reason you would have a problem scaling any large system in the first place.

iveqy4y ago

Just the latency is really important to me! I even built an ERP system that has a response time below 100 ms for all operations, it's a design goal.

My thought is that if you can see consumer changes depending on latency (for example on amazon or google) it is equally important for internal tools. Employee time is expensive.

kumarvvr4y ago

Throughput for a single service / app improves, but does it really scale? Across a cluster, you will have to have data replication and sync routines, that are a whole mess themselves.

The latency is not reduced, it is shifted elsewhere.

mekster4y ago

I want functions.

dsincl124y ago· 14 in thread

Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

What am I missing?

masklinn4y ago

> Uhm... experience from a large project that used SQLite was that we where hit with SQLite only allowing one write transaction at a time. That is madness for any web app really.

littlecranky674y ago

pc864y ago

Is there something you can point to that explains this "flush them to the durable DB once in a while" pattern in more detail?

1 more reply

jbverschoor4y ago

- Most transactions are read-only

- "Large" applications can usually be sharded by account. This means 1 file per account, and can easily be put on the most optimal geolocation of the account

- If you defer locking until commit, allowing multiple writers ( https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi... ). This is good enough for most applications anayway.

- Sqlite simple, fast enough for almost anything, supports a good set of features and datatypes, is very easy to embed.

tidenly4y ago

Why would I bake all of those assumptions and limitations into my system though just on the hope it won't ever become a problem

5 more replies

samwillis4y ago

Quite right it’s not one size fits all but for any site that’s mostly read only it’s a brilliant solution.

Simon Willison has written about it and coined the term “baked data”: https://simonwillison.net/2021/Jul/28/baked-data/

Mozilla.org uses this architecture, Django app running off SQLite with the db rsync’ed to each application server.

quickthrower24y ago

1 more reply

phaedrus4y ago

The default settings of SQLite are very conservative and essentially enforce serial writes. With tuning and loosening that enforcement, you can go from 50 writes per second to 50,000.

Edit: forgot to mention that yes a major part of that is batching writes into fewer, bigger transactions; AFAIK you can't really get around that.

andai4y ago

https://www.sqlite.org/faq.html#q19

>INSERT is really slow - I can only do few dozen INSERTs per second

>Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second.

>By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction.

dagw4y ago

What am I missing?

Many sites are Read (almost) Only. For sites where users interactively query/view/explore the data, but (almost) never write their own, it works great.

unicornporn4y ago

Speaking of this, I really wish there was SQLite support in WordPress...

1 more reply

beberlei4y ago

use more than one SQLite file? we have one per day and project for example.

smt884y ago

I don't know if you're joking or not, but this would just be reinventing the Postgres/SQL Server/Oracle/MySQL wheel using duct tape and wishes.

If you're doing something that multiple systems have had millions of hours of development to do, just use one of those.

1 more reply

sph4y ago

But why? That seems such an unnecessary hack.

1 more reply

paulhodge4y ago· 8 in thread

mwcampbell4y ago

https://glyph.twistedmatrix.com/2008/06/this-word-scaling.ht...

hantusk4y ago

Same pattern is ActorDB: https://github.com/biokoda/actordb

freedomben4y ago

Scarbutt4y ago

Each user's account gets their own SQLite file.

So now you need one database connection per user...

tptacek4y ago

And? It's SQLite; it's a file handle and some cache, not a connection pool.

mwcampbell4y ago

freedomben4y ago

Without knowing details about the app, it's hard to know if that would matter. If a small number of concurrent users would ever be using it, I would think it would be NBD.

robertlagrant4y ago

If by connection you mean in-process database.

swaraj4y ago· 8 in thread

Looks v cool, but I feel like I'm missing a big part of the story, how do 2 app 'servers/process' connect to same sqlite/litestream db?

Do you 'init' (restore) the db from each app process? When one app makes a write, is it instantly reflected on the other app's local sqlite?

judofyr4y ago

zepolen4y ago

swaraj4y ago

This is my main q: are the writes replicated in real-time? Do the apps that just need read access have to repeatedly call 'restore'?

3 more replies

johnrrk4y ago

I also investigated SQLite and it's not clear how we can use it with multiple servers.

So it seems that we can't have 2 Node.js servers accessing the same SQLite file on a shared volume.

[1] https://sqlite.org/wal.html

tptacek4y ago

gizzlon4y ago

> I'm not sure how to do zero downtime deployment

AFAIK, you either:

1) Don't, and eat a few seconds of downtime (f.ex if the clients re-try in the background, or..)

2) Start two processes on the same machine (believe that's always safe)

3) Share the database over the network in a way that's safe with sqlite3. Think it's possible, but at this point things are getting too complicated to be worth it IMO.

thruflo4y ago

Also how does the WAL page based replication maintain consistency / handle concurrent updates?

infogulch4y ago

It doesn't, this gives you a read-only replica only.

wasd4y ago· 6 in thread

Fly is putting together a pretty great team and interesting tech stack. It's the service I see as a true disruptor to Heroku because it's doing something novel (not just cheaper).

the_biot4y ago

If only they could keep their website reachable, that would be the icing on the cake. Like every time I see them linked on HN, I click and cannot connect to their website.

Last time somebody from fly said they'd look into it, but alas. It was related to IPv6 on their end, was as far as I could tell.

mrkurt4y ago

4 more replies

quickthrower24y ago

purplerabbit4y ago

michaeldwan4y ago

1 more reply

tptacek4y ago

tiffanyh4y ago· 6 in thread

@dang, the actual title is “ I'm All-In on Server-Side SQLite”

Maybe I missed it but where in the article does it say Fly acquired Litestream?

EDIT: Ben Johnson says he just joined Fly. Nothing about Fly “acquiring” Litestream.

https://mobile.twitter.com/benbjohnson/status/15237489883352...

dang4y ago

Elsewhere in this thread he says "the project was acquired" which is more or less "Fly.io Buys Litestream" (the submitted title).

I'm honestly not sure whether we should change it or not - minimizing complaints is the goal - what's it called when a function has two points that it keeps unstably jumping between?

mrkurt4y ago

that function is correct when it agrees with me.

lnsp4y ago

As far as I understood it, Fly.io hired the person working on Litestream and pays them to keep working on Litestream.

tiffanyh4y ago

That’s how I understood it and that’s radically different than how this HN post got titled.

Ben Johnson confirms how you framed it here:

https://mobile.twitter.com/benbjohnson/status/15237489883352...

1 more reply

gamblor9564y ago

"Litestream has a new home at Fly.io, but it is and always will be an open-source project"

Very bottom of the post. Technically, Litestream remains an open-source project, so it's more accurate to say that Fly.io acquired the brand IP and the owner of that IP.

bussetta4y ago

The tweet[1] links the blog post and says Litestream is part of fly.io now.

[1]https://twitter.com/flydotio/status/1523743433109692416

anyfactor4y ago· 5 in thread

Story time!

A client told me that they will use a DigitalOcean droplet for a web app. Because the database was very small I chose to use SQLite3.

For some reason, it was infuriating that I have to use a database like that to store few thousand rows of data. Moreover, I would have to rewrite a ton of stuff accommodate the change to Postgres.

I ended up using firestore.

---

I think something like this could have saved me a ton of hassle that day.

luhn4y ago

It was too much work to migrate from SQLite to PostgreSQL, so you migrated to... a NoSQL DB?

pjot4y ago

I think they’re referring to the trade from managing one system (DO + SQLite) to two (Heroku + pg) and instead choosing Firestore instead as it’s only one system to manage.

szundi4y ago

He wrote it was a “day” at the end. This guy is fast.

1 more reply

me_me_mu_mu4y ago

Just curious if you’ve ever had to migrate data out of firestore.

somishere4y ago

ignoramous4y ago· 5 in thread

Looking forward to ditching my PlanetScale plans for this!

> ...people use Litestream today is to replicate their SQLite database to S3 (it's remarkably cheap for most SQLite databases to live-replicate to S3).

Cloudflare R2 would make that even cheaper. Cloudflare set to open beta registration this week.

And if you squint just enough, you'd see R2, S3 et al are nosql KV store themselves, masquerading as disk drives, and used here to back-up a sql db...

> My claim is this: by building reliable, easy-to-use replication for SQLite, we make it attractive for all kinds of full-stack applications to run entirely on SQLite.

> And if you don't need the Postgres features, they're a liability.

Reminds me of WireGuard, and how it accomplishes so much more by doing so much less [2].

Congratulations Ben (but really, could have taken a chance with heavybit)!

----

[0] https://hbr.org/2015/12/what-is-disruptive-innovation

chloerei4y ago

> Cloudflare set to open beta registration this week.

Any source? I wait for a long time.

jgrahamc4y ago

That's correct. Tomorrow. R2 open beta and a hell of a lot more.

ignoramous4y ago

May 11: https://archive.is/2u5Rt

unmole4y ago

> Cloudflare R2 would make that even cheaper.

Cloudflare R2 has free egress. The read and write operations themselves are not that much cheaper than S3.

ignoramous4y ago

Ah, you're right! Litestream isn't likely be egress/ingress heavy...

foodstances4y ago· 5 in thread

Just curious, is there any financial compensation/support going to Richard Hipp with all of this money changing hands?

mrkurt4y ago

Yes. We (Fly.io) are buying a sqlite support agreement. We also send money WireGuard's way. I'm pretty sure Tailscale does too.

defen4y ago

> We have also given OSS authors advisor equity

That's a fantastic idea. In retrospect it's a really obvious idea but I've never heard of anyone doing it before. Is this a common thing that I'm just oblivious to?

1 more reply

foodstances4y ago

That's great to hear, thank you!

qbasic_forever4y ago

That said SQLite has a business model of selling support and premium features like encryption: https://www.sqlite.org/prosupport.html

foodstances4y ago

It's like when RedHat when public and offered pre-IPO stock to open source developers.

kall4y ago· 4 in thread

> it won't work well on ephemeral, serverless platforms or when using rolling deployments

That's... a lot of new applications these days.

mwcampbell4y ago

> it won't work well on ephemeral, serverless platforms or when using rolling deployments

I assumed that was what Fly was hiring Ben to work on.

mrkurt4y ago

Yes. Yes it is.

emptysea4y ago

Yeah the rolling deployments gotcha really stuck out to me. I think most PaaS will provide that by default anyways because who wants downtime during deploys?

mwcampbell4y ago

mrkurt specifically mentioned that a solution for that is in the works. https://news.ycombinator.com/item?id=31319544

otoolep4y ago· 4 in thread

Congratulations to Ben! This project has been like a rocket ship.

wolfhumble4y ago

This is really useful and fun, thanks! Godspeed on this new part of the journey!

benbjohnson4y ago

Thanks, Philip!

Loic4y ago

Thank you Ben.

[0]: https://www.chemeo.com/search?q=methane

abrookewood4y ago

Hey Ben, any chance you can sit next to Chris McCord and get SQLite support in Phoenix :)

1 more reply

no_wizard4y ago· 3 in thread

This a great and interesting offering! I think this fits well with fly.io and their model of computing.

Now I regret!

Just a long winded way of saying, congrats! This is awesome! Thanks for doing exactly what I wanted to do but didn't have the guts to follow through with.

evntdrvn4y ago

there’s some stuff out there:

- https://github.com/rqlite/rqlite - https://github.com/chiselstrike/chiselstore - https://dqlite.io/

I’m sure there’s more, those are just the ones I remember.

epilys4y ago

ComputerGuru4y ago

Did you write your own rust raft implementation or reuse something already available?

1 more reply

NeutralForest4y ago· 3 in thread

There's something I don't understand, it says that the "data is next to the application", what does it mean? Where is stored and how is it accessed by the application?

tptacek4y ago

NeutralForest4y ago

Thanks for the explanation!

ledauphin4y ago

it means the data is stored in a file on the local drive of a computer that is also running the application.

it also means that it is the application itself (via the SQLite library) that reads and modifies that database file. There is no separate database process.

1 more reply

jchw4y ago· 2 in thread

michaeldwan4y ago

> there’s no obvious way to resize disks

Yes, this sucks right now. Resizable disks is on our list, we just need somebody to spend a few days on it. Luckily we're hiring platform engineers [1] to work on fun problems like that.

> I actually am bothered that I basically don’t get billed.

[1] https://fly.io/jobs/platform-product-engineer/

ignoramous4y ago

> Yes, this sucks right now.

tyingq4y ago· 2 in thread

https://dqlite.io/docs/architecture

tptacek4y ago

https://fly.io/blog/a-foolish-consistency/

otoolep4y ago

rqlite author here. Anything else you can tell me about why you decided against it? Just simpler, as you say, to avoid a distributed system when you can (something I understand).

1 more reply

obiwanpallav14y ago· 2 in thread

In which scenario would you use litestream[1] vs rqlite[2]?

1 - https://github.com/benbjohnson/litestream

2 - https://github.com/rqlite/rqlite

otoolep4y ago

Check out the rqlite FAQ for more.

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md#How-...

benbjohnson4y ago

1 more reply

mwcampbell4y ago· 2 in thread

warmwaffles4y ago

> Is support for SQLite in the Elixir ecosystem, particularly Ecto, good enough for this?

Why yes it is. I maintain the `exqlite` and `ecto_sqlite3` libraries and it was just integrated in with `kino_db` which is used by `livebook`.

https://github.com/elixir-sqlite/exqlite

lawik4y ago

I still love you for making this happen.

netcraft4y ago· 2 in thread

bambax4y ago

netcraft4y ago

_vufv4y ago· 2 in thread

jjeaff4y ago

If you want to maintain much of the data in memory, wouldn't that require a process?

_vufv4y ago

Sure. If you need your software to be a process I think you should build it to be both: a library first and a process second. Libraries are so much easier to use, test and reason about.

melony4y ago· 2 in thread

Note that the popular Node.js ORM Prisma does not support WAL.

https://github.com/prisma/prisma/issues/3303

tylergetsay4y ago

It also crashes if you try to write to the DB while its open https://github.com/prisma/prisma/issues/2955

LAC-Tech4y ago

Best option for SQlite with node is this.

https://github.com/JoshuaWise/better-sqlite3

Author is all over the issues section, and seems very knowledgeable about how SQLite works.

beck54y ago· 2 in thread

I have found it easy to overload SQLite with too many write operations (20+ Concurrently), is this typical behaviour referred to in the post, or a write heavy workload?

benbjohnson4y ago

Scarbutt4y ago

How big are the writes? are you storing blobs?

endisneigh4y ago· 2 in thread

What’s an example of a popular app (more than 100K users) that uses lite stream? Curious to see how this looks like in production

benbjohnson4y ago

[1]: https://tailscale.com/blog/database-for-2022/

1 more reply

jkaplowitz4y ago

Tailscale: https://tailscale.com/blog/database-for-2022/

I don't know their user count, but they are growing well and just raised their Series B.

seanwilson4y ago· 2 in thread

SQLite uses dynamic types? Is this an issue in practice, especially for large apps? Don't you lose guarantees about your data which makes it messy to handle on the backend?

ripley124y ago

You can use SQLite in strict mode if you prefer. https://www.sqlite.org/stricttables.html

aliswe4y ago

This sounds like schemalessness to me? serious "question".

1 more reply

nojvek4y ago· 2 in thread

Somebody needs to build litestream for duckdb (columnstore oriented sqlite like db).

That would be epic. DuckDB speed is crazy fast when it comes to aggregate/analysis queries.

viktour194y ago

Yes please!

safehell4y ago

DuckDB doesn't need an equivalent to litestream, it already have parquet files and object storage support.

1 more reply

jasfi4y ago· 2 in thread

Is there a good DB admin GUI that supports both SQLite and Postgres?

jwaterhouse4y ago

You mean like DBeaver?

https://dbeaver.io/

jpcapdevila4y ago

I love Datagrip by jetbrains.

0xbadcafebee4y ago· 1 in thread

Things I would like in a database:

- Register a Linux container as a UDF or stored procedure. Use with pub/sub to create data-adjacent arbitrary data processing of realtime data

- Fine-grained cryptographically-verified least-privilege access control. No i/o without a valid short-lived key linked to rules allowing specific record access.

sa464y ago

> I want to check out the state of the world at a specific commit, write a change, a week later write another change, revert the first change 3 weeks later.

This is sort of like temporal tables but with the ability to branch from a previous point of history. I'm not sure it would play well with foreign keys.

You could branch the entire database with either point-in-time recovery or with the file-system using ZFS. Postgres.ai turned this into a product.

splitrocket4y ago· 1 in thread

There are a couple of interesting options in a similar space: BedrockDB ( https://bedrockdb.com/ ) Dqlite ( https://dqlite.io/ ) Rqlite ( https://github.com/rqlite/rqlite )

I'm interested in how this performs and particularly, what are the tradeoffs relative to the other options above.

benbjohnson4y ago

LunaSea4y ago· 1 in thread

I wonder if we'll ever see an embedded version of PostgreSQL?

nicoburns4y ago

That's basically what SQLite is (notably, SQLite makes an effort to be compatible with Postgres's SQL syntax). If you mean based off the actual PostgreSQL codebase, then I highly doubt it.

1 more reply

learndeeply4y ago· 1 in thread

Since both Fly.io and Litestream founders are here - why not disclose the price?

benbjohnson4y ago

Litestream author here. I just posted it as a reply here: https://news.ycombinator.com/item?id=31319556

downut4y ago· 1 in thread

I do not understand how one implements the multi-role access system on top of SQLite that postgresql gives you for free.

Other than do it from scratch (eeek!) on the app side.

Just as an example, think of the smallest db backed factory situation you can imagine... as small as you like. There will need to be multiple roles if more than one role accesses the database tables.

tptacek4y ago

ok_dad4y ago· 1 in thread

I was just about to start using this for a project, I hope the license won’t change.

Congrats to the author though, no matter what! I wish everyone could be so successful.

benbjohnson4y ago

Litestream author here. It'll continue to be open source under an Apache 2 license.

ilrwbwrkhv4y ago· 1 in thread

For how much?

benbjohnson4y ago

1 more reply

kondro4y ago· 1 in thread

jpcapdevila4y ago

It only PUTs if there's writes.

1 more reply

daniel_iversen4y ago· 1 in thread

jpcapdevila4y ago

Could you elaborate on the features that would make MySQL a safer choice?

coliveira4y ago· 1 in thread

case4y ago

Fwiw, Tailscale has done this, and written about it:

https://tailscale.com/blog/database-for-2022/

zsims4y ago· 1 in thread

> It was reasonable to overlook this option 170 years ago, when the Rails Blog Tutorial was first written.

Woah. Rails is really old

faitswulff4y ago

This whole article is written in an amusing way. It was really easy reading.

jrochkind14y ago

I'd be curious to hear reactions to/experiences with that suggestion/technology, inside or outside the context of fly.io.

mtlynch4y ago

Super cool! Congrats, Ben!

aidenn04y ago

kgeist4y ago

>But database optimization has become less important for typical applications. <..> As much as I love tuning SQL queries, it's becoming a dying art for most application developers.

krts-4y ago

A great project with awesome implications. Well deserved, and the fly.io team are very pragmatic.

This will be even more brilliant than it already is when fly.io can get some slick sidecar/multi-process stuff.

rco87864y ago

All of the action around SQLite recently is very exciting!

RcouF1uZ4gsC4y ago

I love Litestream! It is so simple and it just works!

Congratulations, Ben, on making a great product and on the sale!

One thing I have had in the back of my mind, but have not had the time to pursue is using SQLite replication to make something similar to CloudFlare's durable objects but more open.

scwoodal4y ago

> According to the conventional wisdom, SQLite has a place in this architecture: as a place to run unit tests.

[1] https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/f...

farmin4y ago

Would this make lightstream a possible fit to sync a mobile device to a users own silo of data on 'server'? Would need a port of lightstream to Dart.

CGamesPlay4y ago

https://github.com/CGamesPlay/rapid-cg

onetom4y ago

+1 for SQLite! I've used it from Clojure, via HoneySQL, so no ORM, no danger of SQL injection. It was really wonderful!

https://github.com/seancorfield/honeysql

I used it to quickly iterate on the development of migration SQL scripts for a MySQL DB, which was running in production on RDS.

https://www.hugsql.org/ it's pretty good too, btw! it's just a bit too much magic for me personally :)

Either way, one secret to developing SQL queries comfortably is to utilize some more modern features, like the WITH clause, to provide test data to your queries: https://www.sqlite.org/lang_with.html

You can use it to just type up some static data, but you can also compute test data dynamically and even randomly!

Other little-known feature is the RETURNING clause for INSERT/UPDATE/DELETE: https://www.sqlite.org/lang_returning.html

It can highly simplify your host-code, which embeds SQL, because you don't have to introduce UUID keys everywhere, just so you can generate them without coordination.

DeathArrow4y ago

>SQLite isn't just on the same machine as your application, but actually built into your application process. When you put your data right next to your application, you can see per-query latency drop to 10-20 microseconds. That's micro, with a μ. A 50-100x improvement over an intra-region Postgres query.

3np4y ago

Let's say we're running a vendored application (forking it is not an option) utilizing WAL and want to store the db on one of those filesystems not traditionally suitable for WAL'd sqlite.

Would dropping in Litestream on the db allow us to do so safely?

mrcwinn4y ago

I have really enjoyed using Fly. Great service and support.

mkleczek4y ago

And while current trend is to implement sharing by applications I expect this to change in the future as it is much more economical to use RDBMS to share data.

whazor4y ago

I like the idea. It indeed sounds faster to redirect all write API's via your own proxy to a single write instance remote (or maybe multiple via sharding).

DeathArrow4y ago

In terms of CAP theorem, you give up consistency and partition tolerance, leaving only availability.

For many, giving up consistency would be a big deal.

fareesh4y ago

steve_gh4y ago

This gives is the speed is SQLite plus easy replication and a single source of truth.

Chapeau!!!

swlkr4y ago

The reduction in complexity from using sqlite + litestream as a server side database is great to see!

quintes4y ago

What’s the use case here, a single web app with inproc db?

More complex use cases?

jl64y ago

thdxr4y ago

in practice how do you make a single application node the writer?

do you now need your nodes to be clustered + electing a leader and shipping writes there?

know fly.io did this with PG + Elixir but BEAM makes this type of stuff pretty easy

PhineasRex4y ago

It's been a while since we reinvented the wheel, hasn't it.

mro_name4y ago

> The conventional wisdom could use some updating.

how true in so many fields.

swah4y ago

Reminds me of https://github.com/fpereiro/backendlore

kukabynd4y ago

Great move, congrats to everyone involved. Fly is very promising player in the space. Pipeline looks amazing, and I’ll be trying more of your offerings down the road.

DeathArrow4y ago

With SQLite you embed the DB in the application. If I have 6 Kubernetes pods and the pod containing the writer dies, all other 5 pods will be useless.

InitEnabler4y ago

SQLite, has to be one of my favorite databases. It's always improving and the story behind it's creation is really quite something.

1 more reply

DeathArrow4y ago

If we don't need SQL capabilities of SQLite, we can use the file system as a document database. Rsync will take care of replication.

pjmlp4y ago

If it comes with the same tooling as Oracle and SQL Server, I might think about using it server side, until then not really.

tybit4y ago

I think this architecture would be really powerful paired with the actor model to shard databases to nodes.

rullopat4y ago

My question is: what would happen if my server blows up while Litestream is still streaming to S3?

boesboes4y ago

How well does this scale for larger data sets? Could I use it with 100GB of data for instance?

nh24y ago

The article doesn't seem to discuss one of the most fundamental guarantees of current-day DB-application interaction:

Acknowledged writes must not be lost.

Most applications make this assumption, and that makes sense: Otherwise you could never know how how much longer a client needs to hold onto the data until being sure that the DB stored it.

This can only be achieved reliably with a server-side network roundtrip on write ("synchronous replication"), because a single machine can fry any time.

This is why I think it's justified that some other commenters call this approach a "cache" when compared with a multi-AZ DB cluster doing synchronous replication.

Maksadbek4y ago

SQLite is known for having many various extentions. If the streaming replication is so important, why didn't sqlite authors create such one before ?

pbowyer4y ago

Not surprised. Congratulations Ben!

vinay_ys4y ago

In the past two decades we have done this enough times to know better. Here's what we know:

We know all this..and more.. but I'll stop here... for now.

OOPMan4y ago

Cool technical marketing blog story bro

j / k navigate · click thread line to collapse