https://tailscale.com/blog/database-for-2022
I only say this because I have made this mistake at my previous startup. We built these really cool distributed databases on top of a similar storage engine (RocksDB) plus Kafka, but it ended up being more trouble than it was worth. We should have just used a battle-tested relational database instead.
Using SQLite for these applications is really fun, and it seems like a good idea on paper. But in practice I just don't think it's worth it. YMMV though.
Also databases all use the same fundamental primitives and it's up to you to choose the level of abstraction you need. For example, FoundationDB is a durable distributed database that uses SQLite underneath as the storage layer but exposes an ordered key/value API, but then allows you to build your own relational DB on top.
If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices like CockroachDB/Yugabyte/TiDB/Memsql/etc that can serve the purpose instead of building your own.
This might have limited utility if the goal were to combine RocksDB with something else. And even less for SQLite and something else.
The big push of interest in SQLite serverside isn't driven by people who have never set up pgbounce, but rather by developers who have both read the SQLite docs very carefully and have used the library extensively, and know what it's good for.
If you're saying that a replicated Postgres setup would be simpler than what you're built, I agree; but SQLite+Litestream probably would be too.
I've been looking for a turn key solution that is better than me running a single node Postgres instance "bare metal" or in a container.
postgres-operator seems cool but... k8s, pretty heavy I guess.
https://en.wikipedia.org/wiki/POPFile https://getpopfile.org/browser/trunk/engine/POPFile/Database...
> As of version 3.33.0 (2020-08-14), the SQLite library consists of approximately 143.4 KSLOC of C code. ... By comparison, the project has 640 times as much test code and test scripts - 91911.0 KSLOC.
I don't usually place much stock in those sort of counts, but 640x is notable.
It makes sense considering the wide variety of use-cases, from embedded devices to edge computing and everything in between.
[1]: https://www.sqlite.org/testing.html [2]: https://sqlite.org/src/dir?ci=trunk
Problem is, there's still a huge market for these apps but everything has moved to the web (no one is making desktop apps anymore). So having a full-blown RDMS is overkill for these kind of app, and now SQLite is starting to fill these web app needs.
@sqlite - if you are reading this, any word on merging WAL2 and BEGIN CONCURRENT into main? There clearly is a new class of needs to do so in this world that has completely moved over to web app development (which introduces concurrency problems never experienced on desktop). Any thoughts of focusing more on these web related needs for SQLite (or maybe even fork your own code base to have a more enhanced SQLite version targeted at web needs)?
At least that's been my observation since I started coming around here.
The fact that it's just a file on disk limits the usage.
Any state mutation is ultimately ordered in time and how that that ordering is accomplished depends on the abstractions you're using: in your app, network layer, database, etc.
It’s interesting because you have to consider how to scale your database as well as your application. The fact that you don’t have one central database opens up more possibilities. But it doesn’t work for all instances (such as a shared read-write data source for all users). For example, this approach wouldn’t work for something like Twitter (at least the original architecture).
sqlite is a great embedded database and thanks to use by browsers and on mobile the most used database in the world by orders of magnitude.
But it also comes with lots of limitations.
* there is no type safety, unless you run with the new strict mode, which comes with some significant drawbacks (eg limited to the handful of primitive types)
* very narrow set of column types and overall functionality in general
* the big one for me: limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)
These approaches (like fly.io s) with read replication also (apparently?) seem to throw away read after write consistency. Which might be fine for certain use cases and even desirable for resilience, but can impact application design quite a lot.
With sqlite you have do to a lot more in your own code because the database gives you fewer tools. Which is usually fine because most usage is "single writer, single or a few local readers". Moving that to a distributed setting with multiple deployed versions of code is not without difficulty.
This seems to be mitigated/solved here though by the ability to run worker code "next to the database".
I'm somewhat surprised they went this route. It probably makes sense given the constraints of Cloudflares architecture and the complexity of running a more advanced globally distributed database.
On the upside: hopefully this usage in domains that are somewhat unusual can lead to funding for more upstream sqlite features.
I don't know where this idea of having to swap a whole table in SQLite came from, but it simply isn't true. Over the last 13 years I have upgraded production HashBackup databases at customer sites a total of 35 times without rewriting and swapping out tables by using the ALTER statement, just like other databases:
https://www.sqlite.org/lang_altertable.html
For the most recent upgrade, I upgraded to strict tables, which I could also have done without a rebuild/swap. I chose to do a rebuild/swap this one time because I wanted to reorder some columns. Why? Because columns stored with default or null values don't have row space allocated if the column is at the end of the row.
I'm embarrassed to admit that I didn't realize RENAME COLUMN was actually added in 3.25, almost four years ago.
DROP COLUMN was only just added last year in 3.35.
I'm surprised a database schema lasted 9/12 years without ever renaming or dropping a column.
This changes things! But even now, ALTER TABLE is not transactional. So especially with many concurrent readers there can definitely be situations where you'd still want to rewrite.
`FULL OUTER JOIN` is the secret to diff'ing table sources. `MERGE` is just a diff operation + insert/update/delete statements to make the target table more like the source one (or even completely like the source one).
`FULL OUTER JOIN` is essential to implementing `MERGE`. Granted, one could implement `MERGE` without implementing `FULL OUTER JOIN` as a public feature, but that seems silly.
Sadly, the SQLite3 dev team specifically says they will not implement `FULL OUTER JOIN`[0].
Implementing `MERGE`-like updates without `FULL OUTER JOIN` is possible (using two `LEFT OUTER JOIN`s), but it's an O(N log N) operation instead of O(N).
The lack of `FULL OUTER JOIN` is a serious flaw in SQLite3. IMO.
[0] https://www.sqlite.org/omitted.htmlPrisma Migrate can automatically generate these steps, removing most of the pain. I'm sure other migration tools can do this as well.
Sounds like there will be no synchronous replication and instead there will be a background process to "constantly keep [read-only clones] up-to-date". This means that a stale read from an older read replica can occur even after a write transaction has successfully committed on the "primary" used for writes.
So, while the consistency is not "thrown away", it's no longer a strong consistency? Anyway, Kyle from Jepsen will figure it out soon, I guess :)
Disclaimer: I work at Cloudflare :)
Edit: But that would mean that durable objects can't be replicated asynchronously? That would mean a big latency hit. Then what's the difference to a central DB in one datacenter?
It's more of a "quickstart" than a peek under the hood.
all the hip service providers seem to be all over it which would indicate pretty good modernity to me at least.
Also, I wonder how hard it will be to migrate existing PostgreSQL databases and SQL statements. Of course, I understand if Cloudflare is focused on greenfield applications.
- SQLite is traditionally embedded in an application, so Prisma interacts with it by mounting a file. Workers does not have a local filesystem, and D1 is exposed over the network through an API accessible from a Worker. Prisma will have to create a specific connector for D1. - Workers have a script size limit which is currently 1MB. My understanding is that Cloudflare will be increasing this in the future. We also have specific work to decrease the size of Prisma. Both of those will have to happen before Prisma could be used with D1.
Note that Prisma already support querying Postgres, MySQL, SQL Server and MongoDB from Cloudflare Workers through the Prisma Data Proxy, which will see a GA release next month.
We are also very excited about D1 as a way to bring a subset of data closer to users in order to deliver faster experiences. We hope this will be a way to bring the benefit of edge computing to larger organisations who cannot simply rearchitect everything to run on Workers.
I am also excited about this :)
Source: someone who avoided learning SQL for 20 years.
I too am eagerly waiting for a good serverless nodejs framework that is "batteries included". I've deployed on Lambda using the "Serverless Framework" but once your app grows to a certain size everything starts to fall apart and you lose some of the magic. Unfortunately, most of the things that advertise themselves as serverless/lambda/worker nodejs frameworks are monoliths and/or an existing monolith framework that "supports" lambda (with a billion asterisks after that). There is absolutely nothing wrong with monolith frameworks, I love them, but just not for lambda, I want to deploy a single endpoint as a single function (or as a cron, or queue listener, etc), not all of my code for every function (you hit size limits quick with this method).
I want express/nestjs/etc-type routes that I define with code or annotations that result in /only/ that function (endpoint) being bundled up and deployed. I ended up rolling my own "framework" on top of Serverless Framework (uses serverless.ts config file that scans my directories for a special file that defines the routes defined in that directory) but Serverless Framework is pretty shaky ground. Their documentation is a mess, Serverless Components appears dead, and they seem to be busy with their own "cloud" so I don't know how much longer I can keep building on top of them.
When it works it's like magic but there are a ton of walls you run headfirst into: Cloud formation entity limits, package size limits, typescript/bundling support, clear disregard for medium/large projects ("Just use multiple services", this leads to a terrible dev experience), and long deploy times.
I wish CF Workers had been out when I first started building my current project, I might have gone in that direction instead, I still might.
Deno can compile to wasm, so it can plug in through that vertical. But it's just TS on the frontend.
I'm mainly a python programmer, but Deno's been the most alluring development in the JS ecosystem since typescript for me. Might be helpful to you all to capture some steam from source.
[1]: https://deno.land/
I guess it would count as a client focused ORM :)
I'll be reaching out from jp@javascriptdb.com
Great addition, congrats!
I'm guessing this is a single master database with multiple read replicas. That means it's not consistent anymore (the C in ACID). Obviously reads after a write will see stale data until the write propogates.
I'm a bit curious how that replication works. Ship the whole db? Binary diffs of the master? Ship the SQL statements that did the write and reapply them? Lots of performance and other tradeoffs here.
What's the latency like? This likely doesn't run in every edge location. Does the database ship out on the first request. Get cached with an expiry? Does the request itself move to the database instead of running at the edge - like maybe this runs on a select subset of locations?
So many questions, but no details yet.
In fact, I don't see anything D1 is doing that is not already offered by something like rqlite[1], which is also a super-easy-to-use distributed database built on SQLite. Of course Cloudflare will run the database for you, which is a great help -- they take care of the uptime, monitoring, backups, etc. And that's important obviously, because in the real-world databases must be operated.
Disclaimer: I am the creator of rqlite.
I believe that the power of what Cloudflare offers here isn’t in the actual database. It’s the packaging and how it sits in their serverless world. Even with rqlite, I still need ip addresses to run a resilient system. As someone who sometimes needs a table here snd there, I really, really don’t want a server. I want a table to store a thousand records in and that’s it. This is where I would very much enjoy using something like D1.
A combo of D1, R2 and Workers is a serious contender for over-the-top serverless distributed apps. This is great.
TFA does say that read-replicas will be present at every edge location, which makes sense for a product like Workers. But it doesn't mention writes at all.
Single master with read replicas is fully consistent if commits don't return until propagated to and acknowledged by replicas (the expense here being commit latency.)
Disclaimer: I am the creator of rqlite.
sqlite is accessed via a socket? defeats the whole purpose of using sqlite.
Many here are mentioning using one sqlite file per customer but that sounds like a nightmare for migrations and analytics.
SQLite is great and all these new services and articles are nice but intentionally shadowing lots of complexity.
Also, any plans to support PATCH x-update-range so SQLite can be used entirely in the browser via SQLite.js?
Can someone enlighten me with the types of use cases this would be better for vs say Postgres?
I don’t understand how it will be usable at all in a website with multiple users. Is the idea to make your site to every user gets their own database? How do you stop SQL injection?
Once you solve all of these problems aren’t you better off just using Postgres?
Cloudflare and fly.io both promise hassle free read replicas and backup. They will both offer only a single node capable of writes, because that’s how SQLite rolls.
This is a pretty good fit for a read heavy load that requires SQL and very low latency.
Those offerings are great for use-cases that don't need that kind of consistency, which are many.
It all depends on the use-case, of course. A traditional hosted Postgres or MySQL database or cluster is certainly the go-to solution for all who need advanced features or full consistency, which only synchronous replication could provide.
I see cloudflare people are on this post, any chance to compar D1 vs postgres in terms of DB features?
Insert ... Returning
Stored procedures and triggers
Etc etc
Would be really helpful to get a comparison like cockroachDB did here https://www.cockroachlabs.com/docs/stable/postgresql-compati...
Or even better, a general sql compatibility matrix like this https://www.cockroachlabs.com/docs/stable/sql-feature-suppor...
Kudos to the cloudflare team!
RETURNING is covered.
Stored procedures are indirectly there by running your own code "next to the database", as mentioned in the post. Which is arguably much nicer than having to use some database specific language, given that you can run WASM on workers.
"indirectly" is a keyword here, because running code when data is modified potentially won't replace triggers since they'll probably execute outside the running transaction.
Long story short, don't expect anything fancy. Support for alter table is limited, and concurrency can be an issue.
Am I missing something? Is there no bandwidth cost at all?
edit: maybe one day! this looks cool regardless
If the latter, and anyone from CloudFlare is here, is there any chance to have SpatiaLite enabled?
I'm thinking of sqlx in Rust (or any other language binding / ORM for that matter), which has compile time schema safety. This is a nice capability, and because this interface seems non-standard (possibly for good reason), I guess we are being asked to give some of those things up.
I am getting a bit ahead of myself on the Rust part (presumably that will eventually be supported as part of workers-rs), but I think the feelings still stand if you consider the JS ecosystem.
Edit: I may actually be wrong, but presumably the entire surface isn't covered because there's no file opening, etc.
(It's not really HTTP as in it might never cross a TCP socket, just get shuffled from one V8 isolate to another, but it looks like a `fetch` call to the Javascript.)
It's also worth remembering that SQLite itself has no wire protocol, it's a library. And there is no such thing as a "SQL wire protocol". It sure isn't gonna be Postgres wire protocol either.
From the article:
> D1’s API includes batching: anywhere you can send a single SQL statement you can also provide an array of them, meaning you only need a single HTTP round-trip to perform multiple operations. This is perfect for transactions that need to execute and commit atomically:
The demo is also a bit buggy: orders are duplicated as many times as there are products, but clicking on the various lines of the same order leads to the same record, where the user can only see the first product...
I also think the demo would have more impact if it wasn't read-only (although I understand that this could lead to broken pages if visitors mess up with the data).
Anyway, kudos to the CloudFlare team!
Its perfect for content type sites that want search and querying.
Anyone from CF here, is it using Litestream (https://litestream.io) for its replication or have you built your own replication system?
I assume this first version is somewhat limited on write performance having a single "main" instance and SQLite laking concurrent writes? It seems to me that using SQLite sessions[0] would be a good way to build an eventually consistent replication system for SQLite, would be perfect for an edge first sql database, maybe D2?
You weren't lying, and this is super cool - the SQLite hype train also seems to be in full force.
In 2-3 years from now, these services will be so mature and strong they will be crushing the cloud market.
They're turning dreams into reality, one after another.
CF people around, I would love to chat, if anyone is interested please reach out at: jp@javascriptdb.com
I'll be applying to this beta for sure!
One thing I hope to see in the future is a better product filtering experience. When I worked on a jquery product filter I realized the DOM bloat was the main problem.
I wonder if D1 can help devs build instant product filtering pages that don’t require the reload like microcenter or Newegg does.
It’s not DOM bloat… it’s too many records. If you’re building a DOM node for each record, that’s bloat, but you still have the problem even if the results are stored in a JSON object and dynamically queried on the client side.
So, for each new filter or new query you need to hit the server anyway. If that’s an asynchronous query that returns a json blob or a full refresh, IMHO, it doesn’t really matter that much. Either way, you’re rebuilding a large portion of the DOM with the new results. The only thing that skews things in favor of an async call is if the rest of the page is so heavyweight that reloading the page takes a significant amount of time. This is probably what you’re taking about.
Having a SQLite db close to your worker node really isn’t going to affect this problem all that much.
and well R2 and D2 would make for a great naming scheme.
But it is really hard getting some useful information from this article. I can't even tell if it is not there or just buried in all this marketing hot air.
So, what is it really? Is there one Write-Master that is asynchronously replicated to all other locations? Will writes be forwarded to this master and then replicated back?
I'm very curious about how it performs in real life. Especially considering the locking behavior (SQLite has always the isolation level 'serializable' iirc). The more you put in a transaction or the longer you have to wait for another process to finish their writes, the more likely you have to deal with stale data.
But overall I'm very excited. Also by the fly.io announcement, of course. Lots of innovation and competition. Good times for customers.
Not a lot of detail, but that is mentioned:
"But we're going further. With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database, giving you total control and maximum performance—each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work."
One thing I've noticed that many commenters miss about read-replicated SQLite is assuming that the only valid model is having one, giant, centralized database with all the data. Lets be honest with ourselves, the vast majority of applications hold personal or B2B data and don't need centralized transactions, and at scale will use multi-tenant primary keys or manual sharding anyways. For private data, a single SQLite database per user / business will easily satisfy the write load of all but the most gigantic corporations. With this model you have unbounded compute scaling for new users because they very likely don't need online transactions across multiple databases at once.
Some questions:
Will D1 be able to deliver this design of having many thousands of separate databases for a single application? Will this be problematic from a cost perspective?
> since we're building on the redundant storage of Durable Objects, your database can physically move locations as needed
Will D1 be able to easily migrate the "primary" at will? CockroachDB described this as "follow the sun" primary.
From the blog post it says read-only replicas are created close to users and kept up to date with the latest data.
- How should I think about this in terms of CAP? If there's a write and I query a replica what happens?
- How are writes handled? Do they go to a single location or are they handled by various locations?
I'm excited to try this. It's so cool to see databases being distributed "on CDNs" for lack of a better term.
> Embedded compute
> But we're going further. With D1, it will be possible to define a chunk of your Worker code that runs directly next to the database, giving you total control and maximum performance — each request first hits your Worker near your users, but depending on the operation, can hand off to another Worker deployed alongside a replica or your primary D1 instance to complete its work.
Good: Workers, KV, Durable Objects, Cron Triggers
Bad: Spectrum, Zaraz, R2, D1
> Zaraz
That's the name of the company they acquired. Though, I do agree that more descriptive naming is nice.
E.g.
Zaraz = SafeXXS
D1 = LDS (light database system)
R2 = ObjectStore
Spectrum = Reverse Proxy
* How do you replicate it consistently?
* Who has the master privilege (or masters if sharded)? What's the failover story?
I am guessing a blob store is involved, but I have gaps in my understanding here.
In comparison, everything at CF is global. And its not just "global" from an AWS perspective of "we've got 14 regions and your stuff runs in all of them"; its global from 300+ points-of-presence, within 50ms of like 98% of all humans. CDN for compute, databases, etc.
CF has a way to go in DevEx on many of their products. For example; Workers, being based on V8 Isolates, is a pain to use even compared to e.g. Lambda. It's a battle of figuring out what's possible and what isn't within the runtime. But I'm sure it'll be improved!
* How exactly is the read replication implemented? Is it using litestream behind the scenes to stream the WAL somewhere? How do the readers keep up? Last I saw you just had to poll it, but that could be computationally expensive depending on the size of the data (since I thought you had to download the whole DB), and could potentially introduce a bit of latency in propagation. Any idea what the metrics are for latency in propagation?
* How are writes handled? Does it do the Fly thing about sending all requests to one worker?
I don't quite know what a "worker" is but I'm assuming it's kind of like a Lambda? If you have it replicated around the world, is that one worker all running the same code, and Cloudflare somehow manages the SQL replicating and write forwarding? Or would those all be separate workers?
That's important to understand because that's one of the key advantages of SQLite compared to the usual client-server architecture of databases like PostgreSQL or MySQL: https://www.sqlite.org/np1queryprob.html
That's interesting to me. It opens the door for Cloudflare to offer something more like a "normal" serverless offering. One that can run containers, or least natively run Python/Golang/Java/etc, like AWS Lambda does. And with this ecosystem described above that can conditionally route between the lighter edge Workers and the heavier central serverless functions. To me, that's the tipping point where they start to threaten larger portions of AWS.
The key is to let the user decide what really needs ACID and what doesn't. If someone wants to make the next Facebook or Reddit they'll need huge write throughput and if some votes or updates are lost, that may be a good trade-off.
[1] You could add a BEW file (like WAL file) to sqlite for Best Effort Writes.
Are there any limitations, for example on the number of tables or size of the database?
Is the data replicated to all regions?
This enables entirely new classes of applications where everything can now be hosted by Cloudflare.
Questions:
a. To help with concurrent writes, will Cloudflare be using WAL2 and BEGIN CONCURRENT branches of SQLite?
b. How is Cloudflare replicating the data cross region? Will it be Litestream.io behind the scenes?
c. Will our Worker code need to be written differently to ensure only a single-writer is writing to SQLite database?
d. How does data persistency and database file size get factored in? I have to imagine their is a limit to how much storage can be used, whether or not that storage is local to the Worker machine, and if its persistent.
Maybe they will announce a Hashicorp competitor in their next reveal. Who knows.