and written by someone who should have spent more time studying the history of database systems.
> Imagine a world where the majority of your backend logic is seamlessly embedded within the database itself
This is not a good idea. It has been done many times and never quite caught on because it is not a good idea.
From a security perspective it is a nightmare.
Or if you do put all the correct isolation around the code to protect the database, then you have basically created an "app server" (old term) inside a database, and it would happily, run outside of the database since in essence it is already doing so.
It was quite normal in 70s 80s and even 90s; all the ms sql, db2, oracle and as400 systems I encountered in those days had all or almost all logic as stored procs. Very large ones.
Instead, we haul that shit to the NIC, then across the network, then copy it into memory on some server (probably inefficiently), do the operations there, or we have to reinvent this with pushdown functions for distributed databases.
There are many cases where moving function-to-data is the right answer.
Largely thanks to NeXT showing the way with EOF (the original ORM).
You can drastically cut down on end-to-end latency by eliminating the network hop from your app server to your DB, especially in cases where you are forced to make multiple back-to-back DB queries per request.
> Advanced inter-document relations and analysis. No JOINs. No pain. […] queries allow for multi-table, multi-depth document retrieval, efficiently in the database, without the use of complicated JOINs
That sounds like they have a networked database, not a relational database. That might be fine, but as I understand it, relational databases won because they offer more flexible access patterns and it’s easier to write correct queries.
https://github.com/surrealdb/surrealdb/blob/ed60a35b9b539e1b...
I could be missing something from their branding but I'd rather see projects with non-open source licenses from the beginning that bait-and-switch license change when they want to become profitable. (Although, I'd much rather just see a fully open source project)
They have a page on their website where they claim to be "an open source company" (https://surrealdb.com/opensource)
The strategy is to break the subscription up into listens based on the read-set ranges of the query. Then you put the individual read-set ranges into a system table that you index. Finally when writes happen, you notify all queries who's read-set intersects with the write-set.
For example, say I have a query `SELECT * from block WHERE parent_id = X AND last_modified_at > Y`.
This query might create two subscriptions for its read sets:
{ query_id: Q, table: block, column: parent_id, min: X, max: X }
{ query_id: Q, table: block, column: last_modified_at, min: Y, max: Infinity }
Now a write happens: `INSERT INTO block { id: Z, parent_id: X, last_modified_at: Y + 10, title: "hi" }`We can find our subscriptions to notify by doing queries like these:
SELECT query_id FROM subscription WHERE
table = block
AND (
(column = id
AND min <= new_block.id
AND max >= new_block.id
)
OR
(column = parent_id
AND min <= new_block.parent_id
AND max >= new_block.parent_id)
OR
(column = last_modified_at
AND min <= new_block.last_modified_at
AND max >= new_block.last_modified_at)
)
Then you notify all those query_ids.I'm sure there's a lot of details and optimizations you can do on top of this; finding the right read sets seems pretty tricky for complex queries. Plus stuff like batching/debouncing, etc.
There’s several challenges with this approach that come up for me (which unless I’m mistaken is the naive approach of checking each write against a notification set).
The first is that maintaining the read set seems very expensive since it scales with the number of live queries installed. In a multi tenant DB, that seems tricky to manage in terms of memory used.
The second is that the computation of intersection with a read range seems potentially very expensive. Imagine you have a string column. You now have to do a string comparison for every insertion that might hypothetically match the query.
Finally, computing that read set correctly seems challenging (as you mention) and it’s not immediately clear to me it’s always tractable (eg could you do this with arbitrary complicated joins?).
Additionally, in your description, each write has an implicit table scan to do the select to find the intersection. That will tank write throughput even for small total intersection sets (eg there’s a reason LSM databases do deletes by inserting a tombstone instead of checking whether the data exists first, same with the merge operator in RocksDB - a db read in the write path significantly kills performance)
I would suspect this is what Google does with Cloud Firestore.
Batching only helps if you’re able to amortize the cost of notifications by doing that, but it’s not immediately clear to me that that there’s an opportunity to amortize since by definition all the book keeping to keep track of the writes would have to happen (roughly) in the synchronous write path (+ require a fair amount of RAM when scaling). The sibling poster made mention of taking read / write sets and doing intersections, but I don’t think that answers the question for several reasons I listed (ie it seems to me like you’d be taking a substantial performance hit on the order of O(num listeners) for all writes even if the write doesn’t match any listener). You could maybe shrink that to log(N) if you sort the set of listeners first, but that’s still an insertion speed of m log(n) for m items and n listeners or O(mn) if you can’t shrink it). That seems pretty expensive to me because it impacts all tenants of the DB, not just those using this feature…
I know the license is okay, but from my reading of it, sounds like if SurreaDB really wanted to shut you down they could.
Otherwise, I find the architecture of it to be beautiful and compelling.
A “Database Service” is a commercial offering that
allows third parties (other than your employees and
contractors) to access the functionality of the
Licensed Work by creating tables whose schemas are
controlled by such third parties.
I don't see how SurrealDB could shut you down?I worked with Madeleine for a few years and while the performance was amazing, RDBM systems are ahead because of the many years of standardisation, tooling and capabilities (e.g. migrations)
The other very important thing is - DB latency is an issue only in the hands of a novice. In the real world, network latency is a couple of magnitudes higher.
Io also has persistence baked into it: https://iolanguage.org/reference/
But I'm sure for most of the HN crowd one has Java in the URL and the other is a esoteric cool language from the early 00's so.. not practical. :)
thinks of Oracle APEX
thinks of IBM AS400
shudders
API has a cool design though.
The dev team needs to look at perf before the complexity hole gets too insane if it hasn't already.
The issue is still open as we haven’t resolved all of it yet so makes a good reference.
Don’t need to. Majority of my work is de-coupling systems “designed” around this idea.
Why can’t the database be the database, the backend be the backend, etc ? You can’t have a toolbox full of multi use tools. Sometimes you just need a fucking Phillips head screwdriver.