Announcing SurrealDB 1.0 (opens in new tab)

(surrealdb.com)

63 pointstinrab2y ago55 comments

55 comments

41 comments · 15 top-level

ThinkBeat2y ago· 8 in thread

This feels like satire.

and written by someone who should have spent more time studying the history of database systems.

> Imagine a world where the majority of your backend logic is seamlessly embedded within the database itself

This is not a good idea. It has been done many times and never quite caught on because it is not a good idea.

From a security perspective it is a nightmare.

Or if you do put all the correct isolation around the code to protect the database, then you have basically created an "app server" (old term) inside a database, and it would happily, run outside of the database since in essence it is already doing so.

anonzzzies2y ago

> never quite caught on

It was quite normal in 70s 80s and even 90s; all the ms sql, db2, oracle and as400 systems I encountered in those days had all or almost all logic as stored procs. Very large ones.

foobiekr2y ago

There's also a ton of optimization savings in doing this. your DB already has to move from media to cache, so why not do the pipelined processing in place where it's cheapest?

Instead, we haul that shit to the NIC, then across the network, then copy it into memory on some server (probably inefficiently), do the operations there, or we have to reinvent this with pushdown functions for distributed databases.

There are many cases where moving function-to-data is the right answer.

threeseed2y ago

During the Java era the standard quickly became ORM instead of stored procedures.

Largely thanks to NeXT showing the way with EOF (the original ORM).

1 more reply

10000truths2y ago

> Or if you do put all the correct isolation around the code to protect the database, then you have basically created an "app server" (old term) inside a database, and it would happily, run outside of the database since in essence it is already doing so.

You can drastically cut down on end-to-end latency by eliminating the network hop from your app server to your DB, especially in cases where you are forced to make multiple back-to-back DB queries per request.

jddj2y ago

It's possible and even relatively straightforward to eliminate that network hop without moving all of the application logic into the database.

1 more reply

CyberDildonics2y ago

It's a nightmare from a lot of angles, especially debugging. Mashing everything together is a recipe for disaster. To debug something you want to isolate a small unit of execution and be able to replay the same data through it continually until you get the right results back.

hakanito2y ago

A world where the database is seamlessly embedded in the backend code instead would be more useful. Oh wait…

sa462y ago

This line in particular is a head-scratcher:

> Advanced inter-document relations and analysis. No JOINs. No pain. […] queries allow for multi-table, multi-depth document retrieval, efficiently in the database, without the use of complicated JOINs

That sounds like they have a networked database, not a relational database. That might be fine, but as I understand it, relational databases won because they offer more flexible access patterns and it’s easier to write correct queries.

dwb2y ago· 4 in thread

Not Open Source (in an OSI way):

https://github.com/surrealdb/surrealdb/blob/ed60a35b9b539e1b...

benrutter2y ago

Is it claiming anywhere to be open source? I've heard an interview with the creators and they seemed pretty open about which aspects where open source and which aspects where shut down?

I could be missing something from their branding but I'd rather see projects with non-open source licenses from the beginning that bait-and-switch license change when they want to become profitable. (Although, I'd much rather just see a fully open source project)

dwb2y ago

Sure. I wasn't really trying to insinuate anything, although I think in a more-ideal world the licensing arrangements of all projects would be in a prominent place, especially if they use a "Star us on GitHub" call-to-action. It's a vital part of evaluating this sort of thing.

khuey2y ago

> Is it claiming anywhere to be open source?

They have a page on their website where they claim to be "an open source company" (https://surrealdb.com/opensource)

1 more reply

anonzzzies2y ago

It would indeed be better to state that clearly, then I don’t have to check it (our company has a policy which won’t get this passed).

vlovich1232y ago· 4 in thread

Is there any trick to implement live updates in a scaleable way? In the limit for a naive implementation, every mutation would have to check every subscriber to see if the change is relevant which seems like it would cause large bottlenecks for writes.

jitl2y ago

One of the guys behind Convex explained the rough idea; I hope I do it justice.

The strategy is to break the subscription up into listens based on the read-set ranges of the query. Then you put the individual read-set ranges into a system table that you index. Finally when writes happen, you notify all queries who's read-set intersects with the write-set.

For example, say I have a query `SELECT * from block WHERE parent_id = X AND last_modified_at > Y`.

This query might create two subscriptions for its read sets:

    { query_id: Q, table: block, column: parent_id, min: X, max: X }
    { query_id: Q, table: block, column: last_modified_at, min: Y, max: Infinity }

Now a write happens: `INSERT INTO block { id: Z, parent_id: X, last_modified_at: Y + 10, title: "hi" }`

We can find our subscriptions to notify by doing queries like these:

    SELECT query_id FROM subscription WHERE
      table = block
      AND (
        (column = id
        AND min <= new_block.id
        AND max >= new_block.id
        )
        OR
        (column = parent_id
        AND min <= new_block.parent_id
        AND max >= new_block.parent_id)
        OR
        (column = last_modified_at
        AND min <= new_block.last_modified_at
        AND max >= new_block.last_modified_at)
      )

Then you notify all those query_ids.

I'm sure there's a lot of details and optimizations you can do on top of this; finding the right read sets seems pretty tricky for complex queries. Plus stuff like batching/debouncing, etc.

vlovich1232y ago

Is there a talk you can point me to?

There’s several challenges with this approach that come up for me (which unless I’m mistaken is the naive approach of checking each write against a notification set).

The first is that maintaining the read set seems very expensive since it scales with the number of live queries installed. In a multi tenant DB, that seems tricky to manage in terms of memory used.

The second is that the computation of intersection with a read range seems potentially very expensive. Imagine you have a string column. You now have to do a string comparison for every insertion that might hypothetically match the query.

Finally, computing that read set correctly seems challenging (as you mention) and it’s not immediately clear to me it’s always tractable (eg could you do this with arbitrary complicated joins?).

Additionally, in your description, each write has an implicit table scan to do the select to find the intersection. That will tank write throughput even for small total intersection sets (eg there’s a reason LSM databases do deletes by inserting a tombstone instead of checking whether the data exists first, same with the merge operator in RocksDB - a db read in the write path significantly kills performance)

1 more reply

mpercival5312y ago

Listener callout does not have to be part of the synchronous writes. Say if you keep a changelog, listeners can be asynchronously notified. These checks can be batched and throttled as well to minimize call-outs.

I would suspect this is what Google does with Cloud Firestore.

vlovich1232y ago

If you’re throttling, then your database can’t actually keep up with broadcasting changes. Throttling only helps ensure the system keeps working even when there’s a sudden spike.

Batching only helps if you’re able to amortize the cost of notifications by doing that, but it’s not immediately clear to me that that there’s an opportunity to amortize since by definition all the book keeping to keep track of the writes would have to happen (roughly) in the synchronous write path (+ require a fair amount of RAM when scaling). The sibling poster made mention of taking read / write sets and doing intersections, but I don’t think that answers the question for several reasons I listed (ie it seems to me like you’d be taking a substantial performance hit on the order of O(num listeners) for all writes even if the write doesn’t match any listener). You could maybe shrink that to log(N) if you sort the set of listeners first, but that’s still an insertion speed of m log(n) for m items and n listeners or O(mn) if you can’t shrink it). That seems pretty expensive to me because it impacts all tenants of the DB, not just those using this feature…

omneity2y ago· 3 in thread

Strong vibes from RethinkDB. I hope SurrealDB will have a different fate, I'm a big big fan of live queries and change feeds and don't like most existing implementations.

drdaeman2y ago

This immediately sprung into my mind. Already got bitten by RethinkDB (trusted it too much to do some logic, rewrite was painful), so I’d rather have reasonably easily replaceable third-party components than depend on something unique, especially without any guarantees.

grounder2y ago

Can you give a quick elaboration on what you mean about the fate of RethinkDB? I'm not familiar with it.

omneity2y ago

The project didn't really take off as a business and the team joined Stripe. See the "History" section. https://en.wikipedia.org/wiki/RethinkDB

canadiantim2y ago· 2 in thread

My biggest concern with SurrealDB is the license, otherwise I'd be very interested in adopting it.

I know the license is okay, but from my reading of it, sounds like if SurreaDB really wanted to shut you down they could.

Otherwise, I find the architecture of it to be beautiful and compelling.

gpm2y ago

Unless you're operating a "database service", a well defined term (in the license) that excludes almost every use-case

    A “Database Service” is a commercial offering that
    allows third parties (other than your employees and
    contractors) to access the functionality of the
    Licensed Work by creating tables whose schemas are
    controlled by such third parties.

I don't see how SurrealDB could shut you down?

auggierose2y ago

So if you create schemas on the fly, you cannot? Why would I use such a db?

1 more reply

dgudkov2y ago· 2 in thread

While I applaud any attempts to innovate in this area, I'd be more interested in seeing the opposite approach - integrating persistent storage in a programming language. If we already have default implementations of dictionaries, linked lists, and other in-memory data structures in programming languages, why not have default implementations of permanent data structures such as object collections and KV-stores?

thunderbong2y ago

There have been many attempts over the years. There's Prevayler [0], in Java, and Madeleine [1], in Ruby. Neither of them became wildly popular.

I worked with Madeleine for a few years and while the performance was amazing, RDBM systems are ahead because of the many years of standardisation, tooling and capabilities (e.g. migrations)

The other very important thing is - DB latency is an issue only in the hands of a novice. In the real world, network latency is a couple of magnitudes higher.

[0]: https://github.com/prevayler/prevayler

[1]: https://github.com/ghostganz/madeleine

PedroBatista2y ago

Check this out: https://microstream.one/products/microstream-for-java/

Io also has persistence baked into it: https://iolanguage.org/reference/

But I'm sure for most of the HN crowd one has Java in the URL and the other is a esoteric cool language from the early 00's so.. not practical. :)

robertlagrant2y ago· 1 in thread

> Imagine a world where the majority of your backend logic is seamlessly embedded within the database itself

thinks of Oracle APEX

thinks of IBM AS400

shudders

beders2y ago

Hahaha, loved coding RPG on the AS400. The file system on top of the database layer! Awesome stuff

stallmanwasrigh2y ago· 1 in thread

It benches very very slow and no options to horizontally scale outside of manual sharding.

API has a cool design though.

The dev team needs to look at perf before the complexity hole gets too insane if it hasn't already.

jitl2y ago

What’s your source on the benchmark thing? It seems like they have a lot of backend KV store options including FoundationDB and TiKV which are both distributed, so I’d be curious how the backends perform relative to one another.

https://surrealdb.com/features

zamazan4ik2y ago· 1 in thread

1.0 version but https://github.com/surrealdb/surrealdb/issues/1548 is still open :)

obinnaspeaks2y ago

A lot of the issues here have been resolved in documentation. For example there’s now new docs on permissions and deployment also more information on backups. Others are currently being worked ;)..

The issue is still open as we haven’t resolved all of it yet so makes a good reference.

fredthedeadhead2y ago

Fireship did a 'SurrealDB in 100 seconds' in case anyone would like a quick video summary: https://youtu.be/C7WFwgDRStM?si=va17q7zagn4lIoi3

pryelluw2y ago

“Imagine a world where the majority of your backend logic is seamlessly embedded within the database itself.”

Don’t need to. Majority of my work is de-coupling systems “designed” around this idea.

Why can’t the database be the database, the backend be the backend, etc ? You can’t have a toolbox full of multi use tools. Sometimes you just need a fucking Phillips head screwdriver.

esafak2y ago

I like where they are going. Are there benchmarks out yet?

h1fra2y ago

I feel like databases are the new JS framework, there is a new one every week.

pavlov2y ago

Is it a database where the only date/time type is a melting clock, and a join is defined as “the chance encounter of a sewing machine and an umbrella on an operating table”?

jauntywundrkind2y ago

Live queries & change-feeds! Reactive data stores are great; neat to see.

j / k navigate · click thread line to collapse

55 comments

41 comments · 15 top-level

ThinkBeat2y ago· 8 in thread

This feels like satire.

and written by someone who should have spent more time studying the history of database systems.

> Imagine a world where the majority of your backend logic is seamlessly embedded within the database itself

This is not a good idea. It has been done many times and never quite caught on because it is not a good idea.

From a security perspective it is a nightmare.

anonzzzies2y ago

> never quite caught on

It was quite normal in 70s 80s and even 90s; all the ms sql, db2, oracle and as400 systems I encountered in those days had all or almost all logic as stored procs. Very large ones.

foobiekr2y ago

There's also a ton of optimization savings in doing this. your DB already has to move from media to cache, so why not do the pipelined processing in place where it's cheapest?

There are many cases where moving function-to-data is the right answer.

threeseed2y ago

During the Java era the standard quickly became ORM instead of stored procedures.

Largely thanks to NeXT showing the way with EOF (the original ORM).

1 more reply

10000truths2y ago

jddj2y ago

It's possible and even relatively straightforward to eliminate that network hop without moving all of the application logic into the database.

1 more reply

CyberDildonics2y ago

hakanito2y ago

A world where the database is seamlessly embedded in the backend code instead would be more useful. Oh wait…

sa462y ago

This line in particular is a head-scratcher:

dwb2y ago· 4 in thread

Not Open Source (in an OSI way):

https://github.com/surrealdb/surrealdb/blob/ed60a35b9b539e1b...

benrutter2y ago

Is it claiming anywhere to be open source? I've heard an interview with the creators and they seemed pretty open about which aspects where open source and which aspects where shut down?

dwb2y ago

khuey2y ago

> Is it claiming anywhere to be open source?

They have a page on their website where they claim to be "an open source company" (https://surrealdb.com/opensource)

1 more reply

anonzzzies2y ago

It would indeed be better to state that clearly, then I don’t have to check it (our company has a policy which won’t get this passed).

vlovich1232y ago· 4 in thread

jitl2y ago

One of the guys behind Convex explained the rough idea; I hope I do it justice.

For example, say I have a query `SELECT * from block WHERE parent_id = X AND last_modified_at > Y`.

This query might create two subscriptions for its read sets:

    { query_id: Q, table: block, column: parent_id, min: X, max: X }
    { query_id: Q, table: block, column: last_modified_at, min: Y, max: Infinity }

Now a write happens: `INSERT INTO block { id: Z, parent_id: X, last_modified_at: Y + 10, title: "hi" }`

We can find our subscriptions to notify by doing queries like these:

    SELECT query_id FROM subscription WHERE
      table = block
      AND (
        (column = id
        AND min <= new_block.id
        AND max >= new_block.id
        )
        OR
        (column = parent_id
        AND min <= new_block.parent_id
        AND max >= new_block.parent_id)
        OR
        (column = last_modified_at
        AND min <= new_block.last_modified_at
        AND max >= new_block.last_modified_at)
      )

Then you notify all those query_ids.

I'm sure there's a lot of details and optimizations you can do on top of this; finding the right read sets seems pretty tricky for complex queries. Plus stuff like batching/debouncing, etc.

vlovich1232y ago

Is there a talk you can point me to?

There’s several challenges with this approach that come up for me (which unless I’m mistaken is the naive approach of checking each write against a notification set).

The first is that maintaining the read set seems very expensive since it scales with the number of live queries installed. In a multi tenant DB, that seems tricky to manage in terms of memory used.

Finally, computing that read set correctly seems challenging (as you mention) and it’s not immediately clear to me it’s always tractable (eg could you do this with arbitrary complicated joins?).

1 more reply

mpercival5312y ago

I would suspect this is what Google does with Cloud Firestore.

vlovich1232y ago

If you’re throttling, then your database can’t actually keep up with broadcasting changes. Throttling only helps ensure the system keeps working even when there’s a sudden spike.

omneity2y ago· 3 in thread

Strong vibes from RethinkDB. I hope SurrealDB will have a different fate, I'm a big big fan of live queries and change feeds and don't like most existing implementations.

drdaeman2y ago

grounder2y ago

Can you give a quick elaboration on what you mean about the fate of RethinkDB? I'm not familiar with it.

omneity2y ago

The project didn't really take off as a business and the team joined Stripe. See the "History" section. https://en.wikipedia.org/wiki/RethinkDB

canadiantim2y ago· 2 in thread

My biggest concern with SurrealDB is the license, otherwise I'd be very interested in adopting it.

I know the license is okay, but from my reading of it, sounds like if SurreaDB really wanted to shut you down they could.

Otherwise, I find the architecture of it to be beautiful and compelling.

gpm2y ago

Unless you're operating a "database service", a well defined term (in the license) that excludes almost every use-case

    A “Database Service” is a commercial offering that
    allows third parties (other than your employees and
    contractors) to access the functionality of the
    Licensed Work by creating tables whose schemas are
    controlled by such third parties.

I don't see how SurrealDB could shut you down?

auggierose2y ago

So if you create schemas on the fly, you cannot? Why would I use such a db?

1 more reply

dgudkov2y ago· 2 in thread

thunderbong2y ago

There have been many attempts over the years. There's Prevayler [0], in Java, and Madeleine [1], in Ruby. Neither of them became wildly popular.

I worked with Madeleine for a few years and while the performance was amazing, RDBM systems are ahead because of the many years of standardisation, tooling and capabilities (e.g. migrations)

The other very important thing is - DB latency is an issue only in the hands of a novice. In the real world, network latency is a couple of magnitudes higher.

[0]: https://github.com/prevayler/prevayler

[1]: https://github.com/ghostganz/madeleine

PedroBatista2y ago

Check this out: https://microstream.one/products/microstream-for-java/

Io also has persistence baked into it: https://iolanguage.org/reference/

But I'm sure for most of the HN crowd one has Java in the URL and the other is a esoteric cool language from the early 00's so.. not practical. :)

robertlagrant2y ago· 1 in thread

> Imagine a world where the majority of your backend logic is seamlessly embedded within the database itself

thinks of Oracle APEX

thinks of IBM AS400

shudders

beders2y ago

Hahaha, loved coding RPG on the AS400. The file system on top of the database layer! Awesome stuff

stallmanwasrigh2y ago· 1 in thread

It benches very very slow and no options to horizontally scale outside of manual sharding.

API has a cool design though.

The dev team needs to look at perf before the complexity hole gets too insane if it hasn't already.

jitl2y ago

https://surrealdb.com/features

zamazan4ik2y ago· 1 in thread

1.0 version but https://github.com/surrealdb/surrealdb/issues/1548 is still open :)

obinnaspeaks2y ago

A lot of the issues here have been resolved in documentation. For example there’s now new docs on permissions and deployment also more information on backups. Others are currently being worked ;)..

The issue is still open as we haven’t resolved all of it yet so makes a good reference.

fredthedeadhead2y ago

Fireship did a 'SurrealDB in 100 seconds' in case anyone would like a quick video summary: https://youtu.be/C7WFwgDRStM?si=va17q7zagn4lIoi3

pryelluw2y ago

“Imagine a world where the majority of your backend logic is seamlessly embedded within the database itself.”

Don’t need to. Majority of my work is de-coupling systems “designed” around this idea.

Why can’t the database be the database, the backend be the backend, etc ? You can’t have a toolbox full of multi use tools. Sometimes you just need a fucking Phillips head screwdriver.

esafak2y ago

I like where they are going. Are there benchmarks out yet?

h1fra2y ago

I feel like databases are the new JS framework, there is a new one every week.

pavlov2y ago

Is it a database where the only date/time type is a melting clock, and a join is defined as “the chance encounter of a sewing machine and an umbrella on an operating table”?

jauntywundrkind2y ago

Live queries & change-feeds! Reactive data stores are great; neat to see.

j / k navigate · click thread line to collapse