Skip to content

Top Best Ask Show New Jobs

Is an All-in-One Database the Future?

10 pointsEtai1y ago11 comments

Just posting a thought experiment here.

As data challenges become more defined and difficult, purpose-built databases have emerged to optimize specific problems, leading to a proliferation of specialized solutions. This has resulted in complex and messy data infrastructures, patched together by third-party data pipelines and event streaming products.

Companies rarely rely on a single primary data storage system, often deploying multiple databases to handle different needs, which creates an intricate web of interconnected data systems.

But, the idea of a generalist, all-in-one database that is scalable, performant across contexts, and commercially appealing remains elusive.

Different database types, such as relational OLTP, non-relational document, and memory-based cache, are optimized for specific use cases and face unique challenges that prevent effective consolidation. While tools like Object-Relational Mappers (ORMs) attempt to simplify interactions across databases, they fall short in managing non-relational types, underscoring the complexity of creating a unified solution.

Postgres, with its extensibility through plugins, comes close but still falls short of being a one-stop-shop. Ultimately, the technical and practical hurdles make an all-in-one database unfeasible, leaving the modern data stack as a complex but necessary reality.

In theory, an all-in-one database would struggle with optimization, data model overhead, and latency. Each database type is built for specific use cases, making a universal solution inefficient.

Closest Solution Today: Postgres

While not a full one-stop-shop, Postgres can be extended with plugins like pg_vector for vector search. However, it's not intended to solve every problem efficiently, evidenced by complex data stacks in companies using Postgres.

11 comments

9 comments · 5 top-level

yen2231y ago· 3 in thread

I would be curious to know if anyone tried using one of the newer data-warehouse databases (Clickhouse, Snowflake, etc) as an OLTP database, to run their crud app.

We came close, as we had architects who knew little besides Snowflake.

While there are all kinds of issues it can't handle for CRUD, such as locking and small updates, it's also expensive to do small reads due to the reliance on auto-scaling instead of timesharing. If your app pageloads and issues dozens of queries to populate its widgets, SF will be both slow and expensive.

You can't use ClickHouse this way. It does not have ACID transactions and the main table type (MergeTree) is optimized for reads on append-only workloads that operate on large blocks.

MySQL & PostgreSQL are vastly better for OLTP use cases.

yen2231y ago

I know that you shouldn't, but I'm curious if anyone did.

I don't think it is impossible to use Clickhouse as a backend to a CRUD app, if you didn't particularly care about things like transactional guarantees or read performance. You still can write rows to it, and read rows from it.

I've also worked in the business long enough to know that people make all kinds of suboptimal engineering decisions all the time.

tacostakohashi1y ago· 1 in thread

Hmm, ever used Oracle?

I'm not here to defend it, but in practical terms, it does everything anyone is likely to need out of the box. Relational, XML, JSON, replication, document...

EtaiOP1y ago

Where's it stand compared to some of the more traditional DW systems? Can it handle the workloads snowflake would for analytics?

iamlucaswolf1y ago

I think this question mixes up two orthogonal dimensions: transactionality (i.e., OLTP vs. OLAP) and data model (relational, document-based, graph, vector, ...).

Regarding transactionality: There is an entire area of research on "hybrid transactional and analytical processing" (HTAP) systems that unifies OLAP and OLTP systems. Hyper [1] pioneered this path at TU Munich, it's successor Umbra [2] recently incorporated as CedarDB [3]. There are lots of others. Most of these systems, AFAIK, are relational.

Regarding data model: What we've seen in the past few decades is that non-relational DBMS (excluding key-value stores) only make sense in rare edge cases that require huge scale. There has, e.g. been research [4] that shows that graph databases are still, well, lacking, compared to relational systems. The common pattern seems to be: unless you need to service very specific workloads at huge scales, SQL is probably enough [5]. Then again, it really comes down to intrinsics. If you were to, for example, implement distributed locking using Postgres, you would likely run into problems with MVCC and Xids very quickly.

So, as you already mentioned, there is no silver bullet. But even today, unless you are Meta or Google, SQL is probably enough for a long time and lots of use cases.

(Full disclosure: I'm working on Hyper full-time).

[1]: https://hyper-db.de/ [2]: https://umbra-db.com/ [3]: https://cedardb.com/ [4]: https://homepages.cwi.nl/~boncz/edbt2022.pdf [5]: https://www.youtube.com/watch?v=VxKt245X_ws

EtaiOP1y ago

Here's my full thoughts: https://www.secoda.co/blog/is-an-all-in-one-database-the-fut...

lifeinthevoid1y ago

It all depends on the use-case, a generalist database can never achieve the performance of specialized databases once the workload gets large enough. It's like asking for a car that's great off-road but also beats an F1 car on the circuit.

j / k navigate · click thread line to collapse