Building a Relational Database Using Kafka (opens in new tab)

(yokota.blog)

121 pointsrayokota6y ago33 comments

33 comments

23 comments · 6 top-level

humbleMouse6y ago· 6 in thread

This is fetishizing complexity for no reason. Makes for an interesting blog post, but if any company told me they were doing this I would run the other direction immediately.

thinkersilver6y ago

Great article. I think you might be taking a subtle point in the article for granted. Namely, the ability to assemble a fairly complex distributed system from open source components.

Democratisation of complex machinery like the raft and consensus algorithms, schedulers, append logs, query optimisers and so on is a superb thing. It is something that has only been possible in the last 3/4 years and would have been devilishly difficult before then without significant upfront work.

Fetishising complexity might be a bit strong here. Seeing more posts like this is a good thing.

mLuby6y ago

You make a good counterpoint: it's awesome that this can be done concisely enough these days to fit in a hobby blog post. That's new and demonstrates the approachability of these tools.

At the same time, OP is right: a business doing this is making poor engineering decisions.

1 more reply

humbleMouse6y ago

I agree my comment may be a knee-jerk reaction, and it is quite a good technical article with clear examples.

I also agree that the open source tech used in this solution is very powerful and it is great we are at a point where a solution like this could exist. I love avro schema registries and kafka just as much as the next guy.

1 more reply

foxyv6y ago

If you are using this for your shopping website I would definitely agree. If you are building a managed database service like DynamoDB or Aurora then complexity may be warranted. Great for companies like Google and Amazon that need something that will work best for them and have the engineering talent to make it stick.

Projects like this are how we get solutions that blow existing stuff out of the water. Although a lot just end up being fun projects that amount to nothing but an entertaining blog article.

yowlingcat6y ago

Agreed. To expand on your point, I believe this is pretty close to exactly what Amazon did to build Aurora, which is probably the most exciting development I've seen as an engineering leader in the past decade. Suddenly, the one crucial, stateful component of my system that could ever present sharding issues at scale became something I could pay a premium to not have to think about (to a point -- believe it's still 64TB for Aurora).

What's extra cool about this for me is how it illustrates to me how far open source components that build on the Apache Kafka ecosystem have come, to the degree that each fairly complex component of relational databases that expose a full RDBMS system from the basic building block of a log have been fully developed (to some production capacity) in the Kafka ecosystem. That's amazing! What does this mean?

It means that you could theoretically (please don't hurt me) implement a relational DB backed by a twitter DB log, or a blockchain, or any other event log that you can't necessarily use traditionally as a WAL. It also introduces a lot of really interesting possibilities for data integration. Definitely has my mind reeling a bit.

dominotw6y ago

how else would one implement microservices then?

Do you run away from microservices too?

xmpir6y ago· 6 in thread

Why would I want a relational database living in Kafka? MySQL or Postgres are great at what they do.

mothsonasloth6y ago

Why would I want DOOM running on a 148x64 LCD Canon printer screen?

Because I can - https://www.youtube.com/watch?v=NPWi5yJK3zo

lmm6y ago

A lot of the low-level behaviour is hard to control, at least in terms of having a well-known public interface. E.g. transaction isolation level is database-global, limited control over when updates to indices happen, limited direct control over MVCC. The internals of MySQL or Postgres look a lot like Kafka, but the event-transforming parts are hidden inside a black box.

mamcx6y ago

I love PG and rdbms in general, but certainly exist SO MANY things that could be improved with RDBMS.

1- RDBMS implement an imperfect view of the relational model. Is like to say Java is the only OO inplemented in the world.

2- SQL is the standard... and is not practical to say to eliminate it... BUT RDBMS are constrained for the subpar language SQL is. Modern additions are nice... but that is to say that postcript is a nice way to do apps.

Also, a much better language could expand the role of RDBMS.

This is not weird. I live this way when working in FoxPro:

- UI on fox lang. - DB on fox lang. - Reports on fox lang. - Scripting on fox lang. - Web on fox lang - Triggers on fox lang - OO on fox lang.

And all that still with clean separation of logic and components.

3- RDBMS were made by a certain mindset and use case of the 80s. Still so good that can be valid today, but it could get extended. JSON support is just a tiny example of that.

4- Why you can't do "SELECT .. FROM index"? Why you need to create a table to get an index or FTS? Is weird limitation.

The relational model is fine too for KV stores.

----

"Relational database" is a database made on top of the relational MODEL. You could get MANY IMPLEMENTATIONS of that with different use cases. Similar how the functional MODEL and the OO MODEL is not frozen with a single language... that is not that good for app development.

With a better lang, you could eliminate massive ORMs in a nice way!

P.D: Remember, this IS NOT teory. Is proven. This is how was with the dbase family of langs!

xjoins6y ago

I came here to ask this question. It's a really solid technical article. But apart from the sake of doing it, why would anyone ever want to? I'm not suggesting there's no scenario in which one would want to build a SQL database using Kafka, I'm just asking if anybody knows of one.

wefarrell6y ago

One use case for Kafka is log aggregation. Using SQL to analyze logs would be one use case.

dominotw6y ago

> SQL database

what is your definition of sql database?

lichtenberger6y ago· 2 in thread

I like the idea, to use Apache Kafka as the underlying log. It's in principle what Martin Kleppmann suggests.

As I also want to use a distributed log in the future: Do you know Apache Pulsar or the underlying BookKeeper, which I think was especially made for a distributed WAL?

I might either want to use Apache BookKeeper or Apache Pulsar for a distributed log for scaling my Open Source temporal database, too. Furthermore I'd like to expose the API for streaming changes into the Browser or wherever you want to :-)

eternalban6y ago

I've been itching for a while to build something on top of BookKeeper's DistributedLog. I suggest ditching the Pulsar and just using the DL.

lichtenberger6y ago

I've already written with one of the core committers to BookKeeper and yes, I think I'll use it.

That said, I'm always looking for users and contributors to https://sirix.io/ or https://github.com/sirixdb/sirix. That would be super awesome, but I'm sure you have your own ideas already for using BookKeeper.

I'm currently not sure if I first want to build a frontend (I'm a backend engineer, but would like to learn some TypeScript along with using Vue.js and D3js), to interact with SirixDB and to build interactive visualizations to compare revisions of JSON- or XML-resources in SirixDB (stored in a binary format of course highly optimized for space-efficient snapshots).

I think as I'm lacking users it might be more useful, but for sure I'm at least as eager to put forth the idea of scalable SirixDB databases :-)

So, I'd like to use BookKeeper most probably (single writer, read your own write consistency, using synchronous -- for a quorum -- and asynchronous replication for the rest, exactly once semantics...). The thing I don't like is that we also need ZooKeeper, but yeah.

BTW: Why do you think BookKeeper is better than Kafka for this purpose? :-)

2 more replies

tener6y ago· 2 in thread

Interesting article, but I think it is missing a solid comparison of such "overlay SQL" with more traditional database engine.

It isn't enough to show something can be done. One must also think if it should be done.

swagonomixxx6y ago

The author is doing this purely for fun (I think, I'm not the author). They link to this: https://www.confluent.io/product/ksql/ which is apparently a production tested system that provides a SQL interface to a Kafka backend.

dominotw6y ago

> provides a SQL interface to a Kafka backend.

Not sure if thats an accurate description of ksql.

pram6y ago· 1 in thread

From personal experience, considering the reliability of Kafka, I’d implement this in reverse. Most relational databases are far more resilient in operation.

One idea I’ve had in the past is to implement a Kafka and Zookeeper layer for FoundationDB.

EdwardDiego6y ago

Interesting, what were the failure modes you encountered?

outworlder6y ago

I am amazed by the number of negative comments this article is getting.. on Hacker News of all places.

I don't see the article claiming people should drop PostgreSQL and use this, it's just asking people to give it a try if they are interested.

Much more interesting would be to discuss what capabilities would be different if this were a mature product.

For instance, this sounds interesting:

> One advantage of using Kafka is that multiple servers can all “tail” the same set of topics. This allows multiple KarelDB servers to run as a cluster, with no single-point of failure. In this case, one of the servers will be elected as the leader while the others will be followers (or replicas). When a follower receives a JDBC request, it will use the Avatica JDBC driver to forward the JDBC request to the leader. If the leader fails, one of the followers will be elected as a new leader.

j / k navigate · click thread line to collapse

33 comments

23 comments · 6 top-level

humbleMouse6y ago· 6 in thread

This is fetishizing complexity for no reason. Makes for an interesting blog post, but if any company told me they were doing this I would run the other direction immediately.

thinkersilver6y ago

Great article. I think you might be taking a subtle point in the article for granted. Namely, the ability to assemble a fairly complex distributed system from open source components.

Fetishising complexity might be a bit strong here. Seeing more posts like this is a good thing.

mLuby6y ago

You make a good counterpoint: it's awesome that this can be done concisely enough these days to fit in a hobby blog post. That's new and demonstrates the approachability of these tools.

At the same time, OP is right: a business doing this is making poor engineering decisions.

1 more reply

humbleMouse6y ago

I agree my comment may be a knee-jerk reaction, and it is quite a good technical article with clear examples.

1 more reply

foxyv6y ago

Projects like this are how we get solutions that blow existing stuff out of the water. Although a lot just end up being fun projects that amount to nothing but an entertaining blog article.

yowlingcat6y ago

dominotw6y ago

how else would one implement microservices then?

Do you run away from microservices too?

xmpir6y ago· 6 in thread

Why would I want a relational database living in Kafka? MySQL or Postgres are great at what they do.

mothsonasloth6y ago

Why would I want DOOM running on a 148x64 LCD Canon printer screen?

Because I can - https://www.youtube.com/watch?v=NPWi5yJK3zo

lmm6y ago

mamcx6y ago

I love PG and rdbms in general, but certainly exist SO MANY things that could be improved with RDBMS.

1- RDBMS implement an imperfect view of the relational model. Is like to say Java is the only OO inplemented in the world.

Also, a much better language could expand the role of RDBMS.

This is not weird. I live this way when working in FoxPro:

- UI on fox lang. - DB on fox lang. - Reports on fox lang. - Scripting on fox lang. - Web on fox lang - Triggers on fox lang - OO on fox lang.

And all that still with clean separation of logic and components.

3- RDBMS were made by a certain mindset and use case of the 80s. Still so good that can be valid today, but it could get extended. JSON support is just a tiny example of that.

4- Why you can't do "SELECT .. FROM index"? Why you need to create a table to get an index or FTS? Is weird limitation.

The relational model is fine too for KV stores.

----

With a better lang, you could eliminate massive ORMs in a nice way!

P.D: Remember, this IS NOT teory. Is proven. This is how was with the dbase family of langs!

xjoins6y ago

wefarrell6y ago

One use case for Kafka is log aggregation. Using SQL to analyze logs would be one use case.

dominotw6y ago

> SQL database

what is your definition of sql database?

lichtenberger6y ago· 2 in thread

I like the idea, to use Apache Kafka as the underlying log. It's in principle what Martin Kleppmann suggests.

As I also want to use a distributed log in the future: Do you know Apache Pulsar or the underlying BookKeeper, which I think was especially made for a distributed WAL?

eternalban6y ago

I've been itching for a while to build something on top of BookKeeper's DistributedLog. I suggest ditching the Pulsar and just using the DL.

lichtenberger6y ago

I've already written with one of the core committers to BookKeeper and yes, I think I'll use it.

I think as I'm lacking users it might be more useful, but for sure I'm at least as eager to put forth the idea of scalable SirixDB databases :-)

BTW: Why do you think BookKeeper is better than Kafka for this purpose? :-)

2 more replies

tener6y ago· 2 in thread

Interesting article, but I think it is missing a solid comparison of such "overlay SQL" with more traditional database engine.

It isn't enough to show something can be done. One must also think if it should be done.

swagonomixxx6y ago

dominotw6y ago

> provides a SQL interface to a Kafka backend.

Not sure if thats an accurate description of ksql.

pram6y ago· 1 in thread

From personal experience, considering the reliability of Kafka, I’d implement this in reverse. Most relational databases are far more resilient in operation.

One idea I’ve had in the past is to implement a Kafka and Zookeeper layer for FoundationDB.

EdwardDiego6y ago

Interesting, what were the failure modes you encountered?

outworlder6y ago

I am amazed by the number of negative comments this article is getting.. on Hacker News of all places.

I don't see the article claiming people should drop PostgreSQL and use this, it's just asking people to give it a try if they are interested.

Much more interesting would be to discuss what capabilities would be different if this were a mature product.

For instance, this sounds interesting:

j / k navigate · click thread line to collapse