Work is coming along on this project to build a distributed SQL database from scratch, mostly as a reference for newcomers to get an idea about the inner workings.
Looking for contributors who are interested in anything from parsers, disk paging, building out a REPL, defining an IR grammar, implementing consensus and more!
Anyone interested in contributing in these areas is more than welcome!
* if you're writing a distributed database, a novel and valuable feature would be to consider the network partition case foremost. For example, design the database from the standpont of a node being down for a month.
* do adequate logging so that an operator can understand what is going right or wrong
* how can terminated nodes automatically be rebuilt efficiently and automatically?
* all configuration settings should be dynamic
Source: experienced DBA, worked with Cassandra, Influxdb and most SQL RDBMSs
is this really a requirement for all distributed databases or only decentralized databases? if I'm using a distributed database and I control all the nodes, if one of them is down I don't plan on spinning it back up (assuming the data is replicated across other nodes). I guess what I'm saying is if I need 20 nodes, I'm going to make sure I always have 20 nodes running.
It uses Raft, so this should be handled by nature. If you are referring to writes to said node (not that Raft would allow it), you are delving into CAP theorem and what you are suggesting is impossible (unless you don't care about consistency).
You may or may not be aware, but Andy Pavlo records his courses and puts them on YouTube. His latest playlist covers the main database topics with the last five or so lectures covering distributed databases: https://www.youtube.com/playlist?list=PLSE8ODhjZXjbohkNBWQs_...
edit: ^suggesting Pavlo's work since he introduces database concepts very well, so it may be worth structuring the reference architecture in the same way.
I feel strongly about spreading the skills required to build non-trivial products. I hope to enable others to deliver similar technical projects.
I guess is a important information to put on the Readme, and also if you have any plans to create a persistent one, how you will approach this (if you will try to replicate the SQLite Btree here)
Also, i guess the most easy way to make a pesistent btree happen, is to get that one from that neat key-value store in Go that mimic the LMDB Btree (I dont recall the name right now).
The algo behind the LMDB kind dont need to use journal files, so will be reliable AND fast (is it inspired by Rodeh's Btrees ?). I just dont know about the paging aspect, if its any good as the one SQLite uses.