I’m now completely lost as to why you believe this was a good idea over using something like MySQL/Postgres/Aurora. As I see it, you’ve added complexity in three different dimensions (novel DB API, novel infra/maintenance, and novel oncall/incident response) with minimal gain in availability and no gain in performance. What am I missing?
(FWIW, I worked on Bigtable/Megastore/Spanner/Firestore in a previous job. I’m pretty familiar with what goes into consensus, although it’s been a few years since I’ve had to debug Paxos.)
> I was trying to drive home the point that you don't need a massively distributed system to make a useful startup. I think some founders go the opposite direction and try to build something that scales to a billion users before they even get their first user.
This reads to me as exactly the opposite: overengineering for a problem that you don’t have.
For exactly the reasons you describe, I would argue the burden of proof is on you to demonstrate why Redis, MySQL, Postgres, SQLite, and other comparable options are insufficient for your use case.
To offer you an example: let’s say your Big Customer decides “hey, let’s split our repo into N micro repos!” and they now want you to create N copies of their instance so they can split things up. As implemented, you’ll now need to implement a ton of custom logic for the necessary data transforms. With Postgres, there’s a really good chance you could do all of that by manipulating the backups with a few lines of SQL.