You have no idea what are you talking about.
You can easily batch writes into neat transactions: make a queue, have your POST handler write (row, callback) onto the queue and await the callback. Have the queue reader grab a chunk, push it to the db in batch, and execute all of the callbacks on commit. The callbacks return a 200 to the client, or 500 if the commit failed. This can all happen fast enough to be done in "real time" (however fast you want your queue worker to flush batches).
You can do all of this in a couple dozen lines of code with something like Scala/ZIO.
Computationally, it is totally doable. The biggest constraint is the cost of storage.
When I said that you can't batch, I meant that each tweet will be at least one transaction. Async write with batching like you suggested will have a horrible user-experience, also your client often is a mobile device or browser - how do you deliver a callback from server there?
Nah, how many inserts your laptop can make and how many tweets created in a unit of time are two irrelevant metrics.
You don't. The callback is on the server before it ever responds to the request. The client sees a synchronous response with a delay of a few extra milliseconds.
Here, I slapped this together to demonstrate the technique[0]
Your HTTP route handler becomes
override def create(body: String): Task[Long] = {
for
rspP <- zio.Promise.make[Nothing,Long]
_ <- createQueue.offer(InsertRequest(body, rspP))
rsp <- rspP.await
yield (rsp)
}
i.e. make a promise, put your work on a queue, and have the HTTP response be the result of the promise. Then you have a background worker: _ <- ZStream.fromQueue(createQueue)
.groupedWithin(8192, 10.milliseconds)
.run(ZSink.foreach(repo.createChunk)).forkDaemon
Which processes up to 8k requests at a time, waiting up to 10 ms for a batch.The worker does a bulk db insert, and completes the promises with the generated ids.
Similar techniques should work on read batching, but I haven't tried that. You can also speed that up some more with the COPY protocol, but IIRC you need to be more careful about escaping/SQL injection. The example I wrote uses prepared statements/parameter binding.
On my 6 year old mid-range desktop (this CPU[1] and this disk[2]) this program can process ~30k `create`s per second. For about $1500, I could buy a new computer with a Ryzen 9 7950 with 4x the core count/8x the thread count and 2x the single-threaded performance, so around ~10x more processing power, 128 GB of RAM, and a Samsung 980 Pro SSD, which can do 1M Write IOPS (25x more than my SSD) or 5GB/s sequential writes (10x more). So a $1500 computer with a single disk should be able to do around 300k/s. PCIe gen 5 is now coming out, which will allow for another doubling of disk performance.
128GB of RAM means you can keep at least 100M rows worth of index in memory. It's not that expensive (under $10k) to build a server with 1TB of RAM.
Totally feasible for a hobbyist to do without tons of tricky optimization (the code I posted is purely functional Scala!); people spend $20k on a jetski or $80k on a truck. Like I said, the most expensive part is going to be the storage, but you could do something like only store the most recent 1000 tweets per person, and charge $10 to bump that up to the most recent 10 million tweets or something. You'd come out at a substantial profit with that model if you got a few thousand takers. Similarly you could charge to let someone follow more than a few thousand people so you could pay for a read replica or two.
[0] https://github.com/ndriscoll/twit/commit/19b245677b978b42a6f...
[1] https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-6600K...
[2] https://www.disctech.com/SanDisk-SDSSDHP-256G-256GB-SATA-SSD