If your application language/framework allows, you can do the batching there. e.g. have your single request handler put work into an (in-memory) queue. Then another thread/async worker pull batches off the queue and do your db work in batch, and trigger the response to the original handler. In an http context, this is all synchronous from the client perspective, and you can get 2-10x throughput at a cost of like 2 ms latency under load.
I gave more detail with a toy example here:
https://news.ycombinator.com/item?id=39245416
I've since played around with this a little more and you can do it pretty generically (at least make the worker generic where you give it a function `Chunk[A] => Task[Chunk[Result[B]]]` to do the database logic). I don't have that handy to post right now, but probably you're not using Scala anyway so the details aren't that relevant.
I've tried out a similar thing in Rust and it's a lot more finicky but still doable there. Should be similar in go I'd think.