I would confidently state that most large scale applications have run into the situation where they have scale up their application worker fleet, and then crashed their database with too many connections. Coordinating the size of the worker connection pool in a world where we have elastic application workers is enough of a task that deploying a proxy is really the simplest and best solution
Unlike postgres, Oracle and Sql server can support thousands of connections but they see performance degradation at a certain point. So I have never seen them crash from too many connections(Although they definitely get slower.).
I pool ~10.000 connections down to under 500. I can't do application level pooling because the application is just thousands of individual processes, you often see this with Python, PHP or NodeJS applications.
500 open connections is way less overhead than 10.000 on the Postgres server. I'm very happy to "pass the buck" of connection pooling to separate machines with pgBouncer.
Additionally, the "transaction" mode of PgBouncer can increase the utilization of connections further by making them available to a different application when one application holds a connection while doing something different (e.g. waiting for a call to an external service).
So the question is how many applications have CPU load for 50 pods but also cannot saturate a single database, so sharding is not considered yet? Gut feeling is that they are few, more like exceptions.
From my personal experience pgBouncer is used a lot when framework/library does not support application level pooling in the first place, and not so much when it does.
If you have a single "real" connection pool, the idle pool is effectively shared among all application instances. You can have 100 active connections and 10 (or maybe 20) idle.
I have run into this problem, but I solved it with careful tuning of automatic instance scaling parameters, connection pool size and timeout. But this only gets you so far; a single real connection pool would be more effective (at the cost of added complexity).
But yeah, for less distributed applications, just have N worker threads and don't close the DB connection after each job.
That's a big assumption. If your app is just an HTTP wrapper for SQL queries against a poorly-optimized database, sure.
But there are plenty of applications that spend time the majority of their time doing other stuff (http requests, FFI calls, g/cpu-intensive processing, etc.) where the database interaction is a small part of the overall performance profile - certainly not the bottleneck, even with hundreds of concurrent clients.
In those cases, rather than artificially throttling the concurrency to stay within your global max_connections, you would ideally run as many as you can (ie fully utilize your infrastructure) and grab database connections from an external pool when you need them.