Btw, this is an interesting take on "serverless" terminology too: most other providers would call this "managed Postgresql".
Yet this exact feature might enable something akin to "serverless Postgres": quick spin up and tear down allowing for short-lived instances responding to load (scaling setting should be differently laid out too: scaling configuration is too "serverfull"), but whatever Neon had before could not be "serverless".
Any choice in datastore performance is made up front by choosing how spread out the data is: basically, maximum throughout is the sum of maximum throughput for each storage rack hosting the data. To simplify, if we had a datastore replicating data to 10 instances with peak IOPS of 100k each, we are looking at max 1M IOPS: the fact that we can cheaply scale our CPU instances for Postgres infinitely won't change the upper bound for IO (without scaling the storage too, which is recognized as the slow but avoided part with their implementation).
Network can play a similar role.
And finally, Postgres instances benefit from being long-lived as opposed to serverless (in a more traditional sense): with enough RAM, all the important indexes can stay permanently in the cache thus improving performance by orders of magnitude compared to cold Postgres instances needing to read indexes off disk.
In theory, one could pre-warm memory caches for Postgres read replicas (akin to suspend/hibernate), but depending on the memory size needing to be read from slow storage, it might not really beat simply reloading the indexes.
Obviously, my concerns are related to large databases, but those are the ones usually needing scale out.
> Replicas then update cache pages in the shared buffers. This ensures eventual consistency for read replicas within the same region as your database.
> Data Consistency: Reading data from a single source ensures data consistency. This addresses a common challenge in traditional read replicas where there might be a replication lag.
> Resource Customization: Neon allows you to allocate different CPU and memory resources for each replica.
So there is a single source of data, but can have different CPU and memory and has eventual consistency in the same region?