Same-Region Read Replicas to Serverless Postgres (opens in new tab)

(neon.tech)

3 pointsthrow140820202y ago6 comments

6 comments

6 comments · 4 top-level

necovek2y ago· 1 in thread

One generally introduces read replicas to overcome I/O and network bottlenecks: how does relying on a single datastore perform in circumstances where having multiple colocated copies of the data would speed things up?

Btw, this is an interesting take on "serverless" terminology too: most other providers would call this "managed Postgresql".

Yet this exact feature might enable something akin to "serverless Postgres": quick spin up and tear down allowing for short-lived instances responding to load (scaling setting should be differently laid out too: scaling configuration is too "serverfull"), but whatever Neon had before could not be "serverless".

necovek2y ago

Let me expand in that first point.

Any choice in datastore performance is made up front by choosing how spread out the data is: basically, maximum throughout is the sum of maximum throughput for each storage rack hosting the data. To simplify, if we had a datastore replicating data to 10 instances with peak IOPS of 100k each, we are looking at max 1M IOPS: the fact that we can cheaply scale our CPU instances for Postgres infinitely won't change the upper bound for IO (without scaling the storage too, which is recognized as the slow but avoided part with their implementation).

Network can play a similar role.

And finally, Postgres instances benefit from being long-lived as opposed to serverless (in a more traditional sense): with enough RAM, all the important indexes can stay permanently in the cache thus improving performance by orders of magnitude compared to cold Postgres instances needing to read indexes off disk.

In theory, one could pre-warm memory caches for Postgres read replicas (akin to suspend/hibernate), but depending on the memory size needing to be read from slow storage, it might not really beat simply reloading the indexes.

Obviously, my concerns are related to large databases, but those are the ones usually needing scale out.

throw14082020OP2y ago· 1 in thread

From reading it, I'm not really sure if it's eventually consistent or not.

> Replicas then update cache pages in the shared buffers. This ensures eventual consistency for read replicas within the same region as your database.

> Data Consistency: Reading data from a single source ensures data consistency. This addresses a common challenge in traditional read replicas where there might be a replication lag.

> Resource Customization: Neon allows you to allocate different CPU and memory resources for each replica.

So there is a single source of data, but can have different CPU and memory and has eventual consistency in the same region?

nikita2y ago

Replicas are eventually consistent. Consider a replica a few ms behind.

nikita2y ago

Thank you for submitting (CEO of Neon here). Happy to answer questions on our design decisions.

nikita2y ago

(CEO of Neon). There is a TL;DR: of this blog post: https://twitter.com/nikitabase/status/1680636526823370752

j / k navigate · click thread line to collapse

6 comments

6 comments · 4 top-level

necovek2y ago· 1 in thread

Btw, this is an interesting take on "serverless" terminology too: most other providers would call this "managed Postgresql".

necovek2y ago

Let me expand in that first point.

Network can play a similar role.

Obviously, my concerns are related to large databases, but those are the ones usually needing scale out.

throw14082020OP2y ago· 1 in thread

From reading it, I'm not really sure if it's eventually consistent or not.

> Replicas then update cache pages in the shared buffers. This ensures eventual consistency for read replicas within the same region as your database.

> Data Consistency: Reading data from a single source ensures data consistency. This addresses a common challenge in traditional read replicas where there might be a replication lag.

> Resource Customization: Neon allows you to allocate different CPU and memory resources for each replica.

So there is a single source of data, but can have different CPU and memory and has eventual consistency in the same region?

nikita2y ago

Replicas are eventually consistent. Consider a replica a few ms behind.

nikita2y ago

Thank you for submitting (CEO of Neon here). Happy to answer questions on our design decisions.

nikita2y ago

(CEO of Neon). There is a TL;DR: of this blog post: https://twitter.com/nikitabase/status/1680636526823370752

j / k navigate · click thread line to collapse