How a single PostgreSQL config change improved slow query performance by 50x (opens in new tab)

(amplitude.engineering)

218 pointsryanashcraft8y ago45 comments

45 comments

39 comments · 9 top-level

danjoc8y ago· 10 in thread

Thanks for the tip. This probably affects more people than there are people who realize it. It would be really interesting if PG would use machine learning to discover this sort of tuning on its own.

dmichulke8y ago

Your suggestion sounds very much like that of a Silicon Valley start-up guy ("Machine Learning to the rescue!") but I believe here a deterministic detection of the underlying drive, possibly tied to a warning on start-up, would be easier to code, less bug-prone, and faster.

OTOH, if you speak about the general problem of optimizing configuration, then I distinctly recall having read something about automatic server configuration, IMO on AWS, but quite possibly only as a feature request.

viraptor8y ago

I suspect it was the paper about Ottertune: http://ottertune.cs.cmu.edu (that was innodb, not pg though)

matt40778y ago

There was an article and Software posted here about automatic tuning of pg with machine learning. Sorry for not searching—I’m on the road.

IIRC it was an academic paper and the process was somewhat byzantine when I tried to recreate it. But the results looked good.

davidcuddeback8y ago

PostgreSQL does have built-in support for a genetic algorithm based query optimizer. I haven't tried it yet, so can't comment on how well it works. Docs are here: https://www.postgresql.org/docs/9.6/static/geqo.html.

pgaddict8y ago

The idea is that up until some number of relations (8 by default, IIRC) the join tree is searched exhaustively, then it switches to the genetic algorithm. So it's kinda automatic.

quizotic8y ago

+1 for the idea of tuning query optimization based on ML. I don't know of any DBMSs that take this advice, and that's remarkable in this day and age.

rev1128y ago

This looks quite relevant: https://aws.amazon.com/blogs/ai/tuning-your-dbms-automatical... ("Tuning Your DBMS Automatically with Machine Learning")

The authors propose a way to do some automatic tuning for MySQL and Postgres.

Link to the paper itself: http://db.cs.cmu.edu/papers/2017/tuning-sigmod2017.pdf

power8y ago

Postgres has a query optimizer that uses Genetic Algorithms: https://www.postgresql.org/docs/current/static/geqo.html

Clarification: this is for planning how to execute a query, not for tuning the db settings

1 more reply

jve8y ago

SQL Server 2017 has introduced 2 Automatic database tuning features: Automatic plan correction and Automatic index management (Latter for Azure SQL Only)

https://docs.microsoft.com/en-us/sql/relational-databases/au...

vthriller8y ago

Oracle already promised to bring something similar back in september: https://www.oracle.com/database/autonomous-database/index.ht...

quizotic8y ago· 7 in thread

The prescription of changing sequential_page_cost to equal random_page_cost is certainly reasonable for SSD, but I wonder if the underlying issues aren't somewhat deeper and more interesting. One difference between a sequential scan and an index scan is the amount of data being scanned. PostgreSQL stores information horizontally as rows and a sequential scan will have to read in all column values of all rows. An index scan will read through all values of a _single_ column. The 50x performance difference _might_ be just that the whole row is 50x wider than the width of the indexed join column.

An interesting second factor relates to the nature of the SSD storage. With SSDs a read request will pull back a 4K page, even if the read request was smaller. So it's not quite right to say that a sequential read and a random read cost the same on SSD, particularly if the same 4K page must be read multiple times. I suspect that the particular index technique used by PostgreSQL tends to organize data such that successive indexed values reside in the same 4K SSD page. IOW, it's not so much that the cost of random SSD access is the same as sequential SSD access (though that's true), as it is that the PostgreSQL index mechanism doesn't require multiple reads of the same 4K page.

if a Hash-based index was used instead of a Btree-based index, and if the table width was narrower, the sequential scan might have outperformed the index scan.

yipenghuang8y ago

A sequential scan could also be faster if the join selectivity is poor. As an extreme example say if every id in event_type appeared in prop_keys for a given app. So scanning the index repeatedly would be a waste of time since you would have to scan the table anyway.

brianwawok8y ago

Which is the point of stats, right? To know when it's better to use an index vs just read the whole table because you need 50%+ of it anyway.

cesarb8y ago

PostgreSQL uses 8K pages, so it won't read less than that. IIRC, it also has its own cache of recently read pages, so it won't have to read the same page over and over.

pgaddict8y ago

You can rebuild it with smaller pages, including 4kB, which may be beneficial for various reasons. The packages however stick to 8kB.

And yes, the database has it's own cache (aka shared buffers), on top of page cache (filesystem cache).

idontgetproton8y ago

You're saying that during sequential scans, the time it takes per row is O(n) where n is the number of columns? I find that hard to believe. Can anyone confirm / deny this?

pg3148y ago

The number of columns in PostgreSQL is limited to 250-1600 [1] since a tuple (a row) can't span more than one page of memory. Since O-notation talks about asymptotic behavior, it doesn't really apply here.

But yes, tables with more columns normally take more time to scan sequentially. The complete tuple is always loaded (excluding the data of TOAST [2] attributes), there is no way to only load one column. This is one of the reasons that column-oriented databases can be faster than row-oriented databases [3].

[1] https://www.postgresql.org/message-id/42C3C382.5020108@cinec... [2] https://www.postgresql.org/docs/9.5/static/storage-toast.htm... [3] https://en.wikipedia.org/wiki/Column-oriented_DBMS

yipenghuang8y ago

The time complexity for sequential scan takes O(r), because every row must be read once. Of course rows are made up of columns so you can also specify it as O(rc) where c is the average column length.

ahachete8y ago· 4 in thread

For those interested in postgresql.conf tuning, I did a recent presentation at two conferences recently, hope it's interesting. Slides: https://speakerdeck.com/ongres/postgresql-configuration-for-...

[shamless plug]

aargh_aargh8y ago

There's a link to postgresqlco.nf in the description. Do you run the service? It doesn't accept a file being dropped. In fact, it doesn't seem to do anything (I looked at the HTML).

ahachete8y ago

It's a WIP. It's coming, stay tuned... (yo may subscribe to get noticed).

lccarrasco8y ago

They look great, thanks for sharing! Do you know if the talks were recorded?

ahachete8y ago

pgconf.eu it wasn't recorded. HighLoad++ it was, but I think not published yet.

olavgg8y ago· 2 in thread

I always increase the seq scan cost, but another thing that helped me more was updating the table statistics. In that way you can make the planner better aware of your indexes.

In my case I increased STATISTICS to 5000 and the planner immediately start using the index instead of full table scan.

https://blog.pgaddict.com/posts/common-issues-with-planner-s...

pgaddict8y ago

Please don't mess with the seq_page_cost. It's considered a reference value, so leave it set to 1.0 and tweak the other values.

mnw21cam8y ago

Agreed. The query planner always has to have up-to-date and sufficiently detailed statistics in order to decide the best plan.

pgaddict8y ago· 2 in thread

Unfortunately the author does not say some pretty basic things - which PostgreSQL version, how much data, how much of it fits into RAM, what storage (and hardware in general) ...

If I understand it correctly, PostgreSQL was using the default configuration. Which is rather inefficient, and is more about "must start everywhere".

Decreasing random_page_cost makes sense if you have storage that can handle random I/O well (although I wouldn't go to 1 even if it's an SSD). But who knows if the data was read from storage at all? Maybe it'd fit into RAM (and just increasing effective_cache_size would be enough for the planner to realize that).

rpedela8y ago

Setting random_page_cost = 1 is pretty common advice for SSD and works well in my experience. Typical advice for HDD RAID or SCSI is random_page_cost = 2 and SSDs are faster than them.

pgaddict8y ago

I don't know who recommends random_page_cost=1, but IMNSHO it's a bit silly. Even SSDs handle sequential I/O better than random I/O. Values between 1.5 and 2.0 are more appropriate. I wouldn't really recommend 1.0 except when you know the data fits into RAM. There are other options that affect costs of random I/O, e.g. effective_cache_size.

stubish8y ago· 2 in thread

You also get good results tweaking this particular knob if you have large amounts of RAM. If your blocks are almost always in OS cache, you are almost never going to make random seeks even if the PostgreSQL planner thinks you are.

pja8y ago

That feels like something the database could do for itself - checking whether a range of mmap()ed blocks are actually in memory or not is a single syscall.

I guess for large indexes the overhead of walking the page tables is going to be large though, so it’s not necessarily going to be a net win.

pgaddict8y ago

But you don't know which blocks you'll need at planning time, so you can't really check that.

You could of course check if the total database size is within RAM, but it's much more common to have database much larger than RAM (say 1TB on a machine with 128GB of RAM), but the actual working set (recent data processed by queries) is much smaller.

1 more reply

emilfihlman8y ago· 2 in thread

Why aren't these tuned automatically? Should be pretty easy.

pgaddict8y ago

It sounds simple until you actually try doing that. The thing is, reducing the costing to these two parameters is a significantly simplified model of what happens in practice. So you can't just run some I/O benchmark to measure random vs. sequential requests. For example the defaults that worked fine for a long time (seq_page_cost=1 and random_page_cost=4) certainly do not reflect the difference between random and sequential I/O on rotational devices (where the device can easily do 100MB/s in sequential access, but less than 1MB/s in random).

ric2b8y ago

> For example the defaults that worked fine for a long time (seq_page_cost=1 and random_page_cost=4) certainly do not reflect the difference between random and sequential I/O on rotational devices (where the device can easily do 100MB/s in sequential access, but less than 1MB/s in random)

The postresql documentation explains why. They assume HDD random access is 40x slower than seq access but that you'll have a 90% cache hit rate, so random_page_cost=4 reflects 10% of 40x slower.

1 more reply

topbanana8y ago· 1 in thread

This seems like something that should be measured at startup?

jeltz8y ago

Sadly that is not (at least easily) possible because random_page_cost vs. sequential_page_cost is not just about the IO system but about all factors which can affect the cost of reading pages randomly vs reading pages sequentially. E.g. how often PostgreSQL's tables are in the file cache. So how much RAM your machine has available for PostgreSQL and your access patterns matter too.

Also I imagine the some expensive SAN solutions would be pretty tricky to measure given how smart they try to be with caching and moving between different kinds of disks.

branko_d8y ago

Hmm... The author is filling a combo-box (probably just one column), yet the query is selecting all columns (SELECT *).

I would have tried selecting just the needed column (let's call it "foo"), with following indexes:

event_types (app, id)

prop_keys (event_id, foo)

This should cover the entire query with indexes (i.e. allow for index-only scan).

j / k navigate · click thread line to collapse

45 comments

39 comments · 9 top-level

danjoc8y ago· 10 in thread

Thanks for the tip. This probably affects more people than there are people who realize it. It would be really interesting if PG would use machine learning to discover this sort of tuning on its own.

dmichulke8y ago

viraptor8y ago

I suspect it was the paper about Ottertune: http://ottertune.cs.cmu.edu (that was innodb, not pg though)

matt40778y ago

There was an article and Software posted here about automatic tuning of pg with machine learning. Sorry for not searching—I’m on the road.

IIRC it was an academic paper and the process was somewhat byzantine when I tried to recreate it. But the results looked good.

davidcuddeback8y ago

pgaddict8y ago

The idea is that up until some number of relations (8 by default, IIRC) the join tree is searched exhaustively, then it switches to the genetic algorithm. So it's kinda automatic.

quizotic8y ago

+1 for the idea of tuning query optimization based on ML. I don't know of any DBMSs that take this advice, and that's remarkable in this day and age.

rev1128y ago

This looks quite relevant: https://aws.amazon.com/blogs/ai/tuning-your-dbms-automatical... ("Tuning Your DBMS Automatically with Machine Learning")

The authors propose a way to do some automatic tuning for MySQL and Postgres.

Link to the paper itself: http://db.cs.cmu.edu/papers/2017/tuning-sigmod2017.pdf

power8y ago

Postgres has a query optimizer that uses Genetic Algorithms: https://www.postgresql.org/docs/current/static/geqo.html

Clarification: this is for planning how to execute a query, not for tuning the db settings

1 more reply

jve8y ago

SQL Server 2017 has introduced 2 Automatic database tuning features: Automatic plan correction and Automatic index management (Latter for Azure SQL Only)

https://docs.microsoft.com/en-us/sql/relational-databases/au...

vthriller8y ago

Oracle already promised to bring something similar back in september: https://www.oracle.com/database/autonomous-database/index.ht...

quizotic8y ago· 7 in thread

if a Hash-based index was used instead of a Btree-based index, and if the table width was narrower, the sequential scan might have outperformed the index scan.

yipenghuang8y ago

brianwawok8y ago

Which is the point of stats, right? To know when it's better to use an index vs just read the whole table because you need 50%+ of it anyway.

cesarb8y ago

PostgreSQL uses 8K pages, so it won't read less than that. IIRC, it also has its own cache of recently read pages, so it won't have to read the same page over and over.

pgaddict8y ago

You can rebuild it with smaller pages, including 4kB, which may be beneficial for various reasons. The packages however stick to 8kB.

And yes, the database has it's own cache (aka shared buffers), on top of page cache (filesystem cache).

idontgetproton8y ago

You're saying that during sequential scans, the time it takes per row is O(n) where n is the number of columns? I find that hard to believe. Can anyone confirm / deny this?

pg3148y ago

[1] https://www.postgresql.org/message-id/42C3C382.5020108@cinec... [2] https://www.postgresql.org/docs/9.5/static/storage-toast.htm... [3] https://en.wikipedia.org/wiki/Column-oriented_DBMS

yipenghuang8y ago

ahachete8y ago· 4 in thread

For those interested in postgresql.conf tuning, I did a recent presentation at two conferences recently, hope it's interesting. Slides: https://speakerdeck.com/ongres/postgresql-configuration-for-...

[shamless plug]

aargh_aargh8y ago

There's a link to postgresqlco.nf in the description. Do you run the service? It doesn't accept a file being dropped. In fact, it doesn't seem to do anything (I looked at the HTML).

ahachete8y ago

It's a WIP. It's coming, stay tuned... (yo may subscribe to get noticed).

lccarrasco8y ago

They look great, thanks for sharing! Do you know if the talks were recorded?

ahachete8y ago

pgconf.eu it wasn't recorded. HighLoad++ it was, but I think not published yet.

olavgg8y ago· 2 in thread

I always increase the seq scan cost, but another thing that helped me more was updating the table statistics. In that way you can make the planner better aware of your indexes.

In my case I increased STATISTICS to 5000 and the planner immediately start using the index instead of full table scan.

https://blog.pgaddict.com/posts/common-issues-with-planner-s...

pgaddict8y ago

Please don't mess with the seq_page_cost. It's considered a reference value, so leave it set to 1.0 and tweak the other values.

mnw21cam8y ago

Agreed. The query planner always has to have up-to-date and sufficiently detailed statistics in order to decide the best plan.

pgaddict8y ago· 2 in thread

Unfortunately the author does not say some pretty basic things - which PostgreSQL version, how much data, how much of it fits into RAM, what storage (and hardware in general) ...

If I understand it correctly, PostgreSQL was using the default configuration. Which is rather inefficient, and is more about "must start everywhere".

rpedela8y ago

Setting random_page_cost = 1 is pretty common advice for SSD and works well in my experience. Typical advice for HDD RAID or SCSI is random_page_cost = 2 and SSDs are faster than them.

pgaddict8y ago

stubish8y ago· 2 in thread

pja8y ago

That feels like something the database could do for itself - checking whether a range of mmap()ed blocks are actually in memory or not is a single syscall.

I guess for large indexes the overhead of walking the page tables is going to be large though, so it’s not necessarily going to be a net win.

pgaddict8y ago

But you don't know which blocks you'll need at planning time, so you can't really check that.

1 more reply

emilfihlman8y ago· 2 in thread

Why aren't these tuned automatically? Should be pretty easy.

pgaddict8y ago

ric2b8y ago

The postresql documentation explains why. They assume HDD random access is 40x slower than seq access but that you'll have a 90% cache hit rate, so random_page_cost=4 reflects 10% of 40x slower.

1 more reply

topbanana8y ago· 1 in thread

This seems like something that should be measured at startup?

jeltz8y ago

Also I imagine the some expensive SAN solutions would be pretty tricky to measure given how smart they try to be with caching and moving between different kinds of disks.

branko_d8y ago

Hmm... The author is filling a combo-box (probably just one column), yet the query is selecting all columns (SELECT *).

I would have tried selecting just the needed column (let's call it "foo"), with following indexes:

event_types (app, id)

prop_keys (event_id, foo)

This should cover the entire query with indexes (i.e. allow for index-only scan).

j / k navigate · click thread line to collapse