OTOH, if you speak about the general problem of optimizing configuration, then I distinctly recall having read something about automatic server configuration, IMO on AWS, but quite possibly only as a feature request.
IIRC it was an academic paper and the process was somewhat byzantine when I tried to recreate it. But the results looked good.
The authors propose a way to do some automatic tuning for MySQL and Postgres.
Link to the paper itself: http://db.cs.cmu.edu/papers/2017/tuning-sigmod2017.pdf
Clarification: this is for planning how to execute a query, not for tuning the db settings
https://docs.microsoft.com/en-us/sql/relational-databases/au...
An interesting second factor relates to the nature of the SSD storage. With SSDs a read request will pull back a 4K page, even if the read request was smaller. So it's not quite right to say that a sequential read and a random read cost the same on SSD, particularly if the same 4K page must be read multiple times. I suspect that the particular index technique used by PostgreSQL tends to organize data such that successive indexed values reside in the same 4K SSD page. IOW, it's not so much that the cost of random SSD access is the same as sequential SSD access (though that's true), as it is that the PostgreSQL index mechanism doesn't require multiple reads of the same 4K page.
if a Hash-based index was used instead of a Btree-based index, and if the table width was narrower, the sequential scan might have outperformed the index scan.
And yes, the database has it's own cache (aka shared buffers), on top of page cache (filesystem cache).
But yes, tables with more columns normally take more time to scan sequentially. The complete tuple is always loaded (excluding the data of TOAST [2] attributes), there is no way to only load one column. This is one of the reasons that column-oriented databases can be faster than row-oriented databases [3].
[1] https://www.postgresql.org/message-id/42C3C382.5020108@cinec... [2] https://www.postgresql.org/docs/9.5/static/storage-toast.htm... [3] https://en.wikipedia.org/wiki/Column-oriented_DBMS
[shamless plug]
In my case I increased STATISTICS to 5000 and the planner immediately start using the index instead of full table scan.
https://blog.pgaddict.com/posts/common-issues-with-planner-s...
If I understand it correctly, PostgreSQL was using the default configuration. Which is rather inefficient, and is more about "must start everywhere".
Decreasing random_page_cost makes sense if you have storage that can handle random I/O well (although I wouldn't go to 1 even if it's an SSD). But who knows if the data was read from storage at all? Maybe it'd fit into RAM (and just increasing effective_cache_size would be enough for the planner to realize that).
I guess for large indexes the overhead of walking the page tables is going to be large though, so it’s not necessarily going to be a net win.
You could of course check if the total database size is within RAM, but it's much more common to have database much larger than RAM (say 1TB on a machine with 128GB of RAM), but the actual working set (recent data processed by queries) is much smaller.
The postresql documentation explains why. They assume HDD random access is 40x slower than seq access but that you'll have a 90% cache hit rate, so random_page_cost=4 reflects 10% of 40x slower.
Also I imagine the some expensive SAN solutions would be pretty tricky to measure given how smart they try to be with caching and moving between different kinds of disks.
I would have tried selecting just the needed column (let's call it "foo"), with following indexes:
event_types (app, id)
prop_keys (event_id, foo)
This should cover the entire query with indexes (i.e. allow for index-only scan).