I find it interesting how these vectorized processing engines with DuckDB and Photon Engine of Databricks try to combine row and columnar-oriented strengths.
> When to use a data warehouse > Data warehouses are good for OLAP (online analytical processing) workloads such as the following:
> A small number of users, each of which may execute heavy analytics workloads
Not necessarily. ClickHouse is being used for user-facing analytics where every query can take in the order of 10 ms while supporting many concurrent queries.
> Downtime is permitted – generally not used as the one-and-only operational system
Not necessarily. ClickHouse is being used for HA setups replicated across multiple regions when the service has to survive the outage of a whole region.
this is the first explanation i've seen that directly links disk latency and data architecture decisions. always felt it intuitively but never did the math.
If we take the extreme - ClickHouse as the most thoroughly optimized column-oriented DBMS and compare it with Postgres, the difference will be more than 100 times on average.