undefined | Better HN

0 pointscsdvrx4y ago0 comments

> This is also a funny statement, because TimescaleDB is built on PostgreSQL.

I know, but doing timeseries with postgres is "cool", not standard, not boring. I'd even say "risky".

> We actually take great pride in being a "boring" option

No, you're not there yet: doing timeseries with timescale is way riskier than with clickhouse, which is both a bit older (not much) and more mature (much more), while also being more widely used (even if you are doing a lot of outreach like these posts)

0 comments

4 comments · 1 top-level

akulkarni4y ago· 3 in thread

> No, you're not there yet: doing timeseries with timescale is way riskier than with clickhouse, which is both a bit older (not much) and more mature (much more), while also being more widely used

This is not true at all, and we explain why in the post:

1. TimescaleDB's reliability is PostgreSQL's reliability. ClickHouse has a lot of advantages, but "more reliable than PostgreSQL" is not one of them.

From the post:

  PostgreSQL has the benefit for 20+ years of development and usage, which has resulted in not just a reliable database, but also a broad spectrum of rigorously tested tools: streaming replication for high availability and read-only replicas, pg_dump and pg_recovery for full database snapshots, pg_basebackup and log shipping / streaming for incremental backups and arbitrary point-in-time recovery, pgBackrest or WAL-E for continuous archiving to cloud storage, and robust COPY FROM and COPY TO tools for quickly importing/exporting data with a variety of formats. This enables PostgreSQL to offer a greater “peace of mind” - because all of the skeletons in the closet have already been found (and addressed).

2. ClickHouse, being a newer database, still has several "gotchas" with reliability: e.g., No data consistency in backups (because of its lack of support for transactions and asynchronous data modification)

From the post:

  One last aspect to consider as part of the ClickHouse architecture and its lack of support for transactions is that there is no data consistency in backups. As we've already shown, all data modification (even sharding across a cluster) is asynchronous, therefore the only way to ensure a consistent backup would be to stop all writes to the database and then make a backup. Data recovery struggles with the same limitation.

  The lack of transactions and data consistency also affects other features like materialized views because the server can't atomically update multiple tables at once. If something breaks during a multi-part insert to a table with materialized views, the end result is an inconsistent state of your data.

Now this trade-off - accepting less reliability for faster OLAP queries - may be fine with you. And that's OK. But stating that ClickHouse is more reliable than PostgreSQL/TimescaleDB is just not true.

hodgesrm4y ago

You are right about transactional differences between ClickHouse and PostgreSQL but you are comparing apples and oranges. ClickHouse prioritizes speed, efficiency, and scale over consistency. These are reasonable choices, especially in the largely append-only use cases which dominate analytics.

1. I've seen relatively few messed up source tables and mat views over thousands of support cases. When they happen they can be bad for some use cases like financial analytics. They simply aren't very common. And for use cases like observability or log management it just doesn't matter to have a few lost or duplicated blocks over huge datasets.

2. ClickHouse overall is eventually consistent. There are generally differences between replicas when load is active, yet it causes relatively few practical problems in most applications as they load balance queries over replicas. Serialization is expensive and simply not very highly valued here.

3. ClickHouse uses other mechanisms than ACID transactions to ensure consistency. One good example is discarding duplicate blocks on insert into replicated tables. If there's any doubt whether an insert succeeded, you can just insert the block again. ClickHouse checks the hash and discards it. This is incredibly efficient and works without requiring expensive referential integrity (e.g., unique indexes).

4. It's just about always possible to get ClickHouse to boot even when you have corrupt underlying data (e.g., due to file system problems). I don't know how you define reliability but at least in this sense ClickHouse is extremely robust. I've never seen a server fail to start, though you might need a bit of surgery beforehand.

5. ClickHouse doesn't have transactional DDL. What it does have is features like altering tables to add new columns in a fraction of a second without locking regardless of the size of the dataset. Its behavior is close to NoSQL in this regard.

I could go on, but I think these points illustrate that ClickHouse has a different set of design choices for different problems. I would never use it for eCommerce, but it's great for analytics.

Disclaimer: I work on ClickHouse at Altinity.

akulkarni4y ago

   I could go on, but I think these points illustrate that ClickHouse has a different set of design choices for different problems. I would never use it for eCommerce, but it's great for analytics.

I agree with this. You are poking at a straw man.

My reply was in response to this comment by the OP:

> No, you're not there yet: doing timeseries with timescale is way riskier than with clickhouse, which is both a bit older (not much) and more mature (much more), while also being more widely used

TimescaleDB - which some don't realize - inherits all of the reliability of PostgreSQL, i.e., the 20+ years of usage and tuning (and broad tooling ecosystem).

What I was disproving was the statement that TimescaleDB/PostgreSQL was somehow riskier than ClickHouse.

ClickHouse is impressive, but deployments are still far behind that of PostgreSQL. ClickHouse is also younger and less mature than PostgreSQL.

I can see that you are the CEO of Altinity. Nice to meet you. I'm the CEO of Timescale. I think it's important that we strive for transparency in our industry, which includes admitting our own product's shortcomings, and to accept valid criticism.

We've done that many times in this HN thread (and in the blog post). I think we would have had a more productive discussion in this HN thread if ClickHouse developers were also as transparent with ClickHouse's shortcomings.

I'm happy to continue this conversation offline if you'd like. The database market is large, the journey is long, and in many ways companies like ours are fellow travelers. ajay (at) timescale (dot) com

hodgesrm4y ago

Hi Ajay! Thanks for the thoughtful response and email. I would love a direct meeting and will contact you shortly.

I don't mean to gloss over ClickHouse imperfections. There are lots of them. For my money the biggest is that it still takes way too much expertise in ClickHouse for ordinary developers to use it effectively. Part of that is SQL compatibility, part of it is lack of tools of which simple backup is certainly one. To the extent that ClickHouse is risky, the risk is finding (and retaining) staff who can use it properly. Our business at Altinity exists in large part because of this risk, so I know it's real.

The big aha! experience for me has been that the things like lack of ACID transactions or weak backup mechanisms are not necessarily the biggest issues for most ClickHouse users. I came to ClickHouse from a long background in RDBMS and transactional replication. Things that would be game ending in that environment are not in analytic systems.

What's more interesting (mind-expanding even) is that techniques like deduplication of inserted blocks and async multi-master replication turn out to be just as important as ACID & backups to achieve reliable systems. Furthermore, services like Kafka that allow you to have DC-level logs are an essential part of building analytic applications that are reliable and performant at scale. We're learning about these mechanisms in the same way that IBM and others developed ACID transaction ideas in the 1970s--by solving problems in real systems. It's really fun to be part of it.

My comment didn't convey this clearly, for which I heartily apologize. I certainly don't intend to portray ClickHouse as perfect and still less to bash Timescale. I don't know enough about the latter to make any criticism worth reading.

p.s., Non-transactional insert (specifically non-atomicity across blocks and tables) is an undisputed problem. It's being fixed in https://github.com/ClickHouse/ClickHouse/issues/22086. Altinity and others are working on backups. Backup comes up in my job just about every day.

j / k navigate · click thread line to collapse

0 comments

4 comments · 1 top-level

akulkarni4y ago· 3 in thread

> No, you're not there yet: doing timeseries with timescale is way riskier than with clickhouse, which is both a bit older (not much) and more mature (much more), while also being more widely used

This is not true at all, and we explain why in the post:

1. TimescaleDB's reliability is PostgreSQL's reliability. ClickHouse has a lot of advantages, but "more reliable than PostgreSQL" is not one of them.

From the post:

  PostgreSQL has the benefit for 20+ years of development and usage, which has resulted in not just a reliable database, but also a broad spectrum of rigorously tested tools: streaming replication for high availability and read-only replicas, pg_dump and pg_recovery for full database snapshots, pg_basebackup and log shipping / streaming for incremental backups and arbitrary point-in-time recovery, pgBackrest or WAL-E for continuous archiving to cloud storage, and robust COPY FROM and COPY TO tools for quickly importing/exporting data with a variety of formats. This enables PostgreSQL to offer a greater “peace of mind” - because all of the skeletons in the closet have already been found (and addressed).

From the post:

  One last aspect to consider as part of the ClickHouse architecture and its lack of support for transactions is that there is no data consistency in backups. As we've already shown, all data modification (even sharding across a cluster) is asynchronous, therefore the only way to ensure a consistent backup would be to stop all writes to the database and then make a backup. Data recovery struggles with the same limitation.

  The lack of transactions and data consistency also affects other features like materialized views because the server can't atomically update multiple tables at once. If something breaks during a multi-part insert to a table with materialized views, the end result is an inconsistent state of your data.

hodgesrm4y ago

I could go on, but I think these points illustrate that ClickHouse has a different set of design choices for different problems. I would never use it for eCommerce, but it's great for analytics.

Disclaimer: I work on ClickHouse at Altinity.

akulkarni4y ago

   I could go on, but I think these points illustrate that ClickHouse has a different set of design choices for different problems. I would never use it for eCommerce, but it's great for analytics.

I agree with this. You are poking at a straw man.

My reply was in response to this comment by the OP:

> No, you're not there yet: doing timeseries with timescale is way riskier than with clickhouse, which is both a bit older (not much) and more mature (much more), while also being more widely used

TimescaleDB - which some don't realize - inherits all of the reliability of PostgreSQL, i.e., the 20+ years of usage and tuning (and broad tooling ecosystem).

What I was disproving was the statement that TimescaleDB/PostgreSQL was somehow riskier than ClickHouse.

ClickHouse is impressive, but deployments are still far behind that of PostgreSQL. ClickHouse is also younger and less mature than PostgreSQL.

hodgesrm4y ago

Hi Ajay! Thanks for the thoughtful response and email. I would love a direct meeting and will contact you shortly.

j / k navigate · click thread line to collapse