undefined | Better HN

0 pointsencoderer4y ago0 comments

> I wouldn't let my user-facing app write to Clickhouse

I’ve been thinking of doing exactly that. What are your concerns?

0 comments

8 comments · 2 top-level

dreyfan4y ago· 4 in thread

https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

has some good thoughts. The main thing you'll likely need is some sort of a buffer layer so you can do bulk inserts. Do not write a high-volume of single-row inserts into Clickhouse.

shaklee34y ago

Chproxy is designed to handle this

https://github.com/Vertamedia/chproxy

encodererOP4y ago

Thanks for sharing the link! I’ve heard the bulk insert thing before and to be honest I’ve always thought that RDBMSs don’t love single row inserts either. Seems clickhouse takes that to a new level.

In our case we are using sqs and usually insert 20-100 rows into the db at a time so I’m going to benchmark how that does in clickhouse.

zepearl4y ago

With Clickhouse you can use a "buffer table", which uses just RAM and sits on top of a normal table: https://clickhouse.com/docs/en/engines/table-engines/special...

Rows inserted into the buffer table are then flushed to the normal/base table when one of the limits (defined when the buffer table is created) is reached (limits are max rows, max bytes, max time since the last flush), or when you drop the buffer table.

I'm using it and it works (performance difference can be huge compared to perform single inserts directly into a real/normal table), but be careful - the flushed rows don't give a guarantee of which row is flushed in which sequence, so using a buffer table is a very bad idea if your base table is something which relies on the correct sequences of rows that it receives.

RhodesianHunter4y ago

On a project I worked on we found the sweet spot to be 20k-60k rows per insert.

claytonjy4y ago· 2 in thread

I suppose it depends what you're going to let your user do, but OLAPs in general and Clickhouse in particular don't do well under row-oriented workloads, as described in the post here. I'm imagining users primarily operating on small numbers of rows and sometimes making updates to or deleting them, a worst-case scenario for Clickhouse but best-case for an OLTP like Postgres.

encodererOP4y ago

Ah totally. Thanks for sharing your thoughts! In my case I’m evaluating clickhouse as a source of truth for customer telemetry data. Totally agree about the OLTP limitations.

willvarfar4y ago

(Remember that clickhouse is not reliable. It doesn’t pretend to be.

Clickhouse is great for lots of common query workloads, but if losing your data would be a big deal then it makes a lot of sense to have your data in a reliable and backed up place (eg timescale or just s3 files or whatever) too.

Of course lots of times people chuck stuff into clickhouse and it’s fine if they lose a bit sometimes. YMMV.)

1 more reply

j / k navigate · click thread line to collapse

0 comments

8 comments · 2 top-level

dreyfan4y ago· 4 in thread

https://blog.cloudflare.com/http-analytics-for-6m-requests-p...

has some good thoughts. The main thing you'll likely need is some sort of a buffer layer so you can do bulk inserts. Do not write a high-volume of single-row inserts into Clickhouse.

shaklee34y ago

Chproxy is designed to handle this

https://github.com/Vertamedia/chproxy

encodererOP4y ago

In our case we are using sqs and usually insert 20-100 rows into the db at a time so I’m going to benchmark how that does in clickhouse.

zepearl4y ago

With Clickhouse you can use a "buffer table", which uses just RAM and sits on top of a normal table: https://clickhouse.com/docs/en/engines/table-engines/special...

RhodesianHunter4y ago

On a project I worked on we found the sweet spot to be 20k-60k rows per insert.

claytonjy4y ago· 2 in thread

encodererOP4y ago

Ah totally. Thanks for sharing your thoughts! In my case I’m evaluating clickhouse as a source of truth for customer telemetry data. Totally agree about the OLTP limitations.

willvarfar4y ago

(Remember that clickhouse is not reliable. It doesn’t pretend to be.

Of course lots of times people chuck stuff into clickhouse and it’s fine if they lose a bit sometimes. YMMV.)

1 more reply

j / k navigate · click thread line to collapse