I have not found a way/tool to replace this. Many of the tools fail at dynamic data with cardinality. Wanted to use Clickhouse like this, adding columns as they are discovered but it did not go well, but it has been a while. Also adding replicas is not as easy as Elasticsearch.
Does anyone have a similar use case implemented with Clickhouse when data is not known before hand and unfolds as time goes by?
It's also something we're working on! Shameless plug - I happen to work at Sneller (sneller.io, open source at https://github.com/SnellerInc/sneller) that might be interesting to you.
A couple of key ideas - first, we bypass the need for any sort of 'semi-structured to relational' ETL/ELT overhead by running vectorized SQL on a (compressed) binary form of the JSON data which preserves its original structure. So we're schema-on-read first and foremost - you don't need to worry about adding new fields in the source JSON as long as your queries know of these new fields.
Second, we completely separate storage from compute. Unlike CH we don't use local disk as any sort of storage tier, and use cloud object stores as our _primary_ storage tier. So all your data (including the compressed binary version of your source JSON) lives in s3 buckets in your control.
Feel free to check us out and let us know what you think!
1. Github - https://github.com/SnellerInc/sneller
2. Intro blog - https://github.com/SnellerInc/blogs/blob/main/introducing-sn...
You can create an API endpoint, and send those JSON to it. In the "destination" part, it can sync to clickhouse (one of many choices, like redshift, snowflake,besides clickhouse) very quickly, and flatten the JSON into columns. If there is new key found in JSON, it will create a new column in clickhouse.
1. clickhouse-git-import tool, what is it and how to use it. 2. ClickHouse as a monitoring agent with the system.asynchronous_log table. 3. Hosting static MergeTree tables on any web server.
2. ClickHouse as a monitoring agent with the system.asynchronous_log table.
Can you share some links around it.
It's very simple. Every ClickHouse instance, even empty, is collecting system-wide monitoring information. You can query this table directly from Grafana dashboard (for example).
Can you share some docs on that? AFAIK CH can use web server hosted data sources but I don't recall CH able to host merge tree tables on HTTP servers.
I am a proponent of modern data stack but it’s always at the back of my mind that an OLAP database on a big box negates the need for a lot of it.
Quality: A diligent human like anywhere.
Data Modeling: I disagree, what do you think it is missing? I’ve never encountered anything I couldn’t achieve, and the Clickhouse specific aggregates, lambda support, and array functionality really simplifies many problems.
Permissioning: It has very granular permissions.
https://github.com/grafana/clickhouse-datasource
https://grafana.com/blog/2022/05/05/introducing-the-official...
(Disclaimer: I work at Altinity.)