Any pipeline tool for ClickHouse, similar to Snowflake's Dynamic Tables (opens in new tab)

(snowflake.com)

2 pointstingfirst9mo ago6 comments

6 comments

4 comments · 2 top-level

tingfirstOP9mo ago· 2 in thread

Is there a native SQL pipeline tool for ClickHouse that processes real-time data incrementally, with low latency, large throughput and high efficiency, similar to Snowflake’s Dynamic Tables?

[1] Dynamic Tables: One of Snowflake’s Fastest-Adopted Features: https://www.snowflake.com/en/blog/reimagine-batch-streaming-...

Sep1423249mo ago

Dynamic Tables are interesting for declarative streaming. In the ClickHouse ecosystem, you might want to look at materialized views combined with streaming engines.

For real-time transformations, there are a few approaches: - Native ClickHouse MaterializedViews with AggregatingMergeTree - Stream processors that write to ClickHouse (Flink, Spark Streaming) - Streaming SQL engines that can read/write ClickHouse

We've been working on streaming SQL at Proton (github.com/timeplus-io/proton) which handles similar use cases - continuous queries that maintain state and can write results back to ClickHouse. The key difference from Dynamic Tables is handling unbounded streams vs micro-batches.

What's your specific use case? Happy to discuss the tradeoffs.

tingfirstOP9mo ago

Data sources are usually in Kafka, or other operational databases like Postgres or MySQL

1. Table A : fact events, high-throughput (10k~1M eps), high-cardinality

2. Table B, C, D : couple of dimension tables (fast or slow changing).

The use case is straightforward : join/enrich/lookup everything into one big flattened, analytics-friendly table into ClickHouse.

What’s the best pipeline approach to achieve this in real-time and efficiently?

1 more reply

gangtao9mo ago

you can check https://github.com/timeplus-io/proton which provides streaming processing pipeline.

j / k navigate · click thread line to collapse