If you look at some of the discussions, while a lot of the fixes come from the clickHouse team it would be unjust to say that the corner case discussions don't contribute to the fixes.
I think part of the reason is that ClickHouse, being sort of a unique offering brings with its users sometimes a quite competent bunch that go beyond the "I want this feature, please implement".
We've also been very active on diagnosing problems, logging issues, and contributing ideas for solutions. Alexey Milovidov has logged the most issues of anyone (2376) but the next two people (1012, 810) are from Altinity. The #6 and #9 contributors of issues are also from Altinity.
This question is important regardless of who raises it. Projects like Kafka, Spark, PostgreSQL, and Kubernetes (among others) have solved it while allowing good returns to those who contribute.
p.s., We spent 7% of budget on marketing last month. A sizable fraction of our budget is devoted to open source contributions ClickHouse and ecosystem projects.
this whole thing reminds me of elastic and hashicorp, and it's hard to pick sides given that the core maintainers also worked their assess of building it in public, and the community contributors also put their effort into it.
i think this common theme is unlocking a new era of software where core maintainers productize the main product slapped with a bsl license and the community (incl other businesses) maintain their own fork.
it's great that discussions like this are being brought up and talked about.
If this blog post raises any legitimate concerns, I missed them. All I see is entitlement to continue get work for free. If you're concerned about gaps in ClickHouse functionality, maybe pick up the slack and contribute back?
It takes a village to raise a database, as the saying goes.
The problem with open core is there's no great answer.
Either it's merged, in which case there are now two codebases implementing the same feature (one open, one closed), and the company's revenue stream is imperiled.
Or it's rejected (either explicitly or quietly ignored), in which case work is wasted and the project is less useful than it could be.
How did open core companies historically handle this?
As ugly as it is, it feels like permissive OSS (e.g. MIT) core + open but anti-SaaS non-OSS cloud-only/closed feature is a more sustainable model that encourages development in the open.
E.g. an MIT-alike license for select features that says "free-as-in-beer license up to X users, otherwise talk to our sales team and get a commercial license"
At the end of the day, I want OSS to succeed and be great, but especially nowadays that takes a large team, which takes funding, which requires a competitive revenue model.
Just to make sure, we are on the same page.
For example I saw this with Tailwind UI
FWIW, you can checkout clickbench.com is a benchmark of parquet, partitioned of ClickHouse and DuckDB
What is it about DuckDB and it's strange cult like following? It's nice that it's in process, but then it's an incremental improvement over Pandas. Nice tool and well implemented but I don't see what is transformative about it.
- local - server - cloud (*) - serverless - in-process https://github.com/chdb-io/chdb similar to DuckDB
(*) except for the forked cloud versions, ClickHouse Inc, Huawei, etc ...
Clickhouse is instantly differentiated from Snowflake, Databricks, BigQuery and RedShift with the open source offering that you can deploy yourself. There are lots of other options but Clickhouse has the most mindshare and is the techies choice.
I find myself rooting for them and recommending them for that before you even get into any technical comparison.
For example ReplacingMergeTree uses a distributed algorithm to process changes without incurssing excessive INSERT time expense. It's quite elegant.
Presto, created by FB, was required to let any FB engineer merge without OWNERS (because Facebook doesn't have OWNERS files unless it would create a SEV1).
Subsequently, original creators of Presto forked it to PrestoSQL.
So Facebook trademarked the name Presto.
So creators renamed it Trino.
You should be using ReplacingMergeTree if you are doing updates at the current moment.
Indeed. Altinity and other community users like ContentSquare made numerous contributions to make it more usable. It's a promising approach to updates at scale and has improved markedly over the last few months.
That said you can't currently use RMT very efficiently in S3 because of overall limitations in MergeTree S3 table storage. We need to think about whether the improvements we're proposing will also enhance RMT. Thanks for bringing that up.
*Yes you can use a vpc gateway but need public IPs to waste to setup the BGP/IP routes.
The choice of clickhouse for a new project in my company has always been a no-brainer, but the recent move from clickhouse.inc to a closed source version has made this choice less straightforward.
If you want open source go fund non profit organisations and/or charities. The fact we don’t see developers do that tells me a lot.
We see it with Oracle (MySQL) where most of innovations is happening in cloud only "Heatwave" or MongoDB where MongoDB Atlas increasingly getting features not available in their Community (SSPL) version