This has been the largest criticism of InfluxDB in the past. Kudos to the team for acknowledging and solving it!
> IOx supports SQL natively and our cloud customers can connect using Postgres-compatible clients like psql, Grafana’s Postgres data source, and BI tools like PowerBI and Tableau.
Initially InfluxDB had InfluxQL, a SQL like language for querying data. Then they transitioned to Flux, indicating it was superior to writing complex SQL queries over time series data. Now they are highlighting native SQL support. Since this was only announced today, hopefully there will be clear messaging on which query languages will be supported going forward.
It’s also worth noting that queries can also be executed over an HTTP API that platforms like PowerBI can consume today.
>First introduced in 2020 as the open source project InfluxDB IOx, the new storage engine is the product of sustained development by InfluxData and considerable contribution from the InfluxDB open source developer community. Today, the new engine based on IOx arrives first in InfluxData’s multi-tenant InfluxDB Cloud service, available to developers worldwide.
Will this later be available in an OSS package for self-hosting?
Right now we're focused on our cloud offering. We'll have official open source releases and documentation in the future.
https://arrow.apache.org/datafusion/user-guide/sql/sql_statu...
It's a framework for writing query engines in Rust that takes care of a lot of heavy lifting around parsing SQL, type casting, constructing and transforming query plans and optimizing them. It's pluggable, making it easy to write custom data sources, optimizer rules, query nodes etc.
It's has very good single-node performance (there's even a way to compile it with SIMD support) and Ballista [1] extends that to build it into a distributed query engine.
Plenty of other projects use it besides IOx, including VegaFusion, ROAPI, Cube.js's preaggregation store. We're heavily using it to build Seafowl [2], an analytical database that's optimized for running SQL queries directly from the user's browser (caching, CDNs, low latency, some WASM support, all that fun stuff).
[0] https://github.com/apache/arrow-datafusion
Interesting. Where does seafowl fit in when I compare it with, say, data-stack-in-a-box approach, for ex: meltano + dbt + duckdb + superset [0]? Is my thinking right that seafowl possibly replaces both duckdb (with IOx) and superset (if there's a web front-end)?
Incidentally, dagster had an article up just yesterday making a case for poor-man's datalake with dbt + dagster + duckdb [1]. What does splitgraph replace if I were to use it in a similar setup?
Thanks.
It's a fairly different use case from DuckDB (query execution for Web applications vs fast embedded analytical database for notebooks) and the rest of the modern data stack (which mostly is about analytics internal to a company). Just to clarify, we're not related to IOx directly (only via us both using Apache DataFusion).
If we had to place Seafowl _inside_ of the modern data stack, it'd be mostly a warehouse, but one that is optimized for being queried from the Internet, rather than by a limited set of internal users. Or, a potential use case could be extracting internal data from your warehouse to Seafowl in order to build public applications that use it.
We don't currently ship a Web front-end and so can't serve as a replacement to Superset: it's exposed to the developer as an HTTP API that can be queried directly from the end user's Web browser. But we have some ideas around a frontend component: some kind of a middleware, where the Web app can pre-declare the queries it will need to run at build time and we can compute some pre-aggregations to speed those up at runtime. Currently we recommend querying it with Observable [0] for an end-to-end query + visualization experience (or use a different viz library like d3/Vega).
Re: the second question about Splitgraph for a data lake, the intention behind Splitgraph is to orchestrate all those tools and there the use case is indeed the modern data stack in a box. It's kind of similar to dbt Labs's Sinter [1] which was supposed to be the end-to-end data platform before they focused on dbt and dbt Cloud instead: being able to run Airbyte ingestion, dbt transformations, be a data warehouse (using PostgreSQL and a columnar store extension), let users organize and discover data at the same time. There's a lot of baggage in Splitgraph though, as we moved through a few iterations of the product (first Git/Docker for data, then a platform for the modern data stack). Currently we're thinking about how to best integrate Splitgraph and Seafowl in order to build a managed pay-as-you-go Seafowl, kind of like Fauna [2] for analytics.
Hope this helps!
[0] https://observablehq.com/@seafowl/interactive-visualization-...
2 years and 9,500+ commits is a hell of a feat.
It didn’t really work.
I’m not stupid and I can read docs.
My feeling was it’s like Elastic. Default configuration is so flawed and inscrutable, on purpose, you can forget about using it yourself.
I use Thanos now. At least it fucking works.
I suppose if I need fast queries, I’ll use Postgres.
You guys need to focus on making stuff that works. It’s competitive out there and you don’t have the insights into people who try and wind up hating your guts for being annoying.
Some constructive criticism around naming... You don't have to have Flux in every single damn thing you create!
InfluxDB IOx is not replacing InfluxDB v2 because... It's just a new storage engine.
For querying we have Flux or InfluxQL...
1) Alpha / Beta phase where we experimented with several off-the-shelf key-value stores (RocksDB, LevelDB, & BoltDB). During this early phase, we learned from observing a wide variety of workloads / use-cases that we needed a custom built engine to achieve our early performance goals. But, using these off-the-shelf key-value stores allowed our (at the time) very small team to focus on developing a useful beta product and gathering user feedback.
2) TSM storage engine for 1.0 - Developed from scratch based on our learnings from phase 1, this was the first production storage engine that shipped with 1.0 in 2016 and carried us through 2.0. It served as the workhorse for 3 - 4 years as both the number of users and size of their workloads skyrocketed, eventually bumping into architectural limits of TSM.
3) IOx - equipped with a larger engineering team and years of experience with a wide variety of workloads and use-cases, IOx was developed to handle rapidly growing time series workloads that users need to handle.
The InfluxDB Cloud platform uses a variation of TSM that's tailored for a distributed SaaS rather than stand-alone nodes (this was originally intended to be used in InfluxDB v2 OSS as well, but alpha-testing showed that the old engine performed better there so it ultimately was reverted for the beta release).
So IOx is really the first major new storage engine in InfluxDB.
Why that vs. Fred Brooks' "Plan to throw the first one away" idea?