By clever sharding, you can work around the performance issues somewhat but it'll never be as efficient as an OLAP column store like ClickHouse or MemSQL:
- Timestamps and metric values compress very nicely using delta-of-delta encoding.
- Compression dramatically improves scan performance.
- Aligning data by columns means much faster aggregation. A typical time series query does min/max/avg aggregations by timestamp. You can load data straight from disk into memory, use SSE/AVX instructions and only the small subset of data you aggregate on will have to be read from disk.
So what's the use case for TimescaleDB? Complex queries that OLAP databases can't handle? Small amounts of metrics where storage cost is irrelevant, but PostgreSQL compatibility matters?
Storing time series data in TimescaleDB takes at least 10x (if not more) space compared to, say, ClickHouse or the Prometheus TSDB.
TimescaleDB is more performant that you may think. We've benchmarked this extensively: eg outperforming vs InfluxDB [1] [2], vs Cassandra [3], vs Mongo [4].
We've also open-sourced the benchmarking suite so others can run these themselves and verify our results. [5]
We also beat MemSQL regularly for enterprise engagements (unfortunately can't share those results publicly).
I think the scalability of ClickHouse is quite compelling, and if you need more than 1-2M inserts a second and 100TBs of storage, then that would be one reason where I'd recommend another database over our own. But horizontal scalability is something we have been working on for nearly a year, so we expect this to be a less of an issue in the near future (will have more to share later this month).
You are correct however that TimescaleDB requires more storage than some of these other options. If storage is the most important criteria for you (ie more important than usability or performance), then again I would recommend you to one of the other databases that are more optimized for compression. However, you can get 6-8x compression by running TimescaleDB on ZFS today, and we are also currently working on additional techniques for achieving higher compression rates.
[1] https://blog.timescale.com/timescaledb-vs-influxdb-for-time-...
[2] https://blog.timescale.com/what-is-high-cardinality-how-do-t...
[3] https://blog.timescale.com/time-series-data-cassandra-vs-tim...
[4] https://blog.timescale.com/how-to-store-time-series-data-mon...
How can I not respond to that!
As far as I know we've only faced off against TimeScaleDB on one small account in the IoT space.
You can't really compare columnstore storage (MemSQL) to rowstore storage (Timescale) for scanning and filtering large amounts of data for analytics use cases (of which time series use cases are a subset). I think this fact is reasonably well established at this point (the idea was popularized by the CStore project a decade ago[1]). Even at the small end scanning compressed data in columnstore format is so much faster then rowstore [2] (the data fits nicely into CPU caches and is well suited for SIMD instructions)
I would be happy to compare public customer references with timescale though. MemSQL is well established in the fortune 100 at this point:
- https://www.memsql.com/blog/real-time-analytics-at-uber-scale/
- https://www.memsql.com/blog/pandora/
- https://www.memsql.com/blog/pinterest-apache-spark-use-case/
- https://www.memsql.com/releases/akamai-real-time-analytics/
- https://www.memsql.com/blog/real-time-stream-processing-with-hadoop/
- https://www.datanami.com/2018/05/14/how-disney-built-a-pipeline-for-streaming-analytics/
[1]: http://db.csail.mit.edu/projects/cstore/vldb.pdf
[2]: https://www.memsql.com/blog/memsql-processing-shatters-trillion-rows-per-second-barrier/At my previous job we implemented custom sharding and aggregation on top of Postgres 9.4 for timeseries for a monitoring product. We did it to simplify operations as we built a new product (team <4) and we knew it would be years before our scale motivated us to adopt a specialized store.
We were pleasantly surprised, however, with how far this solution took us. 3 years later we were pushing ~30 TB every two weeks and Postgres was handing it well with predictable performance characteristics. We still didn't feel a pressing need to replace Postgres (although we were moving that direction).
It's also worth mentioning that this was 9.4 Postgres which is prior to partitioning and parallelization improvements which have been landing since 9.6. So I would expect even vanilla Postgres to handle even better.
Anyway, I'm a fan of Timescale's work and share your sentiments here almost exactly.
This is a weird answer since compression is used by columnar databases like MemSQL and Clickhouse to both save on storage and accelerate queries. Compare this to using a generic a filesystem compression which would both compress worse and make the system slower.
- we use Postgres as our main database, so being able to keep out time-series data in the same place is a big win
- perhaps because because it's a Postgres extension, the learning curve is small
- it keeps timerange-constrained queries over our event data super fast, because it knows which chunks to search across
- deleting old data (e.g. for a data retention policy) is instantaneous, as TimescaleDB just deletes the physical files that back the timerange being deleted
- it has some nice functions built-in, like `time_bucket_gapfill`. Yes, you could write your own functions to do this, but it's nice to have maintained, tested functions available OOTB
https://medium.com/@valyala/high-cardinality-tsdb-benchmarks...
We're considering a move from OpenTSDB to Timescale currently, and something that stands out in Timescale is the wide-table format; we get bundles of metrics at each tick, and having them aligned makes usage easier, and perhaps also saved us some space over having the timestamps repeated per metric.
They said they didn't want to reinvent a database engine to solve the timeseries problem, so you have what you pay for.
As our CEO mentioned in a sibling, we are working on a horizontal/scale-out solution for even higher ingest rates, as well as sharding. We're also doing some work for better compression to reduce our disk footprint.
Also since 1.2, we have support for automatic retention policies that help keep the disk usage in check. Yesterday we released 1.3, which contains our first iteration of continuous aggregations that let's you materialize aggregates over the raw data for faster querying. In a future iteration, we'll also allow you to remove the underlying/raw data but keep the aggregates -- another way to improve the disk usage of your data.
All that is to say we do consider ourselves useful for larger use cases, and have a lot of features coming down the pipe to make it even better.
I'd recommend taking a look at Prometheus[1]. It has its own _very_ performant TSDB, there's exporters for just about everything, it's the defacto way that things like Kubernetes expose metrics, and it has first class support in Grafana for visualization.
We POC'd Zabbix, Icinga, ScienceLogic, Instana, Sensu, and Prometheus. Prometheus was our favorite. Take a look at the comparison between it and other popular monitoring products to see if it fits your needs though [2].
[1] https://github.com/prometheus/prometheus [2] https://prometheus.io/docs/introduction/comparison/
That said it does some really cool stuff like tree walking across all the HP switches on our network, auto monitoring all ports it finds and then reporting on their stats and on any UP/DOWN states for every port.
Good for detecting unauthorized usage or a device which is rebooting itself.
Its IPMI support is also pretty good, we had it monitoring Supermicro IPMI interfaces with zero issue.
It handles vSphere and auto scans the entire cluster, adding all guests and monitoring them without needing to install an agent on every VM.
All in all a very good solution with some very cool features, but a steep learning curve and not much help on their forums although the docs are pretty good.
Nagios is not great, but it’s reliable and when it breaks you can figure it out.
That said, I'm not very familiar with the alternatives.
If you're happy with Icinga2, stick with that. I've used that too at a previous gig and found it better that Zabbix, but my personal take on it. YMMV
One feature I particularly like is the zabbix_send command, which I use to push the status of shell-scripted Borg backup jobs into Zabbix.
If better visuals are needed, I would hook it up to Grafana. I have previously used Grafana with Graphite as backend but it was too unreliable. If it actually works with Zabbix then it could be the perfect match.
Do the same benchmarks against a pg_partman managed partitioned db and you'll get the exact same performance. We do, at least - 150k or so metrics per second, 10 columns per metric.
Not trying to crap on the TimescaleDB guys, I've found a lot of their writeups extremely useful and can totally see how their commercially supported product fits. However, I like to see pg_partman at least mentioned somewhere in the article/comments. It's awesome and does the same job.
[0]https://github.com/timescale/timescaledb/blob/master/LICENSE
Hey, just wanted to clarify: the vast majority of TimescaleDB code is Apache2, and you can easily compile (and we ourselves build & distribute) Apache2-only binaries.
When we announced a new license in December, we didn't relicense any code, we just said that some future features will be available under a Community or Enterprise License. The code under this "Timescale License" is clearly marked and in a separate subdirectory, and for virtually all users (except the public cloud DBaaS providers), the community features are free.
This is the actual top-level LICENSE file in the repo: https://github.com/timescale/timescaledb/blob/master/LICENSE
And here's a blog post discussing in more depth: https://blog.timescale.com/how-we-are-building-an-open-sourc...
Perhaps what's not immediately obvious is that the TSL license is there to protect against cloud providers offering hosted TimescaleDB without contributing back - systems that add value (e.g. are backed by TimescaleDB for DML) can use TSL-licensed code without any issue.
On the query side we implement a whole bunch of planner and execution time optimizations that don't come with plain PostgreSQL (and pg_partman does not implement any query optimizations AFAIK). These include optimizations that have to do with ordering based on time_bucket/date_trunc, execution-time chunk exclusion, etc. These result in query speedups of more than 1000x on many common time-series queries.
TimescaleDB is much more automated than pg_partman and thus easier to maintain and administer. There are a lot less knobs to tune and a lot less things to go wrong in TimescaleDB.
We implement analytical features necessary for time-series analyis: gap-filling, common time-series functions liked time_bucket, first, last, etc.
We also implement a lot of data management functionality geared towards time-series data: scheduled data reordering, schedule data dropping/expiration, etc.
This past Monday we released major feature called continuous aggregates. That automatically maintain a materialized view of aggregates over your time-series data, updating it as new data comes in and correctly handling backfilled data as well.
The two projects are really not comparable in breadth or scope IMHO.
Having to resize/grow/stripe/etc. them is a pain.
So we came up with a clever solution that batches chunks to S3:
https://www.youtube.com/watch?v=x_WqBuEA7s8
$10/day for 100M records (100GB data), all costs!
And best yet, reduced DevOps! Very practical, super simple.