The site says "production ready in March" - it seemed to me like there's at least 3 months of work there given that most of the clustering features (e.g. how to rebuild a fialed node, how to expand the cluster, distributed queries) are not there.
My other concern with InfluxDB is that it doesn't follow the fate of FoundationDB - get acquired by a giant corporation and disappear.
The distributed queries part isn't a large amount of work beccause of how we've designed things. Under the covers the query engine already represents each query as a MapReduce job to be run.
For cluster expansion, work is starting on that today. Again it's just a matter of wiring some things up. Node replacement is also starting today.
We may miss the March goal but it won't be by anything close to 3 months. Glad you're paying attention to the project though :)
For the Foundation problem, I thought they were never open source. Just free for 5 nodes or less, no?
I think the key to avoiding this fate is to build an active community of contributors outside the company. Luckily we have people submitting PRs every week. We'll be trying to document more of the code and make it easier for outsiders to get involved as we go along.
That way if the worst happens, at least the community can fork and keep the project going forward. I'd love nothing more than for Influx to become bigger than this company.
Another thing that I think might be a critical (or at least interesting) characteristic is back-filling optimization, i.e. when you need to load a trillion data points of historical data - this y/t explains it pretty well and talks about how OpenTSDB addresses it: https://www.youtube.com/watch?v=SgD3RD2Shg4
Anyhow - keep up the good work, I very much believe that in the next couple of years "Time Series" is going to become a resume-must-include buzzword :)
Remember, in software development there are lies, damn lies, and delivery estimates.
We'll get it out as quickly as possible, sorry for any delays.
I'm currently running the latest 0.8.x release and there are a few issues:
1.My influxdb instances stop servicing reads once every 12 hours so I have a cron job that force restarts it. https://github.com/influxdb/influxdb/issues/1116
2. Enabling the graphite plugin on the first run can crash the process (the creation of the default cluster admin user seems to be racy). Not a big deal except in automated deployment scenarios.
3. I lost an entire database (luckily it was just used for storing grafana graph definitions and not actual data).
4. I'm not sure if anyone's currently working on their admin UI. I submitted a pull request to their admin UI to sort shards by ID because currently it randomizes the order on every load (I presume because of golang's randomized map iteration). It's sat there since January. The last PR they merged into that repo was in May of 2014.
I really want influxdb to be successful. Every organization I've worked for in the last few years has serious graphite scaling issues and influxdb is well positioned to fix those. I think even in it's current state it's a better option than graphite (and the influxdb-graphite plugin gives you all the graphite features).
We're heads down working on 0.9.0 and won't be doing any more releases in the 0.8.x line (except to create a migration path to 0.9.0). So we are merging PRs, but only those that apply to 0.9.0 (which includes the admin UI).
* No dependencies. Compare this with setting up HDFS/HBase and Graphite which is a real pain in the neck to manage, especially since my tsdb has to run on an arbitrary machine pool in a sandbox.
* Active development. This is a big one. Releases have been coming steadily and Paul & co. do a good job of having a real roadmap and chipping away at it; this is probably my tipping point over Graphite.
* Clustering. Maybe it's not there yet, but see above. Most tools in this space are not elastic at all.
* Grafana integration - seems like there is a good bit of momentum in that project in general which is promising.
PS Reading this it almost sounds like an ad, no I'm not affiliated with influx.
PPS logfile configuration for rotation/cleanup would be a nice-to-have enhancement ;)
For logfile rotation our recommended solution is to use logrotate. We'll be updating the install to include a config. See https://github.com/influxdb/influxdb/issues/1943
OpenTSDB relies on HBase, KairosDB has configurable and pluggable datastore - but the only production-ready so far is using Cassandra.
OpenTSDB always does interpolation of values for aggregation (which I found to be an hazardous decision), KairosDB does not really do proper series "vertical" aggregation (by vertical I mean not downsampling).
OpenTSDB is GPL, KairosDB is Apache 2.0 (that counts for closed-source integrations).
OpenTSDB supports only numerical data but supports annotations, KairosDB supports Strings and numerical in baseline but is compatible with any data type, it does not have annotations (but you may use string for that).
On their baseline OpenTSDB produces graphs on the server, KairosDB produces graphs on the client.
Both are integrated with Grafana time series dashboard, OpenTSDB has more side projects, KairosDB is the only time series database I know for being integrated with a reporting tool (BIRT).
OpenTSDB requires to create the metrics in advance using a special tool (needs to lock the cluster to allocate a new ID), KairosDB can have any kind of new metric on the fly.
If you need something modular for building custom features I strongly recommend KairosDB. Look at the code, it's really nicely crafted.
Otherwise, they both have goods, I found that KairosDB is also much less limited on the cornersides (while having less side-projects), and we now use kairosDB intensively.
Mike (BMark Admin)