InfluxDB has taken its open-source business to Silicon Valley (opens in new tab)

(technical.ly)

46 pointsrhoml11y ago22 comments

22 comments

18 comments · 8 top-level

gtrubetskoy11y ago· 2 in thread

Incidently I wrote a blog on it last week: http://grisha.org/blog/2015/03/20/influxdb-data/

The site says "production ready in March" - it seemed to me like there's at least 3 months of work there given that most of the clustering features (e.g. how to rebuild a fialed node, how to expand the cluster, distributed queries) are not there.

My other concern with InfluxDB is that it doesn't follow the fate of FoundationDB - get acquired by a giant corporation and disappear.

pauldix11y ago

Hi Grisha, I saw that post, thanks for writing it! The coming features you're talking about are the work we're focused on for finishing this release. The three you mention should drop in an RC within two weeks.

The distributed queries part isn't a large amount of work beccause of how we've designed things. Under the covers the query engine already represents each query as a MapReduce job to be run.

For cluster expansion, work is starting on that today. Again it's just a matter of wiring some things up. Node replacement is also starting today.

We may miss the March goal but it won't be by anything close to 3 months. Glad you're paying attention to the project though :)

For the Foundation problem, I thought they were never open source. Just free for 5 nodes or less, no?

I think the key to avoiding this fate is to build an active community of contributors outside the company. Luckily we have people submitting PRs every week. We'll be trying to document more of the code and make it easier for outsiders to get involved as we go along.

That way if the worst happens, at least the community can fork and keep the project going forward. I'd love nothing more than for Influx to become bigger than this company.

gtrubetskoy11y ago

Thanks Paul! So you're saying it's all a SMOP :)

Another thing that I think might be a critical (or at least interesting) characteristic is back-filling optimization, i.e. when you need to load a trillion data points of historical data - this y/t explains it pretty well and talks about how OpenTSDB addresses it: https://www.youtube.com/watch?v=SgD3RD2Shg4

Anyhow - keep up the good work, I very much believe that in the next couple of years "Time Series" is going to become a resume-must-include buzzword :)

1 more reply

CSDude11y ago· 2 in thread

I am using InfluxDB in my research, to analyze resource utilizations of running applications and it has been very useful to me since, but I think it was supposed to be production ready this March. There are some bugs that occurs sometimes.

pauldix11y ago

We're busy at work on the production ready version. We're targeting March, but we won't release until it's ready (even if that means slipping our target).

Remember, in software development there are lies, damn lies, and delivery estimates.

We'll get it out as quickly as possible, sorry for any delays.

eik3_de11y ago

can you say something about the upgrade path, will that be possible to do live?

1 more reply

shanemhansen11y ago· 1 in thread

I really want to love influxdb because I think the world needs a better answer to time series databases that doesn't include java (OpenTSDB, Cassandra). The underlying storage engine (leveldb/rocksdb) is quite solid. I'm currently running 3 nodes in production (for collecting stats) and doing a few thousand writes/s. I'm not using any of the clustering features, I probably won't even evaluate that until 0.9.

I'm currently running the latest 0.8.x release and there are a few issues:

1.My influxdb instances stop servicing reads once every 12 hours so I have a cron job that force restarts it. https://github.com/influxdb/influxdb/issues/1116

2. Enabling the graphite plugin on the first run can crash the process (the creation of the default cluster admin user seems to be racy). Not a big deal except in automated deployment scenarios.

3. I lost an entire database (luckily it was just used for storing grafana graph definitions and not actual data).

4. I'm not sure if anyone's currently working on their admin UI. I submitted a pull request to their admin UI to sort shards by ID because currently it randomizes the order on every load (I presume because of golang's randomized map iteration). It's sat there since January. The last PR they merged into that repo was in May of 2014.

I really want influxdb to be successful. Every organization I've worked for in the last few years has serious graphite scaling issues and influxdb is well positioned to fix those. I think even in it's current state it's a better option than graphite (and the influxdb-graphite plugin gives you all the graphite features).

pauldix11y ago

Hi Shane, thanks for the encouragement and sorry you're having a few problems with the current 0.8.8 release.

We're heads down working on 0.9.0 and won't be doing any more releases in the 0.8.x line (except to create a migration path to 0.9.0). So we are merging PRs, but only those that apply to 0.9.0 (which includes the admin UI).

infinotize11y ago· 1 in thread

Another InfluxDB user here. I'd done some evaluations with OpenTSDB and the Graphite suite, and while I had some concerns with stability and maturity the main things that sold me on it were:

* No dependencies. Compare this with setting up HDFS/HBase and Graphite which is a real pain in the neck to manage, especially since my tsdb has to run on an arbitrary machine pool in a sandbox.

* Active development. This is a big one. Releases have been coming steadily and Paul & co. do a good job of having a real roadmap and chipping away at it; this is probably my tipping point over Graphite.

* Clustering. Maybe it's not there yet, but see above. Most tools in this space are not elastic at all.

* Grafana integration - seems like there is a good bit of momentum in that project in general which is promising.

PS Reading this it almost sounds like an ad, no I'm not affiliated with influx.

PPS logfile configuration for rotation/cleanup would be a nice-to-have enhancement ;)

pauldix11y ago

Thanks, we're working hard on getting the clustering features complete so we have a real answer for HA, failover, and scalability (up to a point based on current design).

For logfile rotation our recommended solution is to use logrotate. We'll be updating the install to include a config. See https://github.com/influxdb/influxdb/issues/1943

erichmond11y ago· 1 in thread

Great product and smart pivot, I hope they do well. Them blowing up would be another win for the NYC tech scene (indirectly).

pauldix11y ago

Thanks Eric!

jacques_chester11y ago· 1 in thread

Denver office? Sounds like someone recruited a few Pivotal Labs alumni :)

pauldix11y ago

We have, but sadly not in Denver... yet ;)

bad_user11y ago· 1 in thread

Any comparisons with KairosDB or OpenTSDB?

iolco5111y ago

KairosDB is less popular, smaller, but with different limitations and is more flexible. It was inspired by the OpenTSDB design but then took a different path.

OpenTSDB relies on HBase, KairosDB has configurable and pluggable datastore - but the only production-ready so far is using Cassandra.

OpenTSDB always does interpolation of values for aggregation (which I found to be an hazardous decision), KairosDB does not really do proper series "vertical" aggregation (by vertical I mean not downsampling).

OpenTSDB is GPL, KairosDB is Apache 2.0 (that counts for closed-source integrations).

OpenTSDB supports only numerical data but supports annotations, KairosDB supports Strings and numerical in baseline but is compatible with any data type, it does not have annotations (but you may use string for that).

On their baseline OpenTSDB produces graphs on the server, KairosDB produces graphs on the client.

Both are integrated with Grafana time series dashboard, OpenTSDB has more side projects, KairosDB is the only time series database I know for being integrated with a reporting tool (BIRT).

OpenTSDB requires to create the metrics in advance using a special tool (needs to lock the cluster to allocate a new ID), KairosDB can have any kind of new metric on the fly.

If you need something modular for building custom features I strongly recommend KairosDB. Look at the code, it's really nicely crafted.

Otherwise, they both have goods, I found that KairosDB is also much less limited on the cornersides (while having less side-projects), and we now use kairosDB intensively.

lasermike02611y ago· 1 in thread

Congrats Paul!

Mike (BMark Admin)

pauldix11y ago

thanks Mike!

j / k navigate · click thread line to collapse