I've been meaning to open source our timeseries implementation for a while now, it is very similar to the linked article but uses a "{key}:YYYY:MM:DD:hh:mm:ss" pattern on hashes where you pick your stored granularity and TTL for each time unit. For example: store second granularity "{key}:YYYY:MM:DD:hh:mm": {0-60: count} for 8 hours, minute granularity "{key}:YYYY:MM:DD:hh": {0-60: count} for 24 hours, hour granularity "{key}:YYYY:MM:DD": {0-24: count} for 3 days, the rest forever. Very similar to https://github.com/jimeh/redistat or other implementations.
Fun!
More good implementation info here http://blog.apiaxle.com/post/storing-near-realtime-stats-in-....
Btw is there any nodejs module for voting? I've done it myself for one app but it would be nice to see other solutions.
just curious, are you guys not already using statsd/graphite? I think that at least you used to, since you contributed to my small script[0] to automate the installation of graphite... So I'm curious if/why it wasn't good enough, or whether this has different requirements that graphite wasn't suitable for?
We installed statsd/graphite early on to experiment around with visualizing our task and request logs for Zaps. We've since settled into Elasticsearch and Graylog which is phenomenal for debugging and support -- but has it's growing pains.
The timeseries stuff is used more at the application layer, rather than the pure logging layer. For example, I believe we're using it to track how many tasks an account has done over the last 30 days for pricing/plans.
Redis will enter into conflicts (where in this article's example, those locks won't "lock" the thing you're locking), and it'll lose minutes of committed operations on unexpected stops.
Does that make Redis useless? Hell no. Can it help scale your app if carefully considered, with regards to its properties? Sure. Does it "scale SQL"? No.
The locks are a minor bullet point in a much larger picture. Redis is never going to generate "conflicts" in a classical sense, but there are race conditions with the specific lock implementation. I definitely didn't suggest they were strong.
I disagree with this part. We may not have great options for it now but we're largely stuck with the requirement of a hard lock for data consistency - someday someone will figure out how to mitigate the effect here.
We manage this quite well at GigaSpaces (http://www.gigaspaces.com). I have some examples up at http://gigaspacesinanger.wordpress.com that show some use cases.
I think the chaps over at HyperDex.org may strongly disagree with you.
A "scaling SQL" article that suggests adding Redis is like a "make more beer" article that suggests adding water.
There are performant algorithms for durable operations (as seen in frameworks like LMAX's Disruptor) which are simply not explored by Redis. The Disruptor is not canonical ACID, but it is durable.
They stumbled upon scalable durability because they had no other choice. As a trading platform, they were required to be durable by law, and required to scale by their clients.
A blanket all-or-nothing statement like "it will never scale" stops you before you even try to research the space of possible solutions.
Never noticed anything like this and I've been using Redis for 3+ years.
This means that if you have part of your application that requires fast consistent GETs, and then another application does a slower SORT, UNION, DIFF, etc, on the same db or even other dbs on the same Redis server, EVERY other client request has to wait for this slower command to finish. http://redis.io/topics/latency
This is something that one really has to engineer around in order to use it in an environment that requires performance and consistent latency. In our case of 1000s req/s it was just unacceptable to have the latency be affected, sometimes by 10 times, by a slower command.
I do love all the sort, diff, union commands.
If the datasets aren't disjoint, then you're trying to do fast and slow ops with the same data, which - if you need accurate values - is going to be mildly hairy even if multithreaded, since you'll need to somehow lock the data while you do the slow op (which will exclude the GETs, causing high latency), or you'll need some kind of transaction-based stable view to operate on (e.g. transactional memory?)
For sure having multiple instances will help some of this, but adds more complexity. Do you have your app write to multiple instances, and then read low latency from one, and read high latency from another? Is that data now consistent? Do you setup Redis replication and make sure that works right and then read from different replicas? Or perhaps you engineer some queue that does not block writes, groups them together and writes to Redis in a separate thread. Then you have to maintain all this and make sure it's correct, back it up, what are the corner cases, failure modes, etc.
From my experience, if you want to engineer things well, you end up essentially building out the same sub systems that a larger db engine has. Say Innodb. I'm smart enough to know that I'm not smart enough to build a one off complex system more correctly than really smart people that have been iterating over many years and improving things on something like innodb.
There are very rare, very specific cases where I would use redis over something else if I was building something realtime, large and important.
Also you would think that rate limiting would be handled at the load balancing layer with Nginx, Apache, Layer7 etc. Way before it gets close to your app.
Not criticising Sentry for doing things a bit different. Redis is a fantastic technology.
https://github.com/seivan/redis-friendships