Also, the name also threw me off. I thought it was "Citrus" and not "Citus".
[Full disclosure: I am writing an open source, distributed, behavioral database - https://github.com/skylandlabs/sky]
That can be a bit heavyweight if you want to simply compare people who did action "A" or "B", filter based on complex criteria, or apply simple analytic functions. Also, apart from standard relational algebra operators, SQL provides a lot of convenience functions for math operations, string manipulations, date and time formatting, pattern matching, and so forth. These may come in handy to users who want to quickly gather insights out of their data.
Beyond that you have to worry about whether A, B and C are all within the same session. Trying to define a session such as "all events that occurred until there is 30 minutes of idle time" is going to be damn near impossible in the SQL query.
A few points:
(i) We can do ad-hoc realtime analytics on hundreds of millions of data points.
(ii) We can also do realtime analytics on billions of datapoints as long as we pre-compute along one dimension.
(iii) We could do a lot better at (i) and (ii) if we invested more heavily in hardware (and Citus would make this pretty painless, actually).
(iv) I'd normally not consider a closed-source solution personally, but since Citus is based so heavily on PostgreSQL (protocol-level compatibility, configuration, codebase), this has been a non-issue for us. We can still lean on the amazing PostgreSQL community, documentation, and for the parts we don't have the source code to, the Citus team has been very helpful in explaining how things work.
(v) Fault tolerance is immaculate. At the node level, PostgreSQL is notoriously one of the most reliable and robust databases available. At the cluster level, Citus will magically fall back to a replica mid-query when a server dies.
(vi) Although realtime inserts are not supported out of the box, the system is flexible enough that we were able to get this working on our own without help from Citus.
(vii) Schema migrations are also not supported out of the box, but we built a schema migration framework that takes care of this for us.
(viii) We're not worried about vendor lock-in, since the data is just stored on our servers, in the PostgreSQL serialization format. If we wanted to, we could just give up the features that Citus gives us and build our own data-access layer on top of our cluster.
Anyway, it won't be everything to everyone, but it works very well for our OLAP use-case of timeseries ad impression data. I'd definitely recommend looking into it if you're otherwise considering Hadoop, Vertica, Aster, Greenplum, or a sharded MySQL/PostgreSQL setup.
Full disclosure: I am extremely biased since I've gotten to know the team very well after using Citus. I'm definitely one of their biggest fans, if for no other reason the amount of time they've saved us at MixRank.
For the distributed query processor, we can efficiently parallelize SQL queries that involve look-ups, complex selections, groupings and orderings, analytic functions, and joins between one large and multiple small tables. We also have a lot more coming; are there any queries that you are particularly interested in?
Is this a bulk/batch load only system then?