undefined | Better HN

0 pointsSamReidHughes11y ago0 comments

You would get the same behavior that Postgres's would be in terms of how data is traversed and aggregated -- that is, not by building a bunch of groups and counting them after the fact. I do think RethinkDB ought to be able to apply aggregations to group queries on the fly though... I'm not really up to date on that.

Postgres will still have better numbers, I'm sure. It has a schema for starters.

0 comments

3 comments · 1 top-level

lobster_johnson11y ago· 2 in thread

While I don't know RethinkDB is structured internally, I don't see any technical reason why a non-mapreduce group-by needs to load the entire table into memory instead of streaming it, or why a mapreduce group-by needs to be slow. M/R only becomes a slow algorithm once you involve shards and network traffic; any classical relational aggregation plan uses a kind of M/R anyway.

Postgres has a schema, of course, but it still needs to look up the column map (the ItemIdData) in each page as it scans it, the main difference being that this map is of fixed length, whereas in a schemaless page it would be variable-length.

Anyway, I'm hoping RethinkDB will get better at this. I sure like a lot about it.

SamReidHughesOP11y ago

Generally speaking RethinkDB doesn't query optimize, except in deterministic ways, unless they've changed policy on this. I don't see any reason why a plain group/aggregate query couldn't be evaluated appropriately -- I know it is when the grouping is done using an index, maybe it is now when the grouping is done otherwise (I don't know, but it would be sensible, I'm out of date).

SamReidHughesOP11y ago

(Also it would be nice if it did/does, because performance will still be terrible if you have too many groups, otherwise.)

j / k navigate · click thread line to collapse