undefined | Better HN

0 pointsTallGuyShort13y ago0 comments

If you do enough queries, you should spend the time to use RCFile for Hive, in which case redshift wont come out _that_ much faster. The point is the 17 hours is not negligible.

0 comments

2 comments · 1 top-level

dromidas13y ago· 1 in thread

That is a good case since customers who typically need a datawarehouse aren't just going to upload data once... they probably are going to upload frequently.

TallGuyShortOP13y ago

You're missing my point and resorting to sarcasm - very nice </sarcasm>. My point is not that Hive is the better choice because everyone is going to reload their data frequently. My point is that if you want a fair benchmark, don't use an obviously slow data format for Hive. They spent time importing data optimized for RedShift, but they took a very naive approach for Hive. I'm sure RedShift will still be faster, but not 10 times faster.

j / k navigate · click thread line to collapse