undefined | Better HN

0 pointshota_mazi9y ago0 comments

> I don't know. 10MB still sounds really too big.

Can't tell if sarcastic or... o_O.

In the unlikely case you're actually serious, you really need to rethink your perception of memory costs in 2017.

0 comments

4 comments · 1 top-level

geocar9y ago· 3 in thread

No, I'm really quite serious.

KDB[1] is about 1000x faster than Spark[2], and is only about 600kb (and most of that is shared library dynamic linker stuff that makes interfacing with the rest of the OS easier). A big part of why it's fast is because it's small -- once you're inside cache memory everything gets faster.

That's the real cost of memory in 2017. So what did we gain for paying it?

[1]: https://news.ycombinator.com/item?id=13481824

[2]: http://tech.marksblogg.com/billion-nyc-taxi-rides-spark-2-1-...

ipsi9y ago

You're comparing completely, utterly different results here, and it's really hurting any point you're trying to make.

You're comparing KDB running on 4x Intel Xeon Phi 7210 CPUs, totaling 256 physical CPUs.

Compared to the best result for Java/Spark, which was running on 11x m3.xlarge instances on AWS. That's only 44 CPUs, plus it's running on AWS, not 100% dedicated hardware, so it's tough to tell what sort of an impact the virtualization + EBS has on performance. Plus, from the AWS page: "Each vCPU is a hyperthread of an Intel Xeon core except for T2 and m3.medium", which does not do anything good for the results.

Yes, technically, KDB was 199.80x faster (not 1000!) than Java/Spark, when it was given vastly superior, dedicated hardware without virtualization, and when tackling a problem that the hardware setup is optimized for. Note that the author calls this out by saying "This isn't dissimilar to using graphics cards" when talking about the setup he was using for the KDB benchmarks.

To get a sensible idea of the relative difference in performance, you would have to compare KDB and Java/Spark both running on the Xeon Phis, and/or running both on 11x m3.xlarge AWS instances - and even then, if Java/Spark does poorly on the Xeon Phi test, that might just mean that the Java/Spark developers haven't optimized for that particular setup.

srpeck9y ago

Have a look at the benchmarks here: http://kparc.com/q4/readme.txt

Also: https://hn.algolia.com/?query=http:%2F%2Fkparc.com%2Fq4%2Fre...

geocar9y ago

> You're comparing completely, utterly different results here, and it's really hurting any point you're trying to make.

Then argue with the point you think I could be making instead of the point that you think I'm making[1]

[1]: http://philosophy.lander.edu/oriental/charity.html

> you would have to compare KDB and Java/Spark both running on the Xeon Phis, and/or running both on 11x m3.xlarge AWS instances - and even then, if Java/Spark does poorly on the Xeon Phi test...

If Spark can solve the business problem in less real-time in another way, I think that would be worth talking about, but it's my understanding that a bunch of mid/large machines connected to shared storage is the typical Spark deployment, and the hardware costs are similar to the Phi solution.

So my larger question still stands: What is the value in this approach, if it's not faster or cheaper?

1 more reply

j / k navigate · click thread line to collapse