undefined | Better HN

0 pointsmaartenbreddels7y ago0 comments

You are right, I'm actually underselling it. 1 second is the typical performance for doing a 2d histogram (or other binned statistics) since it involves writing to memory as well.

I just ran a quick benchmark: In [7]: %timeit -r3 -n3 df.mean(df.ra) 330 ms +- 5.46 ms per loop (mean +- std. dev. of 3 runs, 3 loops each) In [11]: f'{len(df):,}' Out[11]: '1,692,919,135' In [12]: 330/len(df)1e9 Out[12]: 194.92957057278463

so it is 0.2second for 1.7 billion rows, which is:

In [15]: (len(df)

8/10243)/0.2 Out[15]: 63.066152296960354

63 GB/s. (this is a high end machine, on my laptop I get ~12GB/s)

We do not use float32 much in science since you really should know how not to screw up. It does give some extra performance boost (not much though), and also saves you on memory cache.

0 comments

1 comments · 1 top-level

aw3c27y ago

Is this cold data? Or already in RAM? What about a billion rows that are not in RAM yet?

How does it compare to plain numpy or pandas?

j / k navigate · click thread line to collapse