Road to NumPy 2.0 (opens in new tab)

(hackmd.io)

68 pointsadm_2y ago22 comments

22 comments

While we are trying to minimize the disruptions, there is one thing project maintainers should do right now: pin the maximum NumPy to <2.0 in their ~`pyproject.toml`~ project dependencies. This will ensure they do not inadvertently upgrade before they are ready to do so. Once numpy2.0 is released, you can check that your code works with it, and then release the pin.

appplication2y ago

Definitely good advice on pinning the upper range for major versions. Pandas 2.0 inadvertently broke spark’s toPandas for (at the time) all existing spark versions, which some of our users lean on. Pandas response to this was apparently to upgrade spark to a version that was not yet released. We lost about half a day’s work from a small handful of devs trying to identify why some of our users were seeing sudden failures.

fouronnes32y ago

I think that ndarray is the most successful abstraction I've come across. Numerical computing is a domain ripe for terrible code, but multi-indexing, broadcasting, mask arrays, .reshape(), .where(), linspace(), and all that are so well made and useful they are now the standard grammar of data science. Yes you've seen horrible numpy code before, but how much worse would it be if it had been written by the same person in raw C with only malloc and pointer arithmetic?

ktpsns2y ago

Just FYI, numpy by no means pioneered this concept. There was Fortran before, as a language built around n-dimensional arrays. And even Fortran was not the first one, as some stack based programming languages (such as APL, IIRC) have similar concepts. Also languages such as R and evventually Matlab (as a kind-of nicer frontend to Fortran libs) pioneered this concept. However, Numpy was the first library bringing this into a general-purpose language as Python is.

rajandatta2y ago

Fortran, Matlab and all array based languages (APL, J, K, Niall, etc) have these constructs at their core. Ndarray may be a great implementation but the ideas have all been there. I've been exploring J recently and the flexibility and compactness is tremendous.

dekhn2y ago

Numeric Python, the original numpy, was based mainly on the MATLAB array object. Of course, it was also influenced by FORTRAN and other languages, but ultimately, it was taken from the mental model of MATLAB.

1 more reply

yakubin2y ago

s/stack/array/

d0mine2y ago

Fortran was created in the 1950s. APL — 60s.

jcarrano2y ago

I remember implementing a sliding overlapping 2d windowing by playing with the strides. I don't know how common is that trick but at the time it felt like magic.

uoaei2y ago

The real interesting page IMO is the actual project roadmap:

https://github.com/orgs/numpy/projects/9

LeanderK2y ago

numpy is a great library, but I find the pytorch take on it superior. It allows for much more method-chaining, which I find to be very readable in longer computations. I frequently want to reach out for it in numpy, only to see that it's a np.* function. In functional languages I would use an infix pipeline-combinator like (|>) but that's not possible in python.

syockit2y ago

While not really python, there's Coconut language that has the infix pipeline operator and compiles into python.

whalesalad2y ago

This service HackMD looks pretty cool.

otsaloma2y ago

Still no NA?

ngoldbaum2y ago

I’m working on adding missing data support for strings as part of adding a UTF-8 variable-width string type to NumPy. Not a general solution but should help with a lot of use-cases. https://numpy.org/neps/nep-0055-string_dtype.html

otsaloma2y ago

The current memory use of string arrays is another major issue, glad to see this being worked on!

EForEndeavour2y ago

np.nan? Not trying to be funny, but hoping to learn whether I'm missing something about limitations of np.nan which would be solved by some other kind of missing value indicator.

otsaloma2y ago

np.nan is only for floats, doesn't help with integer, boolean, string etc. Also, datetimes have NaT, but it's troublesome to e.g. do different checks np.isnan() or np.isnat() depending in the data type. And we don't even have np.nat, but need np.datetime64("NaT"), so it's just confusing.

1 more reply

j / k navigate · click thread line to collapse

22 comments

mattip2y ago

appplication2y ago

fouronnes32y ago

ktpsns2y ago

rajandatta2y ago

dekhn2y ago

1 more reply

yakubin2y ago

s/stack/array/

d0mine2y ago

Fortran was created in the 1950s. APL — 60s.

jcarrano2y ago

I remember implementing a sliding overlapping 2d windowing by playing with the strides. I don't know how common is that trick but at the time it felt like magic.

uoaei2y ago

The real interesting page IMO is the actual project roadmap:

https://github.com/orgs/numpy/projects/9

LeanderK2y ago

syockit2y ago

While not really python, there's Coconut language that has the infix pipeline operator and compiles into python.

whalesalad2y ago

This service HackMD looks pretty cool.

otsaloma2y ago

Still no NA?

ngoldbaum2y ago

otsaloma2y ago

The current memory use of string arrays is another major issue, glad to see this being worked on!

EForEndeavour2y ago

np.nan? Not trying to be funny, but hoping to learn whether I'm missing something about limitations of np.nan which would be solved by some other kind of missing value indicator.

otsaloma2y ago

1 more reply

j / k navigate · click thread line to collapse