> all(map(func3, filter(func2, map(func1, zip(a, b)))))
> a.zip(b).map(func1).filter(func2).forall(func3)
The original is indeed terrible and the second version is a bit better. A lot better than either one, though, is splitting your logic into multiple lines and assigning a descriptive identifier to each step. Maybe even throw in some inline comments if you're particularly respectful of others' time.
As tempting as it is to do something super clever and cram a ton of functionality into a small number of lines or characters (it does feel good), it's just better to be a bit more verbose and write simple, obvious code. I feel like code should be read like a book, not a puzzle.
Though to be fair, having explicit intermediate variables is idiomatic in Python, from what I've seen. It's one of my biggest pet-peeves about the language, but it's not without precedent.
It's not being reused and one of the following is true... I don't want to leave behind intermediary objects for whatever reason is relevant, or I feel its worth it to compress the logic to make it possible to use a language feature that requires an expression, like lambdas or list/dict comprehensions.
Lets make this a somewhat concrete example.
---
heights = [1,2,3]
widths = [4,5,6]
# printing area greater than 10
# functional
heights.zip(widths).map(to_area).filter(lambda area: area > 10).forall(lambda a: print("Area " + a)
#Verbose way
hw_zipped = zip(a,b)
areas = hw_zipped.map(to_inches)
big_areas = areas.filter(a: a > 10)
for a in big_areas: print("Area " + a)
---
Which do you prefer? I would argue the right level of abstraction is the functional way in this example, and its often the case in my experience, especially in python where you don't often use a namespace to store these intermediary variables and you have can't rely on typing
result = [area for x,y in zip(heights,widths) if (area := to_area(x,y)) > 10]
I don't think that's very easy to read; I'd opt for two list comps like areas = [to_area(x,y) for x,y in zip(heights,widths)]
result = [area for area in areas if area > 10]
But I agree with OP that map+filter is easier to read. for x, y in zip(a,b):
area = to_area(x, y)
if area > 10:
print(f"Area {area}")
>in python where you don't often use a namespace to store these intermediary variablesHm? Most python code is within a function, in my experience.
arr.map(func)
vs. list(map(func, arr))Working with data scientists, in practice, these identifiers are usually "arr1", "arr2", &c. I'd rather have method chaining. Often the intermediates are not meaningful.
It's probably the core skill of good programmers though, so it should be taught more. I don't think anyone sets out to use misleading names, but it's easy for name and code to diverge, and it's crippling to readability.
However, often when refactoring/updating such data scientist code (or even understanding), I need to break apart the long method chains, and this is much, much more annoying than dealing with crummy names.
At least I can print the values associated with the names, which is not easily possible in the really long method chain.
I find fluent style often clearer as well as more terse than with superfluous intermediate variables. Verbosity isn't the same thing as clarity.
(But in Python, comprehensions/genexps are often clearer than either.)
The idiomatic Python 3 version uses generators to compose the computation and to avoid unnecessary memory allocations. Does funct.Array also do this?
- https://docs.python.org/3/library/functions.html#map - https://docs.python.org/3/library/functions.html#filter
> all(func3(a) for h,w in zip(a,b) for a in func1(h,w) if func2(a))
bool (__bool__) Returns whether all elements evaluate to True.
I’d be worried that this will trip people up who use the if l:
print l[0] # or whatever
patternGood point. However setting
def __bool__(self): return self.nonEmpty
would mess up certain methods e.g. .index for nested Arrays as __eq__ is computed elementwise and bool(Array(False, False)) would evaluate to True.Maybe a warning would be appropriate? (as is the case with ndarrays)
Isn't that consistent with the built-in `list`, though, because `bool([False, False])` is True?
Many operations are implemented as iterator in python on list, like filter, groupby. Looking at your code, its looks like you're not doing lazy computation. (Correct me if I wrong). This could be huge performance impact, depending upon use case of list.
Regarding the perfomance, Arrays aren't meant to be super high performing but rather a simple way to manipulate sequences. For the best performance you should go with generic python, toolz or other.
The whole point of some things being functions versus methods is that they are generic rather than specialized. The generic iterator protocol is probably the best feature about the Python language, and it's both a damn shame and bad design to not use it.
If you really wanted to make an improvement over built in lists, the thing to do would be to implement some kind of fully lazy "query planning" engine, like what Apache Spark has. Every method call registers a new method to be applied with the query planner, but does not execute it. Execution only occurs when you explicitly request it. That way you can effectively compile in efficient but readable code that takes multiple passes over the data into efficient operations internally that only make one pass, or at least fewer passes. This also naturally lends itself to parallelization/concurrency.
I don't think so. Very frequently the intermediate values represent nothing in particular and naming them simply results in visual noise.
I think this is comparable to SQL or LINQ statements. Consider what those would look like if you had to name every intermediate values instead of being able to filter and group on-the-fly.
Of course you can make a mess out of those too, by building huge unreadable expressions, but that's also an extreme, similar to naming every intermediate step.
I use numpy & pandas, lists & dicts every day. I read your docs/github page, but can you help me see the value?
However, I do think there are lots of common tasks that need to be done with lists that should be methods rather than fancy footwork =)
For example: https://stackoverflow.com/questions/3462143/get-difference-b...
As you allude to w your zip loop: https://stackoverflow.com/questions/1919044/is-there-a-bette...
Naturally if you're dealing with big arrays/tensors, numpy is the best choice for operating on sequences.
However, ndarrays have downsides for certain use cases - as ndarrays are fixed size, adding elements is very slow, also they don't support functional methods (or rather you have to create a new array every time you apply e.g. a map), and ndarrays of any other type than numbers doesn't really make sense.
Many of the methods are wrappers for built-ins, but I find the syntax of Arrays cleaner than the weirdness of the builtins.
For example, while applying an async "starmap" to an Array is just a method call, with built-in lists you would have go through the whole hassle of importing both ThreadPoolExecutor and starmap, creating an executor, scheduling the function, and finally converting the result back to a list.
That resonates with me now that you explain that I can't do it.
I do like chaining things in pandas like `df.select_types("float").head(100).plot.hist()`
> a |> zip(b) |> map(func1) |> filter(func2) |> forall(func3)
The advantages would be that this would work with all lists/iterables, so no need to make a special types.
In short: it's a lot more complicated than it seems, but I agree that this style makes this type of thing 1000x more readable.
from functools import partial
a |> partial(zip, b) |> partial(map, func1) |> partial(filter, func2) |> partial(forall, func3)
Obviously it's a bit more verbose than if the currying was done implicitly, but it's not too bad, I think. You could also import partial under a shorter name if you want.partial does have an advantage over implicit currying in that you can use keyword arguments to neatly curry on a parameter other than the first, although this isn't properly utilized by Python because most of the built-in functions have place-based rather than keyword arguments. In languages with implicit currying you have to use anonymous function expressions or functions like flip (flip(f, x, y) = f(y, x)) to deal with this.
It might also be worth noting that |> doesn't essentially need to be an operator, it would just be syntactic sugar:
def chain(x, *fs):
y = x
for f in fs:
y = f(x)
return y
chain(a, partial(zip, b), partial(map, func1), partial(filter, func2), partial(forall, func3))
Obviously having it as an infix operator is nicer, and produces less parentheses.Pandas allows the first param in a pipe to be a tuple[callable, str], where the second argument would signify the parameter location, e.g. `val |> (func, "param_name")` which gives some flexibility.
But yeah, if you open up to piping, there are a lot of possible choices to be made and easy to go overboard also IMO.
from collections import defaultdict
d = defaultdict(int)
d['non_existant_key'] d.get("non_existant_key", default)That is super readable to me. Working left to right or inside out. There is one, clear, balanced, familiar, consistently used punctuation to guide you, parens, if you need it but adds little noise if you dont.
The “bunch of functions taking and returning an iterator” is a great paradigm. So clean and flexiable, and powerfull. ESP combined with Python’s “many things are iterable” and is trivial to write your own iterator
Any sort of reading inside out, right to left is a barrier to easy reading. This is why people like pipes in functional languages, right? You just read it in one direction.
I'm sure the author is aware of it, but readers might not be.
You definitely wouldn’t do this in “traditional Python”. You’d use a comprehension of some kind, or even the walrus operator, which is quite possibly faster and more readable than several chained lambdas.
all(func3(y) for y in (func1(x) for x in zip(a, b)) if func2(y))
It most likely is a bit faster, but I wouldn't say it's more readable.It looks like Array mostly consolidates functional features already available in standard libraries, and the main innovation is a redesigned swiss-army-knife API.
Good APIs are important, but my instinct is they aren’t this important. Using enhanced versions of built-in container types sounds nice, but do you really want to be keeping track of whether something is a normal list or an Array? Do you want to force people who read your code to learn this library to work with something as fundamental as lists? It’s not an impossible bar to clear (e.g. NumPy, Pandas, Dask, xarray) but it’s a high one.
I’m sure Array’s not for everyone, but for some, including me it’s a nifty tool. I don’t expect people to memorise all the features of the library - the aim was to name and document each feature clearly such that finding the right method would be easy with the help of an IDE.
One use case for the chaining/FP style that I find particularly powerful is building out logic on the REPL. The chaining style allows me to incrementally grow my chain like a unix pipeline, see the results, use that to tweak the chain, until I finally have what I want.
This type of instantaneous feedback loop is both highly productive and also extremely fun.
If, however, you need the dynamic nature of the built-in list or functional methods with a touch of numpyness, you should give Array a spin.
E.g.
def removeByIndex(self, b):
""" Removes the value at specified index or indices. """
...
def removeByIndex_(self, b):
""" Removes the value at specified index or indices in-place. """
...
If you were to follow typical naming conventions, these would be either def remove_by_index(self, b): ...
def remove_by_index_inplace(self, b): ...
Or pandas-like: def remove_by_index(self, b, inplace=False): ...
Or, one more step, use explicit typing as well (which also makes it more clear that the method returns self), and give a better name to the method argument rather than 'b': def remove_by_index(
self,
index: Union[int, Iterable[int]],
inplace: bool = False,
) -> 'Array': ...
Explicit type signatures in libraries like this make many things self-explanatory, like the one above.This is due to classes without `__slots__` gaining a `__dict__` attribute for dynamic attribute assignment.
Currently:
>>> sys.getsizeof([])
56
>>> a = Array()
>>> sys.getsizeof(a)
72
>>> sys.getsizeof(a.__dict__)
104
with `__slots__ = []` in the Array class definition: >>> a = Array()
>>> sys.getsizeof(a)
56
>>> sys.getsizeof(a.__dict__)
AttributeError: 'Array' object has no attribute '__dict__'