A look at some of Python's useful itertools (opens in new tab)

(naiquevin.github.io)

69 pointsnaiquevin13y ago28 comments

28 comments

11 comments · 4 top-level

wting13y ago· 6 in thread

My intention is not to be snarky, but people post all the time about discovering the itertools or collections library. I notice it's a common gap in newer Python programmers.

Save yourself time and effort down the road and read through both libraries' documentation, they're well worth the effort:

http://docs.python.org/3.3/library/itertools.html

http://docs.python.org/3.3/library/collections.html

I tend to use defaultdict, deque (thread safe), namedtuple, imap, izip, drop/takewhile. In Python 3, map and zip have been replaced with their itertools equivalents.

I blame Haskell for all the lazy evaluation influence. :P

naiquevinOP13y ago

Author here. I completely agree with you. Could have known all this before had I just read the itertools docs beyond permutations and combinations. It was only after using Scala's groupby while solving an assignment problem that I thought about finding out an equivalent in python. And thanks to having worked with the Stream class from Scala collections, for the first time the rest of the itertools also made sense. Better late than never I guess :-)

Aqueous13y ago

Is there any reason that you don't switch over to Scala for your main programming language?

I'm in the process of converting a middle-sized PHP codebase completely over to Scala. It is not uncommon for me to see functions that run between 15 and 30 lines shrink to 3 or 4 line functions thanks to the Collections API. On top of that its performance is something you'll never see in PHP or Python. I love Python for its readability and wonderful syntax, but Scala is starting to have an even greater pull on me for its built in concision and the ability to use the entire Java ecosystem without submitting to java's imperative style.

1 more reply

pjmlp13y ago

>My intention is not to be snarky, but people post all the time about discovering the itertools or collections library. I notice it's a common gap in newer Python programmers.

Not only in Python, but programming languages in general.

I still find people writing Java or .NET code that aren't aware of all nice classes that are part of the runtime and end up creating their half baked solutions for their problems.

Nowadays developers seem to code without reading.

Silhouette13y ago

Nowadays developers seem to code without reading.

When your standard library documentation is so vast that it would take weeks to read and understand it all, and you'd never remember most of it anyway without context and experience using it, I don't think "coding without reading" is really a fair complaint.

We as an industry need to get better at documentation, and in particular about separating tutorial/overview documentation that presents a map and summary of what's available from reference documentation, or we're going to keep reinventing wheels like this.

Python is a particularly unfortunate example, because while its documentation is vast, it has very little tutorial/overview material beyond the very basics. For example, given that a substantial proportion of Python's standard library actually doesn't work very well in practice, it would be helpful to have a deeper tutorial/map document somewhere that introduced the various areas of the standard library and that also promoted the good ones and suggested popular alternatives for the not so good ones where they exist.

3 more replies

naiquevinOP13y ago

> Nowadays developers seem to code without reading.

As someone who started programming 4 year back (which from the conversation seems to be pretty much "nowadays" :-)), I think it is a bit of a generalization. I am not saying it's not true. There are certainly places such as StackOverflow, mailing lists etc that attract newbies early on because they provide quick answers or even code, but at some point every developer who is serious about computer programming as a long term profession does need to start reading the docs. There is no other alternative and one eventually comes to realize that it's much faster than arbitrarily hunting for code and asking questions on mailing lists and IRC.

It also depends upon the style of programming language (imperative/functional) and the previous experience of the developer IMO. For eg. I find my self reading the docs significantly more in Erlang/Scala than in Python than in PHP/JS. It is also the reverse order in which I learnt these languages. Of course that is my personal experience.

shavenwarthog213y ago

Seconded. I use itertools constantly for data-wrangling on network server and web applications. Specifically:

- processing files with imap and ifilter to rapidly grab data, find a subset of it, then process it with a function

- defaultdict(list) is incredibly useful for collecting data, arranging it by a certain key (like date or object id), then collecting into a list

- namedtuple is occasionally useful for efficiently stuffing data into an object with a few named attributes.

masklinn13y ago· 1 in thread

    def flatmap(f, items):
        return itertools.chain(*map(f, items))

1. in Python 2 `map` is eager which — as with the previous `even` filter — may lead to unnecessary work if you only need part of the list (or a dead process if the input is infinite...). itertools.imap (or a generator comprehension) would be better. This is "fixed" in Python 3 (where the `map` builtin has become lazy and `itertools.imap` has been removed) but

2. it's being eagerly unpacked through *, itertools.chain also provides a from_iterable method which doesn't have that issue (and can be used to flatten infinite streams), introduced in 2.6

So `flatmap` would probably be better as:

    def flatmap(f, items):
        return itertools.chain.from_iterable(
            itertools.imap(
                f, items))

naiquevinOP13y ago

Thanks for the corrections. I have made an edit (although not sure how long it will take to clear the github-pages cache)

serjeem13y ago

I wrote my favorite function ever last semester with itertools! It (roughly) lazily generates a list of dictionaries that map players to their moves for all possible moves. It turns out you can do that with a chain of combinations, two cartesian products, and an imap: https://github.com/shargoj/acquire/blob/master/gametree.py#L...

davvolun13y ago

On the other hand, I suspect some early programmers might get ahold of this and perform a lot of premature optimizations. A piece of code that runs 20 loops instead of 8 once every couple of hours probably doesn't need to be optimized. A piece of code that does two checks when one would suffice that runs 1000 times every second might need optimization. Profile first, then optimize.

j / k navigate · click thread line to collapse

28 comments

11 comments · 4 top-level

wting13y ago· 6 in thread

My intention is not to be snarky, but people post all the time about discovering the itertools or collections library. I notice it's a common gap in newer Python programmers.

Save yourself time and effort down the road and read through both libraries' documentation, they're well worth the effort:

http://docs.python.org/3.3/library/itertools.html

http://docs.python.org/3.3/library/collections.html

I tend to use defaultdict, deque (thread safe), namedtuple, imap, izip, drop/takewhile. In Python 3, map and zip have been replaced with their itertools equivalents.

I blame Haskell for all the lazy evaluation influence. :P

naiquevinOP13y ago

Aqueous13y ago

Is there any reason that you don't switch over to Scala for your main programming language?

1 more reply

pjmlp13y ago

>My intention is not to be snarky, but people post all the time about discovering the itertools or collections library. I notice it's a common gap in newer Python programmers.

Not only in Python, but programming languages in general.

I still find people writing Java or .NET code that aren't aware of all nice classes that are part of the runtime and end up creating their half baked solutions for their problems.

Nowadays developers seem to code without reading.

Silhouette13y ago

Nowadays developers seem to code without reading.

3 more replies

naiquevinOP13y ago

> Nowadays developers seem to code without reading.

shavenwarthog213y ago

Seconded. I use itertools constantly for data-wrangling on network server and web applications. Specifically:

- processing files with imap and ifilter to rapidly grab data, find a subset of it, then process it with a function

- defaultdict(list) is incredibly useful for collecting data, arranging it by a certain key (like date or object id), then collecting into a list

- namedtuple is occasionally useful for efficiently stuffing data into an object with a few named attributes.

masklinn13y ago· 1 in thread

    def flatmap(f, items):
        return itertools.chain(*map(f, items))

2. it's being eagerly unpacked through *, itertools.chain also provides a from_iterable method which doesn't have that issue (and can be used to flatten infinite streams), introduced in 2.6

So `flatmap` would probably be better as:

    def flatmap(f, items):
        return itertools.chain.from_iterable(
            itertools.imap(
                f, items))

naiquevinOP13y ago

Thanks for the corrections. I have made an edit (although not sure how long it will take to clear the github-pages cache)

serjeem13y ago

davvolun13y ago

j / k navigate · click thread line to collapse