[Tutor] help with itertools.izip_longest

Oscar Benjamin oscar.j.benjamin at gmail.com
Sat Mar 16 22:32:54 CET 2013


On 16 March 2013 21:14, Abhishek Pratap <abhishek.vit at gmail.com> wrote:
> Hey Guys
>
> I am trying to use itertools.izip_longest to read a large file in
> chunks based on the examples I was able to find on the web. However I
> am not able to understand the behaviour of the following python code.
> (contrived form of example)
>
> for x in itertools.izip_longest(*[iter([1,2,3])]*2):
>     print x
>
>
> ###output:
> (1, 2)
> (3, None)
>
>
> It gives me the right answer but I am not sure how it is doing it. I
> also referred to the itertools doc but could not comprehend much. In
> essence I am trying to understand the intracacies of the following
> documentation from the itertools package.
>
> "The left-to-right evaluation order of the iterables is guaranteed.
> This makes possible an idiom for clustering a data series into
> n-length groups using izip(*[iter(s)]*n)."
>
> How is *n able to group the data and the meaning of '*' in the
> beginning just after izip.

The '*n' part is to multiply the list so that it repeats. This works
for most sequence types in Python:

>>> a = [1,2,3]
>>> a * 2
[1, 2, 3, 1, 2, 3]

In this particular case we multiply a list containing only one item,
the iterator over s. This means that the new list contains the same
element twice:
>>> it = iter(a)
>>> [it]
[<listiterator object at 0x166c990>]
>>> [it] * 2
[<listiterator object at 0x166c990>, <listiterator object at 0x166c990>]

So if every element of the list is the same iterator, then we can call
next() on any of them to get the same values in the same order:
>>> d = [it]*2
>>> d
[<listiterator object at 0x166c990>, <listiterator object at 0x166c990>]
>>> next(d[1])
1
>>> next(d[0])
2
>>> next(d[0])
3
>>> next(d[0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> next(d[1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The * just after izip is for argument unpacking. This allows you to
call a function with arguments unpacked from a list:

>>> def f(x, y):
...     print('x is %s' % x)
...     print('y is %s' % y)
...
>>> f(1, 2)
x is 1
y is 2
>>> args = [1,2]
>>> f(args)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: f() takes exactly 2 arguments (1 given)
>>> f(*args)
x is 1
y is 2

So the original expression, izip(*[iter(s)]*2), is another way of writing

it = iter(s)
izip(it, it)

And izip(*[iter(s)]*10) is equivalent to

izip(it, it, it, it, it, it, it, it, it, it)

Obviously writing it out like this will get a bit unwieldy if we want
to do izip(*[iter(s)]*100) so the preferred method is
izip(*[iter(s)]*n) which also allows us to choose what value to give
for n without changing anything else in the code.


Oscar


More information about the Tutor mailing list