[Tutor] help with itertools.izip_longest

Abhishek Pratap abhishek.vit at gmail.com
Sat Mar 16 22:45:03 CET 2013


On Sat, Mar 16, 2013 at 2:32 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> On 16 March 2013 21:14, Abhishek Pratap <abhishek.vit at gmail.com> wrote:
>> Hey Guys
>>
>> I am trying to use itertools.izip_longest to read a large file in
>> chunks based on the examples I was able to find on the web. However I
>> am not able to understand the behaviour of the following python code.
>> (contrived form of example)
>>
>> for x in itertools.izip_longest(*[iter([1,2,3])]*2):
>>     print x
>>
>>
>> ###output:
>> (1, 2)
>> (3, None)
>>
>>
>> It gives me the right answer but I am not sure how it is doing it. I
>> also referred to the itertools doc but could not comprehend much. In
>> essence I am trying to understand the intracacies of the following
>> documentation from the itertools package.
>>
>> "The left-to-right evaluation order of the iterables is guaranteed.
>> This makes possible an idiom for clustering a data series into
>> n-length groups using izip(*[iter(s)]*n)."
>>
>> How is *n able to group the data and the meaning of '*' in the
>> beginning just after izip.
>
> The '*n' part is to multiply the list so that it repeats. This works
> for most sequence types in Python:
>
>>>> a = [1,2,3]
>>>> a * 2
> [1, 2, 3, 1, 2, 3]
>
> In this particular case we multiply a list containing only one item,
> the iterator over s. This means that the new list contains the same
> element twice:
>>>> it = iter(a)
>>>> [it]
> [<listiterator object at 0x166c990>]
>>>> [it] * 2
> [<listiterator object at 0x166c990>, <listiterator object at 0x166c990>]
>
> So if every element of the list is the same iterator, then we can call
> next() on any of them to get the same values in the same order:
>>>> d = [it]*2
>>>> d
> [<listiterator object at 0x166c990>, <listiterator object at 0x166c990>]
>>>> next(d[1])
> 1
>>>> next(d[0])
> 2
>>>> next(d[0])
> 3
>>>> next(d[0])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration
>>>> next(d[1])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration
>
> The * just after izip is for argument unpacking. This allows you to
> call a function with arguments unpacked from a list:
>
>>>> def f(x, y):
> ...     print('x is %s' % x)
> ...     print('y is %s' % y)
> ...
>>>> f(1, 2)
> x is 1
> y is 2
>>>> args = [1,2]
>>>> f(args)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: f() takes exactly 2 arguments (1 given)
>>>> f(*args)
> x is 1
> y is 2
>
> So the original expression, izip(*[iter(s)]*2), is another way of writing
>
> it = iter(s)
> izip(it, it)
>
> And izip(*[iter(s)]*10) is equivalent to
>
> izip(it, it, it, it, it, it, it, it, it, it)
>
> Obviously writing it out like this will get a bit unwieldy if we want
> to do izip(*[iter(s)]*100) so the preferred method is
> izip(*[iter(s)]*n) which also allows us to choose what value to give
> for n without changing anything else in the code.
>
>
> Oscar


Thanks a bunch Oscar. This is why I love this community. It is
absolutely clear now. It is funny I am getting the solution over the
mailing list while I am at pycon :)


best,
-Abhi


More information about the Tutor mailing list