[docs] sum( ) in generator bugged

Anselm Kiefner python at anselm.kiefner.de
Tue Dec 8 16:08:37 EST 2015


Hey Zach,

Thanks for your answer and explanation.
I just posted here following the advice on
https://docs.python.org/3.5/bugs.html ("If you’re short on time, you can
also email your bug report to docs at python.org. ‘docs@’ is a mailing list
run by volunteers; your request will be noticed, though it may take a
while to be processed.").

Now, I'm fairly aware how that code works and how to work around it, but
I must say I almost felt insulted by your claim that my "expectations
are bugged". Considering the Zen of Python ("There should be one - and
preferrably only one- obvious way to do it") and the Principle of least
astonishment I'd rather argue that my expectations on this one are
rather well within what could be considered normal.

When I, as a user, replace a list or a list comprehension with a
generator - that is simply by replacing [] with (), I would expect that
the items are now generated on the fly and not held in memory anymore
(that's the speed to memory tradeoff you were talking about) - but the
result of both should be the same, logically.
This is also how it works in most cases without any trouble.

Now, applying sum() on a list in general returns the same result as it
does when applied on a generator, and it works as expected inside the
list comprehension when applied on the list. So you see, the described
behaviour of sum() applied on a generator inside a list comprehension is
clearly an exception of the general behaviour.
Let me quote the Zen of Python again: Special cases aren't special
enough to break the rules.

Yes, surely there are ways to work around it, but I hope you agree now
that this is not a flaw in my expectations rather than in the code.

Thanks for your time and kind regards,
Anselm





Am 08.12.2015 um 18:56 schrieb Zachary Ware:
> Hi Anselm,
> 
> On Fri, Nov 27, 2015 at 6:51 AM, Anselm Kiefner
> <python at anselm.kiefner.de> wrote:
>> Hi,
>>
>> I just found this bug:
>>
>> Python 3.4.3+ (default, Oct 14 2015, 16:03:50)
>> [GCC 5.2.1 20151010] on linux
>>
>>>>> L = [1,2,3]
>>>>> L_g = (x for x in L)
>>>>> a = [x*sum(L) for x in L]
>>>>> b = (x*sum(L_g) for x in L_g)
>>>>> print(a, list(b))
>> [6, 12, 18] [5]
>>
>> whether b is a generator or not doesn't make a difference, it seems to
>> be a problem with sum() operating on L_g while L_g is consumed.
>> I stumbled over the problem first in ipython notebook running python
>> kernel 3.5.0, but couldn't find anything about it in the bugtracker.
> 
> First off, this is not the place to report bugs; that should be done
> on the bug tracker or reported to python-list at python.org (and someone
> there will either explain why it's not a bug, or make sure it's
> reported properly).  This list is for discussion about the
> documentation of Python rather than its inner workings (except where
> that affects the documentation :)).
> 
> However, this is not a bug in Python, but rather in your expectations.
> In `b = (x*sum(L_g) for x in L_g)` you're trying to loop over the same
> iterator multiple times, which actually works out such that `for x in
> L_g` consumes the first value (1), then `x*sum(L_g)` consumes the rest
> (2, 3), so `x*sum(L_g)` expands to `1*(2 + 3)`, which gives 5.  Next
> time around the loop, L_g is already exhausted, so the for loop ends.
> If you want to iterate over the same values multiple times, you need
> either a concrete list of the values, or to calculate the values
> separately on every iteration.  Either of the following will work the
> way you expected in the first place; performance will depend on your
> real use case; if you're concerned about memory use but have cycles to
> spare, go for the first; if you have memory but want shorter runtime,
> go for the second.
> 
> First:
>    >>> L = [1,2,3]
>    >>> def gen(lst):
>    ...     for x in lst:
>    ...         yield x
>    ...
>    >>> b = (x * sum(gen(L)) for x in gen(L))
>    >>> list(b)
>    [6, 12, 18]
> 
> Second:
>    >>> g_L = list(x for x in L)
>    >>> b2 = (x*sum(g_L) for x in g_L)
>    >>> list(b2)
>    [6, 12, 18]
> 
> Hope this helps,
> 


More information about the docs mailing list