[Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

Tim Peters tim.peters at gmail.com
Tue Apr 10 01:32:26 EDT 2018


[Peter O'Connor <peter.ed.oconnor at gmail.com>]
> Ok, so it seems everyone's happy with adding an initial_value argument.

Heh - that's not clear to me ;-)


> Now, I claim that while it should be an option, the initial value should NOT
> be returned by default.  (i.e. the returned generator should by default
> yield N elements, not N+1).

-1 on that.

- It goes against prior art.  Haskell's scanl does return the initial
value, and nobody on the planet has devoted more quality thought to
how streams "should work" than those folks.  The Python itertoolz
package's accumulate already supports an optional `initial=` argument,
and also returns it when specified.  It requires truly compelling
arguments to go against prior art.

- It's "obvious" that the first value should be returned if specified.
The best evidence:  in this thread's first message, it was "so
obvious" to Raymond that the implementation he suggested did exactly
that.  I doubt it even occurred to him to question whether it should.
It didn't occur to me either, but my mind is arguably "polluted" by
significant prior exposure to functional languages.

- In all but one "real life" example so far (including the
slice-summer class I stumbled into today), the code _wanted_ the
initial value to be returned.  The sole exception was one of the three
instances in Will Ness's wheel sieve code, where he discarded the
unwanted (in that specific case) initial value via a plain

    next(wheel)

Which is telling:  it's easy to discard a value you don't want, but to
inject a value you _do_ want but don't get requires something like
reintroducing the

    chain([value_i_want], the_iterable_that_didn't_give_the_value_i_want)

trick the new optional argument is trying to get _away_ from.  Talk
about ironic ;-)


I would like to see a simple thing added to itertools to make dropping
unwanted values easier, though:

"""
drop(iterable, n=None)
Return an iterator whose next() method returns all but the first `n`
values from the iterable.  If specified, `n` must be an integer >= 0.
By default (`n`=None), the iterator is run to exhaustion.
"""

Then, e.g.,

- drop(it, 0) would effectively be a synonym for iter(it).

- drop((it, 1) would skip over the first value from the iterable.

- drop(it) would give "the one obvious way" to consume an iterator
completely (for some reason that's a semi-FAQ,, and is usually
answered by suggesting the excruciatingly obscure trick of feeding the
iterable to a 0-size collections.deque constructor)..

Of course Haskell has had `drop` all along, although not the "run to
exhaustion" part.


> Example: suppose we're doing the toll booth thing, and we want to yield a
> cumulative sum of tolls so far.  Suppose someone already made a
> reasonable-looking generator yielding the cumulative sum of tolls for today:
>
> def iter_cumsum_tolls_from_day(day, toll_amount_so_far):
>     return accumulate(get_tolls_from_day(day, initial=toll_amount_so_far))
>
> And now we want to make a way to get all tolls from the month.  One might
> reasonably expect this to work:
>
> def iter_cumsum_tolls_from_month(month, toll_amount_so_far):
>     for day in month:
>         for cumsum_tolls in iter_cumsum_tolls_from_day(day,
> toll_amount_so_far = toll_amount_so_far):
>             yield cumsum_tolls
>         toll_amount_so_far = cumsum_tolls
>
> But this would actually DUPLICATE the last toll of every day - it appears
> both as the last element of the day's generator and as the first element of
> the next day's generator.

I didn't really follow the details there, but the suggestion would be
the same regardless: drop the duplicates you don't want.

Note that making up an example in your head isn't nearly as persuasive
as "real life" code.  Code can be _contrived_ to "prove" anything.


> This is why I think that there should be an additional
> "include_initial_in_return=False" argument.  I do agree that it should be an
> option to include the initial value (your "find tolls over time-span"
> example shows why), but that if you want that you should have to show that
> you thought about that by specifying "include_initial_in_return=True"

It's generally poor human design to have a second optional argument
modify the behavior of yet another optional argument.  If the presence
of the latter can have two distinct modes of operation, then people
_will_ forget which one the default mode is, making code harder to
write and harder to read.

Since "return the value" is supported by all known prior art, and by
the bulk of "real life" Python codes known so far, "return the value"
should be the default.  But far better to make it the only mode rather
than _just_ the default mode.  Then there's nothing to be forgotten
:-)


More information about the Python-ideas mailing list