[Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

Peter O'Connor peter.ed.oconnor at gmail.com
Mon Apr 9 23:55:55 EDT 2018


Ok, so it seems everyone's happy with adding an initial_value argument.

Now, I claim that while it should be an option, the initial value should
NOT be returned by default.  (i.e. the returned generator should by default
yield N elements, not N+1).

Example: suppose we're doing the toll booth thing, and we want to yield a
cumulative sum of tolls so far.  Suppose someone already made a
reasonable-looking generator yielding the cumulative sum of tolls for today:

def iter_cumsum_tolls_from_day(day, toll_amount_so_far):
    return accumulate(get_tolls_from_day(day, initial=toll_amount_so_far))

And now we want to make a way to get all tolls from the month.  One might
reasonably expect this to work:

def iter_cumsum_tolls_from_month(month, toll_amount_so_far):
    for day in month:
        for cumsum_tolls in iter_cumsum_tolls_from_day(day,
toll_amount_so_far = toll_amount_so_far):
            yield cumsum_tolls
        toll_amount_so_far = cumsum_tolls

But this would actually DUPLICATE the last toll of every day - it appears
both as the last element of the day's generator and as the first element of
the next day's generator.

This is why I think that there should be an additional "
include_initial_in_return=False" argument.  I do agree that it should be an
option to include the initial value (your "find tolls over time-span"
example shows why), but that if you want that you should have to show that
you thought about that by specifying "include_initial_in_return=True"





On Mon, Apr 9, 2018 at 10:30 PM, Tim Peters <tim.peters at gmail.com> wrote:

> [Tim]
> >> while we have N numbers, there are N+1 slice indices.  So
> >> accumulate(xs) doesn't quite work.  It needs to also have a 0 inserted
> >> as the first prefix sum (the empty prefix sum(xs[:0]).
> >>
> >> Which is exactly what a this_is_the_initial_value=0 argument would do
> >> for us.
>
> [Greg Ewing <greg.ewing at canterbury.ac.nz>]
> > In this case, yes. But that still doesn't mean it makes
> > sense to require the initial value to be passed *in* as
> > part of the input sequence.
> >
> > Maybe the best idea is for the initial value to be a
> > separate argument, but be returned as the first item in
> > the list.
>
> I'm not sure you've read all the messages in this thread, but that's
> exactly what's being proposed.  That. e.g., a new optional argument:
>
>     accumulate(xs, func, initial=S)
>
> act like the current
>
>      accumulate(chain([S], xs), func)
>
> Note that in neither case is the original `xs` modified in any way,
> and in both cases the first value generated is S.
>
> Note too that the proposal is exactly the way Haskell's `scanl` works
> (although `scanl` always requires specifying an initial value - while
> the related `scanl1` doesn't allow specifying one).
>
> And that's all been so since the thread's first message, in which
> Raymond gave a proposed implementation:
>
>         _sentinel = object()
>
>         def accumulate(iterable, func=operator.add, start=_sentinel):
>             it = iter(iterable)
>             if start is _sentinel:
>                 try:
>                     total = next(it)
>                 except StopIteration:
>                     return
>             else:
>                 total = start
>             yield total
>             for element in it:
>                 total = func(total, element)
>                 yield total
>
> > I can think of another example where this would make
> > sense. Suppose you have an initial bank balance and a
> > list of transactions, and you want to produce a statement
> > with a list of running balances.
> >
> > The initial balance and the list of transactions are
> > coming from different places, so the most natural way
> > to call it would be
> >
> >    result = accumulate(transactions, initial = initial_balance)
> >
> > If the initial value is returned as item 0, then the
> > result has the following properties:
> >
> >    result[0] is the balance brought forward
> >    result[-1] is the current balance
> >
> > and this remains true in the corner case where there are
> > no transactions.
>
> Indeed, something quite similar often applies when parallelizing
> search loops of the form:
>
>      for candidate in accumulate(chain([starting_value], cycle(deltas))):
>
> For a sequence that eventually becomes periodic in the sequence of
> deltas it cycles through, multiple processes can run independent
> searches starting at carefully chosen different starting values "far"
> apart.  In effect, they're each a "balance brought forward" pretending
> that previous chunks have already been done.
>
> Funny:  it's been weeks now since I wrote an accumulate() that
> _didn't_ want to specify a starting value - LOL ;-)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180409/a9ae2edd/attachment-0001.html>


More information about the Python-ideas mailing list