[Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

Sun Apr 8 21:43:38 EDT 2018

[Raymond]
> The Bayesian world view isn't much different except they would
> prefer "prior" instead of "initial" or "start" ;-)
>
>     my_changing_beliefs = accumulate(stream_of_new_evidence, bayes_rule, prior=what_i_used_to_think)
>
> Though the two analogies are cute, I'm not sure they tell us much.  In ...

They're not intended to.  They're just intended to shake people loose
from picturing nothing subtler than adding a list of 3 integers ;-)

> My own experience with actually using accumulations in algorithmic
> code falls neatly into two groups.  Many years ago, I used APL
> extensively in accounting work and my recollection is that a part
> of the convenience of "\+" was that the sequence length didn't change
> (so that the various data arrays could interoperate with one another).

Sure.

> My other common case for accumulate() is building cumulative
> probability distributions from probability mass functions (see the
> code for random.choice() for example, or typical code for a K-S test).

So, a question:  why wasn't itertools.accumulate() written to accept
iterables of _only_ numeric types?  Akin to `sum()`.  I gather from
one of Nick's messages that it was so restricted in 3.2.  Then why was
it generalized to allow any 2-argument function?

Given that it was, `sum()` is no longer particularly relevant:  the
closest thing by far is now `functools.reduce()`, which does support
an optional `initial` argument.  Which it really should, because it's
impossible for the implementation to guess a suitable starting value
for an arbitrary user-supplied dyadic function.

My example using accumulate() to generate list prefixes got snipped,
but same thing there:  it's impossible for that snippet to work unless
an empty list is supplied as the starting value.  And it's impossible
for the accumulate() implementation to guess that.

In short, for _general_ use `accumulate()` needs `initial` for exactly
the same reasons `reduce()` needed it.

BTW, the type signatures on the scanl (requires an initial value) and
scanl1 (does not support an initial value) implementations I pasted
from Haskell's Standard Prelude give a deeper reason:  without an
initial value, a list of values of type A can only produce another
list of values of type A via scanl1.  The dyadic function passed must
map As to As.  But with an initial value supplied of type B, scanl can
transform a list of values of type A to a list of values of type B.
While that may not have been obvious in the list prefix example I
gave, that was at work:  a list of As was transformed into a list _of_
lists of As.  That's impossible for scanl1 to do, but easy for scanl.

Or, in short, someone coming from a typed functional language
background sees all sorts of things that rarely (if ever) come up in
number-crunching languages.  Their sensibilities should count too -
although not much ;-)  They should get _some_ extra consideration in
this context, though, because `itertools` is one of the first things
they dig into when they give Python a try.

> For neither of those use case categories did I ever want an initial value

As above, in all your related experiences "0" was a suitable base
value, so you had no reason to care.

> and it would have been distracting to even had the option.

Distracting for how long?  One second or two? ;-)

> ...
> Because of this background, I was surprised to have the question ever
> come up at all (other than the symmetry argument that sum() has "start"
> so accumulate() must as well).

As above, the real parallel here is to reduce().  `sum()` became an
historical red herring when `accumulate()` was generalized.

With a different background, you may just as well have been surprised
if the question _hadn't_ come up.  For example, this is a standard
example in the Haskell world for how to define an infinite Fibonacci
sequence with the initial two values f0 and f1:

fibs = f0 : scanl (+) f1 fibs

The part from `scanl` onward would be spelled in Python as

    accumulate(fibs, initial=f1)

although it requires some trickery to get the recursive reference to
work (details on request, but I'm sure you already know how to do
that).

> When writing itertools.accumulate(), I started by looking to see what
> other languages had done.  Since accumulate() is primarily a
> numerical tool, I expected that the experience of numeric-centric
> languages would have something to teach us.  My reasoning was
> that if the need hadn't arisen for APL, R, Numpy, Matlab², or Mathematica,
> perhaps it really was just noise.

In the itertools context, I also would have looked hard at Haskell experience.

BTW, whoever wrote the current `accumulate()` docs also found a use
for `initial`, but hacked around it:

"""
First-order recurrence relations can be modeled by supplying the
initial value in the iterable and using only the accumulated total in
func argument:
"""

followed by:

>>> logistic_map = lambda x, _:  r * x * (1 - x)
>>> r = 3.8
>>> x0 = 0.4
>>> inputs = repeat(x0, 36)     # only the initial value is used
>>> [format(x, '.2f') for x in accumulate(inputs, logistic_map)]

That would be better on several counts to my eyes as:

    inputs = repeat(None, 35)  # no values actually used
    ... for x in accumulate(inputs, logistic_map, initial=x0)

In particular, filling inputs with `None` would lead to an exception
if the programmer screwed up and the input values actually _were_
being used.  I expect we'll both overlook that writing a generator
using the obvious loop would be a lot clearer regardless ;-)

> My views may be dated though.  Looking at the wheel sieve and collatz glide
> record finder, I see something new, a desire to work with lazy, potentially
> infinite accumulations (something that iterators do well but almost never
> arises in the world of fixed-length sequences or cumulative probability distributions).

Amazingly enough, those are both just doing integer running sums - all
the stuff above about catering to general types doesn't apply in these
cases.  `initial` isn't needed for _correctness_ in these cases, it
would just add convenience and some clarity.

> So I had been warming up to the idea, but got concerned that Nick
> could have had such a profoundly different idea about what the code
> should do.

He appeared to withdraw his objections after agreement _not_ to name
the argument "start" was reached.  Something about `sum()` also having
an optional argument named "start" caused confusion.

> That cooled my interest a bit, especially when thinking about two
> key questions, "Will it create more problems than it solves?"

Like?  It's an optional argument.  One brief example in the docs is
all it takes to understand what it does.

>>> list(accumulate([1, 2, 3]))
[11, 13, 16]
>>> list(accumulate([1, 2, 3], initial=10))
[10, 11, 13, 16]

> and "Will anyone actually use it?".

As above, the docs could change to use it.  And I bet the test suite
too.  How much more could you want from a feature?! ;-)