[Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

Raymond Hettinger raymond.hettinger at gmail.com
Mon Apr 9 00:38:00 EDT 2018



> On Apr 8, 2018, at 6:43 PM, Tim Peters <tim.peters at gmail.com> wrote:
> 
>> My other common case for accumulate() is building cumulative
>> probability distributions from probability mass functions (see the
>> code for random.choice() for example, or typical code for a K-S test).
> 
> So, a question:  why wasn't itertools.accumulate() written to accept
> iterables of _only_ numeric types?  Akin to `sum()`.  I gather from
> one of Nick's messages that it was so restricted in 3.2.  Then why was
> it generalized to allow any 2-argument function?

Prior to 3.2, accumulate() was in the recipes section as pure Python code.  It had no particular restriction to numeric types.

I received a number of requests for accumulate() to be promoted to a real itertool (fast, tested, documented C code with a stable API).  I agreed and accumulate() was added to itertools in 3.2.  It worked with anything supporting __add__, including str, bytes, lists, and tuples.  More specifically, accumulate_next() called PyNumber_Add() without any particular type restriction.

Subsequently, I got requests to generalize accumulate() to support any arity-2 function (with operator.mul offered as the motivating example).  Given that there were user requests and there were ample precedents in other languages, I acquiesced despite having some reservations (if used with a lambda, the function call overhead might make accumulate() slower than a plain Python for-loop without the function call). So, that generalized API extension went into 3.3 and has remained unchanged ever since.

Afterwards, I was greeted with the sound of crickets.  Either it was nearly perfect or no one cared or both ;-)  

It remains one of the least used itertools.


> Given that it was, `sum()` is no longer particularly relevant:  the
> closest thing by far is now `functools.reduce()`, which does support
> an optional `initial` argument.  Which it really should, because it's
> impossible for the implementation to guess a suitable starting value
> for an arbitrary user-supplied dyadic function.
> 
> My example using accumulate() to generate list prefixes got snipped,
> but same thing there:  it's impossible for that snippet to work unless
> an empty list is supplied as the starting value.  And it's impossible
> for the accumulate() implementation to guess that.

Honestly, I couldn't immediately tell what this code was doing:

    list(accumulate([8, 4, "k"], lambda x, y: x + [y], first_result=[]))

This may be a case where a person would be better-off without accumulate() at all.


> In short, for _general_ use `accumulate()` needs `initial` for exactly
> the same reasons `reduce()` needed it.

The reduce() function had been much derided, so I've had it mentally filed in the anti-pattern category.  But yes, there may be wisdom there.


> BTW, the type signatures on the scanl (requires an initial value) and
> scanl1 (does not support an initial value) implementations I pasted
> from Haskell's Standard Prelude give a deeper reason:  without an
> initial value, a list of values of type A can only produce another
> list of values of type A via scanl1.  The dyadic function passed must
> map As to As.  But with an initial value supplied of type B, scanl can
> transform a list of values of type A to a list of values of type B.
> While that may not have been obvious in the list prefix example I
> gave, that was at work:  a list of As was transformed into a list _of_
> lists of As.  That's impossible for scanl1 to do, but easy for scanl.

Thanks for pointing that out.  I hadn't considered that someone might want to transform one type into another using accumulate().  That is pretty far from my mental model of what accumulate() was intended for.  Also, I'm still not sure whether we would want code like that buried in an accumulate() call rather than as a regular for-loop where I can see the logic and trace through it with pdb.

As for scanl, I'm not sure what this code means without seeing some python equivalent.

    scanl            :: (a -> b -> a) -> a -> [b] -> [a]
    scanl f q xs     =  q : (case xs of
                               []   -> []
                               x:xs -> scanl f (f q x) xs)


    scanl1           :: (a -> a -> a) -> [a] -> [a]
    scanl1 f (x:xs)  =  scanl f x xs
    scanl1 _ []      =  []


> Or, in short, someone coming from a typed functional language
> background sees all sorts of things that rarely (if ever) come up in
> number-crunching languages.  Their sensibilities should count too -
> although not much ;-)  They should get _some_ extra consideration in
> this context, though, because `itertools` is one of the first things
> they dig into when they give Python a try.

I concur.


>> and it would have been distracting to even had the option.
> 
> Distracting for how long?  One second or two? ;-)

Possibly forever.  In my experience, if a person initially frames a problem wrong (or perhaps in a hard to solve way), it can take them a long time to recover.  For example with discounted cash flows, people who think of the initial value as being special or distinct from the other cash flows will have a hard time adapting to problem variants like annuity due, balloon payments, internal rate of return, coupon stripping, valuing a transaction that takes place in the future, etc.

I don't want to overstate the case, but I do think a function signature that offers a "first_value" option is an invitation to treat the first value as being distinct from the rest of the data stream.


> With a different background, you may just as well have been surprised
> if the question _hadn't_ come up.  For example, this is a standard
> example in the Haskell world for how to define an infinite Fibonacci
> sequence with the initial two values f0 and f1:
> 
> fibs = f0 : scanl (+) f1 fibs
> 
> The part from `scanl` onward would be spelled in Python as
> 
>    accumulate(fibs, initial=f1)
> 
> although it requires some trickery to get the recursive reference to
> work (details on request, but I'm sure you already know how to do
> that).

Do we want the tool to encourage such trickery?

Don't get me wrong, I think it is cool that you could write such code, but could and should aren't always the same.


>> When writing itertools.accumulate(), I started by looking to see what
>> other languages had done.  Since accumulate() is primarily a
>> numerical tool, I expected that the experience of numeric-centric
>> languages would have something to teach us.  My reasoning was
>> that if the need hadn't arisen for APL, R, Numpy, Matlab², or Mathematica,
>> perhaps it really was just noise.
> 
> In the itertools context, I also would have looked hard at Haskell experience.


Haskell probably is a good source of inspiration, but I don't know the language and find its docs to be inscrutable.

So, I have to just trust when you say something like,
"""
Haskell has millions of functions ;-)  `mapAccumL` is a God-awful
mixture of Python's map(), reduce(), and accumulate() :-(  The
examples here should convince you it's nearly incomprehensible:
"""

In fact, yes, you've convinced me that there is an intelligibility issue ;-)


> That would be better on several counts to my eyes as:
> 
>    inputs = repeat(None, 35)  # no values actually used
>    ... for x in accumulate(inputs, logistic_map, initial=x0)
> 
> In particular, filling inputs with `None` would lead to an exception
> if the programmer screwed up and the input values actually _were_
> being used.  I expect we'll both overlook that writing a generator
> using the obvious loop would be a lot clearer regardless ;-)

The winks may reading your posts fun, but I really can't tell whether position is, "yes, let's do this because someone can do wild things with it", or "no, let's don't this because people would commit atrocities with it".


>> and "Will anyone actually use it?".
> 
> As above, the docs could change to use it.  And I bet the test suite
> too.  How much more could you want from a feature?! ;-)

I'm concerned that the total number of actual users will be exactly two (you and the writer of the wheel-sieve) and that you each would have used it exactly once in your life.  That's a pretty small user base for a standard library feature ;-)

Tim, if you could muster an honest to goodness, real +1, that would be good enough for me.  Otherwise, I'm back to -0 and prefer not to see Pythonistas writing the Haskell magics described in this thread.

If this does go forward, I would greatly prefer "start" rather than "first_value" or "initial".  

The conversation has been enjoyable (perhaps because the stakes are so low) and educational (I learn something new every time you post). 

I'll leave this will a few random thoughts on itertools that don't seem to fit anywhere else.

1) When itertools was created, they were one of easiest ways to get C-like performance without writing C.  However, when PyPy matured we got other ways to do it.  And in the world of PyPy, the plain python for-loops outperform their iterator chain equivalents, so we lost one motivate to use itertools.

2) While I personally like function chains operating on iterators, my consulting and teaching experience has convinced me that very few people think that way.  Accordingly, I almost never use compress, filterfalse, takewhile, dropwhile, etc.  As people started adopting PEP 279 generator expressions, interest in itertool style thinking seems to have waned.

Putting these two together has left me with a preference for itertools to only cover the simplest and most common cases, leaving the rest to be expressed as plain, everyday pure python.  (The combinatoric itertools are an exception because they are more algorithmically interesting).


Raymond









More information about the Python-ideas mailing list