
[Raymond]
The Bayesian world view isn't much different except they would prefer "prior" instead of "initial" or "start" ;-)
my_changing_beliefs = accumulate(stream_of_new_evidence, bayes_rule, prior=what_i_used_to_think)
Though the two analogies are cute, I'm not sure they tell us much. In ...
They're not intended to. They're just intended to shake people loose from picturing nothing subtler than adding a list of 3 integers ;-)
My own experience with actually using accumulations in algorithmic code falls neatly into two groups. Many years ago, I used APL extensively in accounting work and my recollection is that a part of the convenience of "\+" was that the sequence length didn't change (so that the various data arrays could interoperate with one another).
Sure.
My other common case for accumulate() is building cumulative probability distributions from probability mass functions (see the code for random.choice() for example, or typical code for a K-S test).
So, a question: why wasn't itertools.accumulate() written to accept iterables of _only_ numeric types? Akin to `sum()`. I gather from one of Nick's messages that it was so restricted in 3.2. Then why was it generalized to allow any 2-argument function? Given that it was, `sum()` is no longer particularly relevant: the closest thing by far is now `functools.reduce()`, which does support an optional `initial` argument. Which it really should, because it's impossible for the implementation to guess a suitable starting value for an arbitrary user-supplied dyadic function. My example using accumulate() to generate list prefixes got snipped, but same thing there: it's impossible for that snippet to work unless an empty list is supplied as the starting value. And it's impossible for the accumulate() implementation to guess that. In short, for _general_ use `accumulate()` needs `initial` for exactly the same reasons `reduce()` needed it. BTW, the type signatures on the scanl (requires an initial value) and scanl1 (does not support an initial value) implementations I pasted from Haskell's Standard Prelude give a deeper reason: without an initial value, a list of values of type A can only produce another list of values of type A via scanl1. The dyadic function passed must map As to As. But with an initial value supplied of type B, scanl can transform a list of values of type A to a list of values of type B. While that may not have been obvious in the list prefix example I gave, that was at work: a list of As was transformed into a list _of_ lists of As. That's impossible for scanl1 to do, but easy for scanl. Or, in short, someone coming from a typed functional language background sees all sorts of things that rarely (if ever) come up in number-crunching languages. Their sensibilities should count too - although not much ;-) They should get _some_ extra consideration in this context, though, because `itertools` is one of the first things they dig into when they give Python a try.
For neither of those use case categories did I ever want an initial value
As above, in all your related experiences "0" was a suitable base value, so you had no reason to care.
and it would have been distracting to even had the option.
Distracting for how long? One second or two? ;-)
... Because of this background, I was surprised to have the question ever come up at all (other than the symmetry argument that sum() has "start" so accumulate() must as well).
As above, the real parallel here is to reduce(). `sum()` became an historical red herring when `accumulate()` was generalized. With a different background, you may just as well have been surprised if the question _hadn't_ come up. For example, this is a standard example in the Haskell world for how to define an infinite Fibonacci sequence with the initial two values f0 and f1: fibs = f0 : scanl (+) f1 fibs The part from `scanl` onward would be spelled in Python as accumulate(fibs, initial=f1) although it requires some trickery to get the recursive reference to work (details on request, but I'm sure you already know how to do that).
When writing itertools.accumulate(), I started by looking to see what other languages had done. Since accumulate() is primarily a numerical tool, I expected that the experience of numeric-centric languages would have something to teach us. My reasoning was that if the need hadn't arisen for APL, R, Numpy, Matlab², or Mathematica, perhaps it really was just noise.
In the itertools context, I also would have looked hard at Haskell experience. BTW, whoever wrote the current `accumulate()` docs also found a use for `initial`, but hacked around it: """ First-order recurrence relations can be modeled by supplying the initial value in the iterable and using only the accumulated total in func argument: """ followed by:
logistic_map = lambda x, _: r * x * (1 - x) r = 3.8 x0 = 0.4 inputs = repeat(x0, 36) # only the initial value is used [format(x, '.2f') for x in accumulate(inputs, logistic_map)]
That would be better on several counts to my eyes as: inputs = repeat(None, 35) # no values actually used ... for x in accumulate(inputs, logistic_map, initial=x0) In particular, filling inputs with `None` would lead to an exception if the programmer screwed up and the input values actually _were_ being used. I expect we'll both overlook that writing a generator using the obvious loop would be a lot clearer regardless ;-)
My views may be dated though. Looking at the wheel sieve and collatz glide record finder, I see something new, a desire to work with lazy, potentially infinite accumulations (something that iterators do well but almost never arises in the world of fixed-length sequences or cumulative probability distributions).
Amazingly enough, those are both just doing integer running sums - all the stuff above about catering to general types doesn't apply in these cases. `initial` isn't needed for _correctness_ in these cases, it would just add convenience and some clarity.
So I had been warming up to the idea, but got concerned that Nick could have had such a profoundly different idea about what the code should do.
He appeared to withdraw his objections after agreement _not_ to name the argument "start" was reached. Something about `sum()` also having an optional argument named "start" caused confusion.
That cooled my interest a bit, especially when thinking about two key questions, "Will it create more problems than it solves?"
Like? It's an optional argument. One brief example in the docs is all it takes to understand what it does.
list(accumulate([1, 2, 3])) [11, 13, 16] list(accumulate([1, 2, 3], initial=10)) [10, 11, 13, 16]
and "Will anyone actually use it?".
As above, the docs could change to use it. And I bet the test suite too. How much more could you want from a feature?! ;-)