
[Tim]
Then why was [accumulate] generalized to allow any 2-argument function?
[Raymond]
Prior to 3.2, accumulate() was in the recipes section as pure Python code. It had no particular restriction to numeric types.
I received a number of requests for accumulate() to be promoted to a real itertool (fast, tested, documented C code with a stable API). I agreed and accumulate() was added to itertools in 3.2. It worked with anything supporting __add__, including str, bytes, lists, and tuples.
So that's the restriction Nick had in mind: a duck-typing kind, in that it would blow up on types that don't participate in PyNumber_Add():
More specifically, accumulate_next() called PyNumber_Add() without < any particular type restriction.
Subsequently, I got requests to generalize accumulate() to support any arity-2 function (with operator.mul offered as the motivating example).
Sucker ;-)
Given that there were user requests and there were ample precedents in other languages, I acquiesced despite having some reservations (if used with a lambda, the function call overhead might make accumulate() slower than a plain Python for-loop without the function call). So, that generalized API extension went into 3.3 and has remained unchanged ever since.
Afterwards, I was greeted with the sound of crickets. Either it was nearly perfect or no one cared or both ;-)
Or nobody cared _enough_ to endure a 100-message thread arguing about an objectively minor change ;-) If you can borrow Guido's time machine, I'd suggest going back to the original implementation, except name it `cumsum()` instead, and leave `accumulate()` to the 3rd-party itertools packages (at least one of which (itertoolz) has supported an optional "initial" argument all along).
It remains one of the least used itertools.
I don't see how that can be known. There are at least tens of thousands of Python programmers nobody on this list has ever heard about - or from - writing code that's invisible to search engines. I _believe_ it - I just don't see how it can be known.
... Honestly, I couldn't immediately tell what this code was doing:
list(accumulate([8, 4, "k"], lambda x, y: x + [y], first_result=[]))
Of course you couldn't: you think of accumulate() as being about running sums, and _maybe_ some oddball out there using it for running products. But that's a statement about your background, seeing code you've never seen before, not about the function. Nobody knows immediately, at first sight, what list(accumulate([8, 4, 6], lambda x, y: x + y, first_result=0)) does either. It's learned. If your background were in, e.g., Haskell instead, then in the latter case you'd picture a list [a, b, c, ...] and figure it out from thinking about what the prefixes of 0 + a + b + c + .... compute. In exactly the same way, in the former case you'd think about what the prefixes of [] + [a] + [b] + [c] + ... compute. They're equally obvious _after_ undertaking that easy exercise, but clear as mud before doing so.
This may be a case where a person would be better-off without accumulate() at all.
De gustibus non est disputandum.
In short, for _general_ use `accumulate()` needs `initial` for exactly the same reasons `reduce()` needed it.
The reduce() function had been much derided, so I've had it mentally filed in the anti-pattern category. But yes, there may be wisdom there.
The current accumulate() isn't just akin to reduce(), it _is_ reduce(), except a drunken reduce() so nauseated it vomits its internal state out after each new element it eats ;-)
BTW, the type signatures on the scanl (requires an initial value) and scanl1 (does not support an initial value) implementations I pasted from Haskell's Standard Prelude give a deeper reason: without an initial value, a list of values of type A can only produce another list of values of type A via scanl1. The dyadic function passed must map As to As. But with an initial value supplied of type B, scanl can transform a list of values of type A to a list of values of type B. While that may not have been obvious in the list prefix example I gave, that was at work: a list of As was transformed into a list _of_ lists of As. That's impossible for scanl1 to do, but easy for scanl.
Thanks for pointing that out. I hadn't considered that someone might want to transform one type into another using accumulate(). That is pretty far from my mental model of what accumulate() was intended for.
It's nevertheless what the current function supports - nothing being suggested changes that one whit. It's "worse" in Python because while only `scanl` in Haskell can "change types", the current `scanl1`-like Python `accumulate` can change types too. Perhaps the easiest way to see that is by noting that map(f, xs) is generally equivalent to accumulate(xs, lambda x, y: f(y)) right now. That is, just ignore the "accumulator" argument, and you're left with a completely arbitrary transformation of the elements. If you like, you can, e.g., use accumulate() right now to generate a a sequence of socket connections from a sequence of deques. That can't be done by Haskell's scanl1 because that language's strict static type system can't express the notion of that "well, given a list-of-A, the accumulation function f has arguments of types A and A on the first call, but types type-of-f(A) and A on subsequent calls. By supplying an initial value of type B, `scanl`'s accumulation function returns type B and always has arguments of type B and A. The type system has no problem with that.
Also, I'm still not sure whether we would want code like that buried in an accumulate() call rather than as a regular for-loop where I can see the logic and trace through it with pdb.
De gustibus non est disputandum again ;-) These kinds of arguments belong in a style guide, wiki, tutorial, or nagging Stackoverflow answer. They really - to me - have nothing to do with the issue at hand. Again, nothing above depends in any way on whether or not accumulate grows an optional argument. All the things you see as "bad" are already not just possible, but easy.
As for scanl, I'm not sure what this code means without seeing some python equivalent.
scanl :: (a -> b -> a) -> a -> [b] -> [a] scanl f q xs = q : (case xs of [] -> [] x:xs -> scanl f (f q x) xs)
scanl1 :: (a -> a -> a) -> [a] -> [a] scanl1 f (x:xs) = scanl f x xs scanl1 _ [] = []
My apologies for that - I assumed you had a reading knowledge of Haskell, and were just unaware of these specific functions. The details don't really matter. You can just take my word for that `scanl1` is a type-safe-restricted form of Python's current `accumulate`, and that the more basic `scanl` is like what `accumulate` would be if an initial value were a _required_ argument. BTW, Haskell has no native notion of optional (or default, or variable, or keyword) arguments. All functions are "curried": they have exactly one argument. So, e.g., while there's generally no harm in viewing the Haskell f x y as being like the Python f(x, y) it's _really_ more like functools.partial(f, x)(y) In any case, "all functions have exactly one argument" is why, in Haskell, even the tiniest variation in functionality is usually implemented by creating a function with a new name for each variation, rather than pile on ever-growing sequences of required flag arguments. I'll also note that `scanl` and `scanl1` are defined in the Standard Prelude, which is akin to being a Python builtin: they're part of the core language, always available. As such, every experienced Haskell programmer is aware of them.
and it would have been distracting to even had the option.
Distracting for how long? One second or two? ;-)
Possibly forever. In my experience, if a person initially frames a problem wrong (or perhaps in a hard to solve way), it can take them a long time to recover. For example with discounted cash flows, people who think of the initial value as being special or distinct from the other cash flows will have a hard time adapting to problem variants like annuity due, balloon payments, internal rate of return, coupon stripping, valuing a transaction that takes place in the future, etc.
I don't want to overstate the case, but I do think a function signature that offers a "first_value" option is an invitation to treat the first value as being distinct from the rest of the data stream.
For a _general_ accumulate, which we already have, _sometimes_ the first value really is distinct. Greg Ewing recently made that point eloquently, so I won't elaborate. But again, one example in the docs is all it takes:
list(accumulate([1, 2, 3])) [1, 3, 6] list(accumulate([1, 2, 3], initial=10)) [10, 11, 13, 16]
99% of programmers will see that, think "huh! why would I want initial=?" and never give it another thought. 0.001% of the remaining 1% will ask on Stackoverflow, and get an answer showing "advanced" uses for accumulate. The remaining 0.999% of the remaining 1% will eventually find one of those answers.
With a different background, you may just as well have been surprised if the question _hadn't_ come up. For example, this is a standard example in the Haskell world for how to define an infinite Fibonacci sequence with the initial two values f0 and f1:
fibs = f0 : scanl (+) f1 fibs
The part from `scanl` onward would be spelled in Python as
accumulate(fibs, initial=f1)
although it requires some trickery to get the recursive reference to work ...
Do we want the tool to encourage such trickery?
Don't get me wrong, I think it is cool that you could write such code, but could and should aren't always the same.
Sorry, the original point got lost in the weeds there: the point isn't that such code is desirable, it's just that Haskell programmers _are_ aware of scanl. and that one of its very-well-known toy uses exploits that it specifies an initial value. So _if_ you had that background too, it would be natural as death to wonder "huh - so how come Python's accumulate _doesn't_ have such an argument?".
... Haskell probably is a good source of inspiration, but I don't know the language and find its docs to be inscrutable.
That's what happens when a worldwide network of obsessed PhDs struggles for decades to push the state of the art ;-)
... That would be better on several counts to my eyes as:
inputs = repeat(None, 35) # no values actually used ... for x in accumulate(inputs, logistic_map, initial=x0)
In particular, filling inputs with `None` would lead to an exception if the programmer screwed up and the input values actually _were_ being used. I expect we'll both overlook that writing a generator using the obvious loop would be a lot clearer regardless ;-)
The winks may reading your posts fun, but I really can't tell whether position is, "yes, let's do this because someone can do wild things with it", or "no, let's don't this because people would commit atrocities with it".
I mean both, of course ;-) The world is absurd, and often all I can do is chuckle and accept that all options suck in their own endearing ways. Did you write those docs? If so, that's one of life's absurdities: half of the accumulate() docs are devoted to giving a weird example of an application that ignores the input sequence values entirely, taking from the input sequence _only_ its total length. I enjoyed the example, but I can't believe you'd _recommend_ it as good practice. It's just "a trick" you _can_ do, like (as above) you _can_ ignore the accumulator argument instead to make accumulate() work like map() instead.
and "Will anyone actually use it?".
As above, the docs could change to use it. And I bet the test suite too. How much more could you want from a feature?! ;-)
I'm concerned that the total number of actual users will be exactly two (you and the writer of the wheel-sieve) and that you each would have used it exactly once in your life. That's a pretty small user base for a standard library feature ;-)
You can't know that, though. It's not my only use, but it's the only use I wrote about because I had just done it the day before this thread started. I also have a prime-generating function inspired by Will Ness's. BTW, that's a beautiful thing in real life, but not primarily because of the wheel gimmicks: you don't have to specify an upper limit in advance (or ever), but it nevertheless consumes memory proportional to the number of primes less than the _square root_ of the largest prime you've generated so far. That's easy if you know the upper limit in advance, but not if you don't. Will's solution to _that_ part is clever, elegant, and eminently practical. In any case, Will is "a functional language" person, and didn't even know itertools existed. Janne Karila pointed it out to him, and Will was like a kid in a candy store. I expect (but don't know) _he_ wondered "huh - how come accumulate() doesn't have an initial value like scanl has?", but rather than gripe about it on a Python list created the "chain a singleton list" workaround instead. Regardless, I wouldn't be surprised a bit if Janne Karila also had similar code - or any number of other Python programmers we don't know about writing code we'll never see.
Tim, if you could muster an honest to goodness, real +1, that would be good enough for me.
On purely _technical_ grounds, given that accumulate() is already thoroughly general, I think adding an optional start value is not just a +1, but a no-brainer. I've still seen no technical argument against it, and several sound technical arguments in its favor have been made by several people. OTOH ... meh. There appears to be scant demand, and code churn also carries costs. I've already wormed around it, so it doesn't scratch any practical itch I still have. It's your job to worry about future generations ;-)
Otherwise, I'm back to -0 and prefer not to see Pythonistas writing the Haskell magics described in this thread.
Whether or not the argument is added is simply irrelevant to what's possible to do with accumulate() already - it just affects how transparently a special initial value can be supplied when one is desired. In all of the (few!) "real life" current uses you've seen, it would obviously make already-sane code simpler & clearer. Against that, worrying about masses of Pythonistas who _could_ make absurd (in Python) code a bit _less_ absurd just doesn't engage me.
If this does go forward, I would greatly prefer "start" rather than "first_value" or "initial".
Bike-shedding the name holds little interest for me. Against "start", Nick vehemently objects to that name. I _expected_ that you would too, because you've generally seen _no_ value to specifying an initial value for sums, and "start" is the name `sum()` gives to its optional starting value. In favor of "initial", the current accumulate() _is_ a generalization not of sum() but of reduce(), and: """ Help on built-in function reduce in module _functools: reduce(...) reduce(function, sequence[, initial]) -> value """ That is, the reduce docstring has used "initial" forever. And there's also that the itertoolz accumulate() already has the feature in question, and named its argument "initial".
The conversation has been enjoyable (perhaps because the stakes are so low) and educational (I learn something new every time you post).
Yup, I too prefer fighting to the death over things that don't matter ;-)
I'll leave this will a few random thoughts on itertools that don't seem to fit anywhere else.
1) When itertools was created, they were one of easiest ways to get C-like performance without writing C. However, when PyPy matured we got other ways to do it. And in the world of PyPy, the plain python for-loops outperform their iterator chain equivalents, so we lost one motivate to use itertools.
I should try PyPy again. I got out of the habit years ago, when I kept running out of RAM. I have a lot more RAM now ;-)
2) While I personally like function chains operating on iterators, my consulting and teaching experience has convinced me that very few people think that way.
Matches my less extensive experience.
Accordingly, I almost never use compress, filterfalse, takewhile, dropwhile, etc.
While I've never used any of those, except once each when they were new. I'm aware of them, but they just never seem to fit the problem at hand. So, just to keep things interesting, let's add an optional `start=` argument to _all_ itertools functions ;-)
As people started adopting PEP 279 generator expressions, interest in itertool style thinking seems to have waned.
Putting these two together has left me with a preference for itertools to only cover the simplest and most common cases, leaving the rest to be expressed as plain, everyday pure python. (The combinatoric itertools are an exception because they are more algorithmically interesting).
Besides all the combinatoric ones, these are the ones I'd sorely miss: count cycle repeat accumulate chain[.from_iterable] islice tee I'm surprised I left groupby() off that list! I always liked that one "in theory", but in practice - for whatever reasons - whenever I use it I usually end up rewriting the code, in such a way that groupby() no longer makes sense to use. Curious!