Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]
On Friday, April 6, 2018 at 8:14:30 AM UTC-7, Guido van Rossum wrote: On Fri, Apr 6, 2018 at 7:47 AM, Peter O'Connor <peter.ed...@gmail.com> wrote:
So some more humble proposals would be:
1) An initializer to itertools.accumulate functools.reduce already has an initializer, I can't see any controversy to adding an initializer to itertools.accumulate
See if that's accepted in the bug tracker.
It did come-up once but was closed for a number reasons including lack of use cases. However, Peter's signal processing example does sound interesting, so we could re-open the discussion. For those who want to think through the pluses and minuses, I've put together a Q&A as food for thought (see below). Everybody's design instincts are different -- I'm curious what you all think think about the proposal. Raymond --------------------------------------------- Q. Can it be done? A. Yes, it wouldn't be hard. _sentinel = object() def accumulate(iterable, func=operator.add, start=_sentinel): it = iter(iterable) if start is _sentinel: try: total = next(it) except StopIteration: return else: total = start yield total for element in it: total = func(total, element) yield total Q. Do other languages do it? A. Numpy, no. R, no. APL, no. Mathematica, no. Haskell, yes. * http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.accumulate.h... * https://stat.ethz.ch/R-manual/R-devel/library/base/html/cumsum.html * http://microapl.com/apl/apl_concepts_chapter5.html \+ 1 2 3 4 5 1 3 6 10 15 * https://reference.wolfram.com/language/ref/Accumulate.html * https://www.haskell.org/hoogle/?hoogle=mapAccumL Q. How much work for a person to do it currently? A. Almost zero effort to write a simple helper function: myaccum = lambda it, func, start: accumulate(chain([start], it), func) Q. How common is the need? A. Rare. Q. Which would be better, a simple for-loop or a customized itertool? A. The itertool is shorter but more opaque (especially with respect to the argument order for the function call): result = [start] for x in iterable: y = func(result[-1], x) result.append(y) versus: result = list(accumulate(iterable, func, start=start)) Q. How readable is the proposed code? A. Look at the following code and ask yourself what it does: accumulate(range(4, 6), operator.mul, start=6) Now test your understanding: How many values are emitted? What is the first value emitted? Are the two sixes related? What is this code trying to accomplish? Q. Are there potential surprises or oddities? A. Is it readily apparent which of assertions will succeed? a1 = sum(range(10)) a2 = sum(range(10), 0) assert a1 == a2 a3 = functools.reduce(operator.add, range(10)) a4 = functools.reduce(operator.add, range(10), 0) assert a3 == a4 a4 = list(accumulate(range(10), operator.add)) a5 = list(accumulate(range(10), operator.add, start=0)) assert a5 == a6 Q. What did the Python 3.0 Whatsnew document have to say about reduce()? A. "Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable." Q. What would this look like in real code? A. We have almost no real-world examples, but here is one from a StackExchange post: def wsieve(): # wheel-sieve, by Will Ness. ideone.com/mqO25A->0hIE89 wh11 = [ 2,4,2,4,6,2,6,4,2,4,6,6, 2,6,4,2,6,4,6,8,4,2,4,2, 4,8,6,4,6,2,4,6,2,6,6,4, 2,4,6,2,6,4,2,4,2,10,2,10] cs = accumulate(cycle(wh11), start=11) yield( next( cs)) # cf. ideone.com/WFv4f ps = wsieve() # codereview.stackexchange.com/q/92365/9064 p = next(ps) # 11 psq = p*p # 121 D = dict( zip( accumulate(wh11, start=0), count(0))) # start from sieve = {} for c in cs: if c in sieve: wheel = sieve.pop(c) for m in wheel: if not m in sieve: break sieve[m] = wheel # sieve[143] = wheel@187 elif c < psq: yield c else: # (c==psq) # map (p*) (roll wh from p) = roll (wh*p) from (p*p) x = [p*d for d in wh11] i = D[ (p-11) % 210] wheel = accumulate(cycle(x[i:] + x[:i]), start=psq) p = next(ps) ; psq = p*p next(wheel) ; m = next(wheel) sieve[m] = wheel
[Raymond Hettinger <raymond.hettinger@gmail.com>]
... Q. How readable is the proposed code? A. Look at the following code and ask yourself what it does:
accumulate(range(4, 6), operator.mul, start=6)
Now test your understanding:
How many values are emitted?
3
What is the first value emitted?
6
Are the two sixes related?
No.
What is this code trying to accomplish?
It's quite obviously trying to bias the reader against the proposal by presenting a senseless example ;-) Assuming there's any real reason to write that code at all, a better question is whether it's more comprehensible than accumulate(itertools.chain([6], range(4, 6)), operator.mul)
... Q. What would this look like in real code? A. We have almost no real-world examples, but here is one from a StackExchange post:
def wsieve(): # wheel-sieve, by Will Ness. ideone.com/mqO25A->0hIE89 ...
By sheer coincidence, I happened to write another yesterday. This is from a program looking for the smallest integers that yield new records for Collatz sequence lengths. The details don't matter, except that - like Will Ness's wheel sieve code - it needs to generate an unbounded increasing sequence of integers with a periodic, but complicated, sequence of deltas, starting at a more-or-less arbitrary point. def buildtab(SHIFT, LIMIT): ... # Candidates are of the form i*LIMIT + j, for i >= 1 and j in # goodix. However, a new record can't be set for a number of # the form 3k+2: that's two steps after 2k+1, so the smaller # 2k+1 has a glide 2 longer. We want to arrange to never try # numbers of the form 3k+2 to begin with. base = 0 ix2 = [] for i in range(3): base += LIMIT for ix in goodix: num = base + ix if num % 3 != 2: ix2.append(num) ix2.append(ix2[0] + 3 * LIMIT) assert len(ix2) == 2 * len(goodix) + 1 del goodix deltas = tuple(ix2[i] - ix2[i-1] for i in range(1, len(ix2))) return tuple(result), ix2[0], deltas A note on "complicated": the tuple of deltas here can contain millions of integers, and that's the smallest length at which it becomes periodic. Later: def coll(SHIFT=24): ... from itertools import accumulate, chain, cycle ... LIMIT = 1 << SHIFT ... abc, first, deltas = buildtab(SHIFT, LIMIT) ... for num in accumulate(chain([first], cycle(deltas))): assert num % 3 != 2 As in Will's code, it would be more readable as: for num in accumulate(cycle(deltas), start=first): That says what it does pretty clearly, whereas deducing the behavior from "OK, it's chaining together a singleton list and a cycle, because ...?" is a bit of a head scratcher at first. That said, if the need came up often, as you noted it's dead easy to write a helper function to encapsulate the "head scratcher" part, and with no significant loss of efficiency. So I'd be -0 overall, _except_ that "chain together a singleton list and a cycle" is so obscure on the face of it than I'm not sure most programmers who wanted the functionality of `start=` would ever think of it. I'm not sure that I would have, except that I studied Ness's wheel sieve code a long time ago and the idea stuck. So that makes me +0.4.
On Apr 6, 2018, at 9:06 PM, Tim Peters <tim.peters@gmail.com> wrote:
What is this code trying to accomplish?
It's quite obviously trying to bias the reader against the proposal by presenting a senseless example ;-)
FWIW, the example was not from me. It was provided by the OP on the tracker. I changed the start point from 10 to a 6 so it at least made some sense as the continuation of a factorial sequence: 6 24 120
By sheer coincidence, I happened to write another yesterday. This is from a program looking for the smallest integers that yield new records for Collatz sequence lengths.
Nice. That brings the number of real-world examples up to a total of three (collatz, wheel sieve, and signal processing). Prior to today, that total was only one (which was found after much digging).
Later:
def coll(SHIFT=24): ... from itertools import accumulate, chain, cycle ... LIMIT = 1 << SHIFT ... abc, first, deltas = buildtab(SHIFT, LIMIT) ... for num in accumulate(chain([first], cycle(deltas))): assert num % 3 != 2
As in Will's code, it would be more readable as:
for num in accumulate(cycle(deltas), start=first):
That does read better. I am curious how you would have written it as a plain for-loop before accumulate() was added (part of the argument against reduce() was that a plain for-loop would be clearer 99% of the time).
That said, if the need came up often, as you noted it's dead easy to write a helper function to encapsulate the "head scratcher" part, and with no significant loss of efficiency.
So I'd be -0 overall, _except_ that "chain together a singleton list and a cycle" is so obscure on the face of it than I'm not sure most programmers who wanted the functionality of `start=` would ever think of it. I'm not sure that I would have, except that I studied Ness's wheel sieve code a long time ago and the idea stuck. So that makes me +0.4.
Agreed that the "chain([x], it)" step is obscure. That's a bit of a bummer -- one of the goals for the itertools module was to be a generic toolkit for chopping-up, modifying, and splicing iterator streams (sort of a CRISPR for iterators). The docs probably need another recipe to show this pattern: def prepend(value, iterator): "prepend(1, [2, 3, 4]) -> 1 2 3 4" return chain([value], iterator) Thanks for taking a look at the proposal. I was -0 when it came up once before. Once I saw a use case pop-up on this list, I thought it might be worth discussing again. Raymond
On 7 April 2018 at 08:44, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Agreed that the "chain([x], it)" step is obscure. That's a bit of a bummer -- one of the goals for the itertools module was to be a generic toolkit for chopping-up, modifying, and splicing iterator streams (sort of a CRISPR for iterators). The docs probably need another recipe to show this pattern:
def prepend(value, iterator): "prepend(1, [2, 3, 4]) -> 1 2 3 4" return chain([value], iterator)
Thanks for taking a look at the proposal. I was -0 when it came up once before. Once I saw a use case pop-up on this list, I thought it might be worth discussing again.
I don't have much to add here - I typically agree that an explicit loop is simpler, but my code tends not to be the sort that does this type of operation, so my experience is either where it's not appropriate, or where I'm unfamiliar with the algorithms, so terseness is more of a problem to me than it would be to a domain expert. Having said that, I find that the arguments that it's easy to add and it broadens the applicability of the function to be significant. Certainly, writing a helper is simple, but as Tim pointed out, the trick to writing that helper is obscure. Also, in the light of the itertools design goal to be a toolkit for iterators, I often find that the tools are just slightly *too* low level for my use case - they are designed to be combined, certainly, but in practice I find that building my own loop is often quicker than working out how to combine them. (I don't have concrete examples, unfortunately - this feeling comes from working back from the question of why I don't use itertools more than I do). So I tend to favour such slight extensions to the use cases of itertools functions. A recipe would help, but I don't know how much use the recipes see in practice. I see a lot of questions where "there's a recipe for that" is the answer - indicating that people don't always spot the recipes. So I guess I'm +0 on the change - but as a matter of principle, rather than from a pressing need for it, so don't take that vote too seriously. Paul
... [Tim]
Later:
def coll(SHIFT=24): ... from itertools import accumulate, chain, cycle ... LIMIT = 1 << SHIFT ... abc, first, deltas = buildtab(SHIFT, LIMIT) ... for num in accumulate(chain([first], cycle(deltas))): assert num % 3 != 2
As in Will's code, it would be more readable as:
for num in accumulate(cycle(deltas), start=first):
[Raymond]
That does read better. I am curious how you would have written it as a plain for-loop before accumulate() was added
The original loop was quite different, a nested loop pair reflecting directly that candidates are of the form i * LIMIT + j for i >= 1 and j in goodix: for base in itertools.count(LIMIT, LIMIT): for ix in goodix: num = base + ix if num % 3 == 2: continue It was later I noticed that, across every 3 full iterations of the outer loop, exactly one third of the "num % 3 == 2" tests were true. It took some thought & a bit of proof to show that all and only the num % 3 != 2 candidates could be generated directly by the shorter code. BTW, count(LIMIT, LIMIT) is a bit of a head-scratcher itself ;-) Without `accumulate()`, I suppose I would have done this instead: num = first for delta in chain([0], cycle(deltas)): num += delta That's worse to my eyes! The `chain()` trick is still needed, but in this case to inject a 0 delta at the start so that `num` remains `first` across the first iteration. I should note that this is "a search loop" that rarely finds what it's looking for. There are several places in the body that give up on the current `num` and want to move on to the next candidate. So it's of great pragmatic value that it be written in a way such that a plain `continue` in the body does the right thing. For that reason, I would _not_ have written it as, e.g., num = first for delta in cycle(deltas): # masses of tests that may want to give up # early, excruciatingly nested so that "give up" # falls through to the end ... num += delta
(part of the argument against reduce() was that a plain for-loop would be clearer 99% of the time).
Except that isn't true: 99% of `reduce()` instances were replaced by `sum()` when the latter was introduced :-) "Sum reduction" and "running-sum accumulation" are primitives in many peoples' brains. In generalizing those to other dyadic operations, it's the abstraction itself that's responsible for the loss of clarity - now you're building a higher-order functional that's not a primitive in anyone's brain except for Ken Iverson and Kirby Urner ;-) The rest of us are better off seeing the moving pieces in a loop body. But that's simply not so for addition, which is why introducing `sum()` was a great idea. BTW, note that `sum()` also supports an optional `start=` argument. I expect (but don't know) that `accumulate()` is overwhelmingly used to do running sums (the only use I've had for it), so it's a bit odd on that count that it doesn't.
... Agreed that the "chain([x], it)" step is obscure. That's a bit of a bummer -- one of the goals for the itertools module was to be a generic toolkit for chopping-up, modifying, and splicing iterator streams (sort of a CRISPR for iterators).
I'd say it was overwhelmingly successful at that goal. The rub here appears to be that `x` on its own is not a stream - it has to be wrapped inside an iterable first to play nice with stream-based tools. In a stream-based language (like Haskell), there's usually a "cons" operation built in to prepend a scalar to a stream (like `x : it` in Haskell is pretty much the same as `chain([x], it)`).
The docs probably need another recipe to show this pattern:
def prepend(value, iterator): "prepend(1, [2, 3, 4]) -> 1 2 3 4" return chain([value], iterator)
+1. Whether `accumulate()` should grow a `start=` argument still seems a distinct (albeit related) issue to me, though.
On 8 April 2018 at 08:09, Tim Peters <tim.peters@gmail.com> wrote: [Raymond wrote]:
The docs probably need another recipe to show this pattern:
def prepend(value, iterator): "prepend(1, [2, 3, 4]) -> 1 2 3 4" return chain([value], iterator)
+1. Whether `accumulate()` should grow a `start=` argument still seems a distinct (albeit related) issue to me, though.
I didn't have a strong opinion either way until Tim mentioned sum() and then I went and checked the docs for both that and for accumulate. First sentence of the sum() docs: Sums *start* and the items of an *iterable* from left to right and returns the total. First sentence of the accumulate docs: Make an iterator that returns accumulated sums, ... So I now think that having "start" as a parameter to one but not the other, counts as a genuine API discrepancy. Providing start to accumulate would then mean the same thing as providing it to sum(): it would change the basis point for the first addition operation, but it wouldn't change the *number* of cumulative sums produced. By contrast, using the prepend() approach with accumulate() not only changes the starting value, it also changes the number of cumulative sums produced. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
[Nick Coghlan <ncoghlan@gmail.com>]
I didn't have a strong opinion either way until Tim mentioned sum() and then I went and checked the docs for both that and for accumulate.
First sentence of the sum() docs:
Sums *start* and the items of an *iterable* from left to right and returns the total.
First sentence of the accumulate docs:
Make an iterator that returns accumulated sums, ...
So I now think that having "start" as a parameter to one but not the other, counts as a genuine API discrepancy.
Genuine but minor ;-)
Providing start to accumulate would then mean the same thing as providing it to sum(): it would change the basis point for the first addition operation, but it wouldn't change the *number* of cumulative sums produced.
That makes no sense to me. `sum()` with a `start` argument always returns a single result, even if the iterable is empty.
sum([], 42) 42
As the example shows, it's possible that `sum()` does no additions whatsoever. It would be exceedingly bizarre if the same stuff passed to `accumulate()` returned an empty iterator instead:
list(accumulate([], start=42)) []
It should return [42]. It seems obvious to me that a sane implementation would maintain the invariant: sum(xs, s) == list(accumulate(xs, start=s))[-1] and there's nothing inherently special about `xs` being empty. It seems also obviously desirable that accumulate(xs, start=s) generate the same results as accumulate(chain([s], xs)) That's obviously desirable because it's _so_ obvious that Raymond implicitly assumed that's how it would work in his first message ;-) Or think of it this way: if you're adding N numbers, there are N-1 additions, and N partial sums. Whether it's `sum(xs)` or `accumulate(xs)`, if len(xs)==K then specifying `start` too changes the number of addends from K to K+1.
By contrast, using the prepend() approach with accumulate() not only changes the starting value, it also changes the number of cumulative sums produced.
As it should :-) Note that that in the "real life" example code I gave, it was essential that `accumulate()` with `start` yield the starting value first. There were three uses in Will Ness's wheel sieve code, two of which wanted the starting value on its own, and the last of which didn't. In that last case, it was just a matter of doing next(wheel) on its own to discard the (in that specific case) unwanted starting value. If you have to paste the starting value _in_ instead (when it is wanted), then we're reintroducing a need for the "chain a singleton list with the iterator" hack introducing `start=` is trying to eliminate.
On 8 April 2018 at 13:17, Tim Peters <tim.peters@gmail.com> wrote:
[Nick Coghlan <ncoghlan@gmail.com>]
So I now think that having "start" as a parameter to one but not the other, counts as a genuine API discrepancy.
Genuine but minor ;-)
Agreed :)
Providing start to accumulate would then mean the same thing as providing it to sum(): it would change the basis point for the first addition operation, but it wouldn't change the *number* of cumulative sums produced.
That makes no sense to me. `sum()` with a `start` argument always returns a single result, even if the iterable is empty.
sum([], 42) 42
Right, but if itertools.accumulate() had the semantics of starting with a sum() over an empty iterable, then it would always start with an initial zero. It doesn't - it starts with "0+first_item", so the length of the output iterator matches the number of items in the input iterable: >>> list(accumulate([])) [] >>> list(accumulate([1, 2, 3, 4])) [1, 3, 6, 10] That matches the output you'd get from a naive O(n^2) implementation of cumulative sums: data = list(iterable) for stop in range(1, len(iterable)): yield sum(data[:stop]) So if the new parameter were to be called start, then I'd expect the semantics to be equivalent to: data = list(iterable) for stop in range(1, len(iterable)): yield sum(data[:stop], start=start) rather than the version Raymond posted at the top of the thread (where setting start explicitly also implicitly increases the number of items produced). That concern mostly goes away if the new parameter is deliberately called something *other than* "start" (e.g. "prepend=value", or "first=value"), but it could also be addressed by offering a dedicated "yield_start" toggle, such that the revised semantics were: def accumulate(iterable, func=operator.add, start=0, yield_start=False): it = iter(iterable) total = start if yield_start: yield total for element in it: total = func(total, element) yield total That approach would have the advantage of making the default value of "start" much easier to document (since it would just be zero, the same as it is for sum()), and only the length of the input iterable and "yield_start" would affect how many partial sums were produced. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Given that two respected members of the community so strongly disagree whether accumulate([], start=0) should behave like accumulate([]) or like accumulate([0]), maybe in the end it's better not to add a start argument. (The disagreement suggests that we can't trust users' intuition here.) On Sat, Apr 7, 2018 at 9:14 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 8 April 2018 at 13:17, Tim Peters <tim.peters@gmail.com> wrote:
[Nick Coghlan <ncoghlan@gmail.com>]
So I now think that having "start" as a parameter to one but not the other, counts as a genuine API discrepancy.
Genuine but minor ;-)
Agreed :)
Providing start to accumulate would then mean the same thing as providing it to sum(): it would change the basis point for the first addition operation, but it wouldn't change the *number* of cumulative sums produced.
That makes no sense to me. `sum()` with a `start` argument always returns a single result, even if the iterable is empty.
sum([], 42) 42
Right, but if itertools.accumulate() had the semantics of starting with a sum() over an empty iterable, then it would always start with an initial zero.
It doesn't - it starts with "0+first_item", so the length of the output iterator matches the number of items in the input iterable:
>>> list(accumulate([])) [] >>> list(accumulate([1, 2, 3, 4])) [1, 3, 6, 10]
That matches the output you'd get from a naive O(n^2) implementation of cumulative sums:
data = list(iterable) for stop in range(1, len(iterable)): yield sum(data[:stop])
So if the new parameter were to be called start, then I'd expect the semantics to be equivalent to:
data = list(iterable) for stop in range(1, len(iterable)): yield sum(data[:stop], start=start)
rather than the version Raymond posted at the top of the thread (where setting start explicitly also implicitly increases the number of items produced).
That concern mostly goes away if the new parameter is deliberately called something *other than* "start" (e.g. "prepend=value", or "first=value"), but it could also be addressed by offering a dedicated "yield_start" toggle, such that the revised semantics were:
def accumulate(iterable, func=operator.add, start=0, yield_start=False): it = iter(iterable) total = start if yield_start: yield total for element in it: total = func(total, element) yield total
That approach would have the advantage of making the default value of "start" much easier to document (since it would just be zero, the same as it is for sum()), and only the length of the input iterable and "yield_start" would affect how many partial sums were produced.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
Nick, sorry, but your arguments still make little sense to me. I think you're pushing an analogy between `sum()` details and `accumulate()` waaaaay too far, changing a simple idea into a needlessly complicated one. `accumulate()` can do anything at all it wants to do with a `start` argument (if it grows one), and a "default" of start=0 makes no sense: unlike `sum()`, `accumulate()` is not specifically for use with numeric values and may reject non-numeric types [from the `sum()` docs] `accumulate()` accepts any two-argument function.
itertools.accumulate([1, 2, 3], lambda x, y: str(x) + str(y)) <itertools.accumulate object at 0x0000028AB1B3B448> list(_) [1, '12', '123']
Arguing that it "has to do" something exactly the way `sum()` happens to be implemented just doesn't follow - not even if they happen to give the same name to an optional argument. If the function were named `accumulate_sum()`, and restricted to numeric types, maybe - but it's not. [Nick Coghlan <ncoghlan@gmail.com>]
... That concern mostly goes away if the new parameter is deliberately called something *other than* "start" (e.g. "prepend=value", or "first=value"), but it could also be addressed by offering a dedicated "yield_start" toggle, such that the revised semantics were:
def accumulate(iterable, func=operator.add, start=0, yield_start=False): it = iter(iterable) total = start if yield_start: yield total for element in it: total = func(total, element) yield total
That approach would have the advantage of making the default value of "start" much easier to document (since it would just be zero, the same as it is for sum()), and only the length of the input iterable and "yield_start" would affect how many partial sums were produced.
As above, start=0 is senseless for `accumulate` (despite that it makes sense for `sum`). Raymond gave the obvious implementation in his original message. If you reworked your implementation to accommodate that NO sensible default for `start` exists except for the one Raymond used (a unique object private to the implementation, so he knows for sure whether or not `start` was passed), you'd end up with his implementation ;-) `yield_start` looks like a nuisance in any case. As already explained, most uses want the `start` value if it's given, and in cases where it isn't it's trivial to discard by doing `next()` once on the result. Of course it could be added - but why bother?
On 8 April 2018 at 14:31, Guido van Rossum <guido@python.org> wrote:
Given that two respected members of the community so strongly disagree whether accumulate([], start=0) should behave like accumulate([]) or like accumulate([0]), maybe in the end it's better not to add a start argument. (The disagreement suggests that we can't trust users' intuition here.)
The potential ambiguity I see is created mainly by calling the proposed parameter "start", while having it do more than just adjust the individual partial sum calculations by adding an extra partial result to the output series. If it's called something else (e.g. "first_result"), then the potential "sum(iterable, start=start)" misinterpretation goes away, and it can have Tim's desired effect of defining the first output value (effectively prepending it to the input iterable, without the boilerplate and overhead of actually doing so). A name like "first_result" would also make it clearer to readers that passing that parameter has an impact on the length of the output series (since you're injecting an extra result), and also that the production of the first result skips calling func completely (as can be seen in Tim's str coercion example). So where I'd be -1 on: >>> list(accumulate(1, 2, 3)) [1, 3, 6] >>> list(accumulate(1, 2, 3, start=0)) [0, 1, 3, 6] >>> list(accumulate(1, 2, 3, start=1)) [1, 2, 4, 7] I'd be +1 on: >>> list(accumulate(1, 2, 3)) [1, 3, 6] >>> list(accumulate(1, 2, 3, first_result=0)) [0, 1, 3, 6] >>> list(accumulate(1, 2, 3, first_result=1)) [1, 2, 4, 7] Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 8 April 2018 at 15:00, Tim Peters <tim.peters@gmail.com> wrote:
`accumulate()` accepts any two-argument function.
itertools.accumulate([1, 2, 3], lambda x, y: str(x) + str(y)) <itertools.accumulate object at 0x0000028AB1B3B448> list(_) [1, '12', '123']
Arguing that it "has to do" something exactly the way `sum()` happens to be implemented just doesn't follow - not even if they happen to give the same name to an optional argument. If the function were named `accumulate_sum()`, and restricted to numeric types, maybe - but it's not.
When first added in 3.2, it did have that restriction, and the default behaviour without a function argument still parallels repeated use of the sum() builtin. Extending the parallel to *also* include a parameter called "start" would create the expectation for me that that parameter would adjust the partial result calculations, without adding an extra result. Other parameter names (like the "first_result" I suggested in my reply to Guido) wouldn't have the same implications for me, so this objection is specific to the use of "start" as the parameter name, not to the entire idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Top-posting just to say I agree with Nick's bottom line (changing the name to `first_result=`). I remain just +0.5, although that is up a notch from yesterday's +0.4 ;-) --- nothing new below --- On Sun, Apr 8, 2018 at 12:19 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 8 April 2018 at 14:31, Guido van Rossum <guido@python.org> wrote:
Given that two respected members of the community so strongly disagree whether accumulate([], start=0) should behave like accumulate([]) or like accumulate([0]), maybe in the end it's better not to add a start argument. (The disagreement suggests that we can't trust users' intuition here.)
The potential ambiguity I see is created mainly by calling the proposed parameter "start", while having it do more than just adjust the individual partial sum calculations by adding an extra partial result to the output series.
If it's called something else (e.g. "first_result"), then the potential "sum(iterable, start=start)" misinterpretation goes away, and it can have Tim's desired effect of defining the first output value (effectively prepending it to the input iterable, without the boilerplate and overhead of actually doing so).
A name like "first_result" would also make it clearer to readers that passing that parameter has an impact on the length of the output series (since you're injecting an extra result), and also that the production of the first result skips calling func completely (as can be seen in Tim's str coercion example).
So where I'd be -1 on:
>>> list(accumulate(1, 2, 3)) [1, 3, 6] >>> list(accumulate(1, 2, 3, start=0)) [0, 1, 3, 6] >>> list(accumulate(1, 2, 3, start=1)) [1, 2, 4, 7]
I'd be +1 on:
>>> list(accumulate(1, 2, 3)) [1, 3, 6] >>> list(accumulate(1, 2, 3, first_result=0)) [0, 1, 3, 6] >>> list(accumulate(1, 2, 3, first_result=1)) [1, 2, 4, 7]
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Just a nit here: [Tim]
... Arguing that it "has to do" something exactly the way `sum()` happens to be implemented just doesn't follow - not even if they happen to give the same name to an optional argument. If the function were named `accumulate_sum()`, and restricted to numeric types, maybe - but it's not.
[Nick]
When first added in 3.2, it did have that restriction, and the default behaviour without a function argument still parallels repeated use of the sum() builtin.
They're not quite the same today. For example, if you have an object `x` of a custom numeric class, sum([x]) invokes x.__radd__ once (it really does do `x.__radd__(0)`), but list(itertools.accumulate([x])) just returns [x] without invoking x.__add__ or x.__radd__ at all.
Extending the parallel to *also* include a parameter called "start" would create the expectation for me that that parameter would adjust the partial result calculations, without adding an extra result.
Other parameter names (like the "first_result" I suggested in my reply to Guido) wouldn't have the same implications for me, so this objection is specific to the use of "start" as the parameter name, not to the entire idea.
Yup! That's fine by me too - and preferable even ;-)
FYI: [Raymond]
... Q. Do other languages do it? A. Numpy, no. R, no. APL, no. Mathematica, no. Haskell, yes.
Haskell has millions of functions ;-) `mapAccumL` is a God-awful mixture of Python's map(), reduce(), and accumulate() :-( The examples here should convince you it's nearly incomprehensible: http://zvon.org/other/haskell/Outputlist/mapAccumL_f.html A more-than-less direct equivalent to Python's `accumulate` is Haskell's `scanl1`: http://zvon.org/comp/r/ref-Haskell.html#Functions~Prelude.scanl1 That doesn't allow specifying an initial value. But, Haskell being Haskell, the closely related `scanl` function requires specifying an initial value, which is also the first element of the list it returns: http://zvon.org/comp/r/ref-Haskell.html#Functions~Prelude.scanl Of the two, `scanl` is more basic - indeed, the Standard Prelude defines scanl1 by peeling off the first element of the list and passing it as the initial value for scanl to use scanl :: (a -> b -> a) -> a -> [b] -> [a] scanl f q xs = q : (case xs of [] -> [] x:xs -> scanl f (f q x) xs) scanl1 :: (a -> a -> a) -> [a] -> [a] scanl1 f (x:xs) = scanl f x xs scanl1 _ [] = [] There are also analogous `scanr` and `scanr1` functions for doing "right-to-left" accumulation. So, bottom line: as usual, when looking to Haskell for prior art,, "all of the above - for a start" applies ;-)
Well if you can get Raymond to agree on that too I suppose you can go ahead. Personally I'm -0 but I don't really write this kind of algorithmic code enough to know what's useful. I do think that the new parameter name is ugly. But maybe that's the point. On Sat, Apr 7, 2018 at 10:26 PM, Tim Peters <tim.peters@gmail.com> wrote:
Top-posting just to say I agree with Nick's bottom line (changing the name to `first_result=`). I remain just +0.5, although that is up a notch from yesterday's +0.4 ;-)
--- nothing new below ---
On 8 April 2018 at 14:31, Guido van Rossum <guido@python.org> wrote:
Given that two respected members of the community so strongly disagree whether accumulate([], start=0) should behave like accumulate([]) or
On Sun, Apr 8, 2018 at 12:19 AM, Nick Coghlan <ncoghlan@gmail.com> wrote: like
accumulate([0]), maybe in the end it's better not to add a start argument. (The disagreement suggests that we can't trust users' intuition here.)
The potential ambiguity I see is created mainly by calling the proposed parameter "start", while having it do more than just adjust the individual partial sum calculations by adding an extra partial result to the output series.
If it's called something else (e.g. "first_result"), then the potential "sum(iterable, start=start)" misinterpretation goes away, and it can have Tim's desired effect of defining the first output value (effectively prepending it to the input iterable, without the boilerplate and overhead of actually doing so).
A name like "first_result" would also make it clearer to readers that passing that parameter has an impact on the length of the output series (since you're injecting an extra result), and also that the production of the first result skips calling func completely (as can be seen in Tim's str coercion example).
So where I'd be -1 on:
>>> list(accumulate(1, 2, 3)) [1, 3, 6] >>> list(accumulate(1, 2, 3, start=0)) [0, 1, 3, 6] >>> list(accumulate(1, 2, 3, start=1)) [1, 2, 4, 7]
I'd be +1 on:
>>> list(accumulate(1, 2, 3)) [1, 3, 6] >>> list(accumulate(1, 2, 3, first_result=0)) [0, 1, 3, 6] >>> list(accumulate(1, 2, 3, first_result=1)) [1, 2, 4, 7]
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-- --Guido van Rossum (python.org/~guido)
Another bit of prior art: the Python itertoolz package also supplies `accumulate()`, with an optional `initial` argument. I stumbled into that when reading a Stackoverflow "how can I do Haskell's scanl in Python?" question. https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.accumulate On Fri, Apr 6, 2018 at 8:02 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Friday, April 6, 2018 at 8:14:30 AM UTC-7, Guido van Rossum wrote: On Fri, Apr 6, 2018 at 7:47 AM, Peter O'Connor <peter.ed...@gmail.com> wrote:
So some more humble proposals would be:
1) An initializer to itertools.accumulate functools.reduce already has an initializer, I can't see any controversy to adding an initializer to itertools.accumulate
See if that's accepted in the bug tracker.
It did come-up once but was closed for a number reasons including lack of use cases. However, Peter's signal processing example does sound interesting, so we could re-open the discussion.
For those who want to think through the pluses and minuses, I've put together a Q&A as food for thought (see below). Everybody's design instincts are different -- I'm curious what you all think think about the proposal.
Raymond
---------------------------------------------
Q. Can it be done? A. Yes, it wouldn't be hard.
_sentinel = object()
def accumulate(iterable, func=operator.add, start=_sentinel): it = iter(iterable) if start is _sentinel: try: total = next(it) except StopIteration: return else: total = start yield total for element in it: total = func(total, element) yield total
Q. Do other languages do it? A. Numpy, no. R, no. APL, no. Mathematica, no. Haskell, yes.
* http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.accumulate.h... * https://stat.ethz.ch/R-manual/R-devel/library/base/html/cumsum.html * http://microapl.com/apl/apl_concepts_chapter5.html \+ 1 2 3 4 5 1 3 6 10 15 * https://reference.wolfram.com/language/ref/Accumulate.html * https://www.haskell.org/hoogle/?hoogle=mapAccumL
Q. How much work for a person to do it currently? A. Almost zero effort to write a simple helper function:
myaccum = lambda it, func, start: accumulate(chain([start], it), func)
Q. How common is the need? A. Rare.
Q. Which would be better, a simple for-loop or a customized itertool? A. The itertool is shorter but more opaque (especially with respect to the argument order for the function call):
result = [start] for x in iterable: y = func(result[-1], x) result.append(y)
versus:
result = list(accumulate(iterable, func, start=start))
Q. How readable is the proposed code? A. Look at the following code and ask yourself what it does:
accumulate(range(4, 6), operator.mul, start=6)
Now test your understanding:
How many values are emitted? What is the first value emitted? Are the two sixes related? What is this code trying to accomplish?
Q. Are there potential surprises or oddities? A. Is it readily apparent which of assertions will succeed?
a1 = sum(range(10)) a2 = sum(range(10), 0) assert a1 == a2
a3 = functools.reduce(operator.add, range(10)) a4 = functools.reduce(operator.add, range(10), 0) assert a3 == a4
a4 = list(accumulate(range(10), operator.add)) a5 = list(accumulate(range(10), operator.add, start=0)) assert a5 == a6
Q. What did the Python 3.0 Whatsnew document have to say about reduce()? A. "Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable."
Q. What would this look like in real code? A. We have almost no real-world examples, but here is one from a StackExchange post:
def wsieve(): # wheel-sieve, by Will Ness. ideone.com/mqO25A->0hIE89 wh11 = [ 2,4,2,4,6,2,6,4,2,4,6,6, 2,6,4,2,6,4,6,8,4,2,4,2, 4,8,6,4,6,2,4,6,2,6,6,4, 2,4,6,2,6,4,2,4,2,10,2,10] cs = accumulate(cycle(wh11), start=11) yield( next( cs)) # cf. ideone.com/WFv4f ps = wsieve() # codereview.stackexchange.com/q/92365/9064 p = next(ps) # 11 psq = p*p # 121 D = dict( zip( accumulate(wh11, start=0), count(0))) # start from sieve = {} for c in cs: if c in sieve: wheel = sieve.pop(c) for m in wheel: if not m in sieve: break sieve[m] = wheel # sieve[143] = wheel@187 elif c < psq: yield c else: # (c==psq) # map (p*) (roll wh from p) = roll (wh*p) from (p*p) x = [p*d for d in wh11] i = D[ (p-11) % 210] wheel = accumulate(cycle(x[i:] + x[:i]), start=psq) p = next(ps) ; psq = p*p next(wheel) ; m = next(wheel) sieve[m] = wheel _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
2018-04-08 8:19 GMT+03:00 Nick Coghlan <ncoghlan@gmail.com>:
A name like "first_result" would also make it clearer to readers that passing that parameter has an impact on the length of the output series (since you're injecting an extra result), and also that the production of the first result skips calling func completely (as can be seen in Tim's str coercion example).
So where I'd be -1 on:
>>> list(accumulate(1, 2, 3)) [1, 3, 6] >>> list(accumulate(1, 2, 3, start=0)) [0, 1, 3, 6] >>> list(accumulate(1, 2, 3, start=1)) [1, 2, 4, 7]
I'd be +1 on:
>>> list(accumulate(1, 2, 3)) [1, 3, 6] >>> list(accumulate(1, 2, 3, first_result=0)) [0, 1, 3, 6] >>> list(accumulate(1, 2, 3, first_result=1)) [1, 2, 4, 7]
It is a fair point! But the usual way to understand how to use an additional argument, is to try it or to look examples in the documentation. Concerning the topic of relationship between `sum` and `accumulate` I have another point of view. If `start` means something to the `sum`, there are no grounds for believing that it should mean the same for `accumulate`, these functions are not really comparable and fundamentally different. The closest `sum`s friend is `functools.reduce` which uses `initial` instead of `start`. ( the documentation uses `initializer` and the docstring uses `initial`, as for me I prefer the `initial`) and so there is already a discrepancy. Having said this I think that it is no matter how it will be named `start` or `initial` or `first_result` or `first`, which is more suitable. I would prefer `initial` to be the same as in `itertoolz` package. Regarding the second point, should it yield one more element if provided - I think everyone here agrees that yes. With kind regards, -gdg
[Guido]
Well if you can get Raymond to agree on that too I suppose you can go ahead. Personally I'm -0 but I don't really write this kind of algorithmic code enough to know what's useful.
Actually, you do - but you don't _think_ of problems in these terms. Neither do I. For those who do: consider any program that has state and responds to inputs. When you get a new input, the new state is a function of the existing state and the input. That's `accumulate`! It generates a sequence of new states as the sequence of incrementally updated states derived from a sequence of inputs according to the function passed to accumulate. In that view of the world, specifying a starting value is merely specifying the program's initial state - and from that view of the world, not allowing to specify a starting value is bizarre. While Will Ness's wheel sieve program, and my Collatz glide-record-finder, don't _derive_ from that view of the world, it turns out that specifying (and returning) an initial value is also useful for them, and despite that they're just doing integer running sums. A simple example from the broader view of the world is generating all the prefixes of a list:
from itertools import * list(accumulate(chain([[]], [8, 4, "k"]), lambda x, y: x + [y])) [[], [8], [8, 4], [8, 4, 'k']]
That's obviously easier to follow if written, e.g., list(accumulate([8, 4, "k"], lambda x, y: x + [y], first_result=[]))
I do think that the new parameter name ["first_result"] is ugly. But maybe that's the point.
I noted later that the `accumulate` in the Python itertoolz package names its optional argument "initial". That's not ugly, and works just as well for me.
On Apr 8, 2018, at 12:22 PM, Tim Peters <tim.peters@gmail.com> wrote:
[Guido]
Well if you can get Raymond to agree on that too I suppose you can go ahead. Personally I'm -0 but I don't really write this kind of algorithmic code enough to know what's useful.
Actually, you do - but you don't _think_ of problems in these terms. Neither do I. For those who do: consider any program that has state and responds to inputs. When you get a new input, the new state is a function of the existing state and the input.
The Bayesian world view isn't much different except they would prefer "prior" instead of "initial" or "start" ;-) my_changing_beliefs = accumulate(stream_of_new_evidence, bayes_rule, prior=what_i_used_to_think) Though the two analogies are cute, I'm not sure they tell us much. In running programs or bayesian analysis, we care more about the result rather than the accumulation of intermediate results. My own experience with actually using accumulations in algorithmic code falls neatly into two groups. Many years ago, I used APL extensively in accounting work and my recollection is that a part of the convenience of "\+" was that the sequence length didn't change (so that the various data arrays could interoperate with one another). My other common case for accumulate() is building cumulative probability distributions from probability mass functions (see the code for random.choice() for example, or typical code for a K-S test). For neither of those use case categories did I ever want an initial value and it would have been distracting to even had the option. For example, when doing a discounted cash flow analysis, I was taught to model the various flows as a single sequence of up and down arrows rather than thinking of the initial balance as a distinct concept¹ Because of this background, I was surprised to have the question ever come up at all (other than the symmetry argument that sum() has "start" so accumulate() must as well). When writing itertools.accumulate(), I started by looking to see what other languages had done. Since accumulate() is primarily a numerical tool, I expected that the experience of numeric-centric languages would have something to teach us. My reasoning was that if the need hadn't arisen for APL, R, Numpy, Matlab², or Mathematica, perhaps it really was just noise. My views may be dated though. Looking at the wheel sieve and collatz glide record finder, I see something new, a desire to work with lazy, potentially infinite accumulations (something that iterators do well but almost never arises in the world of fixed-length sequences or cumulative probability distributions). So I had been warming up to the idea, but got concerned that Nick could have had such a profoundly different idea about what the code should do. That cooled my interest a bit, especially when thinking about two key questions, "Will it create more problems than it solves?" and "Will anyone actually use it?". Raymond ¹ http://www.chegg.com/homework-help/questions-and-answers/solve-present-worth... ² https://www.mathworks.com/help/matlab/ref/accumarray.html
Raymond Hettinger wrote:
For neither of those use case categories did I ever want an initial value and it would have been distracting to even had the option. For example, when doing a discounted cash flow analysis, I was taught to model the various flows as a single sequence of up and down arrows rather than thinking of the initial balance as a distinct concept¹
There's always an initial value, even if it's implicit. The way accumulate() works can be thought of in two ways: (1) The initial value is implicitly the identity of whatever operation the function performs. (2) The first item in the list is the initial value, and the rest are items to be accumulated. Both of these are somewhat dodgy, IMO. The first one works only if the assumed identity is what you actually want, *and* there is always at least one item to accumulate. If those conditions don't hold, you need to insert the initial value as the first item. But this is almost certainly going to require extra code. The initial value and the items are conceptually different things, and are unlikely to start out in the same list together. What's more, the first thing the implementation of accumulate() does is extract the first item and treat it differently from the rest. So your code goes out of its way to insert the initial value, and then accumulate() goes out of its way to pull it out again. Something smells wrong about that. As an example, suppose you have a list [1, 2, 3] and you want to construct [], [1], [1, 2], [1, 2, 3]. To do that with accumulate() you need to write something like: accumulate([[], 1, 2, 3], lambda x, y: x + [y]) The fact that the first element of the list doesn't even have the same *type* as the rest should be a strong hint that forcing them to occupy the same list is an unnatural thing to do. -- Greg
Because of this background, I was surprised to have the question ever come up at all (other than the symmetry argument that sum() has "start" so accumulate() must as well).
When writing itertools.accumulate(), I started by looking to see what other languages had done. Since accumulate() is primarily a numerical tool, I expected that the experience of numeric-centric languages would have something to teach us. My reasoning was that if the need hadn't arisen for APL, R, Numpy, Matlab², or Mathematica, perhaps it really was just noise.
My views may be dated though. Looking at the wheel sieve and collatz glide record finder, I see something new, a desire to work with lazy, potentially infinite accumulations (something that iterators do well but almost never arises in the world of fixed-length sequences or cumulative probability distributions).
So I had been warming up to the idea, but got concerned that Nick could have had such a profoundly different idea about what the code should do. That cooled my interest a bit, especially when thinking about two key questions, "Will it create more problems than it solves?" and "Will anyone actually use it?".
Raymond
¹ http://www.chegg.com/homework-help/questions-and-answers/solve-present-worth...
² https://www.mathworks.com/help/matlab/ref/accumarray.html _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Raymond]
The Bayesian world view isn't much different except they would prefer "prior" instead of "initial" or "start" ;-)
my_changing_beliefs = accumulate(stream_of_new_evidence, bayes_rule, prior=what_i_used_to_think)
Though the two analogies are cute, I'm not sure they tell us much. In ...
They're not intended to. They're just intended to shake people loose from picturing nothing subtler than adding a list of 3 integers ;-)
My own experience with actually using accumulations in algorithmic code falls neatly into two groups. Many years ago, I used APL extensively in accounting work and my recollection is that a part of the convenience of "\+" was that the sequence length didn't change (so that the various data arrays could interoperate with one another).
Sure.
My other common case for accumulate() is building cumulative probability distributions from probability mass functions (see the code for random.choice() for example, or typical code for a K-S test).
So, a question: why wasn't itertools.accumulate() written to accept iterables of _only_ numeric types? Akin to `sum()`. I gather from one of Nick's messages that it was so restricted in 3.2. Then why was it generalized to allow any 2-argument function? Given that it was, `sum()` is no longer particularly relevant: the closest thing by far is now `functools.reduce()`, which does support an optional `initial` argument. Which it really should, because it's impossible for the implementation to guess a suitable starting value for an arbitrary user-supplied dyadic function. My example using accumulate() to generate list prefixes got snipped, but same thing there: it's impossible for that snippet to work unless an empty list is supplied as the starting value. And it's impossible for the accumulate() implementation to guess that. In short, for _general_ use `accumulate()` needs `initial` for exactly the same reasons `reduce()` needed it. BTW, the type signatures on the scanl (requires an initial value) and scanl1 (does not support an initial value) implementations I pasted from Haskell's Standard Prelude give a deeper reason: without an initial value, a list of values of type A can only produce another list of values of type A via scanl1. The dyadic function passed must map As to As. But with an initial value supplied of type B, scanl can transform a list of values of type A to a list of values of type B. While that may not have been obvious in the list prefix example I gave, that was at work: a list of As was transformed into a list _of_ lists of As. That's impossible for scanl1 to do, but easy for scanl. Or, in short, someone coming from a typed functional language background sees all sorts of things that rarely (if ever) come up in number-crunching languages. Their sensibilities should count too - although not much ;-) They should get _some_ extra consideration in this context, though, because `itertools` is one of the first things they dig into when they give Python a try.
For neither of those use case categories did I ever want an initial value
As above, in all your related experiences "0" was a suitable base value, so you had no reason to care.
and it would have been distracting to even had the option.
Distracting for how long? One second or two? ;-)
... Because of this background, I was surprised to have the question ever come up at all (other than the symmetry argument that sum() has "start" so accumulate() must as well).
As above, the real parallel here is to reduce(). `sum()` became an historical red herring when `accumulate()` was generalized. With a different background, you may just as well have been surprised if the question _hadn't_ come up. For example, this is a standard example in the Haskell world for how to define an infinite Fibonacci sequence with the initial two values f0 and f1: fibs = f0 : scanl (+) f1 fibs The part from `scanl` onward would be spelled in Python as accumulate(fibs, initial=f1) although it requires some trickery to get the recursive reference to work (details on request, but I'm sure you already know how to do that).
When writing itertools.accumulate(), I started by looking to see what other languages had done. Since accumulate() is primarily a numerical tool, I expected that the experience of numeric-centric languages would have something to teach us. My reasoning was that if the need hadn't arisen for APL, R, Numpy, Matlab², or Mathematica, perhaps it really was just noise.
In the itertools context, I also would have looked hard at Haskell experience. BTW, whoever wrote the current `accumulate()` docs also found a use for `initial`, but hacked around it: """ First-order recurrence relations can be modeled by supplying the initial value in the iterable and using only the accumulated total in func argument: """ followed by:
logistic_map = lambda x, _: r * x * (1 - x) r = 3.8 x0 = 0.4 inputs = repeat(x0, 36) # only the initial value is used [format(x, '.2f') for x in accumulate(inputs, logistic_map)]
That would be better on several counts to my eyes as: inputs = repeat(None, 35) # no values actually used ... for x in accumulate(inputs, logistic_map, initial=x0) In particular, filling inputs with `None` would lead to an exception if the programmer screwed up and the input values actually _were_ being used. I expect we'll both overlook that writing a generator using the obvious loop would be a lot clearer regardless ;-)
My views may be dated though. Looking at the wheel sieve and collatz glide record finder, I see something new, a desire to work with lazy, potentially infinite accumulations (something that iterators do well but almost never arises in the world of fixed-length sequences or cumulative probability distributions).
Amazingly enough, those are both just doing integer running sums - all the stuff above about catering to general types doesn't apply in these cases. `initial` isn't needed for _correctness_ in these cases, it would just add convenience and some clarity.
So I had been warming up to the idea, but got concerned that Nick could have had such a profoundly different idea about what the code should do.
He appeared to withdraw his objections after agreement _not_ to name the argument "start" was reached. Something about `sum()` also having an optional argument named "start" caused confusion.
That cooled my interest a bit, especially when thinking about two key questions, "Will it create more problems than it solves?"
Like? It's an optional argument. One brief example in the docs is all it takes to understand what it does.
list(accumulate([1, 2, 3])) [11, 13, 16] list(accumulate([1, 2, 3], initial=10)) [10, 11, 13, 16]
and "Will anyone actually use it?".
As above, the docs could change to use it. And I bet the test suite too. How much more could you want from a feature?! ;-)
On Apr 8, 2018, at 6:43 PM, Tim Peters <tim.peters@gmail.com> wrote:
My other common case for accumulate() is building cumulative probability distributions from probability mass functions (see the code for random.choice() for example, or typical code for a K-S test).
So, a question: why wasn't itertools.accumulate() written to accept iterables of _only_ numeric types? Akin to `sum()`. I gather from one of Nick's messages that it was so restricted in 3.2. Then why was it generalized to allow any 2-argument function?
Prior to 3.2, accumulate() was in the recipes section as pure Python code. It had no particular restriction to numeric types. I received a number of requests for accumulate() to be promoted to a real itertool (fast, tested, documented C code with a stable API). I agreed and accumulate() was added to itertools in 3.2. It worked with anything supporting __add__, including str, bytes, lists, and tuples. More specifically, accumulate_next() called PyNumber_Add() without any particular type restriction. Subsequently, I got requests to generalize accumulate() to support any arity-2 function (with operator.mul offered as the motivating example). Given that there were user requests and there were ample precedents in other languages, I acquiesced despite having some reservations (if used with a lambda, the function call overhead might make accumulate() slower than a plain Python for-loop without the function call). So, that generalized API extension went into 3.3 and has remained unchanged ever since. Afterwards, I was greeted with the sound of crickets. Either it was nearly perfect or no one cared or both ;-) It remains one of the least used itertools.
Given that it was, `sum()` is no longer particularly relevant: the closest thing by far is now `functools.reduce()`, which does support an optional `initial` argument. Which it really should, because it's impossible for the implementation to guess a suitable starting value for an arbitrary user-supplied dyadic function.
My example using accumulate() to generate list prefixes got snipped, but same thing there: it's impossible for that snippet to work unless an empty list is supplied as the starting value. And it's impossible for the accumulate() implementation to guess that.
Honestly, I couldn't immediately tell what this code was doing: list(accumulate([8, 4, "k"], lambda x, y: x + [y], first_result=[])) This may be a case where a person would be better-off without accumulate() at all.
In short, for _general_ use `accumulate()` needs `initial` for exactly the same reasons `reduce()` needed it.
The reduce() function had been much derided, so I've had it mentally filed in the anti-pattern category. But yes, there may be wisdom there.
BTW, the type signatures on the scanl (requires an initial value) and scanl1 (does not support an initial value) implementations I pasted from Haskell's Standard Prelude give a deeper reason: without an initial value, a list of values of type A can only produce another list of values of type A via scanl1. The dyadic function passed must map As to As. But with an initial value supplied of type B, scanl can transform a list of values of type A to a list of values of type B. While that may not have been obvious in the list prefix example I gave, that was at work: a list of As was transformed into a list _of_ lists of As. That's impossible for scanl1 to do, but easy for scanl.
Thanks for pointing that out. I hadn't considered that someone might want to transform one type into another using accumulate(). That is pretty far from my mental model of what accumulate() was intended for. Also, I'm still not sure whether we would want code like that buried in an accumulate() call rather than as a regular for-loop where I can see the logic and trace through it with pdb. As for scanl, I'm not sure what this code means without seeing some python equivalent. scanl :: (a -> b -> a) -> a -> [b] -> [a] scanl f q xs = q : (case xs of [] -> [] x:xs -> scanl f (f q x) xs) scanl1 :: (a -> a -> a) -> [a] -> [a] scanl1 f (x:xs) = scanl f x xs scanl1 _ [] = []
Or, in short, someone coming from a typed functional language background sees all sorts of things that rarely (if ever) come up in number-crunching languages. Their sensibilities should count too - although not much ;-) They should get _some_ extra consideration in this context, though, because `itertools` is one of the first things they dig into when they give Python a try.
I concur.
and it would have been distracting to even had the option.
Distracting for how long? One second or two? ;-)
Possibly forever. In my experience, if a person initially frames a problem wrong (or perhaps in a hard to solve way), it can take them a long time to recover. For example with discounted cash flows, people who think of the initial value as being special or distinct from the other cash flows will have a hard time adapting to problem variants like annuity due, balloon payments, internal rate of return, coupon stripping, valuing a transaction that takes place in the future, etc. I don't want to overstate the case, but I do think a function signature that offers a "first_value" option is an invitation to treat the first value as being distinct from the rest of the data stream.
With a different background, you may just as well have been surprised if the question _hadn't_ come up. For example, this is a standard example in the Haskell world for how to define an infinite Fibonacci sequence with the initial two values f0 and f1:
fibs = f0 : scanl (+) f1 fibs
The part from `scanl` onward would be spelled in Python as
accumulate(fibs, initial=f1)
although it requires some trickery to get the recursive reference to work (details on request, but I'm sure you already know how to do that).
Do we want the tool to encourage such trickery? Don't get me wrong, I think it is cool that you could write such code, but could and should aren't always the same.
When writing itertools.accumulate(), I started by looking to see what other languages had done. Since accumulate() is primarily a numerical tool, I expected that the experience of numeric-centric languages would have something to teach us. My reasoning was that if the need hadn't arisen for APL, R, Numpy, Matlab², or Mathematica, perhaps it really was just noise.
In the itertools context, I also would have looked hard at Haskell experience.
Haskell probably is a good source of inspiration, but I don't know the language and find its docs to be inscrutable. So, I have to just trust when you say something like, """ Haskell has millions of functions ;-) `mapAccumL` is a God-awful mixture of Python's map(), reduce(), and accumulate() :-( The examples here should convince you it's nearly incomprehensible: """ In fact, yes, you've convinced me that there is an intelligibility issue ;-)
That would be better on several counts to my eyes as:
inputs = repeat(None, 35) # no values actually used ... for x in accumulate(inputs, logistic_map, initial=x0)
In particular, filling inputs with `None` would lead to an exception if the programmer screwed up and the input values actually _were_ being used. I expect we'll both overlook that writing a generator using the obvious loop would be a lot clearer regardless ;-)
The winks may reading your posts fun, but I really can't tell whether position is, "yes, let's do this because someone can do wild things with it", or "no, let's don't this because people would commit atrocities with it".
and "Will anyone actually use it?".
As above, the docs could change to use it. And I bet the test suite too. How much more could you want from a feature?! ;-)
I'm concerned that the total number of actual users will be exactly two (you and the writer of the wheel-sieve) and that you each would have used it exactly once in your life. That's a pretty small user base for a standard library feature ;-) Tim, if you could muster an honest to goodness, real +1, that would be good enough for me. Otherwise, I'm back to -0 and prefer not to see Pythonistas writing the Haskell magics described in this thread. If this does go forward, I would greatly prefer "start" rather than "first_value" or "initial". The conversation has been enjoyable (perhaps because the stakes are so low) and educational (I learn something new every time you post). I'll leave this will a few random thoughts on itertools that don't seem to fit anywhere else. 1) When itertools was created, they were one of easiest ways to get C-like performance without writing C. However, when PyPy matured we got other ways to do it. And in the world of PyPy, the plain python for-loops outperform their iterator chain equivalents, so we lost one motivate to use itertools. 2) While I personally like function chains operating on iterators, my consulting and teaching experience has convinced me that very few people think that way. Accordingly, I almost never use compress, filterfalse, takewhile, dropwhile, etc. As people started adopting PEP 279 generator expressions, interest in itertool style thinking seems to have waned. Putting these two together has left me with a preference for itertools to only cover the simplest and most common cases, leaving the rest to be expressed as plain, everyday pure python. (The combinatoric itertools are an exception because they are more algorithmically interesting). Raymond
Raymond Hettinger wrote:
I don't want to overstate the case, but I do think a function signature that offers a "first_value" option is an invitation to treat the first value as being distinct from the rest of the data stream.
I conjecture that the initial value is *always* special, and the only cases where it seems not to be are where you're relying on some implicit initial value such as zero. -- Greg
On 9 April 2018 at 14:38, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Apr 8, 2018, at 6:43 PM, Tim Peters <tim.peters@gmail.com> wrote: In short, for _general_ use `accumulate()` needs `initial` for exactly the same reasons `reduce()` needed it.
The reduce() function had been much derided, so I've had it mentally filed in the anti-pattern category. But yes, there may be wisdom there.
Weirdly (or perhaps not so weirdly, given my tendency to model computational concepts procedurally), I find the operation of reduce() easier to understand when it's framed as "last(accumulate(iterable, binop, initial=value)))". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
It seems clear that the name "accumulate" has been kind of antiquated since the "func" argument was added and "sum" became just a default. And people seem to disagree about whether the result should have a length N or length N+1 (where N is the number of elements in the input iterable). The behaviour where the first element of the return is the same as the first element of the input can be weird and confusing. E.g. compare:
list(itertools.accumulate([2, 3, 4], lambda accum, val: accum-val)) [2, -1, -5] list(itertools.accumulate([2, 3, 4], lambda accum, val: val-accum)) [2, 1, 3]
One might expect that since the second function returned the negative of the first function, and both are linear, that the results of the second would be the negative of the first, but that is not the case. Maybe we can instead let "accumulate" fall into deprecation, and instead add a new more general itertools "reducemap" method: def reducemap(iterable: Iterable[Any], func: Callable[(Any, Any), Any], initial: Any, include_initial_in_return=False): -> Generator[Any] Benefits: - The name is more descriptive of the operation (a reduce operation where we keep values at each step, like a map) - The existence of include_initial_in_return=False makes it somewhat clear that the initial value will by default NOT be provided in the returning generator - The mandatory initial argument forces you to think about initial conditions. Disadvantages: - The most common use case (summation, product), has a "natural" first element (0, and 1, respectively) when you'd now be required to write out. (but we could just leave accumulate for sum). I still prefer a built-in language comprehension syntax for this like: (y := f(y, x) for x in x_vals from y=0), but for a huge discussion on that see the other thread. ------- More Examples (using "accumulate" as the name for now) ------- # Kalman filters def kalman_filter_update(state, measurement): ... return state online_trajectory_estimate = accumulate(measurement_generator, func= kalman_filter_update, initial = initial_state) --- # Bayesian stats def update_model(prior, evidence): ... return posterior model_history = accumulate(evidence_generator, func=update_model, initial = prior_distribution) --- # Recurrent Neural networks: def recurrent_network_layer_step(last_hidden, current_input): new_hidden = .... return new_hidden hidden_state_generator = accumulate(input_sequence, func= recurrent_network_layer_step, initial = initial_hidden_state) On Mon, Apr 9, 2018 at 7:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 9 April 2018 at 14:38, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Apr 8, 2018, at 6:43 PM, Tim Peters <tim.peters@gmail.com> wrote: In short, for _general_ use `accumulate()` needs `initial` for exactly the same reasons `reduce()` needed it.
The reduce() function had been much derided, so I've had it mentally filed in the anti-pattern category. But yes, there may be wisdom there.
Weirdly (or perhaps not so weirdly, given my tendency to model computational concepts procedurally), I find the operation of reduce() easier to understand when it's framed as "last(accumulate(iterable, binop, initial=value)))".
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Also Tim Peter's one-line example of: print(list(itertools.accumulate([1, 2, 3], lambda x, y: str(x) + str(y)))) I think makes it clear that itertools.accumulate is not the right vehicle for this change - we should make a new itertools function with a required "initial" argument. On Mon, Apr 9, 2018 at 1:44 PM, Peter O'Connor <peter.ed.oconnor@gmail.com> wrote:
It seems clear that the name "accumulate" has been kind of antiquated since the "func" argument was added and "sum" became just a default.
And people seem to disagree about whether the result should have a length N or length N+1 (where N is the number of elements in the input iterable).
The behaviour where the first element of the return is the same as the first element of the input can be weird and confusing. E.g. compare:
list(itertools.accumulate([2, 3, 4], lambda accum, val: accum-val)) [2, -1, -5] list(itertools.accumulate([2, 3, 4], lambda accum, val: val-accum)) [2, 1, 3]
One might expect that since the second function returned the negative of the first function, and both are linear, that the results of the second would be the negative of the first, but that is not the case.
Maybe we can instead let "accumulate" fall into deprecation, and instead add a new more general itertools "reducemap" method:
def reducemap(iterable: Iterable[Any], func: Callable[(Any, Any), Any], initial: Any, include_initial_in_return=False): -> Generator[Any]
Benefits: - The name is more descriptive of the operation (a reduce operation where we keep values at each step, like a map) - The existence of include_initial_in_return=False makes it somewhat clear that the initial value will by default NOT be provided in the returning generator - The mandatory initial argument forces you to think about initial conditions.
Disadvantages: - The most common use case (summation, product), has a "natural" first element (0, and 1, respectively) when you'd now be required to write out. (but we could just leave accumulate for sum).
I still prefer a built-in language comprehension syntax for this like: (y := f(y, x) for x in x_vals from y=0), but for a huge discussion on that see the other thread.
------- More Examples (using "accumulate" as the name for now) -------
# Kalman filters def kalman_filter_update(state, measurement): ... return state
online_trajectory_estimate = accumulate(measurement_generator, func= kalman_filter_update, initial = initial_state)
---
# Bayesian stats def update_model(prior, evidence): ... return posterior
model_history = accumulate(evidence_generator, func=update_model, initial = prior_distribution)
---
# Recurrent Neural networks: def recurrent_network_layer_step(last_hidden, current_input): new_hidden = .... return new_hidden
hidden_state_generator = accumulate(input_sequence, func= recurrent_network_layer_step, initial = initial_hidden_state)
On Mon, Apr 9, 2018 at 7:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 9 April 2018 at 14:38, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Apr 8, 2018, at 6:43 PM, Tim Peters <tim.peters@gmail.com> wrote: In short, for _general_ use `accumulate()` needs `initial` for exactly the same reasons `reduce()` needed it.
The reduce() function had been much derided, so I've had it mentally filed in the anti-pattern category. But yes, there may be wisdom there.
Weirdly (or perhaps not so weirdly, given my tendency to model computational concepts procedurally), I find the operation of reduce() easier to understand when it's framed as "last(accumulate(iterable, binop, initial=value)))".
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Tim]
Then why was [accumulate] generalized to allow any 2-argument function?
[Raymond]
Prior to 3.2, accumulate() was in the recipes section as pure Python code. It had no particular restriction to numeric types.
I received a number of requests for accumulate() to be promoted to a real itertool (fast, tested, documented C code with a stable API). I agreed and accumulate() was added to itertools in 3.2. It worked with anything supporting __add__, including str, bytes, lists, and tuples.
So that's the restriction Nick had in mind: a duck-typing kind, in that it would blow up on types that don't participate in PyNumber_Add():
More specifically, accumulate_next() called PyNumber_Add() without < any particular type restriction.
Subsequently, I got requests to generalize accumulate() to support any arity-2 function (with operator.mul offered as the motivating example).
Sucker ;-)
Given that there were user requests and there were ample precedents in other languages, I acquiesced despite having some reservations (if used with a lambda, the function call overhead might make accumulate() slower than a plain Python for-loop without the function call). So, that generalized API extension went into 3.3 and has remained unchanged ever since.
Afterwards, I was greeted with the sound of crickets. Either it was nearly perfect or no one cared or both ;-)
Or nobody cared _enough_ to endure a 100-message thread arguing about an objectively minor change ;-) If you can borrow Guido's time machine, I'd suggest going back to the original implementation, except name it `cumsum()` instead, and leave `accumulate()` to the 3rd-party itertools packages (at least one of which (itertoolz) has supported an optional "initial" argument all along).
It remains one of the least used itertools.
I don't see how that can be known. There are at least tens of thousands of Python programmers nobody on this list has ever heard about - or from - writing code that's invisible to search engines. I _believe_ it - I just don't see how it can be known.
... Honestly, I couldn't immediately tell what this code was doing:
list(accumulate([8, 4, "k"], lambda x, y: x + [y], first_result=[]))
Of course you couldn't: you think of accumulate() as being about running sums, and _maybe_ some oddball out there using it for running products. But that's a statement about your background, seeing code you've never seen before, not about the function. Nobody knows immediately, at first sight, what list(accumulate([8, 4, 6], lambda x, y: x + y, first_result=0)) does either. It's learned. If your background were in, e.g., Haskell instead, then in the latter case you'd picture a list [a, b, c, ...] and figure it out from thinking about what the prefixes of 0 + a + b + c + .... compute. In exactly the same way, in the former case you'd think about what the prefixes of [] + [a] + [b] + [c] + ... compute. They're equally obvious _after_ undertaking that easy exercise, but clear as mud before doing so.
This may be a case where a person would be better-off without accumulate() at all.
De gustibus non est disputandum.
In short, for _general_ use `accumulate()` needs `initial` for exactly the same reasons `reduce()` needed it.
The reduce() function had been much derided, so I've had it mentally filed in the anti-pattern category. But yes, there may be wisdom there.
The current accumulate() isn't just akin to reduce(), it _is_ reduce(), except a drunken reduce() so nauseated it vomits its internal state out after each new element it eats ;-)
BTW, the type signatures on the scanl (requires an initial value) and scanl1 (does not support an initial value) implementations I pasted from Haskell's Standard Prelude give a deeper reason: without an initial value, a list of values of type A can only produce another list of values of type A via scanl1. The dyadic function passed must map As to As. But with an initial value supplied of type B, scanl can transform a list of values of type A to a list of values of type B. While that may not have been obvious in the list prefix example I gave, that was at work: a list of As was transformed into a list _of_ lists of As. That's impossible for scanl1 to do, but easy for scanl.
Thanks for pointing that out. I hadn't considered that someone might want to transform one type into another using accumulate(). That is pretty far from my mental model of what accumulate() was intended for.
It's nevertheless what the current function supports - nothing being suggested changes that one whit. It's "worse" in Python because while only `scanl` in Haskell can "change types", the current `scanl1`-like Python `accumulate` can change types too. Perhaps the easiest way to see that is by noting that map(f, xs) is generally equivalent to accumulate(xs, lambda x, y: f(y)) right now. That is, just ignore the "accumulator" argument, and you're left with a completely arbitrary transformation of the elements. If you like, you can, e.g., use accumulate() right now to generate a a sequence of socket connections from a sequence of deques. That can't be done by Haskell's scanl1 because that language's strict static type system can't express the notion of that "well, given a list-of-A, the accumulation function f has arguments of types A and A on the first call, but types type-of-f(A) and A on subsequent calls. By supplying an initial value of type B, `scanl`'s accumulation function returns type B and always has arguments of type B and A. The type system has no problem with that.
Also, I'm still not sure whether we would want code like that buried in an accumulate() call rather than as a regular for-loop where I can see the logic and trace through it with pdb.
De gustibus non est disputandum again ;-) These kinds of arguments belong in a style guide, wiki, tutorial, or nagging Stackoverflow answer. They really - to me - have nothing to do with the issue at hand. Again, nothing above depends in any way on whether or not accumulate grows an optional argument. All the things you see as "bad" are already not just possible, but easy.
As for scanl, I'm not sure what this code means without seeing some python equivalent.
scanl :: (a -> b -> a) -> a -> [b] -> [a] scanl f q xs = q : (case xs of [] -> [] x:xs -> scanl f (f q x) xs)
scanl1 :: (a -> a -> a) -> [a] -> [a] scanl1 f (x:xs) = scanl f x xs scanl1 _ [] = []
My apologies for that - I assumed you had a reading knowledge of Haskell, and were just unaware of these specific functions. The details don't really matter. You can just take my word for that `scanl1` is a type-safe-restricted form of Python's current `accumulate`, and that the more basic `scanl` is like what `accumulate` would be if an initial value were a _required_ argument. BTW, Haskell has no native notion of optional (or default, or variable, or keyword) arguments. All functions are "curried": they have exactly one argument. So, e.g., while there's generally no harm in viewing the Haskell f x y as being like the Python f(x, y) it's _really_ more like functools.partial(f, x)(y) In any case, "all functions have exactly one argument" is why, in Haskell, even the tiniest variation in functionality is usually implemented by creating a function with a new name for each variation, rather than pile on ever-growing sequences of required flag arguments. I'll also note that `scanl` and `scanl1` are defined in the Standard Prelude, which is akin to being a Python builtin: they're part of the core language, always available. As such, every experienced Haskell programmer is aware of them.
and it would have been distracting to even had the option.
Distracting for how long? One second or two? ;-)
Possibly forever. In my experience, if a person initially frames a problem wrong (or perhaps in a hard to solve way), it can take them a long time to recover. For example with discounted cash flows, people who think of the initial value as being special or distinct from the other cash flows will have a hard time adapting to problem variants like annuity due, balloon payments, internal rate of return, coupon stripping, valuing a transaction that takes place in the future, etc.
I don't want to overstate the case, but I do think a function signature that offers a "first_value" option is an invitation to treat the first value as being distinct from the rest of the data stream.
For a _general_ accumulate, which we already have, _sometimes_ the first value really is distinct. Greg Ewing recently made that point eloquently, so I won't elaborate. But again, one example in the docs is all it takes:
list(accumulate([1, 2, 3])) [1, 3, 6] list(accumulate([1, 2, 3], initial=10)) [10, 11, 13, 16]
99% of programmers will see that, think "huh! why would I want initial=?" and never give it another thought. 0.001% of the remaining 1% will ask on Stackoverflow, and get an answer showing "advanced" uses for accumulate. The remaining 0.999% of the remaining 1% will eventually find one of those answers.
With a different background, you may just as well have been surprised if the question _hadn't_ come up. For example, this is a standard example in the Haskell world for how to define an infinite Fibonacci sequence with the initial two values f0 and f1:
fibs = f0 : scanl (+) f1 fibs
The part from `scanl` onward would be spelled in Python as
accumulate(fibs, initial=f1)
although it requires some trickery to get the recursive reference to work ...
Do we want the tool to encourage such trickery?
Don't get me wrong, I think it is cool that you could write such code, but could and should aren't always the same.
Sorry, the original point got lost in the weeds there: the point isn't that such code is desirable, it's just that Haskell programmers _are_ aware of scanl. and that one of its very-well-known toy uses exploits that it specifies an initial value. So _if_ you had that background too, it would be natural as death to wonder "huh - so how come Python's accumulate _doesn't_ have such an argument?".
... Haskell probably is a good source of inspiration, but I don't know the language and find its docs to be inscrutable.
That's what happens when a worldwide network of obsessed PhDs struggles for decades to push the state of the art ;-)
... That would be better on several counts to my eyes as:
inputs = repeat(None, 35) # no values actually used ... for x in accumulate(inputs, logistic_map, initial=x0)
In particular, filling inputs with `None` would lead to an exception if the programmer screwed up and the input values actually _were_ being used. I expect we'll both overlook that writing a generator using the obvious loop would be a lot clearer regardless ;-)
The winks may reading your posts fun, but I really can't tell whether position is, "yes, let's do this because someone can do wild things with it", or "no, let's don't this because people would commit atrocities with it".
I mean both, of course ;-) The world is absurd, and often all I can do is chuckle and accept that all options suck in their own endearing ways. Did you write those docs? If so, that's one of life's absurdities: half of the accumulate() docs are devoted to giving a weird example of an application that ignores the input sequence values entirely, taking from the input sequence _only_ its total length. I enjoyed the example, but I can't believe you'd _recommend_ it as good practice. It's just "a trick" you _can_ do, like (as above) you _can_ ignore the accumulator argument instead to make accumulate() work like map() instead.
and "Will anyone actually use it?".
As above, the docs could change to use it. And I bet the test suite too. How much more could you want from a feature?! ;-)
I'm concerned that the total number of actual users will be exactly two (you and the writer of the wheel-sieve) and that you each would have used it exactly once in your life. That's a pretty small user base for a standard library feature ;-)
You can't know that, though. It's not my only use, but it's the only use I wrote about because I had just done it the day before this thread started. I also have a prime-generating function inspired by Will Ness's. BTW, that's a beautiful thing in real life, but not primarily because of the wheel gimmicks: you don't have to specify an upper limit in advance (or ever), but it nevertheless consumes memory proportional to the number of primes less than the _square root_ of the largest prime you've generated so far. That's easy if you know the upper limit in advance, but not if you don't. Will's solution to _that_ part is clever, elegant, and eminently practical. In any case, Will is "a functional language" person, and didn't even know itertools existed. Janne Karila pointed it out to him, and Will was like a kid in a candy store. I expect (but don't know) _he_ wondered "huh - how come accumulate() doesn't have an initial value like scanl has?", but rather than gripe about it on a Python list created the "chain a singleton list" workaround instead. Regardless, I wouldn't be surprised a bit if Janne Karila also had similar code - or any number of other Python programmers we don't know about writing code we'll never see.
Tim, if you could muster an honest to goodness, real +1, that would be good enough for me.
On purely _technical_ grounds, given that accumulate() is already thoroughly general, I think adding an optional start value is not just a +1, but a no-brainer. I've still seen no technical argument against it, and several sound technical arguments in its favor have been made by several people. OTOH ... meh. There appears to be scant demand, and code churn also carries costs. I've already wormed around it, so it doesn't scratch any practical itch I still have. It's your job to worry about future generations ;-)
Otherwise, I'm back to -0 and prefer not to see Pythonistas writing the Haskell magics described in this thread.
Whether or not the argument is added is simply irrelevant to what's possible to do with accumulate() already - it just affects how transparently a special initial value can be supplied when one is desired. In all of the (few!) "real life" current uses you've seen, it would obviously make already-sane code simpler & clearer. Against that, worrying about masses of Pythonistas who _could_ make absurd (in Python) code a bit _less_ absurd just doesn't engage me.
If this does go forward, I would greatly prefer "start" rather than "first_value" or "initial".
Bike-shedding the name holds little interest for me. Against "start", Nick vehemently objects to that name. I _expected_ that you would too, because you've generally seen _no_ value to specifying an initial value for sums, and "start" is the name `sum()` gives to its optional starting value. In favor of "initial", the current accumulate() _is_ a generalization not of sum() but of reduce(), and: """ Help on built-in function reduce in module _functools: reduce(...) reduce(function, sequence[, initial]) -> value """ That is, the reduce docstring has used "initial" forever. And there's also that the itertoolz accumulate() already has the feature in question, and named its argument "initial".
The conversation has been enjoyable (perhaps because the stakes are so low) and educational (I learn something new every time you post).
Yup, I too prefer fighting to the death over things that don't matter ;-)
I'll leave this will a few random thoughts on itertools that don't seem to fit anywhere else.
1) When itertools was created, they were one of easiest ways to get C-like performance without writing C. However, when PyPy matured we got other ways to do it. And in the world of PyPy, the plain python for-loops outperform their iterator chain equivalents, so we lost one motivate to use itertools.
I should try PyPy again. I got out of the habit years ago, when I kept running out of RAM. I have a lot more RAM now ;-)
2) While I personally like function chains operating on iterators, my consulting and teaching experience has convinced me that very few people think that way.
Matches my less extensive experience.
Accordingly, I almost never use compress, filterfalse, takewhile, dropwhile, etc.
While I've never used any of those, except once each when they were new. I'm aware of them, but they just never seem to fit the problem at hand. So, just to keep things interesting, let's add an optional `start=` argument to _all_ itertools functions ;-)
As people started adopting PEP 279 generator expressions, interest in itertool style thinking seems to have waned.
Putting these two together has left me with a preference for itertools to only cover the simplest and most common cases, leaving the rest to be expressed as plain, everyday pure python. (The combinatoric itertools are an exception because they are more algorithmically interesting).
Besides all the combinatoric ones, these are the ones I'd sorely miss: count cycle repeat accumulate chain[.from_iterable] islice tee I'm surprised I left groupby() off that list! I always liked that one "in theory", but in practice - for whatever reasons - whenever I use it I usually end up rewriting the code, in such a way that groupby() no longer makes sense to use. Curious!
On Friday, April 6, 2018 at 9:03:05 PM UTC-4, Raymond Hettinger wrote:
On Friday, April 6, 2018 at 8:14:30 AM UTC-7, Guido van Rossum wrote: On Fri, Apr 6, 2018 at 7:47 AM, Peter O'Connor <peter.ed...@gmail.com> wrote:
So some more humble proposals would be:
1) An initializer to itertools.accumulate functools.reduce already has an initializer, I can't see any controversy to adding an initializer to itertools.accumulate
See if that's accepted in the bug tracker.
It did come-up once but was closed for a number reasons including lack of use cases. However, Peter's signal processing example does sound interesting, so we could re-open the discussion.
For those who want to think through the pluses and minuses, I've put together a Q&A as food for thought (see below). Everybody's design instincts are different -- I'm curious what you all think think about the proposal.
Raymond
---------------------------------------------
Q. Can it be done? A. Yes, it wouldn't be hard.
_sentinel = object()
def accumulate(iterable, func=operator.add, start=_sentinel): it = iter(iterable) if start is _sentinel: try: total = next(it) except StopIteration: return else: total = start yield total for element in it: total = func(total, element) yield total
Q. Do other languages do it? A. Numpy, no. R, no. APL, no. Mathematica, no. Haskell, yes.
Isn't numpy a yes? https://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.accumulate.... They definitely support it for add and multiply. It's defined, but doesn't seem to work on custum ufuncs (the result of frompyfunc).
* http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.accumulate.h... * https://stat.ethz.ch/R-manual/R-devel/library/base/html/cumsum.html * http://microapl.com/apl/apl_concepts_chapter5.html \+ 1 2 3 4 5 1 3 6 10 15 * https://reference.wolfram.com/language/ref/Accumulate.html * https://www.haskell.org/hoogle/?hoogle=mapAccumL
Q. How much work for a person to do it currently? A. Almost zero effort to write a simple helper function:
myaccum = lambda it, func, start: accumulate(chain([start], it), func)
Q. How common is the need? A. Rare.
Q. Which would be better, a simple for-loop or a customized itertool? A. The itertool is shorter but more opaque (especially with respect to the argument order for the function call):
result = [start] for x in iterable: y = func(result[-1], x) result.append(y)
versus:
result = list(accumulate(iterable, func, start=start))
Q. How readable is the proposed code? A. Look at the following code and ask yourself what it does:
accumulate(range(4, 6), operator.mul, start=6)
Now test your understanding:
How many values are emitted? What is the first value emitted? Are the two sixes related? What is this code trying to accomplish?
Q. Are there potential surprises or oddities? A. Is it readily apparent which of assertions will succeed?
a1 = sum(range(10)) a2 = sum(range(10), 0) assert a1 == a2
a3 = functools.reduce(operator.add, range(10)) a4 = functools.reduce(operator.add, range(10), 0) assert a3 == a4
a4 = list(accumulate(range(10), operator.add)) a5 = list(accumulate(range(10), operator.add, start=0)) assert a5 == a6
Q. What did the Python 3.0 Whatsnew document have to say about reduce()? A. "Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable."
Q. What would this look like in real code? A. We have almost no real-world examples, but here is one from a StackExchange post:
def wsieve(): # wheel-sieve, by Will Ness. ideone.com/mqO25A->0hIE89 wh11 = [ 2,4,2,4,6,2,6,4,2,4,6,6, 2,6,4,2,6,4,6,8,4,2,4,2, 4,8,6,4,6,2,4,6,2,6,6,4, 2,4,6,2,6,4,2,4,2,10,2,10] cs = accumulate(cycle(wh11), start=11) yield( next( cs)) # cf. ideone.com/WFv4f ps = wsieve() # codereview.stackexchange.com/q/92365/9064 p = next(ps) # 11 psq = p*p # 121 D = dict( zip( accumulate(wh11, start=0), count(0))) # start from sieve = {} for c in cs: if c in sieve: wheel = sieve.pop(c) for m in wheel: if not m in sieve: break sieve[m] = wheel # sieve[143] = wheel@187 elif c < psq: yield c else: # (c==psq) # map (p*) (roll wh from p) = roll (wh*p) from (p*p) x = [p*d for d in wh11] i = D[ (p-11) % 210] wheel = accumulate(cycle(x[i:] + x[:i]), start=psq) p = next(ps) ; psq = p*p next(wheel) ; m = next(wheel) sieve[m] = wheel _______________________________________________ Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Woo hoo! Another coincidence. I just happened to be playing with this problem today: You have a large list - xs - of N numbers. It's necessary to compute slice sums sum(xs[i:j]) for a great many slices, 0 <= i <= j <= N. For concreteness, say xs is a time series representing a toll booth receipt's by hour across years. "Management" may ask for all sorts of sums - by 24-hour period, by week, by month, by year, by season, ... A little thought showed that sum(xs[i:j]) = sum(xs[:j]) - sum(xs[:i]), so if we precomputed just the prefix sums, the sum across an arbitrary slice could be computed thereafter in constant time. Hard to beat that ;-) But computing the prefix sums is exactly what accumulate() does! With one twist: while we have N numbers, there are N+1 slice indices. So accumulate(xs) doesn't quite work. It needs to also have a 0 inserted as the first prefix sum (the empty prefix sum(xs[:0]). Which is exactly what a this_is_the_initial_value=0 argument would do for us. As is, using the chain trick: class SliceSummer: def __init__(self, xs): from itertools import accumulate, chain self.N = N = len(xs) if not N: raise ValueError("need a non-empty sequence") self.prefixsum = list(accumulate(chain([0], xs))) assert len(self.prefixsum) == N+1 def slicesum(self, i, j): N = self.N if not 0 <= i <= j <= N: raise ValueError(f"need 0 <= {i} <= {j} <= {N}") return self.prefixsum[j] - self.prefixsum[i] def test(N): from random import randrange xs = [randrange(-10, 11) for _ in range(N)] ntried = 0 ss = SliceSummer(xs) NP1 = N + 1 for i in range(NP1): for j in range(i, NP1): ntried += 1 assert ss.slicesum(i, j) == sum(xs[i:j]) assert ntried == N * NP1 // 2 + NP1, ntried
Peter O'Connor wrote:
The behaviour where the first element of the return is the same as the first element of the input can be weird and confusing. E.g. compare:
list(itertools.accumulate([2, 3, 4], lambda accum, val: accum-val)) [2, -1, -5] list(itertools.accumulate([2, 3, 4], lambda accum, val: val-accum)) [2, 1, 3]
This is another symptom of the fact that the first item in the list is taken to be the initial value. There's no way to interpret these results in terms of an assumed initial value, because neither of those functions has a left identity. -- Greg
Tim Peters wrote:
while we have N numbers, there are N+1 slice indices. So accumulate(xs) doesn't quite work. It needs to also have a 0 inserted as the first prefix sum (the empty prefix sum(xs[:0]).
Which is exactly what a this_is_the_initial_value=0 argument would do for us.
In this case, yes. But that still doesn't mean it makes sense to require the initial value to be passed *in* as part of the input sequence. Maybe the best idea is for the initial value to be a separate argument, but be returned as the first item in the list. I can think of another example where this would make sense. Suppose you have an initial bank balance and a list of transactions, and you want to produce a statement with a list of running balances. The initial balance and the list of transactions are coming from different places, so the most natural way to call it would be result = accumulate(transactions, initial = initial_balance) If the initial value is returned as item 0, then the result has the following properties: result[0] is the balance brought forward result[-1] is the current balance and this remains true in the corner case where there are no transactions. -- Greg
[Tim]
while we have N numbers, there are N+1 slice indices. So accumulate(xs) doesn't quite work. It needs to also have a 0 inserted as the first prefix sum (the empty prefix sum(xs[:0]).
Which is exactly what a this_is_the_initial_value=0 argument would do for us.
[Greg Ewing <greg.ewing@canterbury.ac.nz>]
In this case, yes. But that still doesn't mean it makes sense to require the initial value to be passed *in* as part of the input sequence.
Maybe the best idea is for the initial value to be a separate argument, but be returned as the first item in the list.
I'm not sure you've read all the messages in this thread, but that's exactly what's being proposed. That. e.g., a new optional argument: accumulate(xs, func, initial=S) act like the current accumulate(chain([S], xs), func) Note that in neither case is the original `xs` modified in any way, and in both cases the first value generated is S. Note too that the proposal is exactly the way Haskell's `scanl` works (although `scanl` always requires specifying an initial value - while the related `scanl1` doesn't allow specifying one). And that's all been so since the thread's first message, in which Raymond gave a proposed implementation: _sentinel = object() def accumulate(iterable, func=operator.add, start=_sentinel): it = iter(iterable) if start is _sentinel: try: total = next(it) except StopIteration: return else: total = start yield total for element in it: total = func(total, element) yield total
I can think of another example where this would make sense. Suppose you have an initial bank balance and a list of transactions, and you want to produce a statement with a list of running balances.
The initial balance and the list of transactions are coming from different places, so the most natural way to call it would be
result = accumulate(transactions, initial = initial_balance)
If the initial value is returned as item 0, then the result has the following properties:
result[0] is the balance brought forward result[-1] is the current balance
and this remains true in the corner case where there are no transactions.
Indeed, something quite similar often applies when parallelizing search loops of the form: for candidate in accumulate(chain([starting_value], cycle(deltas))): For a sequence that eventually becomes periodic in the sequence of deltas it cycles through, multiple processes can run independent searches starting at carefully chosen different starting values "far" apart. In effect, they're each a "balance brought forward" pretending that previous chunks have already been done. Funny: it's been weeks now since I wrote an accumulate() that _didn't_ want to specify a starting value - LOL ;-)
Ok, so it seems everyone's happy with adding an initial_value argument. Now, I claim that while it should be an option, the initial value should NOT be returned by default. (i.e. the returned generator should by default yield N elements, not N+1). Example: suppose we're doing the toll booth thing, and we want to yield a cumulative sum of tolls so far. Suppose someone already made a reasonable-looking generator yielding the cumulative sum of tolls for today: def iter_cumsum_tolls_from_day(day, toll_amount_so_far): return accumulate(get_tolls_from_day(day, initial=toll_amount_so_far)) And now we want to make a way to get all tolls from the month. One might reasonably expect this to work: def iter_cumsum_tolls_from_month(month, toll_amount_so_far): for day in month: for cumsum_tolls in iter_cumsum_tolls_from_day(day, toll_amount_so_far = toll_amount_so_far): yield cumsum_tolls toll_amount_so_far = cumsum_tolls But this would actually DUPLICATE the last toll of every day - it appears both as the last element of the day's generator and as the first element of the next day's generator. This is why I think that there should be an additional " include_initial_in_return=False" argument. I do agree that it should be an option to include the initial value (your "find tolls over time-span" example shows why), but that if you want that you should have to show that you thought about that by specifying "include_initial_in_return=True" On Mon, Apr 9, 2018 at 10:30 PM, Tim Peters <tim.peters@gmail.com> wrote:
[Tim]
while we have N numbers, there are N+1 slice indices. So accumulate(xs) doesn't quite work. It needs to also have a 0 inserted as the first prefix sum (the empty prefix sum(xs[:0]).
Which is exactly what a this_is_the_initial_value=0 argument would do for us.
[Greg Ewing <greg.ewing@canterbury.ac.nz>]
In this case, yes. But that still doesn't mean it makes sense to require the initial value to be passed *in* as part of the input sequence.
Maybe the best idea is for the initial value to be a separate argument, but be returned as the first item in the list.
I'm not sure you've read all the messages in this thread, but that's exactly what's being proposed. That. e.g., a new optional argument:
accumulate(xs, func, initial=S)
act like the current
accumulate(chain([S], xs), func)
Note that in neither case is the original `xs` modified in any way, and in both cases the first value generated is S.
Note too that the proposal is exactly the way Haskell's `scanl` works (although `scanl` always requires specifying an initial value - while the related `scanl1` doesn't allow specifying one).
And that's all been so since the thread's first message, in which Raymond gave a proposed implementation:
_sentinel = object()
def accumulate(iterable, func=operator.add, start=_sentinel): it = iter(iterable) if start is _sentinel: try: total = next(it) except StopIteration: return else: total = start yield total for element in it: total = func(total, element) yield total
I can think of another example where this would make sense. Suppose you have an initial bank balance and a list of transactions, and you want to produce a statement with a list of running balances.
The initial balance and the list of transactions are coming from different places, so the most natural way to call it would be
result = accumulate(transactions, initial = initial_balance)
If the initial value is returned as item 0, then the result has the following properties:
result[0] is the balance brought forward result[-1] is the current balance
and this remains true in the corner case where there are no transactions.
Indeed, something quite similar often applies when parallelizing search loops of the form:
for candidate in accumulate(chain([starting_value], cycle(deltas))):
For a sequence that eventually becomes periodic in the sequence of deltas it cycles through, multiple processes can run independent searches starting at carefully chosen different starting values "far" apart. In effect, they're each a "balance brought forward" pretending that previous chunks have already been done.
Funny: it's been weeks now since I wrote an accumulate() that _didn't_ want to specify a starting value - LOL ;-) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
* correction to brackets from first example: def iter_cumsum_tolls_from_day(day, toll_amount_so_far): return accumulate(get_tolls_from_day(day), initial=toll_amount_so_far) On Mon, Apr 9, 2018 at 11:55 PM, Peter O'Connor <peter.ed.oconnor@gmail.com> wrote:
Ok, so it seems everyone's happy with adding an initial_value argument.
Now, I claim that while it should be an option, the initial value should NOT be returned by default. (i.e. the returned generator should by default yield N elements, not N+1).
Example: suppose we're doing the toll booth thing, and we want to yield a cumulative sum of tolls so far. Suppose someone already made a reasonable-looking generator yielding the cumulative sum of tolls for today:
def iter_cumsum_tolls_from_day(day, toll_amount_so_far): return accumulate(get_tolls_from_day(day, initial=toll_amount_so_far))
And now we want to make a way to get all tolls from the month. One might reasonably expect this to work:
def iter_cumsum_tolls_from_month(month, toll_amount_so_far): for day in month: for cumsum_tolls in iter_cumsum_tolls_from_day(day, toll_amount_so_far = toll_amount_so_far): yield cumsum_tolls toll_amount_so_far = cumsum_tolls
But this would actually DUPLICATE the last toll of every day - it appears both as the last element of the day's generator and as the first element of the next day's generator.
This is why I think that there should be an additional " include_initial_in_return=False" argument. I do agree that it should be an option to include the initial value (your "find tolls over time-span" example shows why), but that if you want that you should have to show that you thought about that by specifying "include_initial_in_return=True"
On Mon, Apr 9, 2018 at 10:30 PM, Tim Peters <tim.peters@gmail.com> wrote:
[Tim]
while we have N numbers, there are N+1 slice indices. So accumulate(xs) doesn't quite work. It needs to also have a 0 inserted as the first prefix sum (the empty prefix sum(xs[:0]).
Which is exactly what a this_is_the_initial_value=0 argument would do for us.
[Greg Ewing <greg.ewing@canterbury.ac.nz>]
In this case, yes. But that still doesn't mean it makes sense to require the initial value to be passed *in* as part of the input sequence.
Maybe the best idea is for the initial value to be a separate argument, but be returned as the first item in the list.
I'm not sure you've read all the messages in this thread, but that's exactly what's being proposed. That. e.g., a new optional argument:
accumulate(xs, func, initial=S)
act like the current
accumulate(chain([S], xs), func)
Note that in neither case is the original `xs` modified in any way, and in both cases the first value generated is S.
Note too that the proposal is exactly the way Haskell's `scanl` works (although `scanl` always requires specifying an initial value - while the related `scanl1` doesn't allow specifying one).
And that's all been so since the thread's first message, in which Raymond gave a proposed implementation:
_sentinel = object()
def accumulate(iterable, func=operator.add, start=_sentinel): it = iter(iterable) if start is _sentinel: try: total = next(it) except StopIteration: return else: total = start yield total for element in it: total = func(total, element) yield total
I can think of another example where this would make sense. Suppose you have an initial bank balance and a list of transactions, and you want to produce a statement with a list of running balances.
The initial balance and the list of transactions are coming from different places, so the most natural way to call it would be
result = accumulate(transactions, initial = initial_balance)
If the initial value is returned as item 0, then the result has the following properties:
result[0] is the balance brought forward result[-1] is the current balance
and this remains true in the corner case where there are no transactions.
Indeed, something quite similar often applies when parallelizing search loops of the form:
for candidate in accumulate(chain([starting_value], cycle(deltas))):
For a sequence that eventually becomes periodic in the sequence of deltas it cycles through, multiple processes can run independent searches starting at carefully chosen different starting values "far" apart. In effect, they're each a "balance brought forward" pretending that previous chunks have already been done.
Funny: it's been weeks now since I wrote an accumulate() that _didn't_ want to specify a starting value - LOL ;-) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Peter O'Connor <peter.ed.oconnor@gmail.com>]
Ok, so it seems everyone's happy with adding an initial_value argument.
Heh - that's not clear to me ;-)
Now, I claim that while it should be an option, the initial value should NOT be returned by default. (i.e. the returned generator should by default yield N elements, not N+1).
-1 on that. - It goes against prior art. Haskell's scanl does return the initial value, and nobody on the planet has devoted more quality thought to how streams "should work" than those folks. The Python itertoolz package's accumulate already supports an optional `initial=` argument, and also returns it when specified. It requires truly compelling arguments to go against prior art. - It's "obvious" that the first value should be returned if specified. The best evidence: in this thread's first message, it was "so obvious" to Raymond that the implementation he suggested did exactly that. I doubt it even occurred to him to question whether it should. It didn't occur to me either, but my mind is arguably "polluted" by significant prior exposure to functional languages. - In all but one "real life" example so far (including the slice-summer class I stumbled into today), the code _wanted_ the initial value to be returned. The sole exception was one of the three instances in Will Ness's wheel sieve code, where he discarded the unwanted (in that specific case) initial value via a plain next(wheel) Which is telling: it's easy to discard a value you don't want, but to inject a value you _do_ want but don't get requires something like reintroducing the chain([value_i_want], the_iterable_that_didn't_give_the_value_i_want) trick the new optional argument is trying to get _away_ from. Talk about ironic ;-) I would like to see a simple thing added to itertools to make dropping unwanted values easier, though: """ drop(iterable, n=None) Return an iterator whose next() method returns all but the first `n` values from the iterable. If specified, `n` must be an integer >= 0. By default (`n`=None), the iterator is run to exhaustion. """ Then, e.g., - drop(it, 0) would effectively be a synonym for iter(it). - drop((it, 1) would skip over the first value from the iterable. - drop(it) would give "the one obvious way" to consume an iterator completely (for some reason that's a semi-FAQ,, and is usually answered by suggesting the excruciatingly obscure trick of feeding the iterable to a 0-size collections.deque constructor).. Of course Haskell has had `drop` all along, although not the "run to exhaustion" part.
Example: suppose we're doing the toll booth thing, and we want to yield a cumulative sum of tolls so far. Suppose someone already made a reasonable-looking generator yielding the cumulative sum of tolls for today:
def iter_cumsum_tolls_from_day(day, toll_amount_so_far): return accumulate(get_tolls_from_day(day, initial=toll_amount_so_far))
And now we want to make a way to get all tolls from the month. One might reasonably expect this to work:
def iter_cumsum_tolls_from_month(month, toll_amount_so_far): for day in month: for cumsum_tolls in iter_cumsum_tolls_from_day(day, toll_amount_so_far = toll_amount_so_far): yield cumsum_tolls toll_amount_so_far = cumsum_tolls
But this would actually DUPLICATE the last toll of every day - it appears both as the last element of the day's generator and as the first element of the next day's generator.
I didn't really follow the details there, but the suggestion would be the same regardless: drop the duplicates you don't want. Note that making up an example in your head isn't nearly as persuasive as "real life" code. Code can be _contrived_ to "prove" anything.
This is why I think that there should be an additional "include_initial_in_return=False" argument. I do agree that it should be an option to include the initial value (your "find tolls over time-span" example shows why), but that if you want that you should have to show that you thought about that by specifying "include_initial_in_return=True"
It's generally poor human design to have a second optional argument modify the behavior of yet another optional argument. If the presence of the latter can have two distinct modes of operation, then people _will_ forget which one the default mode is, making code harder to write and harder to read. Since "return the value" is supported by all known prior art, and by the bulk of "real life" Python codes known so far, "return the value" should be the default. But far better to make it the only mode rather than _just_ the default mode. Then there's nothing to be forgotten :-)
Tim Peters writes:
"Sum reduction" and "running-sum accumulation" are primitives in many peoples' brains.
I wonder what Kahneman would say about that. He goes to some length to explain that people are quite good (as human abilities go) at perceiving averages over sets but terrible at summing the same. Maybe they substitute the abstraction of summation for the ability to perform the operation? Steve
On Mon, Apr 9, 2018 at 11:35 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Tim Peters writes:
"Sum reduction" and "running-sum accumulation" are primitives in many peoples' brains.
I wonder what Kahneman would say about that. He goes to some length to explain that people are quite good (as human abilities go) at perceiving averages over sets but terrible at summing the same. Maybe they substitute the abstraction of summation for the ability to perform the operation?
[OT] How is that human ability tested? I am a visual learner and I would propose that if you have a set of numbers, you can graph it in different ways to make it easier to perceive one or the other (or maybe both): - to emphasize the average, draw a line graph -- in my mind I draw a line through the average (getting the trend for free) - to emphasize the sum, draw a histogram -- in my mind I add up the sizes of the bars -- --Guido van Rossum (python.org/~guido)
[Tim]
Woo hoo! Another coincidence. I just happened to be playing with this problem today:
You have a large list - xs - of N numbers. It's necessary to compute slice sums
sum(xs[i:j])
for a great many slices, 0 <= i <= j <= N.
Which brought to mind a different problem: we have a list of numbers, `xs`. For each index position `i`, we want to know the largest sum among all segments ending at xs[i], and the number of elements in a maximal-sum slice ending at xs[i]. `accumulate()` is a natural way to approach this, for someone with a functional language background. You'll have to trust me on that ;-) But there are some twists: - The identity element for max() is minus infinity, which accumulate() can';t know. - We want to generate a sequence of 2-tuples, despite that xs is a sequence of numbers. - In this case, we do _not_ want to see the special initial value. For example, given the input xs [-10, 3, -1, 7, -9, -7, -9, 7, 4] we want to generate (-10, 1), (3, 1), (2, 2), (9, 3), (0, 4), (-7, 1), (-9, 1), (7, 1), (11, 2) Note: the largest sum across all non-empty slices is then max(that_result)[0]. The code below could be easily changed to keep track off that incrementally too, but this is already so different from "plain old running sum" that I don't want more novelty than necessary to make the points (a special initial value is needed, and it's not necessarily insane to want to produce results of a different type than the inputs). The key part is the state updating function: def update(state, x): prevmax, count = state newsum = prevmax + x if newsum > x: return newsum, count + 1 else: return x, 1 That's all there is to it! Then, e.g.,
from itertools import accumulate, chain import math xs = [-10, 3, -1, 7, -9, -7, -9, 7, 4] initial = (-math.inf, 1) result = accumulate(chain([initial], xs), update) next(result) # discard unwanted value (-inf, 1) list(result) [(-10, 1), (3, 1), (2, 2), (9, 3), (0, 4), (-7, 1), (-9, 1), (7, 1), (11, 2)]
[Raymond Hettinger <raymond.hettinger@gmail.com>]
Q. Do other languages do it? A. Numpy, no. R, no. APL, no. Mathematica, no. Haskell, yes.
* http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.accumulate.h... * https://stat.ethz.ch/R-manual/R-devel/library/base/html/cumsum.html * http://microapl.com/apl/apl_concepts_chapter5.html \+ 1 2 3 4 5 1 3 6 10 15 * https://reference.wolfram.com/language/ref/Accumulate.html * https://www.haskell.org/hoogle/?hoogle=mapAccumL
There's also C++, which is pretty much "yes" to every variation discussed so far: * partial_sum() is like Python's current accumulate(), including defaulting to doing addition. http://en.cppreference.com/w/cpp/algorithm/partial_sum * inclusive_scan() is also like accumulate(), but allows an optional "init" argument (which is returned if specified), and there's no guarantee of "left-to-right" evaluation (it's intended for associative binary functions, and wants to allow parallelism in the implementation). http://en.cppreference.com/w/cpp/algorithm/inclusive_scan * exclusive_scan() is like inclusive_scan(), but _requires_ an "init" argument (which is not returned). http://en.cppreference.com/w/cpp/algorithm/exclusive_scan * accumulate() is like Python's functools.reduce(), but the operation is optional and defaults to addition, and an "init" argument is required. http://en.cppreference.com/w/cpp/algorithm/accumulate
Since this started, I've been using variants of the following in my own code, wrapping `accumulate`: from itertools import accumulate, chain, cycle, islice def accum(iterable, func=None, *, initial=None, skipfirst=False): if initial is not None: iterable = chain([initial], iterable) if func is None: iterable = accumulate(iterable) else: iterable = accumulate(iterable, func) if skipfirst: iterable = islice(iterable, 1, None) return iterable I quite like it! It handles everything that's come up pretty naturally. A difference from what was discussed before: I opposed having a `skipfirst` argument because I didn't want an optional argument that _only_ applied if another optional argument was specified. But `skipfirst` in the above applies regardless of whether `initial` is specified. It turned out to be useful in cases where the _effect_ of `initial` had been gotten instead via slamming the initial value into the first object returned by `iterable`, and then ignored later by a special first case to throw away the first result `accumulate` returned. I'm writing this now because it turned out I used it eight times yesterday, which raised it above background noise level ;-) The first use captures a sequence countless thousands of Python programmers have crafted in various ways by hand: def firstfactor(n, limit=1000): for p in chain([2, 3], accum(cycle([2, 4]), initial=5)): # 2, 3, and integers not divisible by 2 or 3 # 2, 3, 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, ... if p*p > n: return n if p > limit: return 0 if n % p == 0: return p The same sequence can be gotten more obscurely via: for p in chain([2, 3], accum(cycle([4, 2]), initial=1, skipfirst=True)): The remaining 7 uses were in transcribing's Lehman's elegant[1] optimization of Fermat's "difference of squares" factorization method: def large(deltas, init): for k in accum(cycle(deltas), initial=init): ... # ints divisible by 30 large([30], 30) # ints divisible by 24 but not by 30 large([24, 24, 24, 48], 24) # ints divisible by 12 but not by 30 or 24 large([24, 48, 24, 24], 12) # ints divisible by 18 but not by 30, 24, or 12 large([36, 72, 36, 36], 18) # ints divisible by 6 but not by 30, 24, 12, or 18 large([36, 24, 12, 24, 12, 24, 36, 12], 6) # ints divisible by 2 but not by 30, 24, 12, 18, or 6 large([2, 4], 2) # ints not divisible by 30, 24, 12, 18, 6, or 2 large([2], 1) Lehman built his sequence generator "by hand" in Algol. where the delta array (`c` in his code) is inherited from `large`'s enclosing scope, and `large()` is passed just the number of deltas and the initial value. A more literal transcription of his code would have used: def large(deltas, init): for k in accum(cycle(deltas), initial=init, skipfirst=True): ... instead with the delta lists rotated and with a different initial value. For example, instead of large([24, 24, 24, 48], 24) above his code uses large([48, 24, 24, 24], -24) The latter needs to ignore the initial (-24) value (except to add it to the first delta: -24 + 48 = 24). That was more natural in context, due to the way he advanced the generated in a `while True:` loop built out of gotos ;-) [1] https://math.boisestate.edu/~liljanab/Cryptology1Fall2013/LehmanFactoring.pd... Historical gloss: Lehman surprisingly transformed Fermat's O(n) method into an O(cube_root(n)) method. I believe that was the first deterministic factoring method known beating trial division's O(sqrt(n)) time. Alas for Lehman, Pollard very soon after published his `rho` method, which runs in probabilistic O(sqrt(p)) time where p is n's smallest prime factor, so in worst-case probabilistic O(fourth_root(n)) time. That relegated Lehman's method to an academic curiosity, although it remains supremely effective if N is the product of two odd primes whose ratio is close to a fraction with "small" numerator and denominator.
Java is a oops concept supported language and one of the most famous programming languages for computer science students and learners. Most students look for trusted and reliable Java Programming Help to complete Java projects without any hassles. https://www.greatassignmenthelp.com/java-assignment-help
On Mon, Jul 8, 2019 at 3:35 PM john alexa <alexajohn05@gmail.com> wrote:
Java is a oops concept supported language and one of the most famous programming languages for computer science students and learners. Most students look for trusted and reliable Java Programming Help to complete Java projects without any hassles. <spam link trimmed>
This is pure spam - are list admins able to delete it from the archive? ChrisA
I think It's quite obviously trying to bias the reader against the proposal by presenting a senseless example ;-) Assuming there's any real reason to write that code at all, a better question is whether it's more comprehensible than accumulate(itertools.chain([6], range(4, 6)), operator.mul) https://www.windowsmoviemaker.xyz/
Historical gloss: Lehman surprisingly transformed Fermat's O(n) method into an O(cube_root(n)) method. I believe that was the first deterministic factoring method known beating trial division's O(sqrt(n)) time. Alas for Lehman, Pollard very soon after published his rho method, which runs in probabilistic O(sqrt(p)) time where p is n's smallest prime factor, so in worst-case probabilistic O(fourth_root(n)) time. https://mobileappdevelopmentcompanyindia.co
On Fri, Oct 4, 2019 at 7:41 AM <trekha89@gmail.com> wrote:
[spam removed]
Why is this 18-month-old thread in particular attracting spammers? The last four messages to this thread have been three spam links and one complaint about spam. Is there any reason for this thread to be bumped in the future with legitimate discussion posts, and if not, would it make sense for the list admins to just shut this thread down, and is there a way to do that (e.g. by blacklisting or auto-rejecting future messages to the thread)?
I quite like it! It handles everything that's come up pretty naturally. A difference from what was discussed before: I opposed having a skipfirst argument because I didn't want an optional argument that _only_ applied if another optional argument was specified. https://www.straic.com/San-diego-seo-expert/
On Sat, Apr 7, 2018 at 4:48 AM Paul Moore <p.f.moore@gmail.com> wrote:
Agreed that the "chain([x], it)" step is obscure. That's a bit of a bummer -- one of the goals for the itertools module was to be a generic toolkit for chopping-up, modifying, and splicing iterator streams (sort of a CRISPR for iterators). The docs probably need another recipe to show
On 7 April 2018 at 08:44, Raymond Hettinger <raymond.hettinger@gmail.com> wrote: this pattern:
def prepend(value, iterator): "prepend(1, [2, 3, 4]) -> 1 2 3 4" return chain([value], iterator)
Thanks for taking a look at the proposal. I was -0 when it came up once
before. Once I saw a use case pop-up on this list, I thought it might be worth discussing again.
I don't have much to add here - I typically agree that an explicit loop is simpler, but my code tends not to be the sort that does this type of operation, so my experience is either where it's not appropriate, or where I'm unfamiliar with the algorithms, so terseness is more of a problem to me than it would be to a domain expert.
Having said that, I find that the arguments that it's easy to add and it broadens the applicability of the function to be significant. Certainly, writing a helper is simple, but as Tim pointed out, the trick to writing that helper is obscure. Also, in the light of the itertools design goal to be a toolkit for iterators, I often find that the tools are just slightly *too* low level for my use case - they are designed to be combined, certainly, but in practice I find that building my own loop is often quicker than working out how to combine them. (I don't have concrete examples, unfortunately - this feeling comes from working back from the question of why I don't use itertools more than I do). So I tend to favour such slight extensions to the use cases of itertools functions.
A recipe would help, but I don't know how much use the recipes see in practice. I see a lot of questions where "there's a recipe for that" is the answer - indicating that people don't always spot the recipes.
Part of the problem with the recipes is, as far as I am aware, the license. The recipes appear to be under the Python-2.0 license, which complicates the licensing of any project you use them in that isn't already under that license.
On Wed, Oct 23, 2019 at 7:03 AM Todd <toddrjen@gmail.com> wrote:
On Sat, Apr 7, 2018 at 4:48 AM Paul Moore <p.f.moore@gmail.com> wrote:
Agreed that the "chain([x], it)" step is obscure. That's a bit of a bummer -- one of the goals for the itertools module was to be a generic toolkit for chopping-up, modifying, and splicing iterator streams (sort of a CRISPR for iterators). The docs probably need another recipe to show
On 7 April 2018 at 08:44, Raymond Hettinger <raymond.hettinger@gmail.com> wrote: this pattern:
def prepend(value, iterator): "prepend(1, [2, 3, 4]) -> 1 2 3 4" return chain([value], iterator)
Thanks for taking a look at the proposal. I was -0 when it came up
once before. Once I saw a use case pop-up on this list, I thought it might be worth discussing again.
I don't have much to add here - I typically agree that an explicit loop is simpler, but my code tends not to be the sort that does this type of operation, so my experience is either where it's not appropriate, or where I'm unfamiliar with the algorithms, so terseness is more of a problem to me than it would be to a domain expert.
Having said that, I find that the arguments that it's easy to add and it broadens the applicability of the function to be significant. Certainly, writing a helper is simple, but as Tim pointed out, the trick to writing that helper is obscure. Also, in the light of the itertools design goal to be a toolkit for iterators, I often find that the tools are just slightly *too* low level for my use case - they are designed to be combined, certainly, but in practice I find that building my own loop is often quicker than working out how to combine them. (I don't have concrete examples, unfortunately - this feeling comes from working back from the question of why I don't use itertools more than I do). So I tend to favour such slight extensions to the use cases of itertools functions.
A recipe would help, but I don't know how much use the recipes see in practice. I see a lot of questions where "there's a recipe for that" is the answer - indicating that people don't always spot the recipes.
Part of the problem with the recipes is, as far as I am aware, the license. The recipes appear to be under the Python-2.0 license, which complicates the licensing of any project you use them in that isn't already under that license.
That can be solved. We could explicitly license the recipes in the docs under a simpler license. Please start a new thread, as this one has attracted too much spam. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
participants (17)
-
Chris Angelico
-
dparul000@gmail.com
-
Greg Ewing
-
Guido van Rossum
-
john alexa
-
Jonathan Goble
-
Kirill Balunov
-
Neil Girdhar
-
Nick Coghlan
-
Paul Moore
-
Peter O'Connor
-
Raymond Hettinger
-
Stephen J. Turnbull
-
straic009@gmail.com
-
Tim Peters
-
Todd
-
trekha89@gmail.com