[Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

Sat Apr 7 18:09:36 EDT 2018

...

[Tim]
>> Later:
>>
>>    def coll(SHIFT=24):
>>        ...
>>        from itertools import accumulate, chain, cycle
>>        ...
>>        LIMIT = 1 << SHIFT
>>        ...
>>        abc, first, deltas = buildtab(SHIFT, LIMIT)
>>        ...
>>        for num in accumulate(chain([first], cycle(deltas))):
>>            assert num % 3 != 2
>>
>> As in Will's code, it would be more readable as:
>>
>>        for num in accumulate(cycle(deltas), start=first):

[Raymond]
> That does read better.  I am curious how you would have
> written it as a plain for-loop before accumulate() was added

The original loop was quite different, a nested loop pair reflecting
directly that candidates are of the form i * LIMIT + j for i >= 1 and
j in goodix:

    for base in itertools.count(LIMIT, LIMIT):
        for ix in goodix:
            num = base + ix
            if num % 3 == 2:
                continue

It was later I noticed that, across every 3 full iterations of the
outer loop, exactly one third of the "num % 3 == 2" tests were true.
It took some thought & a bit of proof to show that all and only the
num % 3 != 2 candidates could be generated directly by the shorter
code.  BTW,

    count(LIMIT, LIMIT)

is a bit of a head-scratcher itself ;-)

Without `accumulate()`, I suppose I would have done this instead:

    num = first
    for delta in chain([0], cycle(deltas)):
        num += delta

That's worse to my eyes!  The `chain()` trick is still needed, but in
this case to inject a 0 delta at the start so that `num` remains
`first` across the first iteration.

I should note that this is "a search loop" that rarely finds what it's
looking for.  There are several places in the body that give up on the
current `num` and want to move on to the next candidate.  So it's of
great pragmatic value that it be written in a way such that a plain
`continue` in the body does the right thing.  For that reason, I would
_not_ have written it as, e.g.,

    num = first
    for delta in cycle(deltas):
        # masses of tests that may want to give up
        # early, excruciatingly nested so that "give up"
        # falls through to the end
        ...
        num += delta

> (part of the argument against reduce() was that a plain
> for-loop would be clearer 99% of the time).

Except that isn't true:  99% of `reduce()` instances were replaced by
`sum()` when the latter was introduced :-)  "Sum reduction" and
"running-sum accumulation" are primitives in many peoples' brains.  In
generalizing those to other dyadic operations, it's the abstraction
itself that's responsible for the loss of clarity - now you're
building a higher-order functional that's not a primitive in anyone's
brain except for Ken Iverson and Kirby Urner ;-)

The rest of us are better off seeing the moving pieces in a loop body.
But that's simply not so for addition, which is why introducing
`sum()` was a great idea.  BTW, note that `sum()` also supports an
optional `start=` argument.  I expect (but don't know) that
`accumulate()` is overwhelmingly used to do running sums (the only use
I've had for it), so it's a bit odd on that count that it doesn't.

> ...
> Agreed that the "chain([x], it)" step is obscure.  That's a bit of a bummer --
> one of the goals for the itertools module was to be a generic toolkit for
> chopping-up, modifying, and splicing iterator streams (sort of a CRISPR
> for iterators).

I'd say it was overwhelmingly successful at that goal.  The rub here
appears to be that `x` on its own is not a stream - it has to be
wrapped inside an iterable first to play nice with stream-based tools.

In a stream-based language (like Haskell), there's usually a "cons"
operation built in to prepend a scalar to a stream (like `x : it` in
Haskell is pretty much the same as `chain([x], it)`).

> The docs probably need another recipe to show this pattern:
>
>         def prepend(value, iterator):
>             "prepend(1, [2, 3, 4]) -> 1 2 3 4"
>             return chain([value], iterator)

+1.  Whether `accumulate()` should grow a `start=` argument still
seems a distinct (albeit related) issue to me, though.